OpenAI’s fiction-spewing AI is learning to generate images

栏目: IT技术 · 发布时间: 3年前

Artificial intelligence / Machine learning

OpenAI’s fiction-spewing AI is learning to generate images

By training GPT-2 on pixels instead of words, the model can accept half an image and predict how to complete it.

July 16, 2020

Ben Barry / OpenAI

In February of last year, the San Francisco-based research lab OpenAIannounced that itsAI system could now write convincing passages of English. Feed the beginning of a sentence or paragraph into GPT-2, as it was called, and it could continue the thought for as long as an essay with almost human-like coherence.

Now, the lab is exploring what would happen if the same algorithm were instead fed part of an image. The results , which were given an honorable mention for best paper award at this week’s International Conference on Machine Learning, open up a new avenue for image generation, ripe with opportunity and consequences.

At its core, GPT-2 is a powerful prediction engine. It learned to grasp the structure of the English language by looking at billions of examples of words, sentences, and paragraphs, scraped from the corners of the internet. With that structure, it could then manipulate words into new sentences by statistically predicting the order in which they should appear.

So researchers at OpenAI decided to swap the words for pixels and train the same algorithm on images in ImageNet, the most popular image bank for deep learning. Because the algorithm was designed to work with one-dimensional data, i.e.: strings of text, they unfurled the images into a single sequence of pixels. They found that the new model, named iGPT, was still able to grasp the two-dimensional structures of the visual world. Given the sequence of pixels for the first half of an image, it could predict the second half in ways that a human would deem sensible.

Below, you can see a few examples. The left-most column is the input, the right-most column is the original, and the middle columns are iGPT’s predicted completions. (See more examples here .)

OpenAI’s fiction-spewing AI is learning to generate images

OPENAI

The results are startlingly impressive and demonstrate a new path for using unsupervised learning, which trains on unlabeled data, in the development of computer vision systems. While early computer vision systems in the mid-2000s trialed such techniques before, they fell out of favor as supervised learning, which uses labeled data, proved far more successful. The benefit of unsupervised learning, however, is that it allows an AI system to learn about the world without a human filter, and significantly reduces the manual labor of labeling data.

The fact that iGPT uses the same algorithm as GPT-2 also shows its promising adaptability across domains. This is in line with OpenAI’s ultimate ambition to achieve more generalizable machine intelligence.

At the same time, the method presents a concerning new way to create deepfake images. Generative adversarial networks , the most common category of algorithms used to create deepfakes in the past, must be trained on highly curated data. To get a GAN to generate a face, for example, its training data should only include faces. iGPT, by contrast, simply learns enough of the structure of the visual world across millions and billions of examples to spit out images that could feasibly exist within it. While training the model is still computationally expensive, offering a natural barrier to its access, that may not be the case for long.

OpenAI did not grant an interview request, and therefore did not provide additional context for  future plans with regards to its research. But in an internal policy team meeting that MIT Technology Review attended last year, its policy director Jack Clark mused about the future risks of GPT-style generation, including what would happen if it were applied to images. “Video is coming,” he said, projecting where he saw the field’s research trajectory going. “In probably five years, you’ll have conditional video generation over a five to ten second horizon. The sort of thing I’m imagining is eventually you’ll be able to put a photo of Angela Merkel as the condition, with an explosion next to her, and it will generate a likely output, which will be Angela Merkel getting killed.”


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

21天学通C语言

21天学通C语言

(美国)琼斯(Bradley L.Jones) (美国)埃特肯(Peter Aitken) / 信达工作室 / 人民邮电出版社 / 2012-8 / 69.00元

《21天学通C语言(第6版•修订版)》是初学者学习C语言的经典教程。本版按最新的标准(ISO∕IEC:9899-1999),以循序渐进的方式介绍了C语言编程方面知识,并提供了丰富的实例和大量的练习。通过学习实例,并将所学的知识用于完成练习,读者将逐步了解、熟悉并精通C语言。《21天学通C语言(第6版•修订版)》包括四周的课程。第一周的课程介绍了C语言程序的基本元素,包括变量、常量、语句、表达式、函......一起来看看 《21天学通C语言》 这本书的介绍吧!

随机密码生成器
随机密码生成器

多种字符组合密码

SHA 加密
SHA 加密

SHA 加密工具

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器