A tool for Collaborating over GAN’s latent space

栏目: IT技术 · 发布时间: 4年前

内容简介:In January 2020 we finalized the development phase ofGenerative Adversarial Networks, or GAN, was the

In January 2020 we finalized the development phase of Marrow . Shirin Anlen and I are sharing lessons learned during this process, and our post about optimizing and augmenting a small dataset was recently published on towardsatascience . This post looks at how custom web-based tools can inspire a collaborative artistic workflow when working with machine learning models.

Apr 29 ·8min read

A tool for Collaborating over GAN’s latent space

Shadow animation from GAN’s latent space using the web explorer tool

Myself and Marrow

Marrow is a hands-on research project and an interactive theater experience by shirin anlen that explores the possibilities of mental disorders in machine learning . I have previously worked with Shirin on a number of projects, most notably the VR documentary Tzina: Symphony of Longing . In 2018 I joined shirin to preview Marrow as an installation at IDFA Doclab 2018 . The prototype was a success, and one year later we went as collaborators to an intensive development phase co-produced by the National Film Board of Canada and Atlas V .

About GAN and its latent space

Generative Adversarial Networks, or GAN, was the first machine learning model we decided to research . It focuses on generative visual imagery and exhibits a very clear dissonance if you attempt to train it on complex concepts using banal stock images. In a previous post we described how we created a dataset of ‘ Perfect family dinner’ images and used it to train StyleGAN V1 . This particular dataset was constructed to serve the story of the experience; one of a dysfunctional family that sees itself only through the distorted data that it was trained on. Because of this, we aimed for results that are imperfect and represent the glitches that emerge when the model tries to go deep into social narratives.

Our dataset was a bundle of around 6,500 images containing figures of four family members, stripped away from their family dinner setting. Once StyleGAN finished the training process, we ended up with a vast space of possibilities for newly generated images containing four distorted familial figures. The infinite, continuous, space of possibilities for an output image is called the Latent Space . It is “latent” because the output image generated by GAN is determined by a seemingly hidden process of mathematical transformations, starting from a series of numbers, and ending with a bitmap image. When you change any of the initial numbers in the series, the resulting image would be slightly different. The transformation network is so deep, that it’s hard to predict what would change in the image.

A tool for Collaborating over GAN’s latent space

An animation of latent space transitions

If you have a good enough dataset and algorithm, you might be able to reach disentanglement : that is when one of the input numbers controls one meaningful element in the resulting image; for example, one number would change the age of one generated person, while another changes their hair color. Needless to say, we were not able to achieve disentanglement with our small dataset. A change in a single number from the initial series could induce various changes in multiple family members. The same number could simultaneously control one family member’s pose, another member’s smile, and the appearance of a Christmas hat in a third figure (a repeating motif in stock images, it seems). The family members were in fact entangled .

The Shadow Allegory

Marrow tracks each of its models ‘thinking’ process and questions what could go wrong. In GAN, the latent space gives us information about how input data is being broken-down and then reconstructed into something new. But as much as visualizing the latent space is intriguing, we were looking for ways to integrate storytelling into experience. We wanted to materialize GAN’s distorted image of the world.

When watching the ongoing training process of GAN we started noticing things that are other than human, coming from the source dataset. It was like staring at Rorschach tests; flat images that appear different depending on who is watching. We realized that we are learning more about GAN not by seeing the result that we expect, but by seeing its in-between spaces. Plato’s Allegory of the cave speaks about finding meaning in the simple and flattened representation of things. The people in the allegory are stuck in a cave with a fire burning outside. The fire projects the shadows of passing by objects on the cave’s walls, and that is all they can see of reality. They are so used to those shadows, that once a prisoner breaks free, their eyes get burned by the flaring sun. When the prisoner’s eyes are finally accustomed to reality, they come back to the cave to tell the others, but now they are unable to see anything in the darkness. The other prisoners assume that something evil lies outside.

Interestingly, Plato’s allegory of the cave corresponds quite well with the structure and training process of GAN . GAN is in constant conflict between reality, representations of reality, and fantasy. When the algorithm generates images that are too close to the original dataset, it finds itself stuck in a simple and flat representation of the world, unable to escape to pathways of creativity. When GAN’s generations are too fantastical, they are inevitably deemed as fake and wrong. GAN is in a constant struggle to find the balance between the real and the imaginary. Therefore, we decided to visualize GAN’s struggle by using the shadow representation of the distorted family outputs.

A tool for Collaborating over GAN’s latent space

Transitions in full color VS in shadow mode

Animating over the latent space

Marrow is an interactive theater piece where the participants play the role of machine learning models in a family dinner setting. In the experience, a participant who represents GAN is telling their story about the difficulties they face in discerning memory from imagination — both of those perceptions are in fact distorted in GAN, so we decided to explore at this phase the additional layer of fantastical animated layer over the world of shadows, that would represent the character’s struggle between the real and the fake. We worked with the talented Paloma Dawkins , a master of hand-drawn animations and alternate dimensions. Now we had to ask ourselves: how do we orchestrate a workflow that starts in the mathematical depths of GAN, but ends with hand-drawn animations that perfectly match GAN’s latent movements across the image space? The answer came in the form of our custom-developed tool: Marrow GAN Explorer .

以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网






(美)保罗﹒莱文森(Paul Levinson) / 何道宽 / 社会科学文献出版社 / 2001年 / 20.0

本书是一本三合一的书。既是麦克卢汉评传,又是一部专著,而且是让网民“扫盲”和提高的指南。 《数字麦克卢汉》实际上有两个平行的主题和任务。一个是批评和张扬麦克卢汉。另一个是写作者自己的思想。它“不仅谋求提供进入数字时代的向导……而且谋求证明麦克卢汉思想隐而不显的准确性。为了完成这个双重任务,本书的每一章都试图阐明麦克卢汉的一种重要的洞见、原则或概念。与此同时,它试图揭示麦克卢汉告诉我们一些什么......一起来看看 《数字麦克卢汉》 这本书的介绍吧!

