Why Do GANs Need So Much Noise?

栏目: IT技术 · 发布时间: 4年前

Why Do GANs Need So Much Noise?

Image Source: Pexels

Generative Adversarial Networks (GANs) are a tool for generating new, “fake” samples given a set of old, “real” samples. These samples can be practically anything: hand-drawn digits, photographs of faces, expressionist paintings, you name it. To do this, GANs learn the underlying distribution behind the original dataset. Throughout training, the generator approximates this distribution while the discriminator tells it what it got wrong, and the two alternatingly improve through an arms race. In order to draw random samples from the distribution, the generator is given random noise as input. But, have you ever wondered why GANs need random input? The common answer is “so they don’t generate the same thing every time”, and that’s true, but the answer is a bit more nuanced than that.

Random Sampling

Before we continue with GANs, let’s take a detour and consider sampling from the normal distribution. Suppose you want to do this in Python, but you never read the numpy docs and don’t know that np.random.normal() exists. Instead, all you’ve got to work with is random.random() , which produces values uniformly in the interval (0, 1).

Why Do GANs Need So Much Noise?

Figure 1: A histogram of 100k samples drown from our input, uniform distribution (blue) and our target, normal distribution (orange).

In short, we want to transform the blue distribution into the orange distribution in figure 1. Fortunately, there is a function to do this: the inverse cumulative distribution function , also called the quantile function . The (non-inverted) cumulative distribution function, or CDF, illustrated in figure 2, describes the probability that any random value drawn from the distribution in question will be equal to or less than x , for some specified x .

Why Do GANs Need So Much Noise?

Figure 2: The CDF of the standard normal distribution.

For instance, at the point x=0 in figure 2, y=0.5 ; this means that 50% of the distribution lies below zero. A handy quality of the CDF is that the output ranges from 0 to 1, which is exactly the input we have available to us from the random.random() function! If we invert the CDF (flip it on its side), we get the quantile function:

Why Do GANs Need So Much Noise?

Figure 3: The quantile function of the standard normal distribution.

This function gives us the exact relationship between the quantile (our x , ranging from 0 to 1) and the corresponding value in the normal distribution, allowing us to sample directly from the normal distribution. That is, f(random.random()) ~ N(0, 1), where each point in the input space corresponds to a unique point in the output space .

Why Do GANs Need So Much Noise?

Figure 4: An animation illustrating the uniform distribution (blue) being mapped to the normal distribution (orange) using the quantile function.

What does this have to do with GANs?

In the above scenario, we had the quantile function at our disposal, but what if we didn’t, and had to learn a mapping from the input space to the output space? That is exactly the problem that GANs aim to solve. In aprevious article, I illustrated how GANs can be used to sample from the normal distribution if you’re in a data emergency and don’t have the quantile function available to you. In this light, I find it much more helpful to think of GANs not as tools for random sampling, but as functions that map some k -dimensional latent (input) space to some p -dimensional sample (output) space, which can then be used to transform samples from the latent space to samples from the sample space. In this view, much like the quantile function, there’s no randomness involved.

With maps on the mind, let’s consider how we might draw random samples from a 2D normal distribution with only 1D random samples between 0 and 1 as input.

Why Do GANs Need So Much Noise?

Figure 5: A 2D normal distribution (orange) and a 1D uniform distribution (blue), each with 100k samples.

How would we map the 100k samples in that blue line to the 100k samples in the orange blob? There’s no good way to do it. Sure, we could use Peano curves , but then we lose the useful property of having points close together in the input space result in points close together in the output space, and vice-versa. It’s for this reason that the dimensionality of the latent space of a GAN must equal or exceed the dimensionality of its sample space. That way, the function has enough degrees of freedom to map the input to the output.

But just for fun, let’s visualize what happens when a GAN with only one-dimensional input is tasked with learning multi-dimensional distributions. The results hopefully won’t surprise you, but they are fun to watch.

2D Gaussian

Let’s start out with the issue illustrated in figure 5: mapping the 1D range between 0 and 1 to the 2D normal (or “Gaussian”) distribution. We will be using a typical vanilla GAN architecture (code available at the end of the article).

Why Do GANs Need So Much Noise?

Figure 6: A GAN with a latent dimension of 1 trying to learn the 2D Gaussian distribution. Grey points are samples drawn from the true distribution, red points are generated samples. Each frame is one training step.

As you can see, the poor thing is at a loss for what to do. Having only one degree of freedom, it is hardly able to explore the sample space. What’s worse, because the generated samples are so densely-packed in that 1D manifold (there are as many grey dots in this gif as red dots!), the discriminator is able to slack off, never having to try hard to discern the real points from the fakes, and as such the generator doesn’t get very useful information (and certainly not enough to learn a space-filling curve, even if it had the capacity!).

Figure 6 shows the first 600 training steps. After 30k, this was the result:

Why Do GANs Need So Much Noise?

Figure 7: The distribution learned by the GAN from figure 6 after 30k training steps.

It’s a cute little squiggle, but hardly a Gaussian distribution. The GAN completely failed to learn the mapping after 30k steps. For context, let’s consider how a GAN with the same architecture and training routine fares when given 2D, 3D, 10D, and 100D latent spaces to map to the above distribution:

Why Do GANs Need So Much Noise?

Figure 8: Output from GANs with latent spaces of 2D, 3D, 10D, and 100D after 30k training steps

The 2D latent space GAN is much better than the 1D GAN above, but is still nowhere near the target distribution and had several obvious kinks in it. The 3D and 10D latent spaces produced GANs with visually convincing results, and the 100D GAN produced what appears to be a Gaussian distribution with the right variance but wrong mean. But, we should keep in mind that the high-dimensional GANs are cheating in this particular problem, since the mean of many uniform distributions is approximately normally-distributed.

Eight Gaussians

Why Do GANs Need So Much Noise?

Figure 9: The eight gaussians distribution

The eight Guassians distribution (figure 9) is exactly as it sounds: a mixture of eight 2D Gaussians arranged in a circle about the origin, each with small enough variance that they hardly overlap, and with zero covariance. Although the sample space is 2D, a reasonable encoding of this distribution has three dimensions: the first dimension being discrete and describing the mode (numbered one through eight), and the other two describing the x and y displacement from that mode, respectively.

I trained a GAN with latent_dim=1 on the eight Gaussians distribution 600 steps, and these were the results:

Why Do GANs Need So Much Noise?

Figure 10: A GAN with a latent dimension of 1 trying to learn the eight Gaussians distribution. Grey points are samples drawn from the true distribution, red points are generated samples. Each frame is one training step.

As expected, the GAN struggles to learn an effective mapping. After 30k steps, this is the learned distribution:

Why Do GANs Need So Much Noise?

Figure 11: The distribution learned by the GAN from figure 10 after 30k training steps.

The GAN is clearly struggling to map the 1D latent space to this 3D distribution: The right-most mode is ignored, a considerable number of samples are being generated between modes, and samples aren’t normally-distributed. For comparison, let’s consider four more GANs after 30k steps, with latent dimensions of 2, 3, 10, and 100:

Why Do GANs Need So Much Noise?

Figure 12: Output from GANs with latent spaces of 2D, 3D, 10D, and 100D after 30k training steps

It’s hard to tell which is best without actually measuring the KL divergence between the true distribution and the learned distribution (coming soon™️ in a follow-up article!), but the low-dimensional GANs seem to produce fewer samples in the negative space between modes. Even more interesting, the 2D GAN does not show mode collapse, the 3D and 10D GANs show only slight mode collapse, and the 100D GAN failed to generate samples in two of the modes.

Spiral

Why Do GANs Need So Much Noise?

Figure 13: Spiral distribution. The distribution decreases in density as the spiral extends outward from the circle, and is uniform in density laterally across the arm

The spiral distribution, illustrated in figure 13, is in some ways simpler than the eight Gaussians distribution. Having only one mode (albeit elongated and twisty), the GAN isn’t forced to discretize its continuous input. It can be described efficiently with two dimensions: one describing position along the spiral, the other describing position laterally within the spiral.

I trained a GAN with latent_dim=1 for 600 steps, and these were the results:

Why Do GANs Need So Much Noise?

Figure 14: A GAN with a latent dimension of 1 trying to fit the spiral distribution. Grey points are samples drawn from the true distribution, red points are generated samples. Each frame is one training step.

Again, the GAN struggles to learn an effective mapping. After 30k steps, this is the learned distribution:

Why Do GANs Need So Much Noise?

Figure 15: The distribution learned by the GAN from figure 14 after 30k training steps.

Similar to the case of the eight Gaussians distribution, the GAN does a poor job of mapping the spiral distribution. Two regions of the spiral are omitted and many samples are generated in the negative space. I address this inefficient mapping problem in detail inanother article, so I won’t belabour the point here; instead, let’s consider four more GANs tasked with learning this distribution after 30k steps, again with latent dimensions of 2, 3, 10, and 100:

Why Do GANs Need So Much Noise?

Figure 16: Output from GANs with latent spaces of 2D, 3D, 10D, and 100D after 30k training steps

Again, it’s hard to tell which is best without actually measuring the KL divergence, but the differences in coverage, uniformity, and amount of sampling in negative space are interesting to consider.

Closing Thoughts

It’s easy to get caught up in the GAN fervor and treat them like magic machines that use random numbers as fuel to pop out new samples. Understanding the fundamentals of how a tool works is essential to using it effectively and troubleshooting it when it breaks. With GANs, that means understanding that the generator is learning a mapping from some latent space to some sample space, and understanding how that learning unfolds. The extreme case of mapping a 1D distribution to a higher-dimensional distribution clearly illustrates how complicated this task is.

All code used in this project is available in the following GitHub repo:


以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

UNIX网络编程 卷2

UNIX网络编程 卷2

W.Richard Stevens / 人民邮电出版社 / 2009-11 / 89.00元

《UNIX网络编程 卷2:进程间通信(英文版·第2版)》是一部UNIX网络编程的经典之作。进程间通信(IPC)几乎是所有Unix程序性能的关键,理解IPC也是理解如何开发不同主机间网络应用程序的必要条件。《UNIX网络编程 卷2:进程间通信(英文版·第2版)》从对Posix IPC和System V IPC的内部结构开始讨论,全面深入地介绍了4种IPC形式:消息传递(管道、FIFO、消息队列)、同......一起来看看 《UNIX网络编程 卷2》 这本书的介绍吧!

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

RGB CMYK 转换工具
RGB CMYK 转换工具

RGB CMYK 互转工具

HEX HSV 转换工具
HEX HSV 转换工具

HEX HSV 互换工具