Alphabet GAN: AI Generates English Letters!

栏目: IT技术 · 发布时间: 3年前

内容简介:In this article, I want to show you how to implement one such GAN. I’ll also mention a whole bunch of tips that will help you in training your first GAN. But, before jumping into the model let’s understand the dataset.Here, I’m using an MNIST style dataset

Alphabet GAN: AI Generates English Alphabet!

This is how I created a GAN that can generate the English alphabets.

Jul 12 ·7min read

Alphabet GAN: AI Generates English Letters!

Evolution of the GAN’s output over the epochs

F irst, you need to know what a GAN really is. Well here’s a brief description. Generative Adversarial Network is a combination of two models namely Generator and Discriminator. The Generator tries to produce fake data mimicking the original data. On the other hand, the Discriminator tries to tell if a given data is original or fake. Thanks to the adversarial setup, eventually, both models keep getting better at their tasks. Of course, there’s much more to understand about GANs. Please watch this video if you are curious…

How do GANs work?

In this article, I want to show you how to implement one such GAN. I’ll also mention a whole bunch of tips that will help you in training your first GAN. But, before jumping into the model let’s understand the dataset.

Dataset: A-Z Handwritten Alphabets

Here, I’m using an MNIST style dataset of handwritten English alphabets. A-Z dataset contains 372,450 characters from 26 classes. Each data sample is a greyscale image of an alphabet. Like the MNIST dataset, the dimension of each image is 28px*28px and represented as a 784 ( 28*28 ) dimensional vector. Let’s visualize a few of them…

Alphabet GAN: AI Generates English Letters!

100 random images from the EMNIST Letters dataset

Originally, the pixel values range between [0, 255] but we should normalize them before feeding to any machine learning model. Generally, we normalize the pixels between [0, 1] by dividing 255.0 but here we normalize them between [-1, 1] . This is because we will use the tanh (range of tanh = [-1, 1] ) function later.

Now let’s build our GAN. I like to do it in 4 steps.

1. Build the Generator (G)

The generator is a neural network that takes a noise vector ( 100 -dimensional) as input and outputs an image of a single English alphabet. As we are working with image data, it makes sense to use a Convolutional Neural Network. The idea is to increase the spatial dimensions of the input as it passes through different layers until it reaches the desired output shape ( 28px*28px ). The first two layers of the network are Dense layers with ReLu activation. I’d highly recommend using BatchNormalization on the output of each layer.

Note: BatchNormalization makes the training converge faster. A lot faster.

Notice that the first Dense layer contains 1024 neurons and the second one contains 6272 neurons. After that comes the Reshape layer. The reshaping is important because we want to use convolution afterward and to apply convolution we need matrix-like entities rather than column/row vectors.

Note: To find the correct dimensions we need to think backward! First, determine the dimensions of the matrices ( 7*7 ) and how many ( 128 ) of them you want then multiply them to get the dimension ( 7*7*128 = 6272 ) of the Dense layer.

Before applying convolution we will upsample the matrices. I’ve used ( 2, 2 ) upsampling that will increase the dimension from 7*7 to 14*14 .

UpSampling is a kind of inverse function of Pooling.

After that, we have 2*2 convolution filters ( 64 ). Notice that I have initialized the weights of the kernels according to a Normal distribution. The activation for this layer is LeakyReLu. Then again we have an upsampling layer followed by a convolution layer. This time the UpSampling layer will output 28*28 dimensional matrices. The last convolution layer contains only 1 filter because we want only one channel for our grayscale image. The activation function here is tanh . This is the reason why we normalized the pixel values between [-1, 1] .

Note: We could have avoided UpSampling layers by using transposed convolutions. Because they can also increase the matrix dimensions.

Code:

The generator

Architecture:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_1 (Dense)              (None, 1024)              103424    
_________________________________________________________________
batch_normalization_1 (Batch (None, 1024)              4096      
_________________________________________________________________
activation_1 (Activation)    (None, 1024)              0         
_________________________________________________________________
dense_2 (Dense)              (None, 6272)              6428800   
_________________________________________________________________
batch_normalization_2 (Batch (None, 6272)              25088     
_________________________________________________________________
activation_2 (Activation)    (None, 6272)              0         
_________________________________________________________________
reshape_1 (Reshape)          (None, 7, 7, 128)         0         
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 14, 14, 128)       0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 64)        32832     
_________________________________________________________________
batch_normalization_3 (Batch (None, 14, 14, 64)        256       
_________________________________________________________________
leaky_re_lu_1 (LeakyReLU)    (None, 14, 14, 64)        0         
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 28, 28, 64)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 28, 28, 1)         577       
=================================================================
Total params: 6,595,073
Trainable params: 6,580,353
Non-trainable params: 14,720
_________________________________________________________________

Did you notice that I didn’t compile the generator here? This will be done in the 3rd step.

2. Build the Discriminator (D)

Our discriminator is just a binary classifier that takes a grayscale image as input and predicts if it’s an original image or a fake one i.e. created by the generator. The first two layers are convolution layers. Notice that I’ve used a stride of 2 which means the output dimension will be less than the input. So, we don’t need Pooling layers. The filter size is 5*5 for both of the layers but the number of filters is greater in the second layer.

Note: While building the discriminator you should keep in mind that our aim is to favor the generator because we want to generate fake images. Hence, make the discriminator a bit weaker than the generator. For example, here I’ve used fewer convolution layers in the discriminator.

After the convolution layers, we need to Flatten the output so that we can pass it to a Dense layer. The size of the Dense layer is 256 with a 50% dropout. At last, we have the sigmoid layer just like any other binary classifier. We have to compile the discriminator now. The loss should be binary cross-entropy and I’ve used a custom Adam optimizer (learning rate= 0.0002 ).

Note: Default Adam learning rate ( 0.001 ) is too high for GANs so always customize the Adam optimizer.

Code:

The discriminator

Architecture:

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 14, 14, 64)        1664      
_________________________________________________________________
leaky_re_lu_2 (LeakyReLU)    (None, 14, 14, 64)        0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 5, 5, 128)         204928    
_________________________________________________________________
leaky_re_lu_3 (LeakyReLU)    (None, 5, 5, 128)         0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 3200)              0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               819456    
_________________________________________________________________
leaky_re_lu_4 (LeakyReLU)    (None, 256)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 1)                 257       
=================================================================
Total params: 1,026,305
Trainable params: 1,026,305
Non-trainable params: 0
_________________________________________________________________

3. Combine G & D

According to the original GAN paper , we have to train the generator and the discriminator separately. Then why this step?

The discriminator can be trained directly by back-propagating the loss computed at the last sigmoid layer. But for training the generator, we need to send this loss back to the generator without affecting the weights of the discriminator!

One way to achieve this is by creating a new model by stacking the generator and discriminator. And this is why I didn’t compile the generator before. Let’s call the new model gan . It takes the noise vector as input then passes it through the generator to create a fake image. Then the image is passed through the discriminator that computes the probability of it being an original image. When we would train this gan, the discriminator should not learn anything. Hence, ‘discriminator.trainable = False’. Only the weights of the generator will be modified.

Code:

generator + discriminator = gan

Architecture:

Model: "model_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 100)               0         
_________________________________________________________________
sequential_1 (Sequential)    (None, 28, 28, 1)         6595073   
_________________________________________________________________
sequential_2 (Sequential)    (None, 1)                 1026305   
=================================================================
Total params: 7,621,378
Trainable params: 6,580,353
Non-trainable params: 1,041,025
_________________________________________________________________

4. Train

Finally, we are ready to train our GAN! Does the code look weird to you? Don’t worry, I’m gonna explain each step.

Code:

Training loop for GAN

The outer is for traversing through the epochs and the inner one is for batches. I’ve trained the models for 80 epochs and the batch_size is 128. So, in one epoch we will have 2909 (steps_per_epoch = ⌊ no. of data samples/batch_size⌋ = ⌊372,450/128⌋ = 2909) steps.

Train D while G is fixed:

First, the batch_size number of noise vectors are formed by drawing numbers randomly from a standard normal distribution. Then these vectors are given to the generator to create fake images. Now we draw the batch_size number of real images from training data. To get the input to the discriminator we need to concatenate the fake and the real data. Accordingly, we need to mention the label vector (0: fake data, 1: real data). But wait, the code says 0.1 and 0.9 instead! WTH is going on?

This technique is called level smoothing . It prevents the discriminator from being overconfident about its prediction.

Then we call the train_on_batch function for the discriminator and pass the data-label pairs.

Train D while G is fixed:

Here, we need only the noise vectors and labels. The label vector contains 1's. Wait, the generator makes fake data so shouldn’t the labels be 0?

Yes. But here we are deliberately giving wrong labels so that the discriminator makes mistakes. The reason being we want the generator to outperform the discriminator. By doing this G will know how D behaves when it is given real labels and it (G) will change its weights accordingly to fool D. Remember that at this stage, we are not changing the weights of the discriminator so the discriminator is not ‘unlearning’ anything.

Now we call the train_on_batch function for the generator and pass the data-label pairs. And that’s friends, how a GAN is trained!


以上所述就是小编给大家介绍的《Alphabet GAN: AI Generates English Letters!》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

程序设计实践

程序设计实践

[美] BRIAN W.KERNIG / 裘宗燕 / 机械工业出版社 / 2007-1 / 59.00元

从排错、测试、性能、可移植性、设计、界面、风格和记法等方面,讨论了程序设计中实际的、又是非常深刻和具有广泛意义的思想、技术和方法,它的翻译出版将填补国内目前这方面书籍的空白。《程序设计实践》(双语版)值得每个梦想并努力使自己成为优秀程序员的人参考,值得每个计算机专业的学生和计算机工作者阅读,也可作为程序设计高级课程的教材或参考书。一起来看看 《程序设计实践》 这本书的介绍吧!

JS 压缩/解压工具
JS 压缩/解压工具

在线压缩/解压 JS 代码

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

XML、JSON 在线转换
XML、JSON 在线转换

在线XML、JSON转换工具