How to create fake images on a computer using GAN’s

7 min readFeb 23, 2019

Now we’ve all been in that spot in kindergarten when it’s art time. You look around and you see all the other kids drawing nice little pictures that look like this:

But you're embarrassed because you can’t draw. So you end up trying to hide your picture well your drawing and long story short it looks like this:

Don’t worry we’ve all been there. I have too. But what if I told you that you could create realistic looking pictures of fake people with just a laptop and some code.

This is what the pictures would look like

Now as you can tell a lot of them are deformed and some are blurry but I have a feeling this is way more realistic than your kindergarten drawings. They definitely outmatch mine.

Plus, the fact that a computer can generate these images of people that aren’t even real is insane!

So how do these computers come up with the magical images of fake people? It’s all thanks to something called GAN’s.

What are GAN’s and how exactly do they work?

GAN’s are a machine learning model that is used to map the distribution of data. So, for example, all faces usually have the same distribution of features. So the eyes, nose, mouth, etc are generally in the same place on everyone. So the GAN’s are able to figure out what makes a face and the general area of all the features on the face.

I know this might seem weird. That someone with no artistic ability like me can just all of a sudden create an almost realistic picture of a fake person with just a laptop and some code. So how exactly did I do this?

Adversarial Training:

To be able to create these high-quality fake images GANs use a clever training method. This is made up of two competing neural networks. A generator network and a discriminator network.

The generator generates the pictures and the discriminator is then given real training images, in this case, a bunch of faces, and the images the generator generated.

Then it tries to determine what ones are real and what was created by the generator.

This is the adversarial part of GAN’s.

This is a diagram showing the flow of a GAN

In the beginning, both neural networks basically suck. But over time they continue to train and improve. With every iteration, the generator starts to learn what types of images are fooling the discriminator and starts to generate more of those images. But, the discriminator is also getting better at telling what ones are real images vs fake images.

Minimax game:

When one network (discriminator) is trying to maximize or increase the chances of it being correct, while the other network (generator) is trying to fool or minimize the chances of it being right its called a minimax game.

The discriminator returns a number between 1 and 0 showing the probability of it being real. 1 is real and 0 is fake. Sometimes the results are correct and other times it’s not but each time it’s learning more about them. If the picture the generator generates gets a probability of 50% (0.5) or higher from the discriminator it has officially fooled it. When the GAN reaches equilibrium it means the generator’s fake images of people are so good that they look exactly like the images from the dataset, and the discriminator has no choice but to randomly guess whether an image is real or not (therefore returning 0.5).

Deep Convolutional GANs

One of the best GAN models for generating these images is called a Deep Convolutional GAN (DCGAN). A DCGAN incorporates convolutional layers into the GAN to help it run more efficiently.

What are Convolutional Layers?

So what exactly are these convolutional layers? Well, they are just a way to get specific features from an image in a power efficient way. Every neuron in the layer scans a different part of the image (instead of every neuron looking at every pixel) and it extracts different features. The more layers you stack together and use, the more complex the features that you extract will be.

With one layer you could pick up a simple feature like a line but with 5 you could pick up an entire face. So convolutional layers in a GAN, give you the ability to train and run your network much faster than without them.

How I did this

I was able to generate realistic images by using PyTorch Deep Learning framework. Here is what I did.

The discriminator:

class Discriminator(nn.Module):   
       def __init__(self, ngpu):        
           super(Discriminator, self).__init__()        
           self.ngpu = ngpu       
           self.main = nn.Sequential(          
               # input is (nc) x 64 x 64            
               nn.Conv2d(nc, ndf, 4, 2, 1, bias=False), 
               nn.LeakyReLU(0.2, inplace=True),                                    
               # state size. (ndf) x 32 x 32           
               nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False), 
               nn.BatchNorm2d(ndf * 2),                     
               nn.LeakyReLU(0.2, inplace=True),
               # state size. (ndf*2) x 16 x 16                                                    
               nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False),       
               nn.BatchNorm2d(ndf * 4),                  
               nn.LeakyReLU(0.2, inplace=True),          
               # state size. (ndf*4) x 8 x 8           
               nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False),   
               nn.BatchNorm2d(ndf * 8),                       
               nn.LeakyReLU(0.2, inplace=True),         
               # state size. (ndf*8) x 4 x 4          
               nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False),      
               nn.Sigmoid()       
           )    
      def forward(self, input):      
          return self.main(input)

So, as you can see in the code above, the discriminator is using convolutional layers as well as a batch of normalized layers to help speed up the network. Then the Leaky ReLU activation functions are used to determine whether an image is real or not. Next, it uses a sigmoid function to put the output into a probability between 0 and 1.

The initial image which is on the left goes through a bunch of Convolutional layers shown by the blocks before it reaches the layer at the where the network outputs whether the image is real or fake (1 for real, 0 for fake).

The generator:

class Generator(nn.Module):    
     def __init__(self, ngpu):   
         super(Generator, self).__init__()     
         self.ngpu = ngpu      
         self.main = nn.Sequential(    
             # input is Z, going into a convolution 
             nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False), 
             nn.BatchNorm2d(ngf * 8),                                      
             nn.ReLU(True),          
             # state size. (ngf*8) x 4 x 4   
             nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False),  
             nn.BatchNorm2d(ngf * 4),                     
             nn.ReLU(True),         
             # state size. (ngf*4) x 8 x 8       
             nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False),                   
             nn.BatchNorm2d(ngf * 2),
             nn.ReLU(True),          
             # state size. (ngf*2) x 16 x 16       
             nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False),                    
             nn.BatchNorm2d(ngf),
             nn.ReLU(True),      
             # state size. (ngf) x 32 x 32     
             nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False),                 
             nn.Tanh()
             # state size. (nc) x 64 x 64       
          )    
    def forward(self, input):       
        return self.main(input)

So as you can see the generator has a similar structure but it's inverse.

It starts with a vector of 100 random values between -1 and 1. Then it puts the vector through a bunch convolutional transpose layers (as well as some normalization and activation functions) to turn the vector into an image.

A convolutional transpose layer does the opposite of a convolutional layer. Instead of mapping a group of values into 1 value it maps 1 value into a group of values.

For example, a convolutional layer would turn the numbers 1, 2, 3, 4, 5, into 3, and a convolutional transpose layer would turn the number 3 into the numbers 1, 2, 3, 4, 5.

Every image the generator produces is original because it uses the starting vector (the 100 random values) like a seed which is then mapped to the statistical distribution of the dataset.

During the training, the generator has to figure out the best ways to transform the vectors into the fake faces.

Training

For the training process, I trained the GAN’s in this process.

Use the discriminator to classify a bunch of real photos
Edit the discriminator based on the results
Use the discriminator classify a bunch of fake photos
Adjust the discriminator and generator based on the results
Repeat

This process lets the discriminator and the generator to improve and learn at the same time. If one network is significantly better than the other the other network has a hard time improving so using this process makes it a lot easier.

Future and applications of GAN’s

GAN’s have tons of applications and a huge future. They can do things like making music, and create fake images of just about anything but there are also some very useful applications.

For example, GAN’s can help you make bank.

this is a real picture that a GAN created

This picture was sold for $432,500 and it’s not even real. So you can use GAN’s to make money.

They also have some other applications like commercial use and in the medical and health care field.

Highlights

Takeaways

GANs are a newer field in machine learning but they are very promising and definitely something to be aware of.

Here are some takeaways and things to keep in mind about GAN’s:

A GAN uses two competing neural networks to generate data that closely resembles the training data.
The discriminator network tries to determine whether the image it’s given is real or fake.
The generator tries to fool the discriminator into thinking its images are real.

GANs have a ton of potential in commercial and medical applications making them a very promising technology. They are also super cool. I think that they are something companies and researchers should look into and invest in soon.