🧐

GAN3: Apply GANs

Some notes of GAN(Generative Adversarial Network) Specialization Course by Yunhao Cao(Github@ToiletCommander)

👉
Note: This note has content including week 1-3 (version as of 2022/6/16)

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Acknowledgements: Some of the resources(images, animations) are taken from:

  1. Sharon Zhou & Andrew Ng and DeepLearning.AI’s GAN Specialization Course Track
  1. articles on towardsdatascience.com
  1. images from google image search(from other online sources)

Since this article is not for commercial use, please let me know if you would like your resource taken off from here by creating an issue on my github repository.

Last updated: 2022/6/16 14:15 CST

Applying GANS

Gans can be used to

  1. Image-to-image translation
  1. Data augmentation
    1. supplement data when real data is too expensive or rare
    1. traditionally done by cropping, rotating, flipping images, etc.
    1. now we can use generators
    1. RandAugment paper to tell how to augment based on scenerio
    1. GAN has been shown to be better than synthetic data, plus it can be used to generate labeled examples
      1. Encourages data-sharing, less expensive, and protects real data
      1. looks real to professionals(pathogen doctor) eyes who look at samples every day
    1. but GAN’s diversity is always limited by the data available
      1. so GANS can generate samples that are too close to the reals

Image-to-image translation

🔥
Translating image from one style to another
  1. Black-white image to colored?
  1. Segmentation Map to photorealistic images
  1. Paired image-to-image translation
    1. Image to image
  1. Other Translations
    1. Text to image?

Unpaired image-to-image translation

With paired image-to-image translation, you get a one-to-one correspondence between the input and the output while in unpaired image-to-image translation the input and the output are generally two different piles of image that have different styles

So in an unpaired image translation task, you want to

  1. Still learn a mapping between the two piles
  1. Examine the common elements of the two piles (content) and unique elements of each pile (style)
🔥
So we introduce CycleGAN, see below for detail of this network

Pix2Pix

Paired Image-to-Image Translation Model from UC Berkeley, Yeah!!!!

Instead of inputting a noise vector, we would input a real input (segmentation map, etc.) and it would translate into another paired output. Noise vector is not introduced but we would use dropout to add randomness to the output.

Generator
Discriminator
THe generator is upgraded with an “U-Net” and the discriminator is upgraded to give specific feedback about some areas

Pix2Pix Discriminator - PatchGAN

  1. Outputs a matrix of evaluations instead of just one value, the values of each element in the matrix is between 0 to 1.
    1. gives feedback to specific “path”
    1. value closer to 0 ⇒ fake
    1. value closer to 1 ⇒ real

Pix2Pix Generator - UNet

UNet has been very successful for image segmentation

See my notes on DL Specialization course 4
DLS 4: CNN
Another illustration of reverse convolution(with kernel being 3*3, stride being (2,2)), where the right(green) layer is the input layer, grey layer the applied filter with the subscript denoting the value in the filter, and left(blue) layer denoting the output layer, the dotted parts of the output layer is the padding.
https://toiletcommander.github.io/Opensourced-Study-Notes-Berkeley/DLSpecialization/Notes/4-CNN/#13c4df75-9ae6-496e-943c-33233abce034

But for segmentation and generation, they are different

  1. There is not a correct answer for generation
    1. for a car area, we can generate a car, a truck, a bus, etc.
  1. Since we picked up an image instead of a noise vector, the generator has to be beefier.
  1. skip connections to help vanishing gradient problem, and bottlenet helps to embed information
Each “Block” ⇒ Conv + BatchNorm + LeakyReLU
Each “Block” ⇒ Trans Conv + BatchNorm + ReLU

Dropout is added to some decoder blocks(first 3 blocks) ⇒ adds noise to the network

At inference time, dropout is actually disactivated and scaling is used to maintain the stability of distribution

PixelDistance Loss Term

An additional loss term for pix2pix model

Objective

mingmaxdL(g,d)\min_g \max_d L(g,d)

So loss function for pix2pix model

L(g,d)=Adversarial Loss (BCE Loss/W Loss)+λ×Pixel loss termL(g,d)=\text{Adversarial Loss (BCE Loss/W Loss)}+\lambda \times \text{Pixel loss term}

The Pixel Loss is equal to...

Comes handy for getting exactly close to what is real

Added layer of supervision also...which is kind of bad (but very soft, since its only abs not squared distance)

Successors: Pix2PixHD, GauGAN

CycleGAN

Unpaired image-to-image translation model

CycleGAN has two generator models (and two PatchGAN discriminators), one to transfer from one style to another and the other model going the opposite way. Therefore they form a cycle. The generators look very similar to a U-Net structure + bottleneck section interconnected with skip connections of DCGAN.

Generator

Loss Terms

All loss terms have to be applied to both of the two generators ⇒ 6 loss terms in the entire loss function!

Adversarial Loss - Squared Loss

We use least square loss in adversarial loss term for CycleGAN because

  1. Least square do not have extreme behaviors of vanishing gradient problem like BCE loss
    1. gradient is only flat when prediction is exactly correct
  1. It is the easiest function to calculate and optimize if we want to bring fake close to real (it considers outliers)

Discriminator Loss:

Ex[(d(x)1)2]+Ez[(d(g(z))0)2]\mathbb{E}_x[(d(x)-1)^2]+\mathbb{E}_z[(d(g(z))-0)^2]

Generator Loss:

Ez[(d(g(z))1)2]\mathbb{E}_z[(d(g(z))-1)^2]

Cycle Consistency

We want the cycle generation to be similar to the original one, only style should have changed so it is appropriate to take the pixel distance.

Ah, plus we also want a two-way cycle consistency so the entire term should be

Identity Loss

Additional, optionally added loss term added to the loss function

If we put horse into a zebra-to-horse translator, we would expect the horse to come out still as a horse. This technique:

  1. Discourages Z ⇒ H mapping to distort color
  1. is two way as well, just like the cycle consistency loss term