GAN Authoring Made Easy …

Figure 1: Can you identify which ones are real celebrity images while which ones were generated by a generative model?

Look at the above eight images. Can you identify which ones are real images of celebrities while which ones are generated by a generative model? (Scroll to the end of this post for the answer. Read through the post for more information!)

Automated generative models have progressed a lot over the last decade [1], [2]. A latest and very successful variant of generative models in the market today is called as Generative Adversarial Networks aka GANs. Ever since its inception in 2014 [3], there has been a lot of discussion and research about GANs (link). To put things in perspective, in the last 4.5 years since GAN was proposed, this paper has received about ~4800 citations. This means there are more than 1000 scientific papers and documents per year that cite this work. In other words, there are ~2.9 documents per day that gets published that cites this work! Today, GANs can generate high resolution almost-real looking images from nothing but just random numbers as input.

Now, you all readers will be like “Wow! This is really amazing. Can I start using GANs now?” Err… that’s exactly where the challenge kicks in. To an extent, yes, you can start using GANs now from one of these places: PyTorch-GAN, Keras-GAN, tf-GAN. However, these libraries requires that you are an expert in GAN and well versed in Python and fluent in the corresponding’s library is usage. While knowing all these are not impossible, it takes a paramount of time to pass through the initial learning curve and start playing around with GAN models.

What is missing in the literature is a ready-to-use toolkit that is agnostic of any library or language underneath and gives a very intuitive interface for users to play around with GAN models. The challenge that lies in true democratization is to enable any software engineer and developer to consume GAN models without the need for expert level knowledge.

To solve these set of challenges in consuming GAN models, at IBM Research lab in India, we developed an open source Gan-Toolkit (found here: https://github.com/IBM/gan-toolkit). The features of the toolkit are as follows:

It has a highly modularized version and a language agnostic intuitive representation of GAN based on these modules,

It provide a highly flexible, no-code way of implementing GAN models. The details of a GAN model can be provided as a config file or as command line arguments, and there is no requirement for writing any code!

The following figure shows the highly modularized GAN formulation and the no-code approach of designing GAN architectures.

The highly modularized library agnostic representation of GAN in gan-toolkit.

According to our modular representation, a GAN can be representation by defining the following 6 components:

Generator function

Discriminator function

Loss function

Optimizer function

Training process

Real training data

How to Implement using our Toolkit

Let’s take the example of a popularly used GAN model for images: Deep Convolutional GAN aka DCGAN (link). The DCGAN model’s architecture is visually shown as follows:

As it can be seen, the DCGAN’s generator component is a deconvolution kind of architecture and the discriminator is a CNN, with a binary cross-entropy loss function and an Adam optimizer. The implementation of DCGAN in tensorflow can be found here. It takes roughly about ~500 lines of code in Python written in tensorflow to implement DCGAN. This is not truly possible for all the developers and software engineers.

The config file of the gan-toolkit not only provides abstractions and easy-to-use capabilities. It also provides enough flexibility and customization capabilities. For example, the same DCGAN could be defined as follows:

Comparison with Other Toolkits

Realizing the importance of easiness in training GAN models, there are a few other toolkits available in open source domain such as Keras-GAN, TF-GAN, PyTorch-GAN. However, our gan-toolkit has the following advantages:

Highly modularized representation of GAN model for easy mix-and-match of components across architectures. For instance, one can use the generator component from DCGAN and the discriminator component from CGAN, with the training process of WGAN.

An abstract representation of GAN architecture to provide multi-library support. Currently, we are providing a PyTorch support for the provided config file, while in future, we plan to support Keras and Tensorflow as well. Thus, the abstract representation is library agnostic.

Coding free way of designing GAN models. A simple JSON file is required to define a GAN architecture and there is no need for writing any training code to train the GAN model.

The code and documentation of our GAN Toolkit can be found here: https://github.com/IBM/gan-toolkit. For any queries or issues, please raise an issue in the github or reach out to Anush Sankaran (anussank@in.ibm.com)

Answer to Figure 1: The images in the bottom row are real celebrity face images from public CelebA face dataset (LINK). The images in the top row are generated using a GAN model (described here) trained using a CelebA dataset. These images are generated from nothing but random numbers!