Tag: autoencoders

I have recently become fascinated with (Variational) Autoencoders and with PyTorch.

Kevin Frans has a beautiful blog post online explaining variational autoencoders, with examples in TensorFlow and, importantly, with cat pictures. Jaan Altosaar’s blog post takes an even deeper look at VAEs from both the deep learning perspective and the perspective of graphical models. Both of these posts, as well as Diederik Kingma’s original 2014 paperAuto-Encoding Variational Bayes, are more than worth your time.

In my case, I wanted to understand VAEs from the perspective of a PyTorch implementation. I started with the VAE example on the PyTorch github, adding explanatory comments and Python type annotations as I was working my way through it.

This post summarises my understanding, and contains my commented and annotated version of the PyTorch VAE example. I hope it helps!

What is PyTorch?

PyTorch is FAIR’s (that’s Facebook AI Research) Python dynamic deep learning / neural network library. The way that FAIR has managed to make neural network experimentation so dynamic and so natural is nothing short of miraculous. Read this post by fast.ai to find out more about their reasons for excitement, many of which I share.

What is an autoencoder?

The general idea of the autoencoder (AE) is to squeeze information through a narrow bottleneck between the mirrored encoder (input) and decoder (output) parts of a neural network. (see the diagram below)

Because the network achitecture and loss function are setup so that the output tries to emulate the input, the network has to learn how to encode input data on the very limited space represented by the bottleneck.

What is a variational autoencoder?

Variational Autoencoders, or VAEs, are an extension of AEs that additionally force the network to ensure that samples are normally distributed over the space represented by the bottleneck.

They do this by having the encoder output two n-dimensional (where n is the number of dimensions in the latent space) vectors representing the mean and the standard devation. These Gaussians are sampled, and the samples are sent through the decoder. This is the reparameterization step, also see my comments in the reparameterize() function.

What a fabulous trick!

The loss function has a term for input-output similarity, and, importantly, it has a second term that uses the Kullback–Leibler divergence to test how close the learned Gaussians are to unit Gaussians.

In other words, this extension to AEs enables us to derive Gaussian distributed latent spaces from arbitrary data. Given for example a large set of shapes, the latest space would be a high-dimensional space where each shape is represented by a single point, and the points would be normally distributed over all dimensions. With this one can represent existing shapes, but one can also synthesise completely new and plausible shapes by sampling points in latent space.

Results using MNIST

Below you see 64 random samples of a two-dimensional latent space of MNIST digits that I made with the example below, with ZDIMS=2.

Next is the reconstruction of 8 random unseen test digits via a more reasonable 20-dimensional latent space. Keep in mind that the VAE has learned a 20-dimensional normal distribution for any input digit, from which samples are drawn that reconstruct via the decoder to output that appear similar to the input.

A diagram of a simple VAE

An example VAE, incidentally also the one implemented in the PyTorch code below, looks like this:

A simple VAE implemented using PyTorch

I used PyCharm in remote interpreter mode, with the interpreter running on a machine with a CUDA-capable GPU to explore the code below. PyCharm parses the type annotations, which helps with code completion. I also made extensive use of the debugger to better understand logic flow and variable contents. (Debuggability is one of PyTorch’s strong points.)

Let me know in the comments to this post if you have any suggestions on how the code comments could be further improved.