From edge2cat to edge2anything with TensorFlow

Unless you have been hiding under a rock for the past few months, you have likely seen Christopher Hesse’s demo of image-to-image translation (a Tensorflow port of pix2pix by Isola et al.). In case you missed it, search for edge2cat, and a whole new world of cat-infused artificial intelligence will be opened to you. The model is trained on cat images, and it can translate hand drawn cats to realistic images of cats! Here are a few of our personal favorite “edge” image to cat translations generated by Chris’s model, ranging from accurate to horrifying:

But models like this aren’t restricted to creating weird cat images that take over the Internet. They are actually very generic and can be trained to perform a variety of translations including day to night, building layout to buildings, black and white to color, and many more:

You might even be thinking that you want to try using your own set of images to see what kinds of crazy things this model will start pumping out. But, that has to be a tricky process, right?

Nope! Pachyderm has created a totally reusable and generic pipeline that takes care of all the training, pre-processing, etc. for you, so you can jump right into the fun parts! They utilize this machine learning pipeline template (produced by the team at Pachyderm in collaboration with Chris) to show how easy it can be to deploy and manage image generation models (like those pictured above). Everything you need to run the reusable pipeline can be found here on Github, and is described below.

The Model

Christopher Hesse’s image-to-image demos use a Tensorflow implementation of the Generative Adversarial Networks (or GANs) model presented in this article. Chris’s full Tensorflow implementation of this model can be found on Github and includes documentation about how to perform training, testing, pre-processing of images, exporting of the models for serving, and more.

In this post we will utilize Chris’s code in that repo along with a Docker image based on an image he created to run the scripts (which you can also utilize in your experiments).

The Pipeline

To deploy and manage the model, we will execute it’s training, model export, pre-processing, and image generation in the reusable Pachyderm pipeline mentioned above. This will allow us to:

Keep a rigorous historical record of exactly what models were used on what data to produce which results.

Easily revert to other versions of an ML model when a new model is not performing or when “bad data” is introduced into a training data set.

The general structure of our pipeline looks like this:

The cylinders represent data “repositories” in which Pachyderm will version training, model, etc. data (think “git for data”). These data repositories are then input/output of the linked data processing stages (represented by the boxes in the figure).

Getting Up and Running with Pachyderm

You can experiment with this pipeline locally using a quick local installation of Pachyderm. Alternatively, you can quickly spin up a real Pachyderm cluster in any one of the popular cloud providers. Check out the Pachyderm docs for more details on deployment.

Once deploy, you will be able to use the Pachyderm’s pachctl CLI tool to create data repositories and start our deep learning pipeline.

Preparing the Training and Model Export Stages

A “train” mode that we will use to train our model on a set of paired images (such as facades paired with labels or edges paired with cats). This training will output a “checkpoint” representing a persisted state of the trained model.

An “export” mode that will then allow us to create an exported version of the checkpointed model to use in our image generation.

Thus, our “Model training and export” stage can be split into a training stage (called “checkpoint”) producing a model checkpoint and an export stage (called “model”) producing a persisted model used for image generation:

Supply Pachyderm with a JSON specification, training_and_export.json, telling Pachyderm to: (i) run Chris’s pix2pix.py script in “train” mode on the data in the “training” repository outputting a checkpoint to the “checkpoint” repository, and (ii) run the pix2pix.py script in “export” mode on the data in the “checkpoint” repository outputting a persisted model to the “model” repository. This can be done by running pachctl create-pipeline -f training_and_export.json.

Preparing the Pre-processing and Image Generation Stages

Next, let’s prepare our pre-processing and image generation stages. Our trained model will be expecting PNG image data with certain properties (256 x 256 in size, 8-bit/color RGB, non-interlaced). As such, we need to pre-process (specifically resize) our images as they are coming into our pipeline, and Chris has us covered with a process.py script to perform the resizing.

To actually perform our image-to-image translation, we need to use a process_local.py script. This script will take our pre-processed images and persisted model as input and output the generated, translated result:

Supply Pachyderm with another JSON specification, pre-processing_and_generation.json, telling Pachyderm to: (i) run the process.py script in on the data in the “input_images” repository outputting to the “preprocess_images” repository, and (ii) run the process_local.py with the model in the “model” repository and the images in the “preprocess_images” repository as input. This can be done by running pachctl create-pipeline -f pre-processing_and_generation.json.

Putting it All Together, Generating Images

Now that we have created our input data repositories (“input_images” and “training”) and we have told Pachyderm about all of our processing stages, our production-ready deep learning pipeline will run automatically when we put data into “training” and “input_images.” It’s just works.

Chris has provides a nice guide for preparing training sets here. You can use cat images, dog images, buildings, or anything that might interest you. Be creative and show us what you come up with! When you have your training and input images ready, you can get them into Pachyderm using the pachctl CLI tool or one of the Pachyderm clients (discussed in more detail here).

For some inspiration, we ran Pachyderm’s pipeline with Google map images paired with satellite images to create a model that translates Google map screenshots into pictures resembling satellite images. Once we had our model trained, we could stream Google maps screenshots through into the pipeline to create translations like this:

Conclusions/Resources

Prepare your training data set, deploy the above template pipeline, and be sure to share your results! We can’t wait to see what crazy stuff you come up with.

Be sure to:

Check out this GitHub repo to get the above reference pipeline specs along with even more detailed instructions.