This document is an introduction on how to use the Sketch RNN model in Javascript to generate images. The Sketch RNN model is trained on stroke-based vector drawings. The model is able to handle unconditional generation of vector images.

Alternatively, the model can also act as an autoencoder. We can feed in an existing image into the model's encoder, and obtain a vector of 128 floating point numbers (the latent vector) that represents this image. We can take this vector and feed it into the model's decoder, to generate a similar looking image.

Pre-trained models

We have provided around 100 pre-trained sketch-rnn models. We have trained each model either as the decoder model only (with a .gen.json extension), or as the full variational auto-encoder model (with a .vae.json extension). Use the vae model if you plan on using latent vectors, otherwise use the gen models.

You should load and parse the json files using whatever method you are comfortable with, and pass the parsed blob into the constructor of the sketch_rnn object. We will get to this later.

We found it sometimes simpler to copy the contents of the json file and place it into an inline .js code so that the demo loads the model synchronously. Some of our examples below will do this.

Run Pre-built Examples

There are a number of proof-of-concept demos built to use the Sketch RNN model. You can look at the corresponding code to study in detail how the model works. To run these examples, it is recommended to use a simple local webserver, such as the http-server that can be obtained using npm, and load the local html file from the local server. Some examples require this, since they need to dynamically load .json model files, and local static session doesn't allow for this in many browsers.

If you use the http-server, running something would be like putting http://127.0.0.1:8080/basic_predict.html in the address tab in Chrome. For debugging, it is recommended you open a console tab on the side of the screen to look at the log messages.

1) basic_vae.html / basic_vae.js

This basic demo will generate an unconditional images on the web page given random latent vectors. In addition, we demonstrate what an image looks like if we average the two latent vectors.

2) basic_predict.html / basic_predict.js

Similar to basic_grid, this demo will keep on generating random vector images unconditionally. Unlike basic_vae, each point is generated per time frame (at 30 or 60 fps), while basic_vae generates the entire image at once. In basic, you can adjust the "temperature" variable, which controls the uncertainty of the strokes.

3) simple_predict.html / simple_predict.js

This demo is also generates unconditionally, attempting to finish the drawing that the user starts.
If the user doesn't draw anything, the computer will keep on drawing stuff from scratch.

Hitting restart will clear the current human drawing and start from scratch.

In this demo, you can also select other classes, like "cat", "snail", "duck", "bus", etc. The demo will dynamically load the json files in the models directory but cache previously loaded json models.

4) predict.html / predict.js

Same as the previous demo, but made to be interactive so the user can draw the beginning of a sketch on the canvas. Similar to the first AI experiment.

5 interp.html / interp.js

This demo uses the conditional generative model, and samples 2 different images (using 2 latent space vectors encoded by samples from the evaluation set). These 2 auto-encoded images will be displayed at two sides of the screen, and the images generated in between the 2 sides will be the interpolated images based off linear-interpolation of the 128-dim latent vectors. In the future, for better effect, spherical interpolation rather than linear can be used.

6) multi_vae.html / multi_vae.js

The demo is a variational autoencoder built to mimic your drawings and produce similar drawings. You are to draw a complete drawing of a specified object. After you draw a complete sketch inside the area on the left, hit the auto-encode button and the model will start drawing similar sketches inside the smaller boxes on the right. Rather than drawing a perfect duplicate copy of your drawing, the model will try to mimic your drawing instead. You can experiment drawing objects that are not the category you are supposed to draw, and see how the model interprets your drawing. For example, try to draw a cat, and have a model trained to draw crabs generate cat-like crabs.

7) multi_predict.html / multi_predict.js

The demo is similar to simple_predict. In this version, you will draw the beginning of a sketch inside the area on the left, and the model will predict the rest of the drawing inside the smaller boxes on the right. This way, you can see a variety of different endings predicted by the model. You can also choose different categories to get the model to draw different objects based on the same incomplete starting sketch. For example, you can get the model to draw things like square cats or circular trucks. You can always interrupt the model and continue working on your drawing inside the area on the left, and have the model continually predict where you left off afterwards.

8) simplify_lines.html / simplify_lines.js

This one does not use a machine learning model at all. We demonstrate how data_tool.js is used to help us simplify lines. When you draw something on the screen, after you release the mouse, the line you have just drawn will be automatically simplified using the RDP algorithm with an epsilon parameter of 2.0. All models are trained to assume simplified line data with epsilon 2.0, so for best effect it is wise to convert all input data with DataTool.simplify_lines() function (a very efficient JS implementation of RDP), before using DataTool.lines_to_strokes() to convert to stroke-based dataformat for sketch_rnn.js model to process.

Usage of Sketch RNN model

Pre-trained weight files

The RNN model has 2 modes: unconditional and conditional generation. Unconditional generation means the model will just generate a random vector image from scratch and not use any latent vectors as an input. Conditional generation mode requires a latent vector (128-dim) as an input, and whatever the model generates will be defined by those 128 numbers that can control various aspects of the image.

Whether conditional or not, all of the raw weights of these models are individually stored as .json files inside the models directory. For example, for the 'butterfly' class, there are 2 models that come pretrained:

butterfly.gen.json - unconditional model

butterfly.vae.json - conditional model

In addition to the neural network weight matrices, there are several meta information stored in each of these files, including the version of the model, name of the class, the actual reconstruction and KL losses obtained for the evaluation set, the size of the training set used, the scale factor used to normalize the data, etc.

Some of these models are also stored for convenience as .js format, in case you just want to load a single model synchronously within the context of a pure static website demo.

Sketch RNN

The main model is stored inside sketch_rnn.js. Before using the model, you need some method to import the desired .json pre-trained weight file, and parse that into a JS object first.

Currently, once you create a model, you cannot replace the weights with another JSON file, and must instead destroy this object and create another new SketchRNN object using another model_data.

To view the meta information for the pre_trained weights, just do a console.log(model.info) to dump it out.

Scale Factors

When training the models, all the offset data has been normalized to have a standard deviation of 1.0 on the training set, after simplifying the strokes. Neural nets work best when training on normalized data. However, the original data recorded with the QuickDraw web app stored everything as pixels, which was scaled down so that on average the stroke offsets are ~ 1.0 length. Thus each dataclass has its own scale_factors to scale down, and these numbers are usually between 60 to 120 depending on the dataset. These scale factors are stored into model.info.scale_factor. The model will assume all inputs and outputs to be in pixel space, not normalized space, and will do all the scaling for you. You can modify these in the model using model.set_scale_factor(), but it is not recommended. Rather than overwriting the scale_factor, modify the pixel_factor instead, as described in the next paragraph.

If using PaperJS, it is recommended that you leave everything as it is. When using P5.JS, all the recorded data looks much bigger compared to the original app by a factor of exactly 2, and this is likely due to anti-aliasing functionality of web browsers. Hence the extra scaling factor for the model called pixel_factor. If you want to make interactive apps and receive realtime drawing data from the user, and you are using PaperJS, it is best to set do a model.set_pixel_factor(1.0). For p5.js, do a model.set_pixel_factor(2.0). For non-interactive applications, using a larger set_pixel_factor will reduce the size of the generated image.

Line Data vs Stroke Data

Data collected by the original quickdraw app are stored in the below format, which is a list of list of ["x", "y"] pixel points.

With the data_tool.js, this Line Data format must be first simplified using simplify_lines or simplify_line (depending if it is a list of polylines or just a single polyline) first. Afterwards, the simplified line will be fed into lines_to_strokes to convert into the Stroke Data format used by the model.

In the Stroke Data format, we assume the drawing starts at the origin, and store only the offset points from the previous location. The format is 2 dimensional, rather than 3 dimensional as in the Line Data format:

Each row of the stroke will be 5 elements:

[dx, dy, p0, p1, p2]

dx, dy are the offsets in pixels from the previous point.

p0, p1, p2 are binary values, and only one of them will be 1, the other 2 must be 0.

p0 = 1 means the pen stays on the paper at the next stroke.
p1 = 1 means the pen will is now above the paper after this stroke. The next stroke will be the start of a new line.
p2 = 1 means the drawing has stopped. Stop drawing anything!

The drawing will be decomposed into a list of [dx, dy, p0, p1, p2] strokes.

The mapping from Line Data to Stroke Data will lose the information about the starting position of the drawing, so you may want to record LineData[0][0] to keep this info.

Unconditional Generation of Vector Images

Unconditional Generation - Everything at once

Now that the preliminaries of data format and line simplification is out of the way, let's generate some vector images.

The most basic way to generate a vector image is to use an unconditional model, ie loading ant.gen.json into model_data and creating a model = new SketchRNN(model_data);

To generate an entire drawing, as stroke data format:

var example =model.generate();

And draw that out using your favourite method onto the canvas, or as svg's. That's it!

There are more bells and whistles though. You can specify a temperature parameter to specify the uncertainty and amount of variation of the image. I recommend keeping this parameter between 0.1 to 1.0.

If you have written a simple draw_example routine like me (i.e. in basic_predict.js), and want to center and scale the image before rendering it, there are some tools in the model to do this.

Say you have generated a cat using already using example = model.generate(temperature), and want to draw that cat into a 100x100px bounding box between (10, 50) and (110, 150). You can scale the image first, and then center it before plotting it out.

This will draw a scaled drawing to fill the bounding box and draw it at the center. Note that this creates a new list called mod_example to store the modified version in order to keep the original example list unmodified.

Unconditional Generation - One Stroke at a Time

If you want to get the model to generate a stroke at a time, you can use the previous method to pre-generate the entire image, and then plot it out once every 1/60 seconds. Alternatively, you may want to distribute the computing power and generate on the fly. This is useful for interactive applications.

To generate a stroke at a time, let's study basic.js, a p5.js example. Almost pseudo-code:

In the above example, using p5.js framework, the setup method is called first to initialize everything. Afterwards, draw() is called 60 times a second, until noLoop() is called when we finish. If you want to use the same model to draw other things again in the same session, just reinitialize the rnn_state like in the setup() function. You should use another routine like init() or restart() to do this and not rely on the p5.js setup() routine.

Variational Autoencoder - Conditional Generation of Vector Images

In this section, you will see how to use the model to encode a given vector image into a 128-dimension vector of floating point numbers (the "latent vector" Z), and also how to take a given Z (which can be either previously encoded, modified by the user, or even entirely generated), and decode it into a vector image.

To create a model, say from the cat class, you must choose between one of ant_vae.json.

Encoding a Vector Image into Latent Space

The encoding function, by itself, may be useful for t-sne or clustering applications. To encode an image from the raw quickdraw data, it must first be converted to stroke data format as described earlier using DataTool object.

removal of "x", "y" from data, and put into list of polylines in [x, y]'s

simplify the line with DataTool.simplify_lines

convert the line to stroke format using DataTool.lines_to_strokes

After this process, say if you store the final example in a variable called example, you can encode this example to latent space using:

var z =model.encode(example);

Unlike the traditional VAE paper, z is deterministic for a given example. If we want to encode like the original VAE, and make z be a random variable, you can use an optional temperature element:

The 2nd method provides more artistic variation, but which is best for you depends on your application. If you are doing clustering and prefer more certainty, then the default method may be better and easier to debug.

If you collect a group of z's, you can do PCA or t-sne or other clustering methods to analyze your data, and even use the z's for classification. As we may upgrade the model weights in the future, each model has a versioning system stored in model.info.version, so you may want to keep track of the model version with the z's of each class if you intend to save them to use at a later point.

Decoding a Latent Vector into a Vector Image

Assume you obtained z earlier via encoding, you can convert it back into a vector image like in the below:

The process of reconstruction is also a stochastic process. This means for a given z, you can, running model.decode a few times will give you different reconstruction images. Keeping temperature = 0.01-0.1 will give you generally very similar images and this is useful for animation applications.

For models with very low KL loss, ie < 0.50, you can even sample z from a gaussian distribution and use that z to produce a legit vector image. To sample z from a gaussian distribution: