A Visit to Disney’s Magic Kingdom

I just finished giving a short presentation to several thousand screaming fans at the D23 Disney fan convention in Anaheim, California. When I say “screaming fans,” what I mean is Disney fans who were literally screaming at what I had to say.

This already somewhat improbable situation was made all the more surprising by the fact that they were screaming about FindClusters.

Well, technically, most of them may not have actually realized that’s what they were screaming about, because they were seeing only the output of the command, not the actual Mathematica code. But the thing they were so excited about was direct output from Mathematica, and the key differentiating factor that made it so interesting to them was the ability of FindClusters to discern, differentiate, and illuminate the shifting moods and emotions of animated feature films.

By way of background, I should explain that this was a result of spending almost a year writing and designing an iPad app just released by the partnership of Touch Press (my app publishing company), the Disney Interactive Entertainment division, and Walt Disney Animation Studios (the original Disney animation company, now a division tucked away inside “Big Disney”). The app is a big, rich, and deep exploration of the history and current practice of Disney animation. And a surprising amount of it was designed, prototyped, or created with Mathematica (though, as is so often the case, there is no actual Mathematica technology in the shipping product).

In the “designed and prototyped” category are things like a live, physics-based simulation of magical snow modeled on the algorithms used in the production of a particular visual effect in the upcoming Disney feature Frozen (November 2013). The effect involves capturing the path of the user’s finger on the screen, spinning a balanced set of swirls off it, constructing a vector field that drives particles along the path, and finally sending a particle emitter along the paths spewing out virtual snow particles that get swept up in the vector field.

In order to create this effect in real time on an iPad, we of course needed to write highly optimized C and OpenGL code. But before we could do that, we needed to refine the algorithm and parameters through a series of prototypes and meetings with the visual effects supervisors at Disney, to be sure our implementation maintained the spirit of the original, even though it could manage only about a tenth as many snow particles.

That prototyping would have been very painful to do in C code, but by building the prototype in Mathematica, I was able to make adjustments on the spot, and in one case almost completely rewrite the algorithm overnight, then take it back to them the next morning in much improved form.

In the making of animated movies, there is a very well-developed system for communicating visual storytelling ideas before any actual animation is done. Simple sketches are assembled into storyboards, and from those, “animatic” films are made that show the simple sketches playing out in movie form. People need to be able to communicate their ideas quickly, make revisions quickly, and not worry too much about throwing out or rewriting scenes that aren’t working.

The Mathematica prototype of the Frozen snow feature, as well as Mathematica prototypes of several other features in the app, served the same purpose, and were useful for the same reason: Creating final animation, or final deployed C code, is very time consuming and expensive. We have used this technique of code-storyboarding for several Touch Press projects, and it is a powerful bottle on our shelf of secret sauces.

But where Mathematica was really able to shine was in the feature we came to call the Color Maps.

One of the early stages in the making of most movies, and particularly animated movies, is the creation of a “color script.” This is typically a series of several dozen images painted by artists in the visual development department that spell out the color palette of the movie and how that palette should be deployed to communicate different moods, emotions, and actions through the course of the movie. This gives the film a consistent look, and is important to the continuity and character of the film. It is referred to throughout the subsequent stages of production.

I thought it would be fun to see if it’s possible to sort of extract the color script back out of the final film as it was released to the public, to see just how consistent the final product was with the plan. My goal was a single wide, thin image where each vertical stripe represented one frame of the film, start to finish. I hoped that the overall color palette of the movie would become visible in this signature image. This turns out to be rather tricky, and I’m not aware of anyone having done it before in a satisfactory way.

There is a genre of art that consists of entire movies reduced to tiny little thumbnail images arranged in sequence, but this does not serve the purpose of highlighting dominant colors. When you compress an image down to a few dots, all the colors are necessarily blended, which is to say, turned to mud. A frame from the movie that consists of patches of bright primary colors will look no different, when compressed, than a scene consisting mostly of neutral grays and browns.

To see why this must be so, consider that the bright primary red, green, and blue sub-pixels on a computer screen blend invisibly into the particular colors they are asked to represent: No matter how high-resolution an image is, when colors are shown very close to each other, the eye merges them into an average color. You, for example, might think you are reading black text on a white background, but you’re not. Unless you’ve printed out this blog post, you are reading black text on a background consisting of stripes of 100% saturated red, green, and blue.

So micro-thumbnails are no good, and neither are a previous invention known as “movie barcodes,” which similarly blend colors within and between frames. I wanted a way to sort and separate the colors in each frame to create patches of color large enough that the eye would be able to distinguish them.

Unfortunately this idea runs into a basic problem: You can’t sort colors. Because the human eye has three distinct color sensors (red, green, and blue), color is fundamentally a three-dimensional quantity, and there is no linear ordering that brings together “similar” colors. If you sort first by amount of red, for example, then you may bring together wildly different hues and brightnesses. If you sort by hue, then you bring together wildly different degrees of saturation and brightness, and so on. There’s just no way to do it.

Which is frustrating, because particularly in animated movies, and even more particularly in classic hand-drawn animated films, there are very often large expanses of flat, uniform color onscreen for extended periods of time. That’s because in the hand-drawn era it was very difficult to create moving characters that had color gradients in their rendering. They would have had to paint those gradients perfectly, by hand, on hundreds of separate cells. The slightest variation in the manually painted gradients would have shown up as violent flickering when the images were played back at full speed. So they just didn’t do it: The static background images may have been painted with great subtlety, but the moving characters were composed of discrete patches of uniform, flat color.

I eventually realized that I could use this tendency for scenes to contain patches of uniform color to solve the problem of how to sort colors. Instead of sorting, I needed to use clustering. Here’s a 3D plot where each dot represents a pixel of color from a frame of film. The large dots are the average color of each of the sets of colors returned by a straight call to FindClusters (no options, just FindClusters applied to a list of {r, g, b} values for each pixel). Click the image to play a video of it rotating, which is really necessary to see the 3D structure of the color information.

As you can see, there are fairly distinct zones of similar colors. Not perfect, and there is definitely some blending of dissimilar colors going on, but by and large it has identified the dominant colors in this part of the movie.

To go from this cluster analysis to a single-pixel-wide vertical stripe representing the frame of film, I still had to sort the colors in some way, but now I could use the clusters as the primary sort. What I ended up doing was to sort the clusters by perceptual brightness of their mean color, and then within each cluster sort the individual pixels also by perceived brightness. So the final pixels in the vertical stripe have a one-to-one correspondence with the pixels in the original frame, they have just been reorganized into a stripe where similar colors, as defined by cluster analysis, are brought together.

A further practical difficulty is that if you apply the cluster analysis to each frame individually, there is too much variation in which groupings are identified as clusters. The sort by brightness scatters what should be similar colors between frames to different vertical positions. So the clustering needs to be done on the pool of pixel values from all the frames within a zone of the film representing one “scene” or camera cut (i.e. between places where there is a big, dramatic shift in the colors present, like when the action cuts to a new location). So I had to write a further layer of code to identify such dramatic color shifts and split the movie up into groups of typically a few hundred frames each.

Here’s a video I made that attempts to illustrate this process graphically, showing how an image is first split into pixels, then the pixels are clustered, then sorted into a stripe, and then assembled with neighboring stripes to form the complete image.

It turns out that the result is really quite pretty, and does in fact pull out dominant colors very effectively. Here’s the color map for TheLion King. Click it to open a larger version in a separate window: Be sure to expand the image to its full native size, then scroll around horizontally.

You can clearly see the shifting moods. Take a look, for example, at the ending (ignoring the long expanse of nearly solid black, which is the credits). You see strong, violent reds and yellows: That’s the climactic battle scene between Scar and the returning Simba, come to reclaim his father’s throne (it’s Hamlet with lions). Then things are bad, bad, bad: It’s dark, it’s raining, the pride lands are ruined, the black of night descends literally, figuratively, and in the color map. And then the sun rises on Pride Rock! Bright blues and greens burst out, the new king holds his son up to the sky, and we’re off to the credits!

In the app version of this color map you can touch anywhere on it and see a thumbnail image from the movie at that point. If you’ve seen the movie, this instantly reminds you of the scene, and helps make sense of the color map. But it’s the color map itself that communicates the rhythm of the film, and its general tone and palette.

It’s quite a bit of work to create one of these color map images. Not so much for me, but for the poor computer that has to grind through every frame of the movie, split it into groups of frames, then cluster the colors in each group. It took about a day of processing per film on an 8-core Mac tower.

At first, I only intended to make these for a couple of films, to see how much variation there was. But then I started thinking, you know, it might be kind of cool to do a few more…. Following the general principle that anything worth doing is worth over-doing, I quickly decided that I needed to do all of them.

Fortunately the scope of the app we were working on very clearly defined what “all” meant: Every single full-length animated feature film created by Walt Disney Animation Studios. There are exactly 52 of them, from Snow White and the Seven Dwarfs in 1937 through to Wreck-It Ralph in 2012. (Frozen in 2013 will make it 53, but I can’t make the color map until they finish the movie!) That means no Pixar movies (different division), no animated shorts (of which there are hundreds), and no Mary Poppins or Song of the South (because they are mostly live action, not animated).

The result is this image. In my talk I brought it up as an example of just how comprehensively our app tells the story of Disney animation. Because this is pretty much the definition of comprehensive. This is it. This is 75 years of movie making, as a wild estimate something like 30 million person-hours of work encapsulated in one image. At the resolution of a Retina-display iPad, each pixel, occupying about 25 one-millionths of a square inch, represents about one full 8-hour work day of someone’s life. Go on, click the image and look at it full-size: A lot of work went into making those pixels!

This image caused quite a stir in the audience, and the fact is that the entire image is direct output from Mathematica without any intervention by other software, or any human intervention. It’s about four pages of code, much of that administrative in nature. The only real algorithms are one to split up the movie into consistent-color blocks, and one call to FindClusters followed by a few SortBy calls.

The audience, dedicated Disney fans all, immediately grasped what a marvelous thing it is to be able to see so much in one place, and yet be able to make out individual features. If you know your Disney, you can easily pick out, for example, the Winnie the Pooh movies (two of them) that are much brighter than average. Fantasia also stands out for its intense blue scenes (where Leopold Stokowski is shown against a blue curtain). Broad trends are also visible: Notice how about halfway through, there suddenly start being uniform bands of color, mostly black, at the end of each movie. Those are the end credits. Before about 1980, they didn’t have end credits, only a very limited number of credits for the director and major stars at the beginning. Customs change: Now they list the guy who runs the coffee shop as well as the babies born to staff members during the production. (It is a sobering fact that there are about as many babies born during the production of Wreck-It Ralph as there are employees of Touch Press.)

I love images like this, which compress into a single place vast expanses of human effort. My previous favorite of this genre is an image I created (also with Mathematica,of course) several years ago.

This is a network where each node is a particular isotope of a particular element, and the connections are radioactive decay modes that transform one isotope or element into another. Every one of those isotopes is probably someone’s PhD thesis, and together this image represents countless billions of dollars in nuclear research. Possibly even more than was spent to make all those Disney movies!

But I digress: Back to the D23 audience in Anaheim. They liked the composite color map image, but that’s not entirely what they were screaming about. The real outpouring of love came when I showed them that I could touch the image anywhere and see a thumbnail image from the corresponding movie. Any thumbnail from anywhere in any full-length animated feature film ever made by Walt Disney Animation Studios.

Frankly I’m still amazed that Disney decided it was OK to ship something with such a comprehensive set of images from every movie, even if they are only in thumbnail form. Fortunately we had enthusiastic support at the highest levels of the company, from the CEO Bob Iger and CCO John Lasseter on down.

If you’d like to give the interactive, touchable color map a try, I’m afraid you’ll have to buy the app, because that is the only form where release of the images is permitted, and where the technology allows for butter-smooth swiping around the whole image. Yes, I have a Mathematica implementation of this where the thumbnails are full-screen, the color maps are 50,000 pixels wide, and the files add up to a couple of terabytes. But that’s unfortunately going to have to stay locked in the vault, as they like to say at Disney, and besides, the touchable app version is actually far more fun to play with than the clunky, if more exhaustive, clickable Mathematica implementation.

But I can share an interesting variation of the composite image that we ended up not using for various reasons. In the version above and in the app, each movie’s color map has been stretched out to the same width, so the entire screen is filled with color. In the version below, the width of the strip is proportional to the running time of the movie (except Fantasia, third from the top, which is so crazy-long that it would have messed up the scale for all the others).

Here you can see that, after an early flirtation with long movies, they backed off for many decades, before finally building back up to the original ambition of making two-hour movies.

If you want to know more about the app in general and why it is imperative that you buy a copy immediately, please visit Touch Press. To read my more Disney-centric blog post about it, see the Touch Press Blog post. And if you’re still not convinced to buy an iPad if necessary, here are some links to press coverage of the app.