Simple solutions to make videos with R

I'm talking about streaming data displayed in video rather than chart format, like 200 scatter plots continuously updated, as in my recent video series from chaos to clusters, consisting of three parts:

In this article, I explain and illustrate how to produce these videos. You don't need to be a data scientist to understand.

Here's one frame from one version of video clip #3.

Here's the solution:

1. Produce the data that you want to visualize

Using Python, R, Perl, Excel, SAS or any other tool, produce a text file called rfile.txt, with 4 columns:

k: frame number

x: x-coordinate

y: y-coordinate

z: color associated with (x,y)

Or download my rfile.txt (sample data to visualize) if you want to exactly replicate my experiment. To access the most recent data, source code (R or Perl), new videos and explanations about the data, click here.

2. Run the following R script

Note that the first variable in the R script (as well as in my rfile.txt) is labeled iter: it is associated with an iterative algorithm that produces an updated data set of 500 (x,y) observations at each iteration. The fourth field is called new: it indicates if point (x,y) is new or not, for a given (x,y) and given iteration. New points appear in red, old ones in black.

This new R script has the following features (compared with the previous R script):

I have removed the dev.copy and dev.off calls, to stop producing the png images on the hard drive (we don't need them here since we use screen-casts). Producing the png files slows down the whole process, and creates flickering videos. Thus this step removes most of the flickering.

I use the function Sys.sleep to make a short pause between each frame. Makes the video smoother.

I use rgb(r, g, b) inside the plot command to assign a color to each dot: (x, y) gets assigned a color that is a function of z and u, at each iteration.

The size of the dot (cex), in the plot command, now depends on the variable u: that's why you see bubbles of various sizes, that grow bigger or shrink.

Note that d2init (fourth column in the rfile2.txt input data used to produce the video) is the distance between location of (x,y) at current iteration, and location at first iteration; d2last (fifth column) is the distance between the current and previous iterations, for each point. The point will be colored in a more intense blue if it made a big move between now and previous iteration.

Enjoy, and hopefully you can replicate my steps and impress your boss! It did not cost me any money. By the way, which version of the video do you like best? Of course, I'm going to play more with these tools, and see how to produce better videos - including via optimizing my Perl script to produce slow-moving, rectangular frames. Stay tuned!

I'm also wondering if instead of producing this as a video, it might be faster, more efficient to just simply access the graphic memory with low level code (maybe in old C), and update each point in turn, directly in the graphic memory. Or maybe have a Web app (SaaS) doing the job: it would consist of an API accepting frames (or better, R code) as input, and producing the video as output.

The whole process - producing the output data, running the R script, producing the video - took less than 5 minutes. Wondering if someone ever created an infinite video: one that goes on non-stop with thousands of new frames added every hour. I can actually produce my frames (in my video) faster than they are delivered by the streaming device. This is really crazy - I could call it faster than real time (FRT).