At this point at the end of our first summer, over 30 newsrooms are using DocumentCloud to augment their reporting by publishing selected source documents. You can see some examples of DocumentCloud in action on our list of featured documents or our recent MediaShift post. We’ll soon be allowing the general public to search the catalog of primary source documents, and when someone runs a search, we’d like to send readers to the embedded version of a document on the contributing organziation’s site, if it’s available. So we need to know the location of the page where the document is being embedded. In order to help automate this, we created Pixel Ping.

Pixel Ping is a lightweight pixel tracker we developed in collaboration with ProPublica — they’re using it to track republishing of their Creative Commons licensed stories. Today, we’re releasing the code to Pixel Ping, as well as making it a standalone application under NPM. Since Pixel Ping doesn’t store traffic statistics itself, but merely passes them along to your web application, ProPublica is also releasing their simple backend, which presents the hits organized by date range: Pixel Pong.

The Technical Scoop

The main idea behind Pixel Ping is this: You have a piece of embedded content that your users can embed on their web site, and you want to know when and where the content is being accessed. In our case, the embedded document is served entirely statically; there’s no simple way for us to record the hit. Instead, the embed code includes a reference to a single-pixel transparent image. We want to record the location of this pixel, but we don’t want to invoke the overhead of the entire Rails stack for such a simple and frequent operation. Instead, we have a speedy Node.js backend which serves the pixel and records the hit in memory. Every so often, Pixel Ping makes a call to our central database, and flushes its list of aggregated hits for every embedded document.

In this fashion, we hope to be able to automatically route search traffic to your web site, as soon as you’ve embedded a document. We hope that the underlying technology proves useful for folks in similar situations, where you need to keep track of a piece of embedded content, and aren’t willing to incur the expense of sending each page view through a heavy-duty web stack.

As a final technical note, Pixel Ping is written in CoffeeScript: a little language that compiles into JavaScript, and aims to clean up and streamline some of JavaScript’s more awkward parts. I’ve been working on CoffeeScript as a side project for a while now, and Pixel Ping was a good proving ground for it. Our Pixel Ping daemon has been running for several months now, hasn’t crashed, uses a negligible amount of CPU, and is still sitting at 8MB of resident memory.

If you end up using Pixel Ping for a project, or have improvements you’d like to see made, be in touch.