Search This Blog

Using MixPanel with Python

In my Recent post about my Great search for a decent analytics solution, I introduced MixPanel, the new Analytics system that we're test-driving at Newstex. I went over a little bit of how we've been testing it out by importing all the data we had in our old system into MixPanel so we could review it with our actual live data. While doing so, I decided to write something a little more generic and robust for tracking events on the server side. After all, it might be useful in other places within our system, so we can actually track events from the backend.

Tracking events server-side has a pretty big advantage in that it doesn't depend on a specific client application. Since we do everything as APIs with clients, this means that all of our clients log similar events no matter what platform they're running on.

We do everything in Python, and the documentation does give a relatively rudimentary API to pushing events to the server, but interacting directly to the REST API just seemed to be a lot easier. I wrote a very simple class that handles pushing events to the server, both asynchronously and synchronously. Instead of pushing off events through a system like is done mixpanel-celery this just spawns off a new thread for each event tracking if you call it with track_async. It also allows you to pass in a callback function to be executed once the track event is fired, which helps if you need to be absolutely sure your event was saved properly.

But enough talking, get to the code!:

"""
Event tracking, currently uses Mixpanel: https://mixpanel.com
"""
TRACK_BASE_URL = "http://api.mixpanel.com/track/?data=%s"
ARCHIVE_BASE_URL = "http://api.mixpanel.com/import/?data=%s&api_key=%s"
import urllib2
import json
import base64
import time
class EventTracker(object):
"""Simple Event Tracker
Designed to be generic, but currently uses Mixpanel
to actually handle the tracking of the events
"""
def __init__(self, token, api_key=None):
"""Create a new event tracker
:param token: The auth token to use to validate each request
:type token: str
"""
self.token = token
self.api_key = api_key
def track(self, event, properties=None, callback=None):
"""Track a single event
:param event: The name of the event to track
:type event: str
:param properties: An optional dict of properties to describe the event
:type properties: dict
:param callback: An optional callback to execute when
the event has been tracked.
The callback function should accept two arguments, the event
and properties, just as they are provided to this function
This is mostly used for handling Async operations
:type callback: function
"""
if properties is None:
properties = {}
if not properties.has_key("token"):
properties['token'] = self.token
if not properties.has_key("time"):
properties['time'] = int(time.time())
assert(properties.has_key("distinct_id")), "Must specify a distinct ID"
params = {"event": event, "properties": properties}
data = base64.b64encode(json.dumps(params))
if self.api_key:
resp = urllib2.urlopen(ARCHIVE_BASE_URL % (data, self.api_key))
else:
resp = urllib2.urlopen(TRACK_BASE_URL % data)
resp.read()
if callback is not None:
callback(event, properties)
def track_async(self, event, properties=None, callback=None):
"""Track an event asyncrhonously, essentially this runs the track
event in a new thread
:param event: The name of the event to track
:type event: str
:param properties: An optional dict of properties to describe the event
:type properties: dict
:param callback: An optional callback to execute when the event has been
tracked. The callback function should accept two arguments, the event
and properties, just as they are provided to this function
:type callback: function
:return: Thread object that will process this request
:rtype: :class:`threading.Thread`
"""
from threading import Thread
t = Thread(target=self.track, kwargs={
'event': event,
'properties': properties,
'callback': callback
})
t.start()
return t

Popular Posts

Ever wonder how sites like battle.net support things like this in Google Chrome?

Well I did, so I did a little bit of digging. It turns out Google Chrome supports an open standard called Open Search. This format is relatively simple, and very easy to add to your own site. I just added it to some of our systems in under 5 minutes.

Adding OpenSearch to your site is incredibly simple, you just have to add a simple tag to your index HTML page, and add a simple XML file that it points to. The link tag looks like this:
<link rel="search" type="application/opensearchdescription+xml" href="http://my-site.com/opensearch.xml" title="MySite Search" />

For a while, I have been creating command line tools provided right with boto which I used to manage AWS. Recently, others have become interested in these tools as well, and I've seen several other contributors adding to these tools to make them even more useful to others. One recent submission by Ales Zoulek added some nice features to my list_instances command, which I use on a regular basis to list out the instances that are currently active for my account in EC2.

Amazon now lets you add Tags to EC2 objects such as Instances and Snapshots. This allows you to actually "Name" your EC2 instance, as well as add some metadata that could be used for AMI initialization, etc. Ales added the ability to list these tags by name within the list_instances command line application:

Last week, Amazon announced the launch of a new product, DynamoDB. Within the same day, Mitch Garnaat quickly released support for DynamoDB in Boto. I quickly worked with Mitch to add on some additional features, and work out some of the more interesting quirks that DynamoDB has, such as the provisioned throughput, and what exactly it means to read and write to the database.

One very interesting and confusing part that I discovered was how Amazon actually measures this provisioned throughput. When creating a table (or at any time in the future), you set up a provisioned amount of "Read" and "Write" units individually. At a minimum, you must have at least 5 Read and 5 Write units partitioned. What isn't as clear, however, is that read and write units are measured in terms of 1KB operations. That is, if you're reading a single value that's 5KB, that counts as 5 Read units (same with Write). If you choose to operate in eventually consistent mode, you'r…