Arguing with Algorithms

Friday, May 9, 2014

I recently gave a tech talk for the Khan Academy dev team describing the latest iteration of our infrastructure for saving edits from our content editors to the server. While this seems like a simple problem (doesn’t Backbone already handle that stuff?) there are complexities when you want to support multiple simultaneous users and make the UI fast and responsive. I think the pattern that we gravitated to is a generally useful one, and this was validated by a recent blog post describing an almost identical solution to the same problems by the Dropbox team, which you can read here (they even called it an operation queue!). If you’re writing a client-side system for making changes to global state that need to be persisted to a server, this may help you skip some of the less-optimal steps along the path to content syncing nirvana.

Viewing content on Khan Academy (as an admin)

Editing content on Khan Academy

Responsiveness

First, some definitions. What does responsiveness mean in an editing interface?

No save buttons: The user doesn’t have to remember to click a button to apply the changes. Once the user toggles the UI to “edit” mode, all the fields are edit-in-place where possible and everything autosaves immediately as soon as you edit it. (There is always some internal opposition to autosave on the grounds of “users expect a Save button”, but Google Docs has proven it can be done effectively.)

Instant feedback: If something is wrong we tell the user immediately, rather than returning an error minutes or hours later when they try to publish their changes.

No interruptions: The user can continue working after making changes, without having to wait for the data to get saved to the server. The user should only see a progress bar when they navigate to a different piece of content.

First pass: The Backbone Way

We started migrating over to Backbone as a Javascript framework in early 2012. Backbone’s models are a straightforward way to handle synchronizing with the server and listening for property changes, and we are still using them for that purpose. Here is the naive way to implement an editor using Backbone’s views and models:

This has some issues:

Every time the editor calls save(), it must wait for that save to complete before issuing the command again, since multiple AJAX requests can arrive out-of-order and changes will be lost.

If a save fails for whatever reason, the editor must handle that in every place that can trigger a save().

The model’s internal state is only updated either when save() is called or when the save succeeds and the server state is updated. In either case the UI can be temporarily out of sync with the model state. This makes it unsafe to listen for model changes and update the UI since the user may have unsaved changes.

Second pass: Autosave

To address some of these issues, I designed a new system that would be capable of watching the UI for changes and automatically queuing up those changes to be sent to the server in the background:

This new Autosave component would receive a request to update a field whenever the UI registered a change, and add it to a queue of attributes to save for the model. It would encapsulate calling save() on each model that had changes queued, and be smart about merging attribute changes that were queued to reduce the number of AJAX calls issued.

For error handling, in order to allow the user to continue working I chose to make the system optimistic and assume that changes being saved would succeed. If there was a validation error, it would be caught by the queue and shown to the user, and the user could clear the error by making further edits that were successfully saved. All this was handled in one place for all editors, and things were pretty good.

Warning: Collision imminent!

The problems begin when we have multiple users editing the same content at the same time, or one user has a tab open for a long time with stale data and then comes back and edits it. This leads to conflicts and data loss.

Here is a simple example, where an item (which in this case just contains a list of children A, B, and C) is edited by both Alice and Bob at the same time:

Alice adds B to the item list that just contained A and saves the new list [ A, B ] to the server. Since Bob’s item model is not updated to include the new child B, he sends up his own list with just A and C and now Alice’s change has been reverted without either Alice nor Bob being notified.

Now, we could require Bob’s Autosave component to automatically fetch the latest version of the item before overwriting it, but that would neither be transactionally secure (because it would require multiple AJAX calls, and Alice could have made her changes in between those) nor would it be straightforward to do the merge: Bob has children A & C and the server has A & B, and now we have to write some sort of client-side merge algorithm. And, as stated before, we cannot easily update Bob’s UI once we’ve received changes from the server, since that might erase changes he has made that have not been saved!

Barring a complete rewrite, the first step is to detect when Bob is about to overwrite Alice’s changes. It just so happens that we create a unique revision ID after each edit, which is stored with the item, and the client is aware of what revision ID it has for each piece of content. So when the save() is issued, the revision ID that is sent up is the latest revision the client got before the changes being saved. The server can easily compare this revision to the latest one it has, and return an error if they do not match:

Now when Bob sends up his change to the item, he includes in the JSON representation the ID that he thinks is current, in this case it’s revision 1. The server looks at the latest revision - Alice’s - and see’s it’s revision 2, and returns a special HTTP status code: 409/Conflict. This doesn’t make Bob especially happy, since he knows his changes cannot be saved. At this point he has to reload the page and make his changes again. But the key improvement is that someone is made aware that data is lost, and since it’s the person who just made the change it is easy enough for him to redo his work.

Ideal? No, not really. But the correct solution requires a bit bigger change.

OperationQueue

At this point I had to give up the band-aid solutions and look at a bigger change. What we want is to be able to pull down the latest version and play back Bob’s changes on top of Alice’s so that no work is lost. So rather than storing the attribute values directly, we create an “operation” - a function + data - that can be applied to a model at any point to make the desired change.

It’s a subtle change:

For most fields like “title” the operation will be (setAttribute, {title: X}), or “set ‘title’ to ‘X’” with ‘title’ and ‘X’ being stored in the operation’s data and setAttribute being a generic function. But for adding or removing items from a list, we would store the child being added/removed and the index, rather than the actual list.

This is how the conflict plays out now:

Now Alice and Bob no longer enqueue changes to the child list directly, but rather enqueue two operations: (addChild, “B”) and (addChild, “C”). Alice’s operation runs first and adds B to [ A ] to make [ A, B ], and saves. Bob adds C to [ A ] to make [ A, C ] but his save fails immediately. The client then fetches Alice’s new version [ A, B ], and replays the failed operation on top of it to make [ A, B, C ] and attempts to save again. this one succeeds, and now both Alice and Bob’s changes have been saved. No one has lost any work!

Now, since we’ve updated the model during the course of the save, Bob has to be shown the new values in case he wants to make further changes. This was a problem before since Bob could already have made more changes and queued them up to be saved, and we don’t want to revert them in the UI during a refresh. Luckily, having those changes as operations means we can now calculate what the UI should look like any given moment by taking the model state (which corresponds to the latest server state) and applying any operations in the queue that have not yet been saved. This is what the getUIAttributes() method does in the above diagram. This gives Bob a consistent view with all the changes he’s made even though the model underneath is always in sync with the server:

Even though the UI has been refreshed when the item was reloaded from the server, Bob’s changes remain.

It is important to understand that we only get as much protection from data loss as the operations are constructed to give - adding children is safe but if two users change the title of an item in different ways the first one will still be overwritten unless the operation does something smarter internally (like a text diff). Check boxes, combo boxes, and numeric fields cannot be merged and will always overwrite other changes. The best way to deal with this is to synchronize changes between all clients as soon as changes are made on the server, and at least we have a refresh path on the client that makes that fairly easy.

Conclusion

If you want to give this system a try yourself, feel free to check out the source code and use it in your own projects. It's fairly well documented and should be really easy to get up and running. Let me know how it works for you!

So far it seems like the saving infrastructure outlined here is pretty close to the ideal one for handling multiple users editing structured data asynchronously. We’ve had fewer reports of lost data and productivity among our content creators continues to increase as the tools get better. Now we can focus on making the editors themselves more intuitive and user-friendly. As we like to say at Khan Academy: Onward!

Friday, January 3, 2014

This article provides more implementation details for our versioned content store. To see the motivating challenges and overall design for building a versioned content store in App Engine, see the companion blog post.

The truth is, once I had a rough design for a Git-alike content store, the main challenges were all implementation details and attention to backward-compatibility (I had to add the new versioning system incrementally without any downtime in the editing tools or - God forbid - for site users). The simplicity owed a lot to the Git storage model (storing versions of entities labeled by their SHA-1 hash), which aligns neatly with the way App Engine’s High Replication Datastore likes to store and access data. There were just a few issues to work out, and I’ll list them here.

< architecture >

The simplest way to implement a Git-like store in App Engine is to create a db/ndb Model class for the object you’d like to store with the properties you’d like to store and a method for creating new revisions of that model. Unlike traditional entities which are overwritten whenever a change is made, in this case you create a completely new entity (a “revision”) on every change. This might sound wasteful compared to storing diffs, but the invariant that revisions are immutable makes the implementation easier and enables easy caching of revisions. This is one example where we rely on App Engine’s scalability to make our lives easier, and compared to the hundreds of millions of user-generated data items, the number of entities here will be relatively small. If this keeps you up at night you can always prune orphaned revisions later.

One decision we made fairly early on was to keep editing models (revisions) separate from the live models that the site uses. The primary reason for this was that we had live entities already (Video and Exercise), and finding all the places where we fetch them by key would have been an onerous and error-prone task. This choice turned out to have some other advantages as well. So the inheritance tree looks like this:

BaseVideo is a plain Python class that store the common DB properties and methods between the editing version (VideoRevision) and the run-time published version (Video). Common functionality for working with live content and revisions is in VersionedContent and BaseRevision, respectively. In our case, we could not make BaseVideo a PolyModel (as that would have changed the kind and therefore invalidate all the existing keys) so we had to introduce a metaclass to allow the subclasses to share DB properties. This enables us to add properties to VersionedContent and BaseRevision that will be inherited by subclasses. I will use Video/VideoRevision as my examples from now on, but everything stated applies equally to our other content types.

As in Git, the (key) ID of the VideoRevision is a hex-encoded SHA-1 hash of the contents of the object, which in this case is a JSON representation of the entity’s fields. There is also a string property which references the Video’s key ID, so we can track history and keep references to an object across revisions. When we create a VideoRevision for a new video, we generate a random ID, ensuring that a new Video entity will be created at publish time. Note that there may be many VideoRevision entities for a single video (tracking historical changes) but there is only ever one Video entity, preserving the current ability of published entities to reference each other by datastore key. The Video also contains the key ID of its latest VideoRevision that has been published; that is, the fields of both should be the same.

This means that to find the latest revision of an object, we need a table of content ID → revision ID. This is the “stage” (or sandbox), which represents the current editing state.

So, to make an edit to an object:

Look up the latest edit version of the object in the stage (this is a get-by-key, which is very efficient)

Apply the requested changes

Compute a new revision ID from the updated properties

Create the new revision entity with the revision ID as its key ID and put it into the datastore

Update the stage to point to the new revision ID

Once the content author is done making changes, they can create a commit, which is just a snapshot of the stage at a particular moment in time, freezing the revision IDs to specific values. The commit contains an author ID and commit message, and it references the previous “head” commit, forming a chain of changes that can be used to recover the entire history of the content. The commit becomes the new “head” commit and is automatically queued up to be published to the site.

This is what the whole setup looks like after a commit:

And here is what it looks like after a second commit, where three of the four entities have been changed:

Having the snapshot and commits be tables of revision IDs means that doing a diff is very efficient: just look for entries that differ between the two tables, fetch only those revisions, and diff their properties. This makes it easy to recompute values or invalidate caches based on just the content that has changed without having to compare old and new properties for every piece of content.

< publishing >

At its core, publishing a commit to the site involves identifying which content revisions have changed (by comparing the revision IDs in the commits’ tables), fetching those revisions, and copying their fields onto the corresponding live entities, creating any entities that don’t already exist. Then the “currently published commit” setting is updated, which instantaneously changes the version of the entities that all the user-facing handlers look at when rendering the site, and you’re done. The process works equally well for rolling back to an earlier commit.

Publishing is also a great place for denormalizing data. Since Video and VideoRevision are separate models, we can add properties to just Video (such as the canonical URL) that are calculated and saved on the entity during publish. We can also pre-warm some caches that are invalidated on each publish, so that users never see a cache miss.

Separating live entities from editing entities does add some complexity to the system, but after publish we can now reference a Video by its key (which is stable) or we can run datastore queries on its indexed fields, neither of which we could have done with just the revisions.

< sync / merge >

Because of the simplicity of the versioning system, if I want to import the latest copy of the topic tree to my local dev datastore, all I need to do is:

Download the latest commit from the live site,

Make a list of the revision entities that I don’t have,

Download the revisions in bulk and push them directly into my datastore

Set the downloaded commit as the “head” commit

From there I can do a normal publish and everything should behave identically to the way it does on live. If I make local changes, I can run the same process on the live site to pull the changes back up.

It is possible that the commit that has been synced is not a descendant of the local head commit (there have been changes both locally and remotely since the last sync). In this case we can create a “merge” commit which finds the common ancestor and then performs an automatic three-way merge between them. The algorithm is trivial when no entity has been modified in both branches, but it’s still possible do a field-by-field merge, which should cover a majority of cases. This allows us to copy all our content to a new datastore, make some changes, publish and preview them, and then merge those changes back to the live site automatically.

< tl;dr >

I wanted to go into some detail to give you an idea of the design decisions and trade-offs we made on the road to having a flexible, extensible versioning system that works efficiently on App Engine. Some of these decisions are specific to our particular situation, but a lot of these ideas could be generally useful in building a CMS on a NoSQL-type platform like Google’s.

If you find yourself using these patterns please drop me a line and let me know how it’s working for you.

Over the past two years, I've been working largely behind the scenes at Khan Academy on the infrastructure the content team uses to upload and publish content (videos, exercises, articles, and interactive programs) to the site. Most of the changes I've made over the past year are not directly visible to users of the site but without them we could not produce the quality and quantity of lessons we need to provide a "world-class education for anyone, anywhere". One of our strengths as a company is knowing when to hack something together and when to invest in flexible and extensible systems, and I would like to share the solution that we've come up with in case others find it useful.

< context >

Creative solutions for cramped spaces

When I first arrived at Khan Academy's humble office (devs huddled around one long table, with the implementations team occupying the single conference room) the content situation was this: playlists of videos were downloaded directly from Sal Khan's YouTube account, and there was a single editor for the exercises on the knowledge map, changes to which would show up on the live site immediately upon saving. These entities: Video, VideoPlaylist, and Exercise, were the basis for everything on the site. There was no versioning, no organization, and no direct editing - if Sal wanted to fix a typo in a video title he had to do it in YouTube, and only he had access.

Fast forward a year. We have content from a half dozen content creators teaching math, science, and art history. We've added the concept of a topic tree to organize the content that any developer can edit, and a conceptually simple versioning system: edits are made to an "editing" copy of the tree, and when the version is published, a full copy of it is made that becomes the new editing version. All runtime code instantly switches to using the newly "live" version via a global setting. The system works fairly well, and we are able to build new functionality on top of it: topic pages, power mode for exercises in a topic, and tutorial navigation.

However, last fall it was already becoming clear that the system just wouldn't scale. As we brought on more content creators, coordination become more and more of an issue: hitting publish at the wrong time could push out someone else's in-progress changes, and there was no way to see who was currently making edits or who had edited something in the past. As a stopgap measure, a HipChat room was set up to coordinate publishes. As the number of topics in the topic tree grew, publish times ballooned to an unreasonable 45 minutes (owing partly to the need to duplicate all the topic entities and update their child keys), during which time no editing could happen. Rolling content back was a difficult, manual process. Furthermore, many errors were caught only at publish time, allowing one author’s simple oversight to block others from getting their changes out.

< solution >

Experienced developers tend to prefer incremental improvements over rewriting from scratch, but in this case after some discussion we decided to re-architect the system to one that could fulfill not only our current needs, but our aspirations. We want the best and the brightest teachers to share their knowledge on our platform, whether it's Andover teaching calculus or the Getty and MoMA inspiring a generation of art students. Having to coordinate between creators is a bottleneck, and having to wait an hour for content to appear on the site is untenable. At the same time, our growing dev team is adding features at an ever-increasing rate, and they need something stable and flexible to build on.

When looking at various CMS storage and versioning designs, I tried to keep the primary user in mind. Since the infrastructure should always be invisible to content creators, the primary users are in fact the developers who are building features on top of it. When it comes to versioning and deployment, developers are used to code versioning systems, so I opted to start with a design based entirely on Git, a popular distributed revision control system.

In the context of content management on App Engine, the Git model has distinct advantages:

Git's storage model is basically a key value store, with the SHA-1 hash of the data as the key. This maps really well to App Engine's datastore API.

Git stores references to a snapshot of each object on each commit, so we never have to apply a diff or walk the history to see how something looked at a given point in time, but the hash-based reference structure mean we don't have to duplicate objects that haven't changed between commits.

Using hashes as keys means that changes made on one machine can be easily merged with changes from another, which is critical in a distributed environment. For instance, adding a new object generates a random key that cannot collide with any other new object.

By design, it is easy to pull and merge changes between copies of the repository. This makes operations such as syncing a development copy of the site with production as easy as copying over any new keys and setting the head commit pointer.

Also by design, calculating a diff between any two commits is easy - just compare hashes for each object. This means that publish can incrementally update only objects that have changed, speeding up the process considerably. This works equally well for rolling back to an earlier version.

The Git content storage filesystem model is really simple to understand and implement.

I didn't copy Git's design wholesale, nor did I actually expose a Git-compatible API (although it would be really cool to someday be able to check out our content as a repository; it would give us access to a whole bunch of useful tools). However I did find that having a fully working design to crib from made implementation much easier and helped me explain the inner workings to other developers.

So far I've been very happy with the way this system has functioned. It is flexible, so we can implement various versioning or permissions schemes on top of it. Different content types use this system in slightly different ways, and that’s fine. Building helpful tools such as diff viewers and remote syncing on top of this infrastructure is really easy, and we could get as fancy as we want to support branching & merging, pull requests from third parties, etc. Most importantly, developers other than me can jump in and create their own versioned entities without a lot of help from me, and get their code working in a very short time, eliminating a dependency on me when implementing new features.

This new architecture also enabled me to reach the goal I had set for myself: publishes now complete in about a minute. This has had a profound impact on how it feels to author content for the site.

If you’re curious about the details of the implementation, I will go into sordid detail in a companion blog post.

Thursday, March 14, 2013

<ship, ship, ship!>

This past Friday, I shipped about a month's worth of extensive refactoring to the content editing infrastructure of the site. As with many refactoring projects, the best possible outcome would have been that no one would notice I'd done anything. The worst would have been visible breakage of the site for content authors or users. That launch was a success (whew!), and although it took a lot of effort and planning on my part to make that happen, I want to credit two powerful methodologies that ensured I wasn't working alone: unit tests and peer code reviews. I want to focus on code reviews here, because while the benefits of unit tests (especially in refactoring projects) are well-documented, I've found that subjecting code to peer review has subtly and unexpectedly changed the way I actually write my code.

About a year ago, Khan Academy instituted a policy to peer review all non-trivial code commits. For coders who don't follow this regime, there are several benefits we were looking for:

Fewer bugs. Bugs won't reach production if they are caught in a review first.

Improved code quality. A strong check on coders' tendencies to take shortcuts, sacrifice code readability or understandability, or put in temporary half-measures that don't solve the underlying problem. In many cases criticism in a review caused me to rethink a problem and come up with a more elegant solution.

On-boarding of new developers. Getting your head around a new codebase can be challenging. By reviewing new devs' commits we catch redundancy, unwanted side effects, and potential conflicts, as well as enforcing our style guide and setting a standard for high quality code from the very beginning.

Diffusion of knowledge. Anything that facilitates communication between members of the team pays dividends over the long run. If nothing else, there will be a day when a critical developer will be on vacation/trapped in an elevator/at home with the swine flu and a reviewer will come to the rescue.

It's worth acknowledging the obvious cost to reviewing every commit: time. Time that used to be spent writing, testing and committing new code is now taken up with conversation and iteration on already-written code. So is the net result a drop in productivity for the entire development team? Not necessarily. Let's look at the list above again:

Fewer bugs. A bug not caught during review will still have to be found and fixed later on. Tracking down a bug in production takes significantly more time, and fixing it is more difficult.

Improved code quality. Quality code is easier to read, easier to implement new features on top of, and incurs less technical debt (TODO-laden code that will have to be revisited in the future). Of course, code reviews don't force this, they enforce whatever the team decides. If the team needs something done yesterday, then by all means do the quick thing and come back and fix it later.

On-boarding of new developers. Efficient mentoring of new developers bends the productivity curve in a positive way. Even experienced devs are less productive while they are learning a new codebase.

Diffusion of knowledge. Good documentation won't replace having multiple devs who understand any piece of code. Projects won't stall because the one developer who knows a system is busy doing something else and the code isn't clear. Our mantra is "anybody can fix anything", and it's critical that anyone be able to jump into any piece of code and understand it.

I won't go much further with this argument; in my mind it's a settled matter that compulsory code reviews are a Good Thing and they have helped us in many ways as an organization. What I hope to share here is a surprising and totally non-obvious fact: code reviews have changed the way I design and write code for the better.

All the lessons I've drawn from this ongoing experiment have taken some time to understand and internalize. When we first instituted mandatory code reviews, I didn't notice any immediate changes. For trivial bug fixes, reviews are transparent: I make the fix, commit to stable, and send off the review after the fact. The fix might ship before the review is done, but bug fixes are high priority and that's OK.

Similarly for minor features: I make the change in a private branch, test and document, then send a review. Then there is a period of answering reviewer questions and iteration. In the meantime I might move on to other work, and when the review is accepted I merge to stable and deploy.

With larger projects and changesets, I began to notice breakdowns happening when I got to the review stage. Reviews were too large for reviewers to comprehend, or too convoluted to follow. By the time a reviewer responded to a review, I would have several more reviews open for subsequent commits, and it wasn't clear which review fixes should be assigned to. Reviewers were reviewing already-replaced or rewritten code. It became a real mess.

While it was clear what the problem was - too many and too large reviews - the solution wasn't obvious. I could cut the size of the commits, and stop to wait for reviewers before proceeding, but that would mean dramatically slowing my progress - a busy reviewer can take hours or days to thoroughly read an important review. Instead, I adopted (with lots of guidance from coworkers +Ben Komalo, Craig Silverstein, and Ben Kamens) some habits that enable me to get useful feedback on code reviews and use that feedback effectively. Here is what I learned:

Make one conceptual change per commit. (As opposed to one functional change per commit.) When I started working on a change, I would often be thinking about a requirement: "The user can set a bookmark." I would add a UserBookmark object, write the get/set API calls, and some UI. Later I would come back and write some unit tests. This is all one functional change, but many conceptual changes, and they deserve their own commits: A new UserBookmark object, with full documentation and unit tests. Then the API calls, with their own documentation and unit tests. Each change is much easier to understand, can be critiqued on its own merits, and will tend to be confined to a particular part of the code.

Cut the thread. Many times, especially while refactoring, a change will have cascading effects: While testing I find multiple side effects from my change, fix them, and then those fixes cause more side effects, and so on. Sometimes fixing a side effect triggers another refactor, or fixing a totally unrelated bug, and when the code finally gets committed it is both unreasonably large and completely unfathomable. Even I can't remember exactly what prompted a particular change in a day-long marathon of bugfixing. I could try to split the fixes among several commits after the fact, but that's error-prone and difficult. A better solution is to "cut the thread" at some reasonable point, and start sprinkling TODO's liberally where fixes need to be made. It's clear to the reviewer that this code is not yet functional and just what the side effects of the change are without actually fixing those side effects in that commit. Best of all, it's clear from looking at the commit history what motivated each fix.

Write throw-away experimental code. When starting a project, I find it helpful to quickly iterate on a prototype implementation for a thorny code problem before settling on a final solution. These sketches are not useful to have reviewed; rather it is better to write up the proposed solution in a Google Doc or email and iterate on that with reviewers before sitting down to implement it for real. The second implementation takes into account reviewer feedback and is written more carefully, with documentation and unit tests that would be a waste of time during a prototyping phase. The prototype is eventually thrown away and doesn't become part of the commit history.

Work on multiple tasks in parallel. There is some inevitable downtime while waiting for reviewers to look at newly submitted code. Having a list of bugs or small tasks unrelated to my main project gives me something productive to do in that downtime. It's a great way to make sure that small, lower-priority tasks don't get crowded out by larger projects. I can assign the reviews for these smaller tasks to different reviewers, balancing the review load among the team.

Don't get too far ahead of the reviewer. I do my best never to push changes that build on unreviewed changes. When I can't switch to a different task, I keep new changes local and don't push them until they previous reviews are done. That way I can implement changes from reviews cleanly on top of the pushed commits and rebase my local changes on top. (I use Mercurial bookmarks or Git branches for this.) Sometimes after discussion with the reviewer changes will be made that force the later work to be rewritten, and that is fine. It's better than having to roll back later commits to fix something from a review!

Document everything, even if it's in progress. If I'm going to have to explain some complex bit of logic or a long list of method arguments to a reviewer, I may as well just do it in the code itself. Even if the code has ## TEMPORARY ## or // TODO(HACK) UGLY HACK all over it, it still gets documented. It never ceases to surprise me how long those things live in the codebase, and the short-term solutions are the ones that need the most explanation: Why do we need this? What should replace this? When can it safely be replaced?

I firmly believe that all of these techniques make my commit history easier for me and others to understand and make me more efficient as a programmer, and I probably wouldn't have adopted them if not for the practice of peer code reviews. I have also learned a lot about Python/JS/IE/life from my peers and maybe taught someone something they didn't know.

Even if you work alone or don't do code reviews in your team, perhaps you may benefit from these tips in your own work. If you do participate in code reviews, do you organize or think about your code differently to take full advantage of the review process? I'd love to hear from you.

Tuesday, October 9, 2012

About a month ago I had the great
fortune of having an amazing new person stumble into my life, ready
to challenge by his very existence the way I think about myself and
my role in the world. That person is my first child, a beautiful baby
boy. Even at his young age he is already full of contradictions: At
times, he studies the world with the attentiveness and calm of a
philosopher, taking it all in without pausing to divulge what pearls
of wisdom he has gleaned. By contrast, he is barely cognizant of his
own internal state and it's taken weeks of careful observation to
discern which cries are hunger, gas or fatigue. We have had to teach
him how to eat and how to sleep, which has really driven home the
point that literally everything he will know or do as an adult
decades from now he will have to somehow be taught.

Two weeks after he was born I returned
to the office to resume work. There was a pang of regret that I
expect many a father feels at being away from his only child all day,
but there was also something else I didn't expect: a new urgency, a
ticking clock that wasn't there before. I'll elaborate in a minute,
but first some back story.

< childhood >

Last year I was a computer game
developer. There is a fair amount of truth to what most people assume
about game developers: we spend a respectable portion of our time
play testing games, the atmosphere is one of creativity, fun, and
self-expression (relative to most software development teams), and in
the end we have our share of fans because we are essentially in the
same industry as Hollywood and record labels. There is also however a
fair amount of dysfunction: unrealistic deadlines, dysfunctional
corporate partnerships and licensing agreements, and a disturbingly
high rate of failed studios and canceled projects. I could write a
whole blog post on this subject. It's a shame, because I know from
experience that games can elicit strong emotions, games can change
attitudes, and most importantly, games can teach. This last point is
important because it got me thinking seriously about how I could use
my technical skills to make an impact in the field of education.

The observation that depresses me most
about our education system in America is how little things have
changed since I was a child. Students I speak to going to elementary,
middle and high school now have almost the same experience in class
that I did - they are taught the same subjects in the same ways. So,
what is the problem? Just look at the rest of their lives: they carry
in their pockets computers orders of magnitude more powerful than
those of two decades ago, social networking connects them in realtime
with their friends and family, and millions of videos (at least half
of which are not cat-related) are free to watch anytime. The
possibilities for learning have never been greater: Wikipedia dwarfs
any encyclopedia or public library, you can visit many museums and
art galleries virtually, thousands of books can be downloaded
wirelessly onto a single portable reader, and many research journals
are now publicly available online. Such massive technological changes
in the world have been incredibly disruptive in many industries -
finance, commerce, entertainment, journalism, health care, art, the
list goes on and on. But no matter how much we all agree that these
same seismic changes will eventually happen in education, as a
society we have so far been too timid to experiment with new ways to
teach our children.

< maturity >

Enter Khan Academy. Sal's project began
very humbly, tutoring his niece in Math over the Internet. I have
also tutored middle and high school students in Math, and there is a
key insight I took away from the experience: the students who needed
tutoring were not dumb, or slow, or stupid. Surprisingly, the thing
they all had in common was a serious gap in their understanding of
some basic concept. Over time they had developed an attitude of
defensiveness and unwillingness to address the issue, especially if
the key concept was several grade levels beneath them. As a tutor I
found myself in the awkward position of trying to get someone through
Algebra who didn't really understand the relationship between
fractions, percentages, and decimals and who felt that understanding
to be permanently out of her reach. What I saw Khan Academy doing was
addressing both issues: By using videos, a student who needs basic
concepts explained over and over can get that in the privacy of their
room without feeling judged or rushed. And by encouraging self-paced
learning, a student will not be tutored in Algebra until she has
demonstrated a thorough understanding of the prerequisite concepts.
Most importantly, I saw no other company that was experimenting with
new teaching methods and reaching actual students. All these reasons
convinced me that by joining this team I would have a positive social
impact, and that is what I did.

By its very mandate, Khan Academy is
disruptive. We put resources and tools out there for free, for anyone
to use, and let schools who dare to experiment with new ideas and
teaching methods come to us. The fact that schools and districts have
actually done so is very encouraging to me, and it gives me hope that
by the time my son reaches school age, schools all across the country
will be experiencing a much-needed renaissance of creativity,
experimentation, and progress. As a parent I want to encourage him to
find his own interests, his own voice, and a love for learning for
its own sake. Rather than being a conduit for dry instruction,
schools can teach kids to be informed citizens and morally conscious
people. This is far more important and difficult than conveying a
specific bit of knowledge or administering a standardized test. Sure,
it is counterintuitive to speak of online education as a key tool for
unlocking the human element of education, but that is what I see in
the pilot schools and in our community of students and teachers.
There is a hunger for change, and as more startups appear to feed
that hunger the change will accelerate.

< parenthood >

The clock has been ticking for a long
time, even though it didn't hit home for me until my son was born. It
lights a fire beneath me to work harder at this job that I love, not
just because of the wonderful people I get to work with but because
we're working toward a common mission: to make education accessible
and effective for everybody. This is not an abstract goal - in the
next decade tens of millions of children will walk into their first
classroom. How many will eventually go to college? How many will be
able to compete in a global economy? How many will develop a lifelong
love of learning?

As much as I would love to spend all my
time staying home with my son, I take comfort in the knowledge that
what we strive for every day is a brighter future for him and
millions like him all over the world. I couldn't imagine a more
rewarding way to spend my days.

< / tom >

P.S. Do you write code? Do you get
goosebumps pondering the potential of technology to transform
education and get a whole generation excited about learning? Do you
enjoy freshly baked bread? We have a software developer position
open! Email me at tom at khanacademy.org to find out more.

Wednesday, May 2, 2012

Statistics: The science of producing unreliable facts from reliable figures.

-Evan Esar

There is a trendy practice being advocated in a number of software development teams, which is to measure everything and A/B test radical ideas. It sounds good on paper: Instead of crowding together in a meeting room and debating pros and cons or relying on a designer to make all the decisions, why not let the customers vote with their clicks? On the other hand, there are some outspoken critics of A/B testing who claim that it marginalizes good design and it leads to bad decision-making in many cases. At Khan Academy, a company that provides a free service to a global population diverse in age and nationality, there is really no way to know what the effect of many of our changes will be except to try them and measure the results.

The result of our experiments thus far boils down to this:

Data is easy to collect but hard to interpret, never giving you a clear result that confirms your hypotheses. However, even noisy and confusing data is invaluable in forming and testing hypotheses about user behavior.

Long story short, so far the best we've gotten is some comfort knowing we haven't made things worse, and maybe a little insight into how users behave contrary to our expectations.

< process >

Last week at Khan Academy we completed a sequence of experiments over a few months aimed at improving navigation through the site. On this project we intentionally divided the work into small shippable changes so we could observe the changes in user behavior. Why is this process important? Because we as developers are biased by our prior knowledge of how the site works and how we want it to work, and the best direct user feedback we can hope for is a vocal minority. We will tend to err on the side of changing everything, and the vocal minority tend to be those highly invested in the status quo. Therefore, we need objective measurements of how visitors to the site behave and how this changes over time.

Our data collection is as comprehensive as possible: we report user actions to Google Analytics and MixPanel. We also collect conversion data for A/B tests using GAE/Bingo. After each deploy, we monitored all these statistics for a week to see the effect of the changes on both weekend and weekday traffic, which can differ dramatically. Then we took the results into account when deciding the next steps.

< long list of links >

The khanacademy.org homepage is a textbook example of a KISS solution that doesn't scale especially well. When you click the "Watch" link you get this:

Rather than diving in and trying to improve things, we started by trying to understand user behavior. How effective was this layout (long page full of text links with a sticky navigation header at the top of the screen) at getting users to the video content they are looking for? Here is some sample data from MixPanel:

What you see here:

Of all the users in the sample who landed on the homepage, 22.29% selected a topic from the topic browser. (There are other ways to get to videos, but it's useful to know what subset of the population we're looking at)

Of those who clicked on the topic browser, 80.67% clicked on a video in the Big List of Links.

Of those who clicked on a video, 86.71% started watching the video.

Of those who started watching a video, 74.27% completed it.

There is clearly a lot of room for improvement here.

< the fun part >

So now we come to the changes we've been rolling out. Since I last wrote about our topics reorganization in February, there have been two major changes tested and shipped that take advantage of the new structure:

The Watch menu replaced the link that takes you to the top of the video list with a drop-down menu of topics that initially navigated you to the specific topic you selected, and eventually to the relevant topic page once those were released.

Topic pages were designed carefully to help visitors find content faster without overwhelming them with thousands of video titles. They are also a big step forward visually - I am particularly proud of the Art History page. Go on, have a look. Isn't it beautiful?

Aside from helping visitors get their bearings, topic pages afford many secondary benefits: They are easy to bookmark and link to, they can load much faster than the homepage, they can be included in site search results, and they can be indexed directly by search engines. As we build new features based on topics, they will naturally be surfaced here, and doing further A/B tests on these pages is easier than doing them on the homepage.

< the results >

Here is a chart of some of our key video conversion results for topic pages:

The columns show data for "supertopic" pages (such as Algebra), "subtopic" pages (such as Solving Linear Equations), and "content topic" pages (such as Finance, which has no subtopics) as well as the homepage topic browser for comparison. The top three rows show the percentage of visitors who clicked on, started, and completed a single video. The bottom three rows show the retention rates between steps.

There are two things to notice here:

The retention rates after the visitor has clicked on a video link are marginally better than on the homepage. This seems good.

The click-through rate is quite a bit lower in some cases than the topic browser on the homepage. This seems bad.

These two contradictory trends roughly cancel each other out in the final number of visitors completing a video! Obviously, we were hoping for a significant improvement in overall conversions and we aren't seeing one.

< the analysis >

Our general attitude at Khan Academy is to ship often and adjust course as necessary, and this is a tacit admission that no plan survives contact with the user base. The thing that has amazed me more than anything else about this is how similar the conversion numbers are even when comparing between completely different pages and navigation styles! While the numbers aren't an obvious home-run, we can see that those visitors who found a video were more likely to watch it, which is a sign that indeed they are more easily finding the video that they are looking for. And while the slightly lower initial click-through rate could be improved, the number of visitors to the topic pages seems to be rising over time, and with it the total number of clicks on video links from topic pages.

It's also important to step back from the numbers for a bit and look at the big picture: Topic pages give us more flexibility for future features, they are a better experience for navigation than the homepage, and we can get massive SEO benefits from of them. Analytics is just one part of the decision-making process, and one we are taking with a grain of salt while we continue to try to better understand it. I'm really happy with how this project turned out and look forward to sharing our next amazing contributions to online education.

Wednesday, February 22, 2012

Last week we at Khan Academy revealed the first pieces of a long-term project to reorganize the content on the homepage and throughout the site. You may have noticed the removal of a number of playlists, notably "Developmental Math" and "Pre-Algebra", and the appearance of newly organized topics called "Algebra" and "Arithmetic and Pre-Algebra". We have grouped the videos under each topic into subtopics as well, to better expose how the thematic building blocks come together and the logical ordering between them. We think showing the structure and consolidating related subtopics under one supertopic is a big improvement over the somewhat haphazard collection of playlists that had evolved over time. Here is what one of the new topics looks like:

The shift in terminology from "playlist" to "topic" is significant: Playlists are completely linear and imply a passive mode of consumption. The concept of a playlist does not include exercises (although the videos in the playlist do have related exercises) and so the natural flow of watching a playlist precludes stopping to do an exercise to practice the knowledge that you've just absorbed.

< topics >

So what is a topic? A topic can represent a single concept or related group of concepts, and unlike playlists they are arranged in a tree. So "Math" is a top-level concept, with "Algebra" beneath it and "Solving linear equations" beneath that. Within these lowest-level topics is our content: Not just videos, but eventually exercises and any other tools that we come up with to teach that topic. The ordering is meaningful so a student can look at the content in a topic and see what relevant videos and exercises teach this concept and the logical order to complete them.

This has both immediate and long-term effects on how students find and use our content. In the short term, we hope that specific and relevant content will be easier to find on the homepage, and there should be less confusion about what concepts that student is actually learning in each topic. In the longer term, we want to encourage a more active and nonlinear mode of participation: students will have the option to interleave videos and exercises for a more active learning session, and students will be able to tackle specific topics by setting goals and getting feedback on a more granular level how they're progressing. This will also help us clean up the Knowledge Map so it's easier to read and navigate.

< api >

One important note for users of the public API: If you have been using http://www.khanacademy.org/api/v1/playlists to get a list of playlists for displaying our videos in your application, you will want to migrate to the new topic tree API: http://www.khanacademy.org/api/v1/topictree. The playlists are becoming less coherent as we reorganize content and will eventually be deprecated in favor of the new organization, and all the new features in the API will be referencing topics. We look forward to seeing what navigation methods you find effective for the topic tree.

< ta-da! >

We are really proud of this architectural and pedagogical improvement. We hope to have all the topics under Math organized in the next few months, and to start rolling out exciting new features that leverage this infrastructure soon. Watch this space, and please leave your feedback in the comments!