17 July 2012

Futures as a Design Pattern for Refactoring

We've been using the NDB library in our app to manage all interactions with the DB. With NDB you use Futures to express asynchronous Python code. It feels a lot like gevent mixed with SQLAlchemy's ORM, but more thoughtfully integrated. What's interesting is how future-based programming has emerged as a design pattern for refactoring for performance. Let me explain:

We often have code that looks like this, especially for dashboards, where we need to do N independent queries to construct a view:

Initially this is fine because it's only a couple of queries in serial. Eventually our DashboardHandler grows to be 10 or 20 separate operations that need to join together to construct the response. At that point the DashboardHandler would be excruciatingly slow. It also gets long, as the various methods that fetch and traverse objects are added to the handler's get() method.

Using NDB we'll split up something like this. We do it by finding the logical units of work in the DB (focusing on what's being fetched) and breaking them out into methods that are tasklets which execute concurrently:

Now get_posts() and get_comments() will run in parallel, minimizing the idle time for the Python GIL thread, maximizing throughput. Simultaneously we've refactored our code to be more readable, logically separated, and potentially reusable. But it still reads like procedural code and can be tested synchronously, like this:

So this style of refactoring using futures is all win with very little effort. With almost just copying and pasting code sections you can get tremendous latency improvements through simultaneous I/O. And it's way easier to understand than continuation-passing style asynchronous programming. In general I wish more APIs worked this way. Maybe Tornado could be paired with something for a non-App-Engine solution?

As an aside, this also illustrates why I'm not optimistic about nodejs's longevity. Programmers don't understand asynchronous programming, even when they've been warned. Futures are the sanest way to transition a synchronous project to async without using threads. When I see there's resistance to officially adopting the project that makes Node feel imperative it makes its future as an application platform look grim. I think what gets popular is what's easy to learn that solves a real need; what lasts is what's easy to make robust and fast.

Onward

If you liked this
follow @haxor
,
or
get my best-of newsletter once a month