Seeing something in space fly overhead with the naked eye is pretty awesome. It's not only a technical marvel to enjoy, but for me it helps keep things in perspective and my thoughts on the big picture. Ever since the first time I saw the International Space Station fly by, I've felt that it is something everyone should experience. I started looking at existing software to help remind me (and others) when the next pass of the ISS would be, and while there are a few good websites out there for satellite prediction, none of them did exactly what I wanted. So a few months ago I decided to try and build my own. The biggest piece of my project was going to be getting the actual satellite data. So I looked around for existing data APIs that I could use, but couldn't find anything other than scraping existing sites (which is not only fragile, but some have no published data policy). So, the first step was to figure out how to to generate the predictions myself.

When developing a web application, it's common for your feature development to be ahead of any data administration tools you might need. While some frameworks, such as django, have built in admin tools for managing your application's data, many don't. At my current job we use Zend Framework, and have rolled our own lightweight model layer which does not have any fancy automatic admin tools. In addition to supporting database pools and selectors for partitioning of data, it also has transparent caching of data. While this is good for speed, it precludes you from making any data storage changes without going through this model layer or the cache will get out of sync. For now, rather than doing any DB queries directly, we do them through CLI jobs built for a specific task (batch updates, etc). This works fairly well, but often things come up that aren't yet supported in the application's admin tools, such as changing a user's status bits, etc.. Things that a CLI console would be useful for.

We've been using Twitter's kestrel queue server for a while now at work, but only from our service layer, which is written in python. Now that we have some queueing needs from our application layer, written in PHP, I spent a few days this week adding queue support to our web application. I thought I'd share what I learned, and how I implemented it.

Goals

The kestrel server itself was pretty straightforward to get up and running. The only thing I would point out is that I recommend sticking to release branches, as master was fairly unstable when I tried to use it. Regarding implementing the client, there were a few goals I had in mind when I started:

Since kestrel is built on the memcache protocol, try and leverage an existing memcache client rather than build one from scratch

Keep the queue interface generic in case we change queue servers later

Utilize existing kestrel management tools, only build out the the functionality we need

With these goals in mind, I ended up with 4 components: a kestrel client, a producer, a consumer, and a very small CLI harness for running the consumer. But before I even coded anything, I set up kestrel web, a web UI for kestrel written by my co-worker Matt Erkkila. Kestrel web allows you to view statistics on kestrel, manage queues, as well as sort and filter queues based on manual inputs. Having this tool up and running from the get go made it easy to watch jobs get added and consumed from my test queue, and also easily flush out the queues as needed.

This week at work I got the chance to address the growing number of batch oriented CLI scripts for our main web application. While they weren't quite unmanageable yet, they were heading in that direction. There was too much common code, especially with bootstrapping the application and parsing options. Also, the location of scripts didn't really make sense... ./bin/bar.php, ./cron/foo.php, etc. So I decided to carve out some time and clean it up.

The goals were pretty straight forward:

Everything must use the application's model layer. This is mostly so that the built in caching will be consistent, but also to enforce that all data access goes through the same code.

When debugging problems in PHP, most of the time it's easiest to just add var_dump($foo); exit; in the middle of your script, and you can see the contents of $foo right in your browser. But if you have to do much more, this approach gets cumbersome pretty quickly. I've recently been using step debugging for harder to track down problems. It allows me to examine the state of things all the way through execution of a request, line by line, or skipping ahead to break points. This process also gives you more insight into everything else happening in a request, which can be useful when you're using frameworks or other 3rd party code in your application.