Skynet - A Ruby Map/Reduce Framework

When Geni decided to build a Newsfeed system it was obvious that we would need a distributed system to take the large amount of user generated content on the site, decide who would be interested in each piece of content, and finally integrate that content into each user’s Newsfeed. Skynet was developed as a general purpose distributed computing framework that could handle the needs of our Newsfeed system, but since then, we have started using it for other computationally expensive or merely asynchronous tasks. We use Skynet to send emails, handle large database migrations, build reverse indexes, invalidate caches, and run various other tasks triggered by user activity that need run asynchronously.

Before building Skynet we researched several other solutions. There are a few other Map/Reduce frameworks for Ruby, but none were robust enough to handle our volume. Other open-source solutions like Hadoop were robust enough, but less attractive since they’re not written in Ruby. We also looked into using a message queue system, but decided that a message queue alone didn’t provide the desired level of abstraction so that our programmers could focus on the problems they needed to solve rather than the details of a distributed system. We wanted something that would be dead-simple for our engineers who all know Ruby, but robust enough to run in a production environment under heavy load. So, we decided to build it ourselves.

Skynet is an adaptive, self-upgrading, fault-tolerant, and fully distributed system with no single point of failure. It uses a “peer recovery” system where workers watch out for each other. If a worker dies or fails for any reason, another worker will notice and pick up that task. Skynet also has no special ‘master’ servers, only workers which can act as a master for any task at any time. Even these master tasks can fail and will be picked up by other workers.

Skynet is a Map/Reduce framework with a simple interface, but much of Skynet’s power and robustness comes from the fact that it is built on top of a message queue. At the heart of Skynet is a plugin-based message queue architecture, so it is simple to adapt Skynet to your computing environment and needs. Currently, there are two message queue implementations available: one built on Rinda that uses Tuplespace and one built on MySQL. There is also an ApacheMQ plugin in the works, and hopefully, the Ruby community will add more as they implement adapters to support their environments.

At its core, Skynet was designed with Ruby programmers in mind. It is not only extremely robust, but also very easy to set-up and use. One example is Skynet’s ActiveRecord helpers, which make it easy to distribute processing of large database tables with a simple ActiveRecord-like interface. For example:

In this simple example, all of the AR objects created more than three days ago will be distributed asynchronously to workers which will call :some_method on each object. Skynet does this without actually instantiating or retrieving the IDs of any objects in the caller’s process. You can safely do this with a table of any size. Geni uses the above ActiveRecord plugin to write complex migrations which involve some ruby component along with the database. Skynet is also well suited to work with Rails as a background processor where user activity in a Rails app triggers Skynet jobs that are run asynchronously outside of Rails.

Skynet is already being used heavily in production at Geni, and can be downloaded for free at RubyForge.org under the MIT license. (http://skynet.rubyforge.org)

In this session, the designer of Skynet will describe and demonstrate the system, preparing anyone to start using Skynet to power their distributed computing tasks.

This talk will focus on:

1. What is a map/reduce framework

2. What is Skynet and how does it implement map/reduce

3. How to setup a Skynet cluster.

4. The various interfaces to Skynet, from simple to complex.

By the end of this talk anyone should be able to quickly download, install and try out Skynet.

People planning to attend this session also want to see:

Adam Pisoni

Geni.com

Adam Pisoni has been building large-scale web applications for over 10 years. He served as CTO of Cnation through the 90s and then as Architect and Director of Web Development for Shopzilla.com. He’s currently working as a Sr. Software Engineer at Geni.com, a family social networking startup. In his spare time he can be found backpacking and rock climbing in the Eastern Sierras or snowboarding on Mammoth Mountain.