[Bernardo Heynemann] r³ – A quick demo of usage

My new map/reduce engine project, r³ got a lot of attention last week and before that in twitter, facebook and even hackernews.

So I decided to write a sample project demoing the usage of r³.

The problem

I had to find an interesting, yet simple problem to show in this demo. Since I am a huge fan of github, I decided that I would show each committer’s percentage of commits in a given repository.

GitHub has a VERY nice API that you can use to retrieve a myriad of information on your own repositories or on other people’s repositories (provided they are public).

You just have to access https://api.github.com/repos/mirrors/linux/commits?per_page=100&top=master to get the first 100 commits in the linux kernel repository. The resulting document comes with a link header that specifies where the next 100 commits can be found.

The Input Stream

Cool! So my map/reduce operation should operate on top of all commits for a given project. That means that in my input stream I just need to capture all those commits and return them.

I just built a simple crawler that keeps looking for the next page of commits until it can’t find one.

To save myself some time and bandwidth it also stores those commits in a temp folder as means of caching them.

WARNING: The make run command will install some python packages. If you don’t want them to be installed system-wide, create a virtualenv before running the command.

Interesting Trivia

I ran r3-gh against some famous repositories and got some interesting information. Be advised that the number of commits does not reflect code committed and/or effort spent, since some people commit more often than others. This is meant simply as trivia and as a way of demoing r³.

That said, let’s take a look at the rails repository (total of 25974 commits):

Now let’s see how django is distributed among committers (total of 12403 commits):

It’s worth noting that I excluded every committer that had less than 1% of commits (and more than 0.5% for the linux kernel), so the percentages are a little off.

Conclusion

It is pretty simple to get r³ to do some cool calculations for us. I got the whole sample in a very short amount of time. It took me more time to write this post than to make r³ calculate the commiter percentages.

Hope you guys come up with some interesting stuff to calculate as well.