Hadoop Has Arrived: Hadoop Summit 2009

Hadoop is the leading open-source project for MapReduce computation and supporting infrastructure (such as HDFS, the Hadoop Distributed FileSystem based on the GFS design). The 2008 Hadoop summit saw about 150 attendees; 2009 had literally five times that number. I am not a Hadoop expert but as a Cassandra developer, I’m interested in meeting people working with large datasets and there was no better place for that than the Hadoop summit.

Hadoop summit videos are not out yet, but should be soon. My favorite talks were the ones on Amazon Elastic MapReduce, Pig, and Hive. (At The Rackspace Cloud, we compete with Amazon but I have to give them credit for their talk!) Pig and Hive are both projects that offer a higher-level language for writing MapReduce jobs, with slightly different approaches. We use Pig internally.

I should also mention that the first 500 people to register at the Hadoop summit were given a free copy of Hadoop: The Definitive Guide. I would recommend this for anyone looking for an introduction to both using and administering Hadoop.

The NoSQL conference the next day featured an overview of a half-dozen of the most interesting open-source distributed databases, and CouchDB, which is targeting scaling down to mobile devices rather than out to hundreds of servers in your datacenter. NoSQL videos are up, and of course I have to point out the comment calling the Cassandra presentation (by Avinash Lakshman of Facebook) “hands-down the most interesting.” Besides ours, I would recommend Todd’s overview as well as the Voldemort and HBase talks. Yes, there are cases I would use one of those instead of Cassandra, but that’s a subject for another post! (In the meantime, Toby Negrin from Yahoo posted some notes on each.)

Want to know more about Hadoop at Rackspace? Be sure to check out this video interview from building43.com.

If you want to try out Hadoop or a distributed database but don’t have a cluster of your own, visit our Cloud Servers page for more information.

About the Author

This is a post written and contributed by
Angela Bartels.

Angela runs integrated marketing campaigns for Rackspace. She started at Rackspace in 2003 and has done everything from Linux support, account management, sales, product marketing and now lives in marketing. She left Rackspace in 2005 to work for PEER 1 Hosting but returned in 2009 because she was interested in the cloud computing movement (and has always been a Racker at heart). Angela is a strong believer in the power of storytelling.