Twitter open sources Scalding

The social networking giant makes its Scala for Cascading API available to all – jumping on the Hadoop bandwagon?

Scala developers were all aflutter over an announcement from
Twitter’s development team. Why? Well, they’ve just open sourced
Scalding – a Scala version of Cascading, enabling many to analyse
reams and reams of data.

For those unaware, Cascading is a thin Java library that can sit
on top of Apache Hadoop’s MapReduce Layer that aims to cut out all
the difficult, tiring jobs. It has two major components:

a DSL to make MapReduce computations look very similar to
Scala’s collection API

A wrapper for Cascading to make it simpler to
define the typical use cases of jobs, tests and describing data
sources on a Hadoop Distributed File System (HDFS) or local
disk

As the entire ecosystem gets excited about Hadoop’s potential to
dominate Big Data, it is surely a good thing that this Scala
adaption of Cascading is now available to the masses, making sure
that developing, regression and integration testing and
deploying enterprise applications are made simpler for Scala
enthusiasts.

The supposed simplicity of Scalding was brought up by Boykin – a
rare claim about Scala at the moment. He wrote:

In comparison to languages such as Apache
Pig that separate the query language from the user
defined functionality, with Scalding everything is integrated into
one language. In most cases, one file will describe your
job.

Twitter’s use of Scala has been well documented, after
they switched from Ruby on Rails to a Java/Scala mix as they grew.
Given the amount of data the social networking site has to deal
with on a daily basis, you’d think they’d be a good judge for
simplifying anything to do with MapReduce

Why not try Scalding now – it’s available on Github and
you can follow the project on Twitter via
@scalding. Embrace
the Big Data revolution now, everyone else is.