Channels

Services

Pentaho opens its "Big Data" Kettle

Pentaho has announced that it is placing its Pentaho Kettle project under an Apache 2.0 Licence. Kettle is an ETL (Extraction, Transformation and Loading) engine and also the name of the project of which the ETL engine is the core component.

Kettle, the project, includes PHD (Pentaho Hadoop Node Distribution), which is the Kettle engine packaged for running on a Hadoop cluster where transformations can be run as a map task, reduce task or combined, to take advantage of Hadoop's clustering. PHD is destined to "become unnecessary" though as Kettle is set to evolve to use the Hadoop distributed cache. Kettle is able to operate as an ETL engine with Cassandra, Hadoop HDFS, Hadoop MapReduce, Hadapt, HBase, Hive, HPCC Systems and Mongo DB.

The Kettle project also includes Spoon, a visual desktop for designing ETL transformations and jobs; Pan and Kitchen, command line transformation and jobs execution programs; and Carte, a web server for executing transformations and jobs driven by XML. There is also a report designer and data integration server in the Kettle project. The combination of platform and tools should, say Pentaho, allow a developer to extract and load relational data in parallel into a Hadoop HDFS without writing a single line of code. Kettle, Map/Reduce and the tools are introduced in a video from Pentaho:

Executing Kettle transforms in the MapReduce cluster

More information on Pentaho Kettle is available from the home page of the Pentaho Big Data Community where further videos and how-tos are provided, along with downloads of preview builds of the Kettle 4.3 client, Hadoop Node Distribution and Report Designer (all available for Windows, Mac and Linux). A release candidate of Kettle 4.3 is due towards the end of February and a stable release by the end of March.