Sessions at Strata 2012 about Pig and MapReduce on Tuesday 28th February

This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems.

The agenda will include:

The rationale for Hadoop

Understanding the Hadoop Distributed File System (HDFS) and MapReduce

Common Hadoop use cases including recommendation engines, ETL, time-series analysis and more

How Hadoop integrates with other systems like Relational Databases and Data Warehouses

Overview of the other components in a typical Hadoop “stack” such as these Apache projects: Hive, Pig, HBase, Sqoop, Flume and Oozie

This tutorial will explain how to leverage a Hadoop cluster to do data analysis using Java MapReduce, Apache Hive and Apache Pig. It is recommended that participants have experience with some programming language. Topics include:

Why are Hadoop and MapReduce needed?

Writing a Java MapReduce program

Common algorithms applied to Hadoop such as indexing, classification, joining data sets and graph processing