Category

You are here: Home ▶ Databases ▶ Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce

Posted on April 8th, 2012

“Data-Intensive Text Processing with MapReduce”, written by Jimmy Lin and Chris Dyer, is available in pdf format for free. This book focuses on MapReduce algorithm design, with an emphasis on text processing algorithms common in natural language processing, information retrieval, and machine learning.

Description

MapReduce is a programming model for expressing distributed computations on massive amounts of data and an execution framework for large-scale data processing on clusters of commodity servers. It was originally developed by Google and built on well-known principles in parallel and distributed processing dating back several decades.

MapReduce has since enjoyed widespread adoption via an open-source implementation called Hadoop, whose development was led by Yahoo (now an Apache project). Today, a vibrant software ecosystem has sprung up around Hadoop, with signicant activity in both industry and academia.