Big Data Analytics with R and Hadoop

If you’re an R developer looking to harness the power of big data analytics with Hadoop, then this book tells you everything you need to integrate the two. You’ll end up capable of building a data analytics engine with huge potential.

Big Data Analytics with R and Hadoop

Starting

Vignesh PrajapatiNovember 2013

If you’re an R developer looking to harness the power of big data analytics with Hadoop, then this book tells you everything you need to integrate the two. You’ll end up capable of building a data analytics engine with huge potential.

Book Details

About This Book

Write Hadoop MapReduce within R

Learn data analytics with R and the Hadoop platform

Handle HDFS data within R

Understand Hadoop streaming with R

Encode and enrich datasets into R

Who This Book Is For

This book is ideal for R developers who are looking for a way to perform big data analytics with Hadoop. This book is also aimed at those who know Hadoop and want to build some intelligent applications over Big data with R packages. It would be helpful if readers have basic knowledge of R.

Table of Contents

Chapter 1: Getting Ready to Use R and Hadoop

Installing R

Installing RStudio

Understanding the features of R language

Installing Hadoop

Understanding Hadoop features

Learning the HDFS and MapReduce architecture

Understanding Hadoop subprojects

Summary

Chapter 2: Writing Hadoop MapReduce Programs

Understanding the basics of MapReduce

Introducing Hadoop MapReduce

Understanding the Hadoop MapReduce fundamentals

Writing a Hadoop MapReduce example

Learning the different ways to write Hadoop MapReduce in R

Summary

Chapter 3: Integrating R and Hadoop

Introducing RHIPE

Introducing RHadoop

Summary

Chapter 4: Using Hadoop Streaming with R

Understanding the basics of Hadoop streaming

Understanding how to run Hadoop streaming with R

Exploring the HadoopStreaming R package

Summary

Chapter 5: Learning Data Analytics with R and Hadoop

Understanding the data analytics project life cycle

Understanding data analytics problems

Summary

Chapter 6: Understanding Big Data Analysis with Machine Learning

Introduction to machine learning

Supervised machine-learning algorithms

Unsupervised machine learning algorithm

Recommendation algorithms

Summary

Chapter 7: Importing and Exporting Data from Various DBs

Learning about data files as database

Understanding MySQL

Understanding Excel

Understanding MongoDB

Understanding SQLite

Understanding PostgreSQL

Understanding Hive

Understanding HBase

Summary

What You Will Learn

Integrate R and Hadoop via RHIPE, RHadoop, and Hadoop streaming

Develop and run a MapReduce application that runs with R and Hadoop

Handle HDFS data from within R using RHIPE and RHadoop

Run Hadoop streaming and MapReduce with R

Import and export from various data sources to R

In Detail

Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapReduce, offer alternatives to traditional data warehousing.

Big Data Analytics with R and Hadoop is focused on the techniques of integrating R and Hadoop by various tools such as RHIPE and RHadoop. A powerful data analytics engine can be built, which can process analytics algorithms over a large scale dataset in a scalable manner. This can be implemented through data analytics operations of R, MapReduce, and HDFS of Hadoop.

You will start with the installation and configuration of R and Hadoop. Next, you will discover information on various practical data analytics examples with R and Hadoop. Finally, you will learn how to import/export from various data sources to R. Big Data Analytics with R and Hadoop will also give you an easy understanding of the R and Hadoop connectors RHIPE, RHadoop, and Hadoop streaming.

Authors

Vignesh Prajapati

Vignesh Prajapati, from India, is a Big Data enthusiast, a Pingax (www.pingax.com) consultant and a software professional at Enjay. He is an experienced ML Data engineer. He is experienced with Machine learning and Big Data technologies such as R, Hadoop, Mahout, Pig, Hive, and related Hadoop components to analyze datasets to achieve informative insights by data analytics cycles.
He pursued B.E from Gujarat Technological University in 2012 and started his career as Data Engineer at Tatvic. His professional experience includes working on the development of various Data analytics algorithms for Google Analytics data source, for providing economic value to the products. To get the ML in action, he implemented several analytical apps in collaboration with Google Analytics and Google Prediction API services. He also contributes to the R community by developing the RGoogleAnalytics' R library as an open source code Google project and writes articles on Data-driven technologies.
Vignesh is not limited to a single domain; he has also worked for developing various interactive apps via various Google APIs, such as Google Analytics API, Realtime API, Google Prediction API, Google Chart API, and Translate API with the Java and PHP platforms. He is highly interested in the development of open source technologies.
Vignesh has also reviewed the Apache Mahout Cookbook for Packt Publishing. This book provides a fresh, scope-oriented approach to the Mahout world for beginners as well as advanced users. Mahout Cookbook is specially designed to make users aware of the different possible machine learning applications, strategies, and algorithms to produce an intelligent as well as Big Data application.

Alerts & Offers

Series & Level

We understand your time is important. Uniquely amongst the major publishers, we seek to develop and publish the broadest range of learning and information products on each technology. Every Packt product delivers a specific learning pathway, broadly defined by the Series type. This structured approach enables you to select the pathway which best suits your knowledge level, learning style and task objectives.

Learning

As a new user, these step-by-step tutorial guides will give you all the practical skills necessary to become competent and efficient.

Beginner's Guide

Friendly, informal tutorials that provide a practical introduction using examples, activities, and challenges.

Essentials

Fast paced, concentrated introductions showing the quickest way to put the tool to work in the real world.

Cookbook

A collection of practical self-contained recipes that all users of the technology will find useful for building more powerful and reliable systems.

Blueprints

Guides you through the most common types of project you'll encounter, giving you end-to-end guidance on how to build your specific solution quickly and reliably.

Mastering

Take your skills to the next level with advanced tutorials that will give you confidence to master the tool's most powerful features.

Starting

Accessible to readers adopting the topic, these titles get you into the tool or technology so that you can become an effective user.

Progressing

Building on core skills you already have, these titles share solutions and expertise so you become a highly productive power user.