Hadoop Training

Hadoop Training is delivered by Subject Matter Experts currently managing, landscapes in real-time environments. The course and its contents are developed by subject matter experts and solution Architects,who deliver the solutions to Fortune 500 and Top-Notch MNCs around the globe.

“Demonstrate your expertise with the most sought after technical skills. Big data success requires professionals who can prove their mastery with the tools and techniques of the Hadoop stack”

Hadoop training in Forscher Education Syllabus

Module 1: Apache Hadoop

Introduction to Big Data & Hadoop Fundamentals

Dimensions of Big data

Type of Data generation

Apache ecosystem & its projects

Hadoop distributors

HDFS core concepts

Modes of Hadoop employment

HDFS Flow architecture

HDFS MrV1 vs. MrV2 architecture

Types of Data compression techniques

Rack topology

HDFS utility commands

Min h/w requirements for a cluster & property files changes

Module 2 : MapReduce Framework

Goal : In this module, you will understand Hadoop MapReduce framework and the working of MapReduce on data stored in HDFS. You will understand concepts like Input Splits in MapReduce, Combiner & Partitioner and Demos on MapReduce using different data sets.

Objectives – Upon completing this Module, you should be able to understand MapReduce involves processing jobs using the batch processing technique.

MapReduce can be done using Java programming.

Hadoop provides with Hadoop-examples jar file which is normally used by administrators and programmers to perform testing of the MapReduce applications.

Module 3 : Apache Hive

Goal : This module will help you in understanding Hive concepts, Hive Data types, Loading and Querying Data in Hive, running hive scripts and Hive UDF.

Objectives – Upon completing this Module, you should be able to understand Hive is a system for managing and querying unstructured data into a structured format.

The various components of Hive architecture are metastore, driver, execution engine, and so on.

Metastore is a component that stores the system catalog and metadata about tables, columns, partitions, and so on.

Hive installation starts with locating the latest version of tar file and downloading it in Ubuntu system using the wget command.

While programming in Hive, use the show tables command to display the total number of tables.

Introduction to Hive & features

Hive architecture flow

Types of hive tables flow

DML/DDL commands explanation

Partitioning logic

Bucketing logic

Hive script execution in shell & HUE

Module 4 : Apache Pig

Goal : In this module, you will learn Pig, types of use case we can use Pig, tight coupling between Pig and MapReduce, and Pig Latin scripting, PIG running modes, PIG UDF, Pig Streaming, Testing PIG Scripts. Demo on healthcare dataset.

Objectives – Upon completing this Module, you should be able to understand Pig is a high-level data flow scripting language and has two major components: Runtime engine and Pig Latin language.

Pig runs in two execution modes: Local mode and MapReduce mode. Pig script can be written in two modes: Interactive mode and Batch mode.

Pig engine can be installed by downloading the mirror web link from the website: pig.apache.org.

Topics:

Introduction to Pig concepts

Pig modes of execution/storage concepts

Pig program logics explanation

Pig basic commands

Pig script execution in shell/HUE

Module 5 : Apache Hbase

Goal : This module will cover Advanced HBase concepts. We will see demos on Bulk Loading, Filters. You will also learn what Zookeeper is all about, how it helps in monitoring a cluster, why HBase uses Zookeeper.

Objectives – Upon completing this Module, you should be able to understand HBasehas two types of Nodes—Master and RegionServer. Only one Master node runs at a time. But there can be multiple RegionServersat a time.

The data model of Hbasecomprises tables that are sorted by rows. The column families should be defined at the time of table creation.

There are eight steps that should be followed for installation of HBase.

Some of the commands related to HBaseshell are create, drop, list, count, get, and scan.

Topics:

Introduction to Hbase concepts

Introdcution to NoSQL/CAP theorem concepts

Hbase design/architecture flow

Hbase table commands

Hive + Hbase integration module/jars deployment

Hbase execution in shell/HUE

Module 6 : Apache Sqoop

Goal : Sqoop is an Apache Hadoop Eco-system project whose responsibility is to import or export operations across relational databases. Some reasons to use Sqoop are as follows:

SQL servers are deployed worldwide

Nightly processing is done on SQL servers

Allows to move certain part of data from traditional SQL DB to Hadoop

Transferring data using script is inefficient and time-consuming

To handle large data through Ecosystem

To bring processed data from Hadoop to the applications

Objectives – Upon completing this Module, you should be able to understand Sqoop is a tool designed to transfer data between Hadoop and RDBs including MySQL, MS SQL, Postgre SQL, MongoDB, etc.

Sqoop allows the import data from an RDB, such as SQL, MySQL or Oracle into HDFS.

Topics:

Introduction to Sqoop concepts

Sqoop internal design/architecture

Sqoop Import statements concepts

Sqoop Export Statements concepts

Quest Data connectors flow

Incremental updating concepts

Creating a database in MySQL for importing to HDFS

Sqoop commands execution in shell/HUE

Module 7 : Apache Flume

Goal : Apache Flume is a distributed data collection service that gets the flow of data from their source and aggregates them to where they need to be processed.

Objectives – Upon completing this Module, you should be able to understand Apache Flume is a distributed data collection service that gets the flow of data from their source and aggregates the data to sink.

Flume provides a reliable and scalable agent mode to ingest data into HDFS.

Topics:

Introduction to Flume & features

Flume topology & core concepts

Property file parameters logic

Module 8 : Apache HUE

Goal : Hue is a web front end offered by the ClouderaVM to Apache Hadoop.

Objectives – Upon completing this Module, you should be able to understand how to use hue for hive,pig,oozie.

Topics:

Introduction to Hue design

Hue architecture flow/UI interface

Module 9 : Apache Zookeeper

Goal : Following are the goals of ZooKeeper:

Serialization ensures avoidance of delay in reading or write operations.

Reliability persists when an update is applied by a user in the cluster.

Atomicity does not allow partial results. Any user update can either succeed or fail.

Simple Application Programming Interface or API provides an interface for development and implementation.

Objectives – Upon completing this Module, you should be able to understand ZooKeeper provides a simple and high-performance kernel for building more complex clients.

ZooKeeper has three basic entities—Leader, Follower, and Observer.

Watch is used to get the notification of all followers and observers to the leaders.

Topics:

Introduction to zookeeper concepts

Zookeeper principles & usage in Hadoop framework

Basics of Zookeeper

Module 10 : Administration concepts

Goal:

Explain different configurations of the Hadoop cluster

Identify different parameters for performance monitoring and performance tuning

Explain configuration of security parameters in Hadoop.

Objectives – Upon completing this Module, you should be able to understand Hadoop can be optimized based on the infrastructure and available resources.

Hadoop is an open-source application and the support provided for complicated optimization is less.

Optimization is performed through xml files.

Logs are the best medium through which an administrator can understand a problem and troubleshoot it accordingly.

Hadoop relies on the Kerberos based security mechanism.

Topics:

Principles of Hadoop administration & its importance

Hadoop admin commands explanation

Balancer concepts

Rolling upgrade mechanism explanation

Get in touch with US!!

Contact Us

Forscher Education is a first-of-a-kind Training, Research and Development center with offices in Chennai, India and Cyberjaya, Malaysia.Forscher takes an immense pride in imparting the right knowledge in the areas of Virtualization, Storage, Server systems, Cloud, Networking based technologies and Applications.