Need Custom Training for Your Team?

Call Us

Inquire About This Course

Instructor

Sumit Pal

The instructor for this course has more than 22 years of experience in various roles spanning companies from startups to enterprises. He has worked for Microsoft (SQL server development team), Oracle (OLAP development team) and Verizon (as an Director for Big Data Architecture). Currently, he consults for multiple clients advising them on their data architectures and big data solutions and does hands on coding with Spark, Scala, Java and Python. Author of recently published book : SQL on Big data, he has extensive experience in building scalable systems across the stack from middletier, data tier to visualization for analytics applications, using Big Data, NoSQL DB and has deep expertise in Database Internals, Data Warehouses, Dimensional Modeling, Data Science with Java and Python and SQL.

Duration: 10h 41m

Course Description

This Big Data online training gives one the background necessary to start doing analyst work on Big Data. It covers - areas like Big Data basics, Hadoop basics and tools like Hive and Pig - which allows one to load large data sets on Hadoop and start playing around with SQL Like queries over it using Hive and do analysis and Data Wrangling work with Pig.
This online Big Data training also teaches Machine Learning Basics and Data Science using R and also covers Mahout briefly - a Recommendation, Clustering Engine on Large data sets.
The course includes hands-on exercises with Hadoop, Hive , Pig and R with some examples of using R to do Machine Learning and Data Science work

What am I going to get from this course?

Students will get a good idea of Big Data Landscape, Learn basics of Big Data and Hadoop and HDFS.

Students will also learn to use tools like - Hive and Pig - both from a theoretical aspect as well as Hands on.

Students will Learn some amount of R and SparkR ( a big data processing framework )

Students will learn about Mahout and also about Data Science and where it is used

Students will learn basics of some Data Science Algorithms like - Decision Trees, Naive Bayes and Clustering algorithms and do hands on work with them

Students will learn about R on Hadoop - tools and solutions

Students will also learn how to use Hadoop Virtual Machines on their laptop

Prerequisites and Target Audience

What will students need to know or do before starting this course?

Interest in Data and some SQL and general aptitude

Who should take this course? Who should not?

The course is open for anyone who likes to know about Big Data tools and technologies and someone who is interested in knowing about Data Science and the algorithms and where they are used

It will be useful for both Business Analysts as well as Managers and anyone interested in working with big data.

Curriculum

Module 1: Big Data Analytics Overview

01:02:40

Lecture 1
Introduction

08:14

Introduction to the course and contents

Lecture 2
How Big Data Affects Our Daily Life

22:39

Lecture 3
Big Data Analytics Overview

16:43

Discuss State of Practice in Analytics and the disruption happening
How Big Data is usurping the traditional analytics

Lecture 4
Big Data Analytics Across Verticals

15:04

Discuss usage of Big Data in different verticals and newly evolving field of IOT and Cybersecurity and how Big Data is so essential for them

Module 2: Big Data Analytics with Hadoop

01:40:06

Lecture 5
What is Hadoop?

18:49

Motivation for Hadoop and Distributed Data Processing, new Architectures and History of Hadoop

Lecture 6
Hadoop - Key Platform Components and Architecture

15:21

Here we cover how Hadoop evolved to what it is today and its main components and the reason why Hadoop exists

Lecture 7
Hadoop Cluster

23:33

This module covers details about a Hadoop Cluster and how data splitting and data compression is so essential for Hadoop

Lecture 8
HDFS and Map Reduce Architecture

15:58

This section covers in details about the 2 major components that build up Hadoop - HDFS and Map Reduce and their internals

Lecture 9
Hadoop Ecosystem

12:41

In this section we cover about Hadoop Ecosystem, Deployment architectures and major Hadoop Vendors and also when, where and how to use Hadoop deployments

Lecture 10
Installation Hands-on and Resources Download

13:44

Module 3: Hive

01:27:17

Lecture 11
Hive Overview

15:14

In this section we discuss how Hive fits into the overall Hadoop Architecture and what is Hive and what it is Not

Lecture 12
Hive Architecture

15:55

In this section we discuss about Hive Architecture as well as Hive basic command level details - how to create tables, data types and support for complex data types

Lecture 13
How to connect Tableau to Hive

14:42

In this section we see a basic demo of how to setup Tableau to connect to Hive installation on your laptop or VM

Lecture 14
Hive Tables, Partitions and Data Formats

15:19

In this section we discuss Hive Tables, Data Formats and how to do data partitioning in Hive for better performance and scalability

Lecture 15
Hive deeper details

12:36

In this section we cover more capabilities of Hive - Functions and Joins, Other Hive Queries and building UDFs and Importing and Exporting Data

Lecture 16
Hive Hands On Video

13:31

Hands on Video showing how to work with Hive and walk through of a sample example

Module 4: PIG

56:50

Lecture 17
Pig Overview

14:16

In this section we discuss how Pig fits into the Hadoop Ecosystem and an introduction to Pig
How Pig works
What is Pig
What Pig is Not

Lecture 18
Pig Data Types and Operators

15:13

In this section we cover more operators and commands available in Pig and their usage with examples. This is the meat of Pig

Lecture 19
Pig Hands On

07:00

In this section we show the video of how to start using pig and some sample examples

Lecture 20
Deeper Into Pig - Some Advanced Things on Pig

20:21

More operators and advanced concepts in Pig

Module 5: Introduction to R

50:13

Lecture 21
What is R?

18:42

In this section we learn about the basics of the R Programming Language and the Data Exploration Capabilities of R

Lecture 22
Data Ingestion and Manipulation with R

13:01

In this section we learn the capabilities of R for doing basic Data Ingestion / Reading and Manipulation

Lecture 23
Data Visualization with R

18:30

Here we learn how to do some basic data visualization with R

Module 6: R with Big Data ( Hadoop and Spark )

46:45

Lecture 24
R with Big Data - 1

15:11

Here we cover how R and Big Data Technologies have evolved and adapted for processing large data sets using R language constructs but with Map Reduce and Spark as the underlying engines to run R code

Lecture 25
R with Big Data - 2

20:01

Here we cover how R and Big Data Technologies have evolved and adapted for processing large data sets using R language constructs but with Map Reduce and Spark as the underlying engines to run R code

Lecture 26
R with SparkR

11:33

See working examples of using R on Spark

Module 7: Fundamentals of Machine Learning

01:21:16

Lecture 27
Basics of Machine Learning

13:33

What is Machine Learning, Data Science and where they are used

Lecture 28
Road to Data Science

13:37

In this section we discuss the kind of skills and capabilities needed to become a data scientists
We discuss the Life cycle of Data Science projects
Everyday usage of Data Science based algorithms

Lecture 29
Basic Concepts and Terminology and their meaning

15:18

In this section we discuss the basics concepts for Data Science things like Bias and Variance and why they are important to go further into this field

Lecture 30
Basic Concepts and Terminology and their meaning

10:52

This is an additional module to the previous one - where we discuss more of the fundamental concepts and terminology of the different things in Machine Learning and Data Science and how they help us to build the right algorithms

Lecture 31
Classification and Regression

09:42

In this section we look at the basics of Classification and Regression Algorithms

Lecture 32
Naive Bayes and Decision Trees

18:14

In this section we look at 2 of the most commonly used algorithms in the field of Data Science - Naive Bayes and Decision Trees

Module 8: Installation and Hands On Exercise

58:31

Lecture 33
Installation (RECAP)

13:44

In this section we will install and setup the VM
The zip file contains the following files
-Install.txt - Start here - following the instructions (This has been tested on Windows 7 and Windows 10 laptop )
-Vagrant_README.md -- The above Install.txt file will also tell you to refer to this file and do the steps as mentioned in this file for Installation and Setup
-VagrantNotes.txt -- This file will tell you how to copy files from your laptop to the VM
These 2 files are to be used when setting up connectivity to Hive from Tableau
TableauConnectToHive.png -
HortonworksHiveODBC64.msi

Lecture 34
Hands On working session with Hadoop and HDFS

09:34

This is to be tried only after the VM has been installed and it is working and you are comfortable working with the VM
See the zip file - This has some very basic Hadoop and HDFS commands for you to use and get used to Hadoop. Also available in the zip file is a sample dataset (Text file ) for you to use for your Hadoop commands

Lecture 35
Hands on Working session and Exercises with Hive (RECAP)

13:31

This lecture contains the resources to work with Hive Examples.
The zip file has all the examples and code for you to try and play with Hive and learn the Commands and Querying capabilities of Hive
The HivePigData.zip file - has all the data you need to do the exercises

Lecture 36
Connecting Tableau to Hive (RECAP)

14:42

In this Lecture we will see demo of how to connect Tableau to Hive ( just the connectivity part ) not doing the actual visualization of data in Tableau ( that is not part of the course )
The zip file has the ODBC Driver to connect to Hive from Tableau and a Screen Image of how to do the setup - look at the video with this Lecture to do the setup
ODBC Driver - HortonworksHiveODBC64.msi
Tableau to Hive Connection Setup - TableauConnectToHive.png

Lecture 37
Hands on Exercise with Pig (RECAP)

07:00

In this module we will do some sample exercises to learn Pig more deeply.
The zip file has the code / exercises to do with Pig
The HivePigData.zip fiile has the datasets we would be using.
Look at the video for the sample example of how to learn pig

Resource 1
Code and Data Sets

This is not a lecture per-se - but the code (R) and Data Sets for Module 10 - where we learnt - Cluster Analysis, Decision Trees, Descriptive Statistics and little bit of probability

Resource 2
DataSets to Download for Module 5

Resource 3
DataSets to Download for Module 5 - SparkR

Module 9: Apache Mahout Introduction

27:01

Lecture 38
Mahout Basics

11:18

This is the only section in the module - which discusses the basics of Mahout - where it started and where it is going and the capabilities and algorithms it has in built for large scale data science.

Lecture 39
Mahout Demo for Recommendation

15:43

This section shows how to run Mahout's recommendation engine from out of the box and 1 configurable example developed by the instructor to run Mahout's Recommendation Engine

Module 10: Data Analysis and Statistical Methods

01:10:20

Lecture 40
Cluster Analysis Part 1

10:13

This section walks through the different ways of doing Clustering of data using out of the box algorithms in R

Lecture 41
Cluster Analysis - Part 2

12:28

This section walks through the different ways of doing Clustering of data using out of the box algorithms in R

Lecture 42
Statistical Method - Part 1

08:31

This section covers - Descriptive Statistics part of Data Analysis using R

Lecture 43
Statistical Method - Part 2

08:15

This section covers - basics of Probability Theory part of Data Analysis using R

Lecture 44
Statistical Method - Part 3

07:26

This section covers - Inferential Statistics part of Data Analysis using R

Lecture 45
Decision Tree - Part 1

13:54

This section covers - building decision trees using R with sample examples and demos

Reviews

10 Reviews

Ben J

January, 2017

Excellent course with right contents in terms of coverage and right amount of depth to get started, up and running. The trainer had done lot of hard work in building the right slides and right content which is appropriate for this extensive subject area

Martin R

January, 2017

Good bang for the buck - gets the trainee up to speed with the Big Data Analyst Skills in a short but comprehensive course. The course has a good balance of hands on and theoretical content

Sanjay M

February, 2017

One of the best courses on Big Data. I have been searching for something like this for a while. I have taken many other courses before but the way Sumit takes you to the journey of Big Data is quite unique. He starts off with a big picture and explains each and every aspect of Hadoop with hands on exercises. Highly recommended. Must for anyone to learn hadoop the right way.

Donald S

May, 2017

The course also helped me to learn big data processing framework and algorithms broadly. Particularly the use of Hadoop virtual machine was useful as I could easily do it on my laptop. Sure, the course can prepare us for Cloudera's business analyst certification.

Victor L

May, 2017

Before I enrolled for the course, I was not a confident big data analyst. However, the course has changed me a lot and made me more confident with a broad spectrum of big data technologies learning. I could gain excellent knowledge in the basics of big data, Hadoop, Pig, and Hive, apart from Machine learning and SQL. The exercises were so good that I gained practical knowledge in the work of data science. In addition, one can learn much more on the usage of big data in different verticals and newly evolving field of IOT and Cybersecurity.

Sue B

May, 2017

Overall it is a very good course even for the experienced ones as they can brush up their knowledge with the developments in big data.

Pavithra S

July, 2017

I was uncomfortable with big data analytics. After enrolling in this course of study, I was a changed person with more confidence studying big data methodologies.

Peter B

July, 2017

The tests were so valuable that I had sound wisdom in the practice of data technique. Notably, the Hadoop virtual machine was helpful as I could comfortably deal with it on my computer.

Nishad D

July, 2017

I picked up great understanding in the rudiments of Hadoop, Machine learning, Hive and Pig, besides big data and SQL. The curriculum also pushed me to get up to speed on big data processing scheme and algorithms broadly.

Thomas B

July, 2017

The program prepares you for Cloudera's business analyst certification. Additionally, you get a deeper understanding of the management of big data in several verticals and newly expanding area of IOT and security. All in all, it is a rather useful course for skilled people as they can brush up on their familiarity with improvements in big data.

Inquire About This Course

Please fill in the details and our support team will get back to you within 1 business day.