20775 Performing Data Engineering on Microsoft HD Insight

COURSE OVERVIEW

The main purpose of the course is to give students the ability plan and implement big data workflows on HDInsight.

AUDIENCE AND PREREQUISITES

The primary audience for this course is data engineers, data architects, data scientists, and data developers who plan to implement big data engineering workflows on HDInsight.

In addition to their professional experience, students who attend this course should have:

Programming experience using R, and familiarity with common R packages.

Knowledge of common statistical methods and data analysis best practices.

Basic knowledge of the Microsoft Windows operating system and its core functionality.

Working knowledge of relational databases.

*Course Cost listed does not include the cost of courseware or exam. Course is subject to a minimum enrollment to run. Course may run virtually as a Virtual Instructor-Led (VILT) class if the minimum enrollment is not met. If the course is under the minimum enrollment the course may run as 4 day class (Bootcamp Style). For more information, please contact learn@vtec.org or call 207-775-0244.

COURSE TOPICS:

Module 1: Getting Started with HDInsight What is Big Data? Introduction to Hadoop Working with MapReduce Function Introducing HDInsightLab : Working with HDInsight

Module 11: Implementing Streaming Solutions with Kafka and HBase Building and Deploying a Kafka Cluster Publishing, Consuming, and Processing data using the Kafka Cluster Using HBase to store and Query DataLab : Implementing Streaming Solutions with Kafka and HBase