Data Science at Scale using Spark and Hadoop training is a 3 day instructor-led class where you will learn how scientists use data to solve problems by understanding the tools and techniques they use. Through in-class simulations, participants apply data science methods to real-world challenges in different industries and prepare for data scientist roles in the field.

Upon completion of the Data Science at Scale using Spark and Hadoop Training course, attendees are encouraged to continue their study and register for the Cloudera Certified Professional: Data Scientist (CCP-DS) exam.

Customize It:

With onsite Training, courses can be scheduled on a date that is convenient for you, and because they can be scheduled at your location, you don’t incur travel costs and students won’t be away from home. Onsite classes can also be tailored to meet your needs. You might shorten a 5-day class into a 3-day class, or combine portions of several related courses into a single course, or have the instructor vary the emphasis of topics depending on your staff’s and site’s requirements.

Proficiency in a scripting language
Python is strongly preferred
Perl or Ruby is sufficient
Basic knowledge of Apache Hadoop
Experience working in Linux environments

What You Will Learn:

How to identify potential business use cases where data science can provide impactful results
How to obtain, clean and combine disparate data sources to create a coherent picture for analysis
What statistical methods to leverage for data exploration that will provide critical insight into your data
Where and when to leverage Hadoop streaming and Apache Spark for data science pipelines
What machine learning technique to use for a particular data science project
How to implement and manage recommenders using Spark’s MLlib, and how to set up and evaluate data experiments
What are the pitfalls of deploying new analytics projects to production, at scale

Course Content:

Module1: Data Science Overview

What Is Data Science?
The Growing Need for Data Science
The Role of a Data Scientist