HDP Analyst: Data Science Training

Training Calendar

Training

Date

Training Time

Delivery Methods

HDP Analyst: Data Science

<h3>About Training</h3><p>This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.</p><br /><h3>What You'll Learn</h3><ul><li>At the completion of the course students will be able to:Recognize use cases for data scienceDescribe the architecture of Hadoop and YARN</li><li>Recognize use cases for data science</li><li>Describe the architecture of Hadoop and YARN</li><li>Describe supervised and unsupervised learning differences</li><li>List the six machine learning tasks</li><li>Use Mahout to run a machine learning algorithm on Hadoop</li><li>Describe the data science life cycle</li><li>Use Pig to transform and prepare data on Hadoop</li><li>Write a Python script</li><li>Use NumPy to analyze big data</li><li>Use the data structure classes in the pandas library</li><li>Write a Python script that invokes SciPy machine learning</li><li>Describe options for running Python code on a Hadoop cluster</li><li>Write a Pig User-Defined Function in Python</li><li>Use Pig streaming on Hadoop with a Python script</li><li>Write a Python script that invokes scikit-learn</li><li>Use the k-nearest neighbor algorithm to predict values</li><li>Run a machine learning algorithm on a distributed data set</li><li>Describe use cases for Natural Language Processing (NLP)</li><li>Perform sentence segmentation on a large body of text</li><li>Perform part-of-speech tagging</li><li>Use the Natural Language Toolkit (NLTK)</li><li>Describe the components of a Spark application</li><li>Write a Spark application in Python</li><li>Run machine learning algorithms using Spark MLlib</li><li>Take data science into production</li></ul><br /><h3>Who Should Attend</h3><p>Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.</p><br /><h3>Outline</h3><ul><li>Setting Up a Development Environment</li><li>Using HDFS Commands</li><li>Using Mahout for Machine Learning</li><li>Getting Started with Pig</li><li>Exploring Data with Pig</li><li>Using the IPython Notebook</li><li>Data Analysis with Python</li><li>Interpolating Data Points</li><li>Define a Pig UDF in Python</li><li>Streaming Python with Pig</li><li>K-Nearest Neighbor and K-Means Clustering</li><li>Using NLTK for Natural Language Processing</li><li>Classifying Text using Naive Bayes</li><li>Spark Programming and Spark MLlib</li></ul><br />

<h3>About Training</h3><p>This course Provides instruction on the processes and practice of data science, including machine learning and natural language processing. Included are: tools and programming languages (Python, IPython, Mahout, Pig, NumPy, pandas, SciPy, Scikit-learn), the Natural Language Toolkit (NLTK), and Spark MLlib.</p><br /><h3>What You'll Learn</h3><ul><li>At the completion of the course students will be able to:Recognize use cases for data scienceDescribe the architecture of Hadoop and YARN</li><li>Recognize use cases for data science</li><li>Describe the architecture of Hadoop and YARN</li><li>Describe supervised and unsupervised learning differences</li><li>List the six machine learning tasks</li><li>Use Mahout to run a machine learning algorithm on Hadoop</li><li>Describe the data science life cycle</li><li>Use Pig to transform and prepare data on Hadoop</li><li>Write a Python script</li><li>Use NumPy to analyze big data</li><li>Use the data structure classes in the pandas library</li><li>Write a Python script that invokes SciPy machine learning</li><li>Describe options for running Python code on a Hadoop cluster</li><li>Write a Pig User-Defined Function in Python</li><li>Use Pig streaming on Hadoop with a Python script</li><li>Write a Python script that invokes scikit-learn</li><li>Use the k-nearest neighbor algorithm to predict values</li><li>Run a machine learning algorithm on a distributed data set</li><li>Describe use cases for Natural Language Processing (NLP)</li><li>Perform sentence segmentation on a large body of text</li><li>Perform part-of-speech tagging</li><li>Use the Natural Language Toolkit (NLTK)</li><li>Describe the components of a Spark application</li><li>Write a Spark application in Python</li><li>Run machine learning algorithms using Spark MLlib</li><li>Take data science into production</li></ul><br /><h3>Who Should Attend</h3><p>Architects, software developers, analysts and data scientists who need to apply data science and machine learning on Hadoop.</p><br /><h3>Outline</h3><ul><li>Setting Up a Development Environment</li><li>Using HDFS Commands</li><li>Using Mahout for Machine Learning</li><li>Getting Started with Pig</li><li>Exploring Data with Pig</li><li>Using the IPython Notebook</li><li>Data Analysis with Python</li><li>Interpolating Data Points</li><li>Define a Pig UDF in Python</li><li>Streaming Python with Pig</li><li>K-Nearest Neighbor and K-Means Clustering</li><li>Using NLTK for Natural Language Processing</li><li>Classifying Text using Naive Bayes</li><li>Spark Programming and Spark MLlib</li></ul><br />

Training Details

Training Time

:

3 Days

Capacity

:

12

Prerequisites

:

Students must have experience with at least one programming or scripting language, knowledge in statistics and/or mathematics, and a basic understanding of big data and Hadoop principles. Students new to Hadoop are encouraged to attend the HDP Overview: Apache Hadoop Essentials course

Related Trainings

This course is designed for developers who want to create custom YARN applications for Apache Hadoop. It will include: the YARN architecture, YARN development steps, writing a YARN client and ApplicationMaster, and launching Containers. The course us ...

This course provides a technical overview of Apache Hadoop. It includes high-level information about concepts, architecture, operation, and uses of the Hortonworks Data Platform (HDP) and the Hadoop ecosystem. The course provides an optional primer f ...

Cloudera University’s one-day Python training course will teach you the key language concepts and programming techniques you need so that you can concentrate on the subjects covered in Cloudera’s developer courses without also having to learn a compl ...

Hadoop Data Platform capabilities evolve continuously through the power of open community innovation. Meanwhile your business requirements are also changing at a fast pace, with more data applications and increased workloads being added to your clus ...

This training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark.Topics include: Essential understanding of HDP a ...