Cognitive Class

Accessing Hadoop Data Using Hive

Writing MapReduce programs to analyze Big Data can get complex. In this Accessing Hadoop Data Using Hive course, you will get a solid foundation on using Apache Hive, a tool that can help make querying your data much easier. You will learn how to query, summarize, and analyze large data sets stored in Hadoop compatible file systems.

Tell your friends

About This Course

In this Apache Hive course you'll learn how to make querying your data much easier. First created at Facebook, Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems.

Learn how to write MapReduce programs to analyze your Big Data

Learn Hive QL, a language the provides a mechanism to project structure onto this data and query the data.

Learn how to use Hive for Data Warehousing tasks on your Big Data projects.

Course Syllabus

Module 1 - Introduction to Hive

Describe what Hive is, what it’s used for and how it compares to other similar technologies

Describe the Hive architecture

Describe the main components of Hive

List interesting ways others are using Hive

Module 2 - Hive DDL

Create databases and tables in Hive, while using a variety of different Data Types

Run a variety of different DDL commands

Use Partitioning to improve performance of Hive queries

Create Managed and External tables in Hive

Module 3 - Hive DML

Load data into Hive

Export data out of Hive

Run a variety of different Hive QL DML queries

Lesson 4 - Hive Operators and Functions

Use a variety of Hive Operators in your queries

Utilize Hive’s Built-in Functions

Explain ways to extend Hive functionality

General information

This Hive course is free.

It is self-paced.

It can be taken at any time.

It can be audited as many times as you wish.

Labs can be performed on the Cloud, or using a 64-bit system. If using a 64-bit system, you can install the required software (Linux-only), or use the supplied VMWare image. More details are provided in the section "Labs setup".

Recommended skills prior to taking this course

Requirements

Course Staff

Aaron Ritchie

Aaron Ritchie has worked in the Information Management division of IBM for over 8 years and has held a variety of roles within the Center of Excellence and Education groups. Aaron has worked as an IT Specialist, Learning Developer, and Project Manager. He is certified in multiple IBM products and enjoys working with an assortment of open-source technologies. Aaron holds a Bachelor of Science in Computer Science degree from Clarkson University and a Master of Science in Information Technology degree from WPI.

Daniel Tran

Daniel Tran is an IBM Co-op Student working as a Technical Curriculum Developer in Toronto, Ontario. He develops courses to improve the education of customers who seek knowledge in the Big Data field. He has also reworked previously developed courses, updating them to be compatible with the newest software releases, as well as work at the forefront of recreating courses on a newly developed cloud environment. He has worked with various components that deal with Big Data, including Hadoop, Pig, Hive, HBase, MapReduce & YARN, Sqoop, Oozie, and Phoenix. He has also worked on separate courses involving Machine Learning. Daniel is from the University of Alberta, where he has completed his third year of traditional Computer Engineering Co-op.