This course is for novice programmers or business people who would like to understand the core tools used to wrangle and analyze big data. With no prior experience, you will have the opportunity to walk through hands-on examples with Hadoop and Spark frameworks, two of the most common in the industry. You will be comfortable explaining the specific components and basic processes of the Hadoop architecture, software stack, and execution environment. In the assignments you will be guided in how data scientists apply the important concepts and techniques such as Map-Reduce that are used to solve fundamental problems in big data. You'll feel empowered to have conversations about big data and the data analysis process.

YN

Easy into into big data architecture with minimal previous development requirements. Relevant to anyone who wants to grasp basic concepts of how data flows/processed/analyzed in hadoop.

SW

Jan 19, 2017

Filled StarFilled StarFilled StarFilled StarFilled Star

This is a great introductory course for entry level Hadoop learner. I hope more content can be added into this course. This course overlaps with other big data courses offered by USDC.

De la lección

Introduction to Map/Reduce

This module will introduce Map/Reduce concepts and practice. You will learn about the big idea of Map/Reduce and you will learn how to design, implement, and execute tasks in the map/reduce framework. You will also learn the trade-offs in map/reduce and how that motivates other tools.

Impartido por:

Natasha Balac, Ph.D.

Interdisciplinary Center for Data Science

Paul Rodriguez

Research Programmer

Andrea Zonca

HPC Applications Specialist

Transcripción

Hi. Welcome to introduction to Map/Reduce. My name is Paul Rodriguez. I work here at SDSE helping folks with different kinds of data analysis problems. In this module, you will learn the concept for the Map/Reduce framework, and strategies for using Map/Reduce. You will also go through details of some Map/Reduce examples, as well as the execution of Hadoop. In a previous module, you learned about the architecture of Hadoop, and in a previous course, you learned about the challenges of big data. So this module will start putting these things together. The first lecture, I wanna set up the context and motivate the need for Map/Reduce. Let's recall what the problem is. Imagine you have a large amount of data. And let's suppose the data's growing. It's perhaps unstructured, and you somehow wanna process this data. Well, having big data means you're gonna need lots of hard drives. And maybe that data's already spread out among many hard drives. And you can imagine that you're a company or your project collects Internet data. You need to process that data but you don't wanna worry about all the details of parallelizing and communicating between processes and all the potentially messy details that that entails. In fact, this is like the problem that Google faced with Internet searching and they developed an approach to solve their problem which is to bring computation to the data. And moreover, as we will see, they wanted to make it easy to develop a code without worrying about all the messy details. Before we go further, let's be clear on something. If you have a lot of data spread out over many disks and if your data is transactional, meaning that maybe you have a lot of customer records, those records get retrieved, they get updated, processed for billing and so forth and so on, you might wanna go to traditional database scheme where you have a database management system, you build indices, set schemas, you organize the data into tables. But if you need to make a lot of sweep through data and perform some relatively simple processing, Then it would be better to have a system that helps you apply functions to pieces of the data that are spread out and then organize the output. That, in a nutshell, is the Map/Reduce framework. It is a layer of software to help you bring computation to the data and organize the output. The next video will get into the framework in detail.