Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.

Apache Spark with Scala By Example

Advance your Spark skills and become more valuable, confident, and productive

4.1
(86 ratings)

Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.

Prior programming or scripting experience in at least one programming language is preferred, but not required.

If you are training for a new career or looking to advance your career

You are curious how and when the Apache Spark ecosystem might be beneficial for your operations or product development efforts

Description

Understanding how to manipulate, deploy and leverage Apache Spark is quickly becoming essential for data engineers, architects, and data scientists. So, it's time for you to stay ahead of the crowd by learning Spark with Scala from an industry veteran and nice guy.

This course is designed to give you the core principles needed to understand Apache Spark and build your confidence through hands-on experiences.

In this course, you’ll be guided through a wide range of core Apache Spark concepts using Scala source code examples; all of which are designed to give you fundamental, working knowledge. Each section carefully builds upon previous sections, so your learning is reinforced along every step of the way.

All of the source code is conveniently available for download, so you can run and modify for yourself.

Here are just a few of concepts this course will teach you using more than 50 hands-on examples:

Learn the fundamentals and run examples of Spark's Resilient Distributed Datasets, Actions and Transformations through Scala

Let's show and describe the structure of this Apache Spark with Scala course from a high level.

What Apache Spark topics will be covered?

Why is it structured this way?

What are the course activities and resources?

After watching this video, you'll know how each section in this course builds upon each other. So, as we progress through Spark Core and Spark SQL, we know these beginning sections will be relevant when learning Spark Streaming and Spark MLlib.

Download, review and run the source code. Customize the source code and re-run. The way to build confidence is through doing.

Participate in the course discussion boards. Through discussion and collaboration, you'll have the opportunity to teach others and ask questions. This will strengthen your Spark with Scala skills.

A note for Windows users.

Where and how to download the course source code.

How to Succeed in this Course

01:46

Provides link to download all source code used in this Apache Spark with Scala course.

Course Source Code

00:04

+–

Introducing the Apache Spark Fundamentals

3 Lectures
13:34

Before we jump into Spark with Scala examples, let's presenting a high-level overview of the key concepts you need to know. These fundamentals will be used throughout the rest of this Spark with Scala course.

We're going to be running many examples in this next section. I don't expect you to follow every detail. Rather, I just want to experience loading external data and run some simple examples of Spark Transformations and Actions.

To begin the course, let's run some Spark code with Scala from the shell.

I don't expect you to follow all the details of this code. I just want to get us motivated to continue our Spark learning adventure.

In this example, we'll get a glimpse into Spark core concepts such as Resilient Distributed Datasets, Transformations, Actions and Spark drivers from a Scala perspective. Again, I'll fill in all the details of this Scala code in later lectures.

Let's run some Apache Spark code!

06:21

Before moving to more advanced examples, we need to ensure the Apach Spark fundamentals are understood. This quiz will ensure the student is ready to proceed.

[Milestone] Quiz - Spark Core Fundamentals

3 questions

+–

Preparing up your Spark environment

4 Lectures
07:29

In this section of the Spark with Scala course, we'll set up and verify your Spark with Scala environment. With your own environment in place, you can choose to run the course examples and experiment with the Scala Spark API.

Walk through all steps required to setup Apache Spark on your machine.

Download and Install Spark

03:22

We need sample data to run Scala examples in the Spark Console. This lecture will prepare the Apache Spark environment for loading data and confirm the Spark console.

[Milestone] Prepare Sample Data Source and Confirm Console

03:03

Reference links used in this section of the Spark with Scala course

Setup Resources

00:12

+–

Deeper Dive into Spark Actions and Transformations

6 Lectures
24:46

There are two kinds of Spark functions: Transformations and Actions. Transformations transform an existing RDD into a new, different one. Actions are functions used against RDDs to produce a value.

In this section of the Apache Spark with Scala course, we'll go over a variety of Spark Transformation and Action functions.

This should build your confidence and understanding of how you can apply these functions to your uses cases. It will also create more foundation for us to build upon in your journey of learning Apache Spark with Scala.

We're going to break Apache Spark transformations into groups. In this video, we'll cover some common spark transformations which produce RDDs. These include map, flatMap, filter, etc.

We're going to use a CSV dataset of baby names in New York. As we progress through transformations and actions in this Apache Spark with Scala course, we'll determine more and more results for this sample data set.

So, let's begin with some commonly used Spark transformations.

Transformations Part 1

07:48

In part 2 of Spark Transformations, we'll discover spark transformations used when we need to combine, compare and contrast elements in two RDDs. This is something we often have to do when working with datasets. Spark helps compare RDDs through transformation functions union, intersection, distinct, etc.

Transformations Part 2

01:49

In part 3 of our focus on Spark Transformation functions were going work with the "key" functions including groupByKey, reduceByKey, aggregateByKey, sortByKey

All these transformations work with key,value pair RDDs, so we will cover the creation of PairRDDs as well.

We'll continue to use the baby_names.csv file used in Part 1 and Part 2 of Spark Transformations

Transformations Part 3

06:30

Test and confirm your knowledge of Spark Transformations.

[Milestone] Transformation Quiz

3 questions

Run and review common Spark actions. You have already seen many Spark action examples before this lecture, so we will go quickly to review.

Spark Actions produce values back to the Spark Driver program. Also, recall that Action functions called against RDD cause a previously lazy RDD to be evaluated. So, in the real world when working with large datasets, we need to be careful when triggering RDDs to be evaluated through Spark actions.

This video shows commonly used Spark Actions.

Actions

06:13

Test and confirm knowledge of Spark Actions.

[Milestone] Actions Quiz

2 questions

Links to conveniently download the Spark source code examples presented in this section of the course. Also, links to the latest programming guides for SparkTransformations and Actions is included.

Transformations and Actions Source Code and Programming Guides

00:13

+–

Utilizing Clusters with Apache Spark

7 Lectures
26:31

Clusters allow Spark to processes huge volumes of data by distributing the workload across multiple nodes. This is also referred to as "running in parallel" or "horizontal scaling"

A cluster manager is required to Spark on a cluster. Spark supports 3 types of cluster managers including Apache YARN, Apache Mesos and an internal cluster manager distributed with Spark called Standalone.

Setup, compile and package a Scala Spark program using `sbt`. `sbt` is short for "simple build tool" and is most often used in Scala based projects.

This is easy example to ensure you're ready for more advanced build and cluster deploys later in this Apache Spark with Scala course.

[Milestone] Deploy a Scala Program to a Cluster

06:34

Let's configure an Apache Spark cluster running on two instances of Amazon EC2.

Create an Amazon EC2 Based Cluster Part 1

05:54

Before the EC2 cluster is ready to use from local running shell, we need to open port 7077.

Create an Amazon EC2 Based Cluster Part 2

02:54

Review key takeaways from this section on Spark running in a cluster and deploying a Scala based Spark program to the cluster.

[Milestone] Cluster Section Recap

02:52

To reinforce the key takeaways from the Cluster section of the course

Cluster Section Quiz

4 questions

Convenient link to download all source code used in this section

Cluster Section Resources

00:12

+–

Spark SQL

6 Lectures
31:42

Spark SQL background, key concepts and high-level examples of CSV, JSON and mySQL (JDBC) data sources. This lecture lays the groundwork for next lectures in this course section. It provides overview examples and common patterns of Spark SQL from a Scala perspective.

Spark SQL uses a type of Resilient Distributed Dataset called DataFrames which are composed of Row objects accompanied with a schema. The schema describes the data types of each column. A DataFrame may be considered similar to a table in a traditional relational database.

Methodology

We’re going to use the baby names dataset and the spark-csv package available from Spark Packages to make our lives easier. The spark-csv package is described as a “library for parsing and querying CSV data with Apache Spark, for Spark SQL and DataFrames” This library is compatible with Spark 1.3 and above.

Spark SQL with CSV source

06:05

Let's load a JSON input source to Spark SQL’s SQLContext. This Spark SQL JSON with Scala portion of the course has two parts. The first part shows examples of JSON input sources with a specific structure. The second part warns you of something you might not expect when using Spark SQL with JSON data source.

Methodology

We are going to use two JSON inputs. We’ll start with a simple, trivial example and then move to an analysis of more realistic JSON example.

Spark SQL with JSON source

08:56

Now that we have Spark SQL experience with CSV and JSON, connecting and using a mySQL database will be easy. So, let’s cover how to use Spark SQL with Scala and a mySQL database input data source.

Overview

We’re going to load data into a database. Then, we’re going to fire up spark-shell with a command line argument to specifiy the JDBC driver needed to connect to the JDBC data source. We’ll make sure we can authenticate and then start running some queries.

Spark SQL with mySQL (JDBC) source

05:38

Earlier in the course, we performed a simple deploy to an Apache Spark Cluster. Let's build upon the simple example and deploy our Spark SQL code examples.

Deploying the Spark SQL examples introduces a new challenge. How do we deploy when our application uses 3rd party libraries such as CSV parsing and JDBC drivers?

[Milestone] Spark SQL Deploying to a Spark Cluster

07:43

Links to download Spark SQL code examples and videos on setting up mySQL

Present an overview of the lessons contained in this Spark Streaming section. For some of you, you may be able to skip the first two examples and move to a more complex Spark Streaming custom application.

Spark Streaming Overview

00:52

To ensure your environment is ready for more complex Spark Streaming examples, let's run through a trivial example. This is a word count example which streams for the netcat utility found on Linux and Mac. For windows users, check https://nmap.org/ncat/ which may be used to run this example.

Spark Streaming Example Part 1

02:53

Let's continue to take one step at a time as we are learning Spark Streaming. In this example, we will build and deploy a spark streaming application to a Spark cluster.

Spark Streaming Example Part 2

07:07

This video demonstrates our custom Spark Streaming application and how you can configure Slack to stream your own channel content.

I think it's important to show you running example of Spark Streaming application a

Spark Streaming Application - Streaming from Slack

05:44

Spark Streaming example code review. Answers the questions -- how do I write my own custom receiver and how did the Slack Spark Streaming example work?

Spark Streaming Custom Example Code Review

10:11

Our Spark Streaming with Slack program contains 3rd party libraries. As we've seen previously in the course, we can use the sbt-assembly plugin to make "fat jars" for Spark Driver programs using 3rd party libraries.

But, what happens when things do not deploy according to plan?

In this video, we'll cover three advanced issues when deploying to a Spark Cluster and how to address.

1) What happens if your Spark Driver program is compiled to Scala 2.11, but you are deploying to Spark compiled to Scala 2.10?

2) What happens if your 3rd party library conflicts with your Spark Cluster?

3) What to do if your Spark Cluster uses a jar which is older and incompatible with a jar needed by your driver program?

[Advanced] Spark Streaming Deploy to Cluster Introduction

01:24

In this video, we'll cover three advanced issues when deploying to a Spark Cluster and how to address.

1) What happens if your Spark Driver program is compiled to Scala 2.11, but you are deploying to Spark compiled to Scala 2.10?

2) What happens if your 3rd party library conflicts with your Spark Cluster?

3) What to do if your Spark Cluster uses a jar which is older and incompatible with a jar needed by your driver program?

[Milestone] Advanced Spark Deploy Troubleshooting and Tactics

02:54

A list of resources used in this Spark Streaming section of the Apache Spark course tutorials

Spark Streaming Resources

00:14

+–

Spark Machine Learning

5 Lectures
23:00

Machine Learning is an exciting and growing topic of interest these days. Let's start this section on Spark MLlib with a background on Machine Learning.

Afterwards, we'll have a foundation of machine learning concepts when we run demos and review source code in later videos in this Spark MLlib section.

A suggested list of free resources for machine learning and Spark MLlib.

Spark Machine Learning (MLlib) Resources

00:13

+–

Conclusion and Suggested Next Steps

2 Lectures
01:52

Conclusion of version 2 of the Apache Spark with Scala course. We review the content of version 2 of this course, suggested next steps and ask for ideas for version 3 of the Apache Spark with Scala course.

Todd has an extensive and proven track record in software development leadership and building solutions for the world's largest brands and Silicon Valley startups.

His courses are taught using the same skills used in his consulting and mentoring projects. Todd believes the only way to gain confidence and become productive is to be hands-on through examples. Each new subject should build upon previous examples or presentation, so each step is also a way to reemphasis a prior topic.