Delivering this course:

Tzach is a system architect specializing in Scaling systems and development teams, Java and Scala developer, Functional Programming enthusiast, Big Data practitioner and StackOverflow reputation junkey. Tzach spent the last 10+ years at Kenshoo, helping to scale the company's systems from the wild startup days into its current enterprize-grade set of solutions, serving as a developer, team lead, tech lead and Chief Architect.

Scalable data processing with Apache Spark

Scalable data processing with Apache Spark introduces you to the popular, open-source processing framework that took over the Big Data landscape. From basic concepts all the way to configuration and operations, you will learn how to model data processing algorithms using Spark's APIs, how to monitor, analyze and optimize Spark's performance, how to deploy and build Spark applications, and how to use Spark's various APIs (RDD, SQL, DataFrame and Dataset).

Objectives

This course teaches you how to:

Get started with writing Apache Spark applications

Setup a local Apache Spark environment

Implement data processing algorithms using Spark's core (RDD) API

Use Spark's Web UI to monitor and analyze Spark jobs

Optimize Spark jobs using caching and broadcasting

Use Accumulators to optimize cumulative aggregations

Understand Apache Spark's resiliency and error handling

Implement Spark jobs using SQL, DataFrame and DataSet APIs

Use various dataformats and data storage engines with Spark

Intended Audience

This course is intended for individuals responsible for designing and implementing solutions using Apache Spark, namely Solutions Architects and SysOps Administrators, Data Scientists and Data Engineers interested in learning about Apache Spark.

Prerequisites

We recommend that attendees of this course have the following prerequisites:

Proficiency in at least one of the following programming languages: Java8 (including Lambdas), Scala, Python