Description

Big Data is now around for many years as the solution for nowadays challenges brought by the massive datasets available. The initial technologies were disruptive compared to legacy stacks, however they are now suffering the age, specially their usability is slowing down their introduction in the global market. Furthermore, the Data Science has been understood to be the underpinning needs required to leverage a good data management and their processing. Nevertheless, this brings more problems onto the tables, by shifting the needs from ETL to recurrent or stream processing. Apache Spark is rising out of the water with a new disruptive model allowing all kind of business to easily work with distributed technologies and process their Big or Fast Data.

Hence, this course deeply covers the underlying concepts behind the Apache Spark project. Although the model is simpler than other technologies, it is still fundamental to grasp the ideas and the features in Apache Spark that will allow any business to unleash the power of their infrastructure and, or data.

The focus in this course is to explain based on concrete and reproducible examples run interactively from the Spark Notebook. Not only Spark Core will be extensively decrypted but also the streaming and machine layers that are part of the global project. It’s a matter of fact that Spark is an important piece of modern architecture, but it cannot be the only one covering the whole pipeline, that’s why this seminar will also tackle the Spark ecosystem including its integration with the Apache project Kafka, Cassandra and Mesos.