«Spark for noobs», Paul Lysak

An introduction to Apache Spark for those who might have heard of it but never tried,

and for those who mastered some basic tutorials but still unsure how to use it in real project. It covers basic Spark concepts, execution model, overview of approaches for running Spark application with special focus on Amazon EMR, some not-so-complex tricks and performance considerations which are better to be taken into account from the beginning.