Even so, if you're puzzling through Spark's many complexities and capabilities, you may want to turn to books that offer a true guided tour of the material.

We've assembled a survey of the best of the books currently on the market — from introductions for novices to deep-dive explorations for veterans:

Learning Spark: Lightning-Fast Big Data Analysis — by Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia. This definitive guide comes from a core team of Spark insiders and is designed to get you up and running fast. Learn to quickly express parallel jobs and set up everything from simple batch jobs to streaming processing and machine learning.

Getting Started With Apache Spark — Jim Scott. A friendly, free, online introduction for new-comers. Scott offers step-by-step instructions to take users from installation to core capabilities (RDDs, Data Frames, Spark SQL, Spark Streaming, and the Machine Learning library). He ends with real-world production use cases.

Share on

Date

Tags

Newsletter

You Might Also Enjoy

There’s no denying that data analytics is the next frontier on the computational landscape. Companies are scrambling to establish teams of data scientists to better understand their clientele and how best to evolve product solutions to the ebb and flow of today’s business ecosystem. With Apache Hadoop and Apache Spark entrenched as the analytic engine and coupled with a trial-and-error model to... Read More

Alluxio is fast virtual storage for Big Data. Formerly known as Tachyon, it’s an open-source memory-centric virtual distributed storage system (yes, all that!), offering data access at memory speed and persistence to a reliable storage. This technology accelerates analytic workloads in certain scenarios, but doesn’t offer any performance benefits in other scenarios. The purpose of this blog is to... Read More

Newly appointed (anointed?) Apache Spark committer Holden Karau isn't resting on her laurels. See her talk this Thursday at Spark Summit East where she'll be presenting "a monster identification guide... Read More

Apache Spark CommitterpysparksparkML

Spark Technology Center

The Apache Software Foundation has no affiliation with and does not endorse or review the materials provided on this website, which is managed by IBM. Apache®, Apache Spark™, and Spark™ are trademarks of the Apache Software Foundation in the United States and/or other countries.