Getting Started with Spark on OS X

Installation and first steps

September 12, 2015

With impressive performance results and intuitive support for streaming data,
Apache Spark is one of the hottest discussion topics across the big data community
and start-up lofts around the globe.
Whether Spark will end up replacing Hadoop or whether the two will continue to coexist
is up for debate.
But this much is for certain: it is definitely worth having a good look at. Especially
from a developer’s point of view, Spark is quite a tease as it comes with an invaluable practical
feature: interactive Python and Scala shells!

While Spark obviously thrives in large-scale cluster deployments,
possibly on top of Apache Mesos, a local installation is a cheap and easy
way to explore all key features of the Spark framework. Here’s how to get started
with Spark on OS X - it just takes a few minutes.

Installation

The best way to install the latest version of Apache Spark
on OS X and to keep it up to date is via Homebrew.

brew install apache-spark

The above command installs the latest version of Apache Spark
on your Mac. By the time I wrote this post, this was version 1.5.
If you don’t have Java installed on your system, the installation
will abort and print instructions how to install
the latest Oracle JDK.

After the installation has completed, you’ll find your Spark
installation in /usr/local/Cellar/apache-spark/1.5.0. All relevant
paths were added automatically to your environment by Homebrew.

Next, you can change Spark’s log-level to something a little less
verbose. First, copy the log4j template file;