The Goldfish

Main Menu

Social Networks

Running Flask on Spark2

@Tim Hughes · Apr 23, 2018 · 3 min read

The following is a quickstart for running Flask on Spark.

Most of the example tutorials I have found are for running a bunch of spark jobs
on a spark cluster and returning a result. I was interested in long-running
tasks and seeing if I could build a web app that ran on Spark. I thought it
would be possible but I didn’t think it would be this easy. Please note that this
same procedure will work for lots of python scripts, and I am interested to see
what else I can load into Spark.

Prerequisites:

Java runtime

Python 3 (It will probably run in python 2 with minor changes)

Apache Spark (instructions below)

If you haven’t installed Spark then grab the latest build from
https://spark.apache.org/downloads.html and untar it in a directory somewhere.
This doesn’t need to be anywhere special and I just used ~/Downloads/

Change into the root of the extracted Spark directory::

cd ~/Downloads/spark-2.3.0-bin-hadoop2.7/

Create the following file start_standalone.sh to start spark. This is optional but it helps me. Set
JAVA_HOME correctly for you system, the trick below works for me.

You should also make sure you are running the same version of python when you
start Spark as when you do spark-submit. The best way to do that is to quickly
create a virtualenv and activate it. Install Flask in there while we are at it.

python3 -m venv env
source env/bin/activate
pip install flask

Now we can start our cluster (of one)

./spark-standalone.sh

You should now be able to access the Spark UI at http://localhost:8080/ and see
that it has 1 worker attached.

Next we write our Python Flask script. I have created 2 routes, one of which
calculates Pi using the example from the source code
examples/src/main/python/pi.py with a few tweaks to remove the arguments and
change them to GET parameters. `