You can also run Spark programs using spark-submit without entering theinteractive shell by supplying "submit" as the command to run the image.Arguments after "submit" are passed to spark-submit. For example, to run thesame example as above type:

Notice that the path to the jar is a path inside the container. You can passjars in the host filesystem as arguments if you map the directory where theyreside as a Docker volume.

Finally, you can also start an interactive Spark shell with:

$ docker run -it --rm cloudsuite/spark shell

Again, this is just a shortcut for starting a container and running/opt/spark-1.5.1/bin/spark-shell. Try running a simple parallelized count:

$ sc.parallelize(1 to 1000).count()

Multi Container

Usually, we want to run Spark with multiple workers to parallelize some job. InDocker it is typical to run a single process in a single container. Here weshow how to start a number of workers, a single Spark master that acts as acoordinator (cluster manager), and submit a job.

First, create a new Docker network. Let's call it spark-net. If we are runningon a single machine, a bridge network will do.

$ docker network create --driver bridge spark-net

We need to create this new network because Docker does not support automaticservice discovery on the default bridge network.

For a multi-node setup, where multiple Docker containers are running onmultiple physical nodes, all commands remain the same if using Docker Swarm asthe cluster manager. The only difference is that the new network needs to be anoverlay network instead of a bridge network.