Deployment of Containerized Spark

The Hadoop distributed cache mechanism ensures that the base Spark and Hadoop
libraries along with the related configuration, which are installed on the gateway hosts,
are distributed automatically to all the Spark hosts in the cluster. YARN automatically
mounts the base libraries to the Docker containers where the Spark executors also
run.

In addition, any binaries (–files, –jars and other such files) that the user explicitly
includes at the time of application submission, are also made available through
distributed cache.

The following diagram outlines how containerized Spark is deployed on YARN: