Next I created a new Dockerfile from sequenceiq/hadoop-docker which uses Centos as the Linux distribution. The Dockerfile updates packages, overwrites the yarn-site.xml file to bypass virtual memory limitations, copies scripts to the Docker container, downloads/installs Scala, and sets a few environment variables. Contents:

I provided a simple script (install_scala.sh) to download and install Scala on the Docker container. NOTE: the sequenceiq/hadoop-dockercontainer is built on Java 7, so I did not use latest stable Scala (2.12) which requires Java 8.