Motivation

I want to use Spark with Jupyter and Python in a simple way to start getting results very soon. I am a data scientist I want to do things with the data.

I am going to use Docker containers the idea is that I do not want to create a configure a VM for Spark, Anaconda and so on. Also I want to learn more Docker. So I am going to follow some tutorials to create my cool work environment.

The -p 8888:8888 makes the container’s port 8888 accessible to the host (i.e., your local computer) on port 8888. This will allow us to connect to the Jupyter Notebook server since it listens on port 8888.

The -v /home/raf/Documents/spark-docker allows us to map our spark-docker folder ( to the container’s /home/raf/Documents/spark-docker working directory (i.e., the directory the Jupyter notebook will run from). This makes it so notebooks we create are accessible in our spark-docker folder on our local computer. It also allows us to make additional files such as data sources (e.g., CSV, Excel) accessible to our Jupyter notebooks.

The --name spark2 gives the container the name spark, which allows us to refer to the container by name instead of ID in the future.

To stop them docker stop spark2

To delete them: docker rm spark2

The final part of the command, jupyter/pyspark-notebook tells Docker we want to run the container from the jupyter/pyspark-notebook image.

References

The image with some documentation: https://github.com/jupyter/docker-stacks/tree/master/pyspark-notebook

Annex A. sudo usermod mystery

Apparently you should be very thoughtful when given access to particular capabilities, a way to do that in Ubuntu is through the creation of different groups.

usermod - modify a user account-a, --append Add the user to the supplementary group(s). Use only with the -G option.

-G, --groupsGROUP1[,GROUP2,...[,GROUPN]]] A list of supplementary groups which the user is also a member of. Each group is separated from the next by a comma, with no intervening whitespace. The groups are subject to the same restrictions as the group given with the -g option. If the user is currently a member of a group which is not listed, the user will be removed from the group. This behaviour can be changed via the -a option, which appends the user to the current supplementary group list.

Take a way: sudo usermod -aG docker raf add (a) the user raf (my user) to the group (G). sudo is because only a sudo administrator can make those kind of changes.

Specific references

Annex B: Your containers

You can see what do you have: docker images, if you do not have already what you want then you can search with: docker search key-word it will give you a list of images. You can also search for the images in google and you may see more details, such as the version of the software and some useful information and advice.