External Libraries & Spark

If you need third-party (pip-installed) libraries on the cluster, it's possible, but there's a little work to be done. Your home directory is shared on all of the cluster nodes, but the Spark executors are running as a different user that your own.

Make sure everything is installed in your home directory. You can do that by installing the modules you need like this:

pip3 install --user --force-reinstall --ignore-installed pygpx

Make sure that directory is readable by the executor processes (that aren't running as your userid):