Re: Submit spark job from outside cluster

Hi, if you are using Cloudera Manager deployed cluster with parcels, add a new host to the list of host and then deploy YARN and SPARK GATEWAY roles on this node. This will trigger the CM and it will distribute the parcels on this edge node and "activate" it.

After that you should have on PATH the following commands: spark-submit, spark-shell (or spark2-submit, spark2-shell if you deployed SPARK2_ON_YARN)

If you are using Kerberos, make sure you have the client libraries and valid krb5.conf file. And make sure you have a valid ticket in your cache.

If submitting to a kerberized cluster, the easiest way is to mount a keytab file and the /etc/krb5.conf file in the docker container. Set the principal and keytab using spark.yarn.principal and spark.yarn.keytab, respectively.

For ports, 8032 of the Spark Master's (Yarn ResourceManager External) definitely needs to be open to traffic from the docker node. I am not sure if this is the complete list of ports - could another user verify?