Development Workflow

Developing and Deploying Spark applications can be a challenge upfront. This section helps you through the development and deployment workflow.

When developing a Spark application that uses external dependencies, typically there are two challenges a developer is confronted with:

How to efficiently and quickly develop locally.

How to reliably deploy to a production cluster.

Apart from the actual development, the biggest hurdle most of the time is the dependency management aspect. This documentation is intended to guide you through and to help you get set up as needed.

There are two ways the problem is typically approached: you can either add the dependencies to the classpath when you submit the application or you can create a big jar which contains all dependencies.

Adding the Connector to the Executor's classpath

If you want to manage the dependency directly on the worker, the actual project setup is quite simple. You don't need to set up shadowing and can use a build.sbt like this:

Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration

There are different approaches to handle this, but all boil down to conditionally setting the system property for the master. One approach that is useful is to have a runnable class in the test (or somewhere else) directory which sets the property and then runs the application. This also has additional benefits that are discussed in the next section.

You can then build your application via sbt package on the command line and submit it to Spark via spark-submit. Because Couchbase distributes the connector via maven, you can just include it with the --packages flag:

If your environment does not have access to the internet you can use the --jars argument instead and grab the assembly with all the dependencies from here: Download and API Reference.

Deploying a jar with dependencies included

The previous example shows how to add the connector during the submit phase as a dependency. If more than one dependency needs to be managed this can become cumbersome quickly since you also need to make sure that your build.sbt is always in sync with your command line parameters.

The alternative is to create a "fat jar" with batteries included that ships everything in the first place. This requires more setup on the project side but saves you some work later on.