Spark Connector 2.0

Compatibility

Every version of the Couchbase Spark connector is compiled against a specific Spark target. The following table lists the compatible versions:

Table 1. Couchbase Spark connector compatibility

Couchbase Spark connector version

Apache Spark target version

2.0.x

2.0.x

1.2.x

1.6.x

1.1.x

1.5.x

1.0.x

1.4.x

Note that if the internal Spark APIs do not break between minor versions, it is possible to use different version combinations. The table above shows the combination Couchbase tests and supports.

Contributing

Couchbase welcomes community contributions to the Spark connector. The Spark connector source code is available on GitHub and contains instructions to contribute.

Download and API Reference All production-ready Couchbase Spark connector artifacts are downloadable through Maven Central. Prerelease versions are available through our Couchbase Maven repository for easy consumption. Downloads via alternative methods such as Spark Packages are also available.

Getting Started To get started with the Couchbase Spark connector quickly, learn how to add the connector to your Spark project and run simple queries.

Development Workflow Developing and Deploying Spark applications can be a challenge upfront. This section helps you through the development and deployment workflow.

Working With RDDs Spark operates on resilient distributed datasets (RDDs). When you need to extract data out of Couchbase, the Couchbase Spark connector creates RDDs for you. You can create and persist RDDs by using key-value pairs, views, or N1QL.

Spark SQL Integration Spark SQL integration depends on N1QL, which is available in Couchbase Server 4.0 and later. To use Spark SQL queries, you need to create and persist DataFrames/Datasets via the Spark SQL DataFrame/Dataset API.

Spark Streaming Integration The Couchbase Spark connector works with Spark Streaming by using the Couchbase Server replication protocol (called DCP) to receive mutations on the server side as they happen and provide them to you in the form of a DStream. The DStream is the primary format used by Spark Streaming. You can create and persist DStreams.

Structured Streaming Integration Couchbase allows you to integrate with Spark Structured Streaming as a Source as well as a Sink, making it possible to query incoming data in a structural and efficient manner.

Java API In addition to the primary Scala API, the connector provides convenience APIs when accessed from Java.

Using the Spark Shell The interactive shell can be used together with the couchbase connector for quick and easy data exploration.

Release Notes Release notes for the 2.0 version of the Spark Connector.