Bigtable I/O

The documentation on this page applies only to the Dataflow SDK 1.x for
Java.

The Dataflow SDK 2.x for Java and the Dataflow SDK for Python are based on Apache Beam.
See the documentation for those SDKs.

The Dataflow SDKs provide an API for reading data from, and writing data to,
Google Cloud Bigtable. The BigtableIO source and
sink let you read or write a PCollection of Bigtable Row objects from a
given Bigtable.

The Dataflow Bigtable I/O connector is considered experimental and may change in
backwards-incompatible ways in future versions of the Dataflow SDK for Java.

You can also use the Dataflow HBase Connector,
provided as part of the Bigtable HBase Client, to read from and write to Bigtable in your
pipeline.

Setting Bigtable Options

When you read from or write to Bigtable, you'll need to provide a table ID and a set of Bigtable
options. These options contain information necessary to identify the target Bigtable cluster,
including:

Project ID

Cluster ID

Zone ID

The easiest way to provide these options is to construct them using
BigtableOptions.Builder in the package
com.google.cloud.bigtable.config.BigtableOptions:

If you want to scan a subset of the rows in the specified Bigtable, you can provide a Bigtable
RowFilter object. If you provide a
RowFilter, BigtableIO.read() will return only the Rows that
match the filter:

Writing to Bigtable

To write to Bigtable, apply the BigtableIO.write() transform to the
PCollection containing your output data. You'll need to specify the table ID and
BigtableOptions using .withTableId and
.withBigtableOptions, respectively.

Formatting Bigtable Output Data

The BigtableIO data sink performs each write operation as a set of row mutations to
the target Bigtable. As such, you must format your output data as a
PCollection<KV<ByteString, Iterable<Mutation>>>. Each element
in the PCollection must contain:

The key of the row to be written as a ByteString.

An Iterable of Mutation objects that represent a series of
idempotent row mutation operations.