Method Detail

split

Splits the table based on keys that belong to tablets, known as "regions" in the HBase API.
The current implementation uses the HBase RegionLocator interface, which calls
BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest)
under the covers. A CloudBigtableIO.SourceWithKeys may correspond to a single region or a portion of
a region.

If a split is smaller than a single region, the split is calculated based on the assumption
that the data is distributed evenly between the region's startKey and stopKey. That
assumption may not be correct for any specific start/stop key combination.

getEstimatedSizeBytes

public long getEstimatedSizeBytes(org.apache.beam.sdk.options.PipelineOptions options)
throws IOException

Gets an estimated size based on data returned from
BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest).
The estimate will be high if a Scan is set on the
CloudBigtableScanConfiguration; in such cases, the estimate will not take the Scan
into account, and will return a larger estimate than what the CloudBigtableIO.Reader
will actually read.

getSampleRowKeys

Performs a call to get sample row keys from
BigtableDataClient.sampleRowKeys(com.google.bigtable.repackaged.com.google.bigtable.v2.SampleRowKeysRequest)
if they are not yet cached. The sample row keys give information about tablet key boundaries
and estimated sizes.