Google BigQuery

The Google BigQuery origin executes a query job
and reads the result from Google BigQuery.

The origin submits the query that you define, and then Google BigQuery runs the query as
an interactive query. When the query is complete, the origin reads the query results to
generate records. The origin runs the query once and then the pipeline stops when it
finishes reading all query results. If you start the pipeline again, the origin submits
the query again.

When you configure the origin, you define the query to run using valid BigQuery standard
SQL or legacy SQL syntax. By default, BigQuery writes all query results to a temporary,
cached results table. You can choose to disable retrieving cached results and force
BigQuery to compute the query result.

You also define the project and credentials provider to use to connect to Google
BigQuery. The origin can retrieve credentials from the Google Application Default
Credentials or from a Google Cloud service account credentials file.

The origin can generate events for an event stream. For
more information about dataflow triggers and the event framework, see Dataflow Triggers Overview.

Credentials

When the Google BigQuery origin executes
a query job and reads the result from Google BigQuery, it must pass credentials to
Google BigQuery. Configure the origin to retrieve the credentials from the Google
Application Default Credentials or from a Google Cloud service account credentials
file.

Default Credentials Provider

When configured to use the Google Application
Default Credentials, the origin checks for the credentials file defined in the
GOOGLE_APPLICATION_CREDENTIALS environment variable. If the
environment variable doesn't exist and Data Collector is
running on a virtual machine (VM) in Google Cloud Platform (GCP), the origin uses the
built-in service account associated with the virtual machine instance.

On the Credentials tab for the stage,
select Default Credentials Provider
for the credentials provider.

Service Account Credentials File (JSON)

When configured to use the Google Cloud service account credentials file, the origin
checks for the file defined in the origin properties.

Complete the following steps to use the service account credentials file:

Generate a service account credentials file in JSON
format.

Use the Google Cloud Platform Console or
the gcloud command-line tool to
generate and download the credentials file. For more
information, see generating a
service account credential in the Google
Cloud Platform documentation.

Store the generated credentials file on the Data Collector machine.

As a best practice, store the file in the
Data Collector resources directory,
$SDC_RESOURCES.

On the Credentials tab for the stage,
select Service Account Credentials
File for the credentials provider and
enter the path to the credentials file.

BigQuery Data Types

The following table lists the data types that the
Google BigQuery origin supports and the Data Collector
data types that the origin converts them to:

BigQuery Data Type

Data Collector Data Type

Boolean

Boolean

Bytes

Byte Array

Date

Date

Datetime

Datetime

Float

Double

Integer

Long

Numeric

Decimal

String

String

Time

Datetime

Timestamp

Datetime

Datetime Conversion

In Google BigQuery, the Datetime, Time, and Timestamp data types have microsecond
precision, but the corresponding Datetime data type in Data Collector has
millisecond precision. The conversion between data types results in some precision loss.

To preserve potentially lost precision during data type conversion, the Google Big Query
origin generates the bq.fullValue field attribute that stores a string
containing the original value with microsecond precision. You can use the
record:fieldAttribute or
record:fieldAttributeOrDefault functions to access the information
in the attribute.

Generated Field Attribute

Description

bq.fullValue

Provides the original precision for Datetime, Time, and Timestamp
fields.