Quota requirements

The quota requirements for export jobs, by GCP service, are
as follows:

Cloud Spanner: No additional nodes are
required to export a database, though you might need to add more nodes
so that your job finishes in a reasonable amount of time. See
Optimizing jobs for more details.

Cloud Storage: You must create a bucket for
your exported files if you do not already have one. You can do this in the
GCP Console, either through the Cloud Storage page or
while creating your export through the Cloud Spanner page. You do
not need to set a size for your bucket.

Cloud Dataflow: Export jobs are subject to the same CPU,
disk usage, and IP address
Compute Engine quotas as
other Cloud Dataflow jobs.

Compute Engine: Before running your export job, you must set up
initial quotas for
Compute Engine, which Cloud Dataflow uses. These quotas
represent the maximum number of resources that you allow
Cloud Dataflow to scale your job to. Recommended starting values are:

CPUs: 200

In-use IP addresses: 200

Standard persistent disk: 50 TB

Generally, you do not have to make any other adjustments.
Cloud Dataflow provides autoscaling so that you only pay for the
actual resources used during the export. If your job can make use of
more resources, the Cloud Dataflow UI displays a warning icon. The
job should finish even if there is a warning icon.

Cloud IAM requirements

To export a database, you also need to have Cloud IAM roles with
sufficient permissions to use all of the services involved in an export
job. For information on granting roles and permissions, see
Applying IAM roles.

To export a database, you need the following roles:

At the GCP project level:

Cloud Spanner Viewer

Dataflow Admin

Storage Admin

At the Cloud Spanner database or instance level, or at the
GCP project level:

Cloud Spanner Reader

Exporting a database

After you satisfy the quota and IAM requirements described above, you can
export an existing Cloud Spanner database.

Note: It is safe to export a database that is in use. However, any changes you
make after Cloud Dataflow creates a Cloud Spanner transaction are not
captured by the export, because all of the exported data is from the same
database snapshot.

To avoid Cloud Storage charges for files your
failed export job created, delete the folder and its files. See Viewing your
export for information on how to find the folder.

Viewing your export in Cloud Storage

To view the folder that contains your exported database in the
GCP Console, navigate to the Cloud Storage browser and click
on the bucket you previously selected:
Go to the Cloud Storage browser

The bucket now contains a folder with the exported database inside. The folder
name begins with your instance's ID, database name, and the timestamp of your
export job. The folder contains:

A spanner-export.json file

A TableName-manifest.json file for each table in the
database you exported.

One or more TableName.avro-#####-of-#####
files. The first number in the extension
.avro-#####-of-##### represents the index
of the Avro file, starting at zero, and the second represents the number of
Avro files generated for each table.

For example, Songs.avro-00001-of-00002 is the second of two files that
contain the data for the Songs table.

Choosing a region for your export job

You might want to choose a different region based on whether your
Cloud Spanner instance uses a regional or multi-region configuration. To avoid
network egress charges, choose a region that overlaps with your
Cloud Spanner instance location.

Regional instance configurations

If your Cloud Spanner instance configuration is regional, choose the
same region for your export job to take advantage of free egress within the same
region.

If the same region is not available, charges will apply. Refer to the
Cloud Spanner network egress pricing to choose a region that will incur
the lowest network egress charges.

Multi-region instance configurations

If your Cloud Spanner instance configuration is multi-region, choose one
of the regions that make up the multi-region configuration to take advantage of
free egress within the same region.

If an overlapping region is not available, egress charges will apply. Refer to
the Cloud Spanner network egress pricing to choose a region that will
incur the lowest network egress charges.

Viewing or troubleshooting jobs in the Cloud Dataflow UI

After you start an export job, you can view details of the job, including
logs, in the Cloud Dataflow section of the GCP Console.

In this situation,
increasing the quotas for
CPUs, in-use IP addresses, and standard persistent disk might shorten the
run time of the job, but you might incur more Compute Engine
charges.

Check the Cloud Spanner CPU utilization: If you see that the CPU
utilization for the instance is over 65%, you can
increase the number of nodes in that instance. The extra nodes add more
Cloud Spanner resources and the job should speed up, but you incur more
Cloud Spanner charges.

Factors affecting export job performance

Several factors influence the time it takes to complete an export job.

Cloud Spanner database size: Processing more data takes more time
and resources.

Cloud Spanner database schema (including indexes): The number of tables,
the size of the rows, and the number of secondary indexes influence the time it
takes to run an export job.

Data location: Data is transferred between Cloud Spanner and
Cloud Storage using Cloud Dataflow. Ideally all three
components are located in the same region. If the components are not in the same
region, moving the data across regions slows the job down.

Number of Cloud Dataflow workers: By using autoscaling,
Cloud Dataflow chooses the number of workers for the job depending on
the amount of work that needs to be done. The number of workers will, however,
be capped by the quotas for CPUs, in-use IP addresses, and standard persistent
disk. The Cloud Dataflow UI displays a warning icon if it encounters
quota caps. In this situation, progress is slower, but the job should still
complete.

Existing load on Cloud Spanner: An export job typically
adds a light load on a Cloud Spanner instance. If the
instance already has a substantial existing load, then the export job runs
more slowly.

Number of Cloud Spanner nodes: If the CPU utilization for the instance
is over 65%, then the job runs more slowly.