Backing Up and Restoring

This page describes how to backup and restore the data your application stores
in Google Cloud Datastore.

Before you begin

If you haven't already, create a storage bucket for your project.
Optionally, check that the App Engine default service account for the project
has access to the bucket via the Access Control
List. This may be set already by default and can be overridden, if
needed.

Notice that a backup name is supplied and that it includes a datestamp.

You must change this value if you make more than one backup per day because
a backup is not made if a backup of the same name already exists.

Notice that the default queue is used for the backup job; you can use this
in most cases.

If you use a non-default queue for backup/restore, you can only specify the
target ah-builtin-python-bundle in queue.yaml. You cannot use any other
targets.

Select Google Cloud Storage as the backup storage location.

When you choose Cloud Storage, you are prompted for the bucket name
where the backups are to be stored, in the format [BUCKET_NAME]. You can
optionally specify the bucket name suffixed with a directory structure, such
as [BUCKET_NAME]/backups/foo): If those folders don't already
exist, they will be created.

Note: you can alternatively preface the bucket name with /gs/,
for example, /gs/[BUCKET_NAME]

Start the backup jobs by clicking Backup Entities. Notice that a job
status page is displayed.

Click Back to Datastore Admin to see the backup status.

After the backup is complete, if you disabled Cloud Datastore
writes, re-enable them.

Aborting a backup

If backup jobs are currently running, they appear in a Pending Backups list
in the Cloud Datastore Admin screen.

In the Pending Backups section, select the backup in the list and click
Abort.

When you abort a backup job, App Engine attempts to delete backup data that has
been saved up to that point. However, in some cases, some files can remain after
the abort. You can locate these files in the location you chose for your backups
in Google Cloud Storage and safely delete them after the abort completes. The
names of such files start with the following pattern:
datastore_backup_[BUCKET_NAME].

Note: If you abort or delete a backup, App Engine does not delete files from
Google Cloud Storage. You must manually delete backup files from Google Cloud
Storage.

Finding information about a backup

You might want to find out details about a backup, such as which entity kinds
it contains, where it was saved in Google Cloud Storage, and its starting and
ending time. To display this backup information:

In the list of available backups, select the backup that you want to restore from.

Click Restore.

In the advisory page that is displayed, notice the list of entities with
checkboxes. By default, all of the entities will be restored. Uncheck the
checkbox next to each entity that you don't want to restore.

Also in the advisory page, notice that the default queue, with its
pre-configured performance settings, is used for the restore job. Change this
to another queue that you have configured differently if you need different
queue performance characteristics, making sure the queue chosen does not have
any target specified in queue.yaml other than ah-builtin-python-bundle.

Start the restore by clicking Restore. Notice that a job status page is
displayed.

Note: The permissions set in the previous steps are not retroactive to
existing backups, so the target application is not able to access those
earlier backups. The target application can access only backups made after
it was given permissions.

(Recommended) Disable Cloud Datastore writes for your target
application to avoid conflicts between the restored data and any new data
written to Cloud Datastore.

In the textbox next to the button labelled Import Backup Information
specify the source application's bucket containing the backup, in the format
/gs/[BUCKET_NAME].
Alternatively, supply the file handle for a specific backup: To view the
file handle for a backup, open the Admin page for the source
application, select the backup, and click Info. You should see the file
handle next to the label Handle.

Click Import Backup Information.
The resulting selection page shows the available backups for the bucket you
specified, unless you specified a backup by its handle. Select the desired
backup and click one of the following:

Add to backup list if you want this backup to be retained
in the list of available backups for your application.

Restore from backup if you want to restore from this backup
but do not want the backup displayed in the list of available backups
for your application.

In the advisory page that is displayed, notice the list of entities with
checkboxes. By default, all of the entities will be restored. Uncheck the
checkbox next to each entity that you don't want to restore.

Also in the advisory page, notice that the
default queue, with its pre-configured performance settings, is used for the
restore job. Change this to another queue that you have configured
differently if you need different queue performance characteristics.

Start the restore by clicking Restore. Notice that a job status page is
displayed.

After the restore is complete, if you disabled Cloud Datastore
writes, re-enable them.

Viewing resource usage

Very frequent backups often lead to higher costs. When you run a
Cloud Datastore Admin job, you are actually running an underlying
MapReduce job. MapReduce jobs cause frontend instance hours to increase on top
of Storage operations and Storage usage.

Use the pulldown menus at the top of the page to select the default service
and the ah-builtin-python-bundle version.

Backup and restore considerations

The backup and restore feature is intended to help you recover from accidental
deletes of data or to enable you to export data. You can back up all entities or
just the selected kinds of entities, and you can restore from one of these
backups when you need to.

Backups are saved to Cloud Storage.

Note that the backup does not contain any indexes. When you restore, the
required indexes are automatically rebuilt using the index definitions you
uploaded with your application.

You can also use backup files to export your data to other Google Cloud Platform
services, such as BigQuery.

Restores do not assign new IDs to entities. Restores use the IDs that existed at
backup-time and overwrite any existing entity with the same ID. During a
restore, the IDs are reserved as the entities are being restored. This should
prevent ID collisions with new entities if writes are enabled while a restore is
running. New entities added since the backup are retained.

You can restore all data from a backup or you can restore specific entity kinds
from the backup. In addition, you can also use this feature to restore a backup
of one application's data to some other application, provided that you use
Cloud Storage for your backups.

Note: The backup process does not include values stored in Blobstore or
Cloud Storage. Also, the Cloud Datastore BlobInfo records that
correspond with Blobstore values are not included in the backup.
Cloud Datastore entity properties of the type BlobKey are backed up,
even though they rely on the corresponding Blobstore values having their
original keys. Cloud Datastore property values of type Blob are
unrelated to the Blobstore, and are included in the backup.