Scheduled Backups

Jacob Butcher, Doug AndersonApril 16, 2012

Alpha

This is an Alpha release of
Datastore Administration. This feature
might be changed in backward-incompatible ways and is not
recommended for production use. It is not subject to any SLA or deprecation
policy.

Introduction

Note: Use of this feature is limited to backups
started from the application's cron or task queue.

You can run scheduled backups for your application using the App Engine Cron
service. To do this for Python or Go apps, specify backup cron jobs in cron.yaml. For Java apps, specify
the backup cron job in cron.xml.
Currently there is no way to specify a scheduled backup programmatically.

Important: the ability to store backups using Blobstore will be removed soon, so
we strongly recommend you switch to using Cloud Storage for backups: see
Datastore Backup to Blobstore Turndown for details. However, you will still be able to restore from backups in Blobstore. The instructions provided in this page describe making scheduled backups using Cloud Storage only.

In your application directory, if you don't already have one, create a
cron.yaml file for a Python or Go app or a cron.xml
file for a Java app.

Add the backup cron entries. These specify the backup schedule, the set of
entities to back up, and the storage to be used for the backups, as described in
Specifying Backups in a Cron File.
Here are some examples:

Deploy this file with your app. (You can verify the Cron job you just
deployed by clicking Cron Jobs in the left nav pane.)

The backups will occur on the schedule you specified. While it runs, it will
show up in the Pending Backups list. After the backup is
complete, you can view it and use it in the list of available backups within the
Datastore Admin tab.

Specifying Backups in a Cron File

These are the fields to include in your cron file to perform scheduled backups:

description

This is the title that appears in the Cron Job list. It can be anything you wish.

name is an optional prefix that is prepended to the backup
name. It helps you identify your backups. If not supplied, the default "cron-"
will be used.

The kind field can appear one or more times. Each value
specifies an entity kind that you wish to back up. You must specify at least one
entity kind. In the Datastore Admin Console, the default is that all entity
kinds are backed up. With a cron backup, there is no such default: if you don't
specify a kind, it doesn't get backed up.

queue is optional. It specifies the task queue to be used. If
not supplied, the default task queue is used.

filesystem specifies the storage to be used for backups. Specify the value
"gs", which means that Google Cloud Storage will be used.

gs_bucket_name is required. It specifies the Cloud Storage bucket name used for
backup storage.

namespace is optional. When provided, only entities from the
selected namespace are included in the backup.

Note: The url cannot be longer than 2000 characters. As
shown in the cron.xml Java example above, you must use the HTML entity
"&amp;" to separate fields, rather than the ampersand character
("&") since that will be interpreted by XML.

schedule

This field is required: it defines the recurring schedule at which the
backup runs. For complete details, see the Schedule Format documentation for
Python or
Java).

target

This is required. It identifies the app version the cron backup job is to be
run on. You must use the value ah-builtin-python-bundle because
that is the version of your app that contains the Datastore Admin features that
the cron job needs to execute. Keep in mind that the cron backup job is running
against this version of your app, so you incur costs when the cron
backup job is running. (The ah-builtin-python-bundle version of
your app is enabled when you
enable
Datastore admin for your app.)

Very frequent backups often lead to higher costs.
When you run a Datastore Admin job, you are actually running underlying MapReduce jobs.
MapReduce jobs cause frontend instance hours to increase on top of Storage operations and Storage usage.
To keep an eye on your resource usage, click on the Dashboard link under Main in the left navigation.
On the top of the page select ah-builtin-python-bundle from the Version drop down menu.

Troubleshooting

When the scheduled backup runs, App Engine performs a GET using the backup
url. If the GET succeeds it results in http status 200. When it
fails it results in http status code 400. You can look at the logs to determine
whether a backup succeeded or failed by doing the following:

In the Admin Console for your application, click Logs in
the left navigation pane, under Main.

Locate the version pulldown menu, which is immediately to the right of the
application pulldown. The app pulldown should be showing the name of your app,
the version pulldown is most likely showing the number 1.

In the version pulldown, select ah-builtin-python-bundle to
display the logs.

Locate your backup job in the log to determine whether it succeeded or
failed. If there was a failure, in addition to the status code 400, there will
be an error message to help you determine the cause of the error.