Telemetry spoofs Chrome's User-Agent field, and user_agent_type tells it whether to use a desktop, mobile, or tablet user agent. We generally only use one recording for all platforms.

The archive_data_file contains metadata about which pages are stored in which archive files. You need to specify its location, and it will be generated when recording the page set.

Note that the naming convention of page set file is to lowercase & underscore class name, e.g. MyPageSet should be stored as my_page_set.py.

Choosing a bucket

Telemetry has three Cloud Storage buckets you can put page sets in.

page_set.PUBLIC_BUCKET == 'chromium-telemetry'

page_set.PARTNER_BUCKET == 'chrome-partner-telemetry'

page_set.INTERNAL_BUCKET == 'chrome-telemetry'

Google wants to avoid legal issues with distributing third-party content, so to be safe, most recordings of websites on the public web go in PARTNER_BUCKET, which is accessible by Googlers and whitelisted Google partners. Recordings of Google-properties on the public web can go in PUBLIC_BUCKET, and recordings of unreleased or internal Google websites go in INTERNAL_BUCKET.

Record a page set

Use the record_wpr script to record a page set. Your command will look something like this:

A .wpr file containing the recorded data. This file is hidden from git status, which we'll explain next.

A .wpr.sha1 file containing the SHA1 hash of the .wpr file.

A .json file containing metadata about which .wpr files store which URLs.

Upload the recording to Cloud Storage

To avoid bloating everyone's Chromium checkouts, we avoid committing the large .wpr files to source control. Instead, we upload them to Cloud Storage and download them as needed. If you just want to use your recording locally, you can skip this step.