YouTube Reporting API - Data Model

Important: An upcoming policy change affects the length of time that you will be able to retrieve YouTube Reporting API reports. After the change, daily API reports and backfill reports will be available for 60 days (instead of 180 days) from the time that they are generated. Historical data reports will be available for 30 days from the time they are generated.

This change is currently planned to go into effect in July 2018, and the new policy applies globally to all reports and reporting jobs. See the API's revision history for complete details about this change.

The YouTube Reporting API supports predefined reports that contain a comprehensive set of YouTube Analytics data for a channel or content owner. These reports allow you to download the bulk data sets that you can query with the YouTube Analytics API or in the Analytics section of the Creator Studio.

Overview

Report fields in these reports are characterized as either dimensions or metrics:

Dimensions are common criteria that are used to aggregate data, such as the date on which an action occurred or the country where the users were located. In a report, each row of data has a unique combination of dimension values.

The report also contains a lot of metrics, such as views, likes, and average_view_duration_seconds. After retrieving and importing the report, an application could make many different calculations based on common dimension values.

Retrieving YouTube Analytics reports

Step 1: Retrieve authorization credentials

All YouTube Reporting API requests must be authorized. The Authorization guide explains how to use the OAuth 2.0 protocol to retrieve authorization tokens.

YouTube Reporting API requests use the following authorization scopes:

Step 2: Identify the report to retrieve

Call the API's reportTypes.list method to retrieve a list of reports that can be generated for the channel or content owner. The method returns a list of report IDs and names. Capture the id property value for the reports that you want to have generated. For example, the ID of the basic user activity report for channels is channel_basic_a1.

Step 3: Create a reporting job

YouTube does not begin to generate your report until you create a reporting job for that report. (As such, reports are only generated for the channels and content owners that actually want to retrieve them.)

To create a reporting job, call the API's jobs.create method. Set the following values in the request body:

Set the reportTypeId property's value to the report ID that you retrieved in step 2.

Set the name property's value to the name that you want to associate with the report.

The API response to the jobs.create method contains a Job resource, which specifies the ID that uniquely identifies the job. You can start retrieving the report within 48 hours of the time that the job is created, and the first available report will be for the day that you scheduled the job.

For example, if you schedule a job on September 1, 2015, then the report for September 1, 2015, will be ready on September 3, 2015. The report for September 2, 2015, will be posted on September 4, 2015, and so forth.

Step 4: Retrieve the job ID

Note: If your application stored the job ID returned in step 3, then you can skip this step.

Call the jobs.list method to retrieve a list of scheduled jobs. The reportTypeId property in each returned Job resource identifies the type of report that that job generates. Your application needs the id property value from the same resource in the following step.

Step 5: Retrieve the report's download URL

Call the jobs.reports.list method to retrieve a list of reports created for the job. In the request, set the jobId parameter to the job ID of the report that you want to retrieve.

Tip: Use the createdAfter parameter to indicate that the API should only return reports created after a specified time. This parameter can be used to ensure that the API only returns reports that you have not already processed.

The API response contains a list of Report resources for that job. Each resource refers to a report that contains data for a unique 24-hour period. Note that YouTube does generate downloadable reports for days on which no data was available. Those reports contain a header row but do not contain additional data.

The resource's startTime and endTime properties identify the time period that the report's data covers.

The resource's downloadUrl property identifies the URL from which the report can be fetched.

The resource's createTime property specifies the date and time when the report was generated. Your application should store this value and use it to determine whether previously downloaded reports have changed.

Step 6: Download the report

Send an HTTP GET request to the downloadUrl obtained in step 5 to retrieve the report.

You can reduce the bandwidth needed to download reports by enabling gzip compression on download requests. While your application will need additional CPU time to uncompress API responses, the benefit of consuming fewer network resources usually outweighs that cost.

To receive a gzip-encoded response, set the Accept-Encoding HTTP request header to gzip as shown in the following example:

Accept-Encoding: gzip

Processing reports

Best practices

Applications that use the YouTube Reporting API should always follow these practices:

Use a report's header row to determine the ordering of the report's columns. For example, do not assume that views will be the first metric returned in a report just because it is the first metric listed in a report description. Instead, use the report's header row to determine which column contains that data.

Keep a record of the reports you have downloaded to avoid repeatedly processing the same report. The following list suggests a couple of ways to do that.

When calling the reports.list method, use the createdAfter parameter to only retrieve reports created after a certain date. (Omit the createdAfter parameter the first time you retrieve reports.)

Each time you retrieve and successfully process reports, store the timestamp corresponding to the date and time when the newest of those reports was created. Then, update the createdAfter parameter value on each successive call to the reports.list method to ensure that you are only retrieving new reports, including new reports with backfilled data, each time you call the API.

As a safeguard, before retrieving a report, also check to ensure that the report's ID is not already listed in your database.

Store the ID for each report that you have downloaded and processed. You can also store additional information like the date and time when each report was generated or the report's startTime and endTime, which together identify the period for which the report contains data. Note that each job will likely have many reports since each report contains data for a 24-hour period.

Use the report ID to identify reports that you still need to download and import. However, if two new reports have the same startTime and endTime property values, only import the report with the newer createTime value.

Reports contain IDs associated with YouTube resources, and you can use the YouTube Data API to retrieve additional metadata for those resources. As noted in the YouTube API Services Developer Policies (sections III.E.4.b through III.E.4.d), API clients must either delete or refresh stored resource metadata from that API after 30 days.

Report characteristics

API reports are versioned .csv (comma-separated values) files that have the following characteristics:

Each report contains data for a unique 24-hour period lasting from 12:00 a.m. through 11:59 p.m. Pacific time. As such, in any given report, the date dimension value is always the same.

Reports are updated daily.

YouTube does generate downloadable reports for days on which no data was available. Those reports will contain a header row but will not contain additional data.

Important: An upcoming policy change affects the length of time that you will be able to retrieve YouTube Reporting API reports. The change is currently scheduled to go into effect in July 2018, and it applies globally to all reports and reporting jobs.

Prior to the change, API reports will be available for up to 180 days from the time that they are generated.

After the change, API reports will be available for 60 days from the time that they are generated with the exception of historical data generated for new reporting jobs. Reports that are already more than 60 days old will no longer be accessible when the policy change becomes effective.

After the change, reports containing historical data will be available for 30 days from the time that they are generated. Reports that contain historical data and are already more than 30 days old will no longer be accessible when the policy change becomes effective.

Report data is not filtered. As such, a channel report contains all data for a channel's videos or playlists with the exception noted in the following paragraph related to deleted resources. Similarly, a content owner report contains all data for the content owner's channels (videos, playlists, ad performance, etc) with the following exception.

Although report data is not filtered, reports that contain data for a time period on or after June 1, 2018, will not contain any references to YouTube resources that were deleted at least 30 days prior to the date the report was generated.

Report data is not sorted.

Reports omit rows that do not have metrics. In other words, rows that do not have any metrics are excluded from the report. For example, if a video has no views in Albania on a particular day, that day's report will not contain rows for Albania.

Reports do not contain rows that provide summary data for metrics, such as the total number of views for all of a channel's videos. You can calculate those total values as the sum of the values in the report, but that sum might not include metrics for deleted videos, as noted above. You can also use the YouTube Analytics API to retrieve total counts. The YouTube Analytics API does return total values that include metrics for deleted resources even though those resources are not explicitly referenced in API responses.

Backfill data

Backfill data refers to a data set that replaces a previously delivered set. When a backfill data report is available, your application should retrieve the new report and update your stored data to match the revised data set. For example, your application could delete the previous data for the time period covered in the report and then import the new data set.

If YouTube has backfill data, it generates a new report with a new report ID. In that case, the report's startTime and endTime property values will match the start and end times of a report that was previously available and that you might have previously downloaded.

Backfill reports that contain data for a time period on or after June 1, 2018, will not contain any references to YouTube resources that were deleted at least 30 days prior to the date the report was generated.

Historical data

When you schedule a new reporting job, YouTube generates historical reports covering a time period prior to when you created the job. Thus, in this documentation, historical data refers to a report that contains data for a time period before the reporting job was scheduled.

Important: An upcoming policy change affects the length of time for which that historical report data is generated. The change is currently scheduled to go into effect in July 2018, and it applies globally to all reports and reporting jobs.

Prior to the policy change, when you schedule a new reporting job, YouTube will generate reports covering the 180-day period prior to the time that you created the job.

After the policy change, when you schedule a new reporting job, YouTube will generate reports from that day forward and covering the 30-day period prior to the time that you created the job.

Historical reports are posted as soon as they are available. Typically, all of the historical data is posted for a job within a couple of days. As explained in the Report characteristics section, after a policy change scheduled to go into effect in July 2018, reports containing historical data will be available for 30 days from the time that they are generated. Reports that contain non-historical data will be available for 60 days after the policy change.

Data anonymization

To ensure the anonymity of YouTube viewers, values for some dimensions are returned only if a metric in the same row meets a certain threshold.

For example, in the video traffic source report for channels, each row contains a number of dimensions, including traffic_source_type and traffic_source_detail. Each row also contains various metrics, including views. In rows that describe traffic that originated from a YouTube search, the traffic_source_detail dimension identifies the search term that led to the traffic.

In this example, the following rules apply:

The traffic source report identifies the query term (traffic_source_detail) only if it led to at least a certain number of views of a particular video on a particular day. In this case, views is the metric, video_id is the aggregating dimension, and traffic_source_detail is the anonymized dimension.

The report includes an additional row that aggregates metrics for all traffic_source_detail values that do not meet the view count threshold. That row reports the total number of views associated with those query terms but does not identify the terms themselves.

The following tables illustrate these rules. The first table contains a hypothetical set of raw data that YouTube would use to generate a traffic source report, and the second table contains the report itself. In this example, the view count threshold is 10, meaning the report only identifies a search term if it led to at least 10 views of a particular video on a particular day. (Actual thresholds are subject to change.)

Raw YouTube search traffic data for a video

Assume that the data below describes YouTube search traffic to a particular video on a particular day.

search term

views

estimated minutes watched

gangnam style

100

200

psy

15

25

psy gangnam

9

15

oppa gangnam

5

8

horse riding dance

2

5

Sample traffic source report

The following table shows an excerpt from the traffic source report that YouTube would generate for the raw data in the preceding section. (The actual report would contain more dimensions and metrics.) In this example, the report identifies search terms only if they led to at least 10 views. Actual thresholds are subject to change.

In the report's third row, the trafficSourceDetail dimension value is NULL. The views and estimatedMinutesWatched metrics contain the combined views and minutes watched for the three search terms that generated fewer than 10 views.

trafficSourceDetail

views

estimatedMinutesWatched

gangnam style

100

200

psy

15

25

NULL

16

28

Dimensions subject to anonymization

The following table identifies dimension values that are anonymized if associated metric values do not meet a certain threshold. In each case, the metric's value is aggregated over another dimension. For example, if the metric is views and the aggregating dimension is video_id, then the dimension value is anonymized unless the video video was viewed a certain number of times.

The code sample calls the jobs.list method to retrieve a list of reporting jobs. It then calls the reports.list method with the jobId parameter set to a specific job ID to retrieve reports created by that job. Finally, the sample prints out the download URL for each report.

The code sample calls the jobs.list method to retrieve a list of reporting jobs. It then calls the reports.list method with the jobId parameter set to a specific job ID to retrieve reports created by that job. Finally, the sample prints out the download URL for each report.

The code sample calls the jobs.list method to retrieve a list of reporting jobs. It then calls the reports.list method with the jobId parameter set to a specific job ID to retrieve reports created by that job. Finally, the sample prints out the download URL for each report.