Google BigQuery is designed to house some truly monstrous datasets, sometimes hosting tables billions of rows. It is therefore no surprise that Google has implemented the handy capability of partitioned tables, which allow otherwise daunting datasets to be broken up into smaller, more manageable chunks without losing performance or scalability.

What is a Partitioned Table?

In the past, it was common for database administrators using BigQuery to split large datasets into smaller tables that were divided by date and/or time. For example, each day a table may be generated where the table name is suffixed with the current date (e.g. books_20160724, books_20160725, etc). While this method is functional, it creates a lot of headache to not only manage all these split tables, but queries rapidly become quite complex as the potential date ranges increase and crossover.

To resolve this, Google BigQuery introduced the partitioned table, which is essentially just a normal table, except that it is automatically partitioned for each date.

The _PARTITIONTIME Psuedo Column

Functionally, a partitioned table acts just like any other table, except it contains a special _PARTITIONTIME psuedo column which contains the date-based timestamp representing when a particular record was loaded. This field uses UTC time and is the number of microseconds since the unix epoch.

Creating a Partitioned Table

To create a partitioned table, you must issue a call to the Tables.insert API method. This can be done either through the API or through the command-line tool.

The critical point here is that you do not need to specify a schema for the partitioned table at this time, but instead are effectively telling BigQuery to create a special partitioned table, and then await the insertion of data at a later time which will give it a proper schema.

Loading Data in a Partitioned Table

Loading data into the partitioned table is no different than loading data into any other table in BigQuery. For the purposes of this example, we’re just using the WebUI and grabbing some data from the [bigquery-public-data:samples.github_timeline] dataset and setting our Destination Table to the previously created bookstore-1382:exports.partition table.

Querying a Partitioned Table

Now that we have loaded some data into our partitioned table, we can make use of the “automagical” nature when querying our table.

Performing a basic query with no regard for the _PARTITIONTIME is standard fare, but often we’ll want to retrieve just the data that we loaded into our table for a particular date or daterange. To do this, we simply add a WHERE clause and compare the _PARTITIONTIME value to the daterange in question.

For example, here we’re retrieving only records that were loaded today (which, therefore, exist in the partition for today’s date):

The options are limitless for querying the partitioned data. Note that if we want to actually SELECT the _PARTITIONTIME value (as seen in the above example), since it’s a psuedo column we must assign it to an alias name before output (e.g. SELECT _PARTITIONTIME pt).