BigQuery + Colaboratory setup in 5 mins

For data analysis and exploration, Jupyter/IPython notebooks has often been the tool of choice for its ease in sharing work and explaining the thought process. If your data resides mainly on BigQuery, there are a couple of steps required to set up the authentication to access your data directly on the notebook. This can sometimes get a little confusing and messy.

With Colaboratory, connecting Google Cloud Platform products to a notebook on a cloud that teammates can access and collaborate on in real time is now really easy. Colaboratory is basically Jupyter notebooks on Google Drive with couple more cool features (free TPUs, anyone?). And just with other Google Drive products, this is free! This post is about how you can connect BigQuery to Colab in a few minutes and dive straight into the data.

1. Add the Colaboratory add-on to your Google Drive

If you have not added Colab to Google Drive before, clicking on New > More will lead you to this set of dropdown options. Click on Connect more apps to connect Colab.

Search for Colaboratory and click on + CONNECT.

Add Colaboratory to Google Drive.

If you go back to New > More, you will now see Colaboratory as an option. Click on it to start a new notebook.

You should see Colaboratory as an option on the More dropdown.

This is how it will look like:

Your first Colab notebook!

2. Make sure BigQuery API is Enabled

Open your Google Cloud Platform console. Make sure you are on the correct project (active project is shown beside ‘Google Cloud Platform’ on the top left). From the search bar at the top center of the page, search for BigQuery API to go to the BigQuery API page.

Search for bigquery api from the cloud console.

If you see the option to ‘DISABLE API‘, it means BigQuery API has been enabled and you do not need to take any action.

On the other hand if you see the option to ‘ENABLE‘, it means BigQuery API is not enabled on your project yet. Just click on ENABLE, wait a few minutes till everything has loaded.

BigQuery API is not enabled on your project yet. Click on ENABLE to enable it.

3. Google Cloud Platform authentication

1. control + enter OR command + enter OR clicking on the arrow to the left of the cell runs the cell only.

2. shift + enter runs the cell and and moves the cursor to the next, if there is one. If there isn’t a next cell it will create a new cell and move the cursor there.

3. option + enter runs the cell, inserts one cell directly below it, and moves the cursor to that new cell.

Soon you should see a prompt like the one below:

Click on the link to open the authentication page.

Click on the link to open the authentication page, which should look like this this screenshot below. Your signed in Google accounts will show up on that page too. Click on the account which is associated with the Google Cloud Platform project.

Your Google accounts will show up here. Click on the one associated with your Google Cloud Platform project, if you have multiple Google accounts.

The next page will inform you of the permissions that will be granted to the notebook. Click on Allow once you’ve viewed them.

Check the permissions that will be granted to the Google Cloud SDK.

Finally you’ll arrive at this page below where there is a string of characters. Click on the copy icon to the right, then navigate back to your notebook.

Copy the code (hidden here) and return to your notebook.

Paste into the field ‘Enter verification code:‘ and hit Enter. You’re now authenticated!

Authentication success!

4. Run queries and create dataframes with magics

The easiest way to create dataframes from BigQuery queries is via magics, which, as its name suggests, is magic! Using the documentation page’s example:

Start your cell with ‘%%bigquery‘, change ‘yourprojectid‘ to your project’s id, and ‘df‘ to the name of the variable you want to contain your dataframe. Enter your query, run the cell, and you have your dataframe! Really simple, not much to explain here.

Personally I needed more parameters than that, such as being able to toggle between legacy and standard SQL, and showing job details. Some of that is available in magics too, just by adding arguments to the command. To use legacy sql add the flag `–use_legacy_sql‘. By default job id and running times are cleared when the job is complete. To leave both there just add the flag ‘–verbose‘:

5. Get job details via Google Cloud Python Client for BigQuery

Sometimes you might want more details about the job, such as the amount of data processed and billed. For the full range of details related to the job you’ll need the BigQuery Python Client. Here’s a code snippet you can use to get the amount of data processed and billed for a specified job.

2 thoughts on “BigQuery + Colaboratory setup in 5 mins”

1. From the cloud console (https://console.cloud.google.com/), on the top left, to the right of ‘Google Cloud Platform’, you’ll see the active project’s name. Click on it and a pop up will appear. Within that pop up you’ll find a list (or one) project name with the corresponding project id. Select the ID of the project you are using BigQuery with.

2. From the BigQuery UI (https://console.cloud.google.com/bigquery), on the left sidebar, below ‘Search for your tables and datasets’ search bar, there’ll be your project ids listed (if you are in the new UI). If you are in the old UI, below the search bar is the project name. Hover over it and you’ll see your project id.