Downloading BigQuery data to pandas using the BigQuery Storage API

The BigQuery Storage API provides fast access to data stored in
BigQuery. Use the BigQuery Storage API to
download data stored in BigQuery for use in analytics
tools such as the pandas library for
Python.

Objectives

In this tutorial you:

Download query results to a pandas DataFrame by using the
BigQuery Storage API from the IPython magics for BigQuery in
a Jupyter notebook.

Download query results to a pandas DataFrame by using the
BigQuery client library for Python.

Download BigQuery table data to a pandas DataFrame by using the
BigQuery client library for Python.

Download BigQuery table data to a pandas DataFrame by using the
BigQuery Storage API client library for Python.

Costs

BigQuery is a paid product and you will incur
BigQuery usage costs for the queries you run. The first 1 TB of
query data processed per month is free. For more information, see the
BigQuery Pricing page.

BigQuery Storage API is a paid product and you will incur usage costs for the
table data you scan when downloading a DataFrame. For more information, see the
BigQuery Pricing page.

Before you begin

Before you begin this tutorial, use the Google Cloud Platform Console to create or select
a project and enable billing.

Note: If you don't plan to keep the
resources you create in this tutorial, create a new project instead of
selecting an existing project. After you finish, you can delete the
project, removing all resources associated with the project and
tutorial.

# Download query results.
query_string = """
SELECT
CONCAT(
'https://stackoverflow.com/questions/',
CAST(id as STRING)) as url,
view_count
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE tags like '%google-bigquery%'
ORDER BY view_count DESC
"""
dataframe = (
bqclient.query(query_string)
.result()
# Note: The BigQuery Storage API cannot be used to download small query
# results, but as of google-cloud-bigquery version 1.11.1, the
# to_dataframe method will fallback to the tabledata.list API when the
# BigQuery Storage API fails to read the query results.
.to_dataframe(bqstorage_client=bqstorageclient)
)
print(dataframe.head())

Download table data using the BigQuery client library

Download all rows in a table by using the
list_rows
method, which returns a
RowIterator
object. Download rows by using the BigQuery Storage API by calling the
to_dataframe
method with the bqstorage_client argument.

Download table data using the BigQuery Storage API client library

Use the BigQuery Storage API client library directly for fine-grained control
over filters and parallelism. When only simple row
filters
are needed, a BigQuery Storage API read session may be used in place of a
query.

If there are any streams on the session, begin reading rows from it by using the
read_rows
method. Call the
to_dataframe
method on the reader to write the entire stream to a pandas DataFrame. For
better performance, read from multiple streams in parallel, but this code
example reads from only a single stream for simplicity.

Cleaning up

To avoid incurring charges to your Google Cloud Platform account for
the resources used in this tutorial:

Delete your project. You didn't create any BigQuery resources
this tutorial, but deleting your project removes any other resources that you
created.
Caution: Deleting a project has the following effects:

Everything in the project is deleted. If you used an existing project for
this tutorial, when you delete it, you also delete any other work you've done in the project.

Custom project IDs are lost.
When you created this project, you might have created a custom project ID that you want to use in
the future. To preserve the URLs that use the project ID, such as an appspot.com
URL, delete selected resources inside the project instead of deleting the whole project.

If you plan to explore multiple tutorials and quickstarts, reusing projects can help you avoid
exceeding project quota limits.