I’ve been using Google’s Timeline feature to track my locations since 2011
when it was called Google Latitude. When I first signed up, I thought it was an
interesting way to share your location with family and friends, but I pretty much
forgot about it until recently.

I decided that it might be a good time to take
the information that Google has been gathering for the past seven years and
start a new project to see what I could learn about myself.

The first issue that I ran into is the size of the GeoJSON file that Google
lets you export.

My exported file was around 400 MB once it was unzipped, and I don’t really
want to load the entire file into R to work with, so I thought this would be
a great use case for Google’s Cloud Platform.

I could easily store the file in Cloud Storage and then load the data into
Google BigQuery’s data warehouse to analyze with R.

Writing SQL is similar to the dplyr package in R. I also get the added benefit of offloading
all of the processing power to Google BigQuery which I can run for free since I have the
free tier of GCP.

Today I will share the bash script that I’ve written to automate extracting,
uploading and storing the information.

Let’s take a look at the script: Before you begin, be sure to create a folder
for the project and download your takeout file here.

The script will use JQ to parse the JSON file and save it as a newline delimited file, which is BigQuery’s format for using JSON.

It will upload the file to Google Cloud Storage for long-term storage. This part uses the gsutil feature of the Google Cloud SDK to load the data into a storage bucket. The flag -mq makes the process quiet and let’s you upload multiple files in parallel.

Finally the script upload the file from cloud storage into BigQuery for analyzing. For BQ, I’m using two flags: --source_format=NEWLINE_DELIMITED_JSON and --autodetect. The autodetect feature will attempt to get the schema from the JSON file and set the field names for you, which I find super helpful, especially when you have a lot of key value pairs. The other flag just identifies which type of file you’re uploading. In this case it’s a newline delimited file which we created in step two.

This script is written for a Mac, but could easily be rewritten for other
machines. It also uses the Google Cloud SDK, so I’m going to assume you already
have that set up on your machine locally.

In the next post, we’ll dive into the data and take an initial look at things through
an exploratory data analysis using R.