Data Scientist in Van Wyck, SC | Taking it one day at a time

Analyze Your Google Location History: Time & Distance

In my last post, I did an exploratory data analysis of my Google location
history.

One of things that I wanted to calculate and measure is the difference between
different coordinates. For example, how far is it between
48.1265044, 11.6593258 and 40.555608, -105.098397? It turns out that
answering that question is a bit harder than I originally guessed. As you know
the earth isn’t flat (some people would have you believe otherwise) so measuring
the distance between two points when using latitude and longitude will have to
be done using the Haversine Formula.

I’m not going to pretend to understand the trigonometry that goes into
calculating this, so the best way that I can explain it is measuring
“as the crow flies.”

While this is good for calculating a fixed point, I want to calculate the
distance between each previous point. For this, I’ll use LAG() and OVER()
and then rewrite the
Haversine Formula to
properly calculate the distance (I think). I also changed 111.045 to 69.0 to
convert kilometers to miles.

Here’s my example:

Since there are 1,770,882 observations, I have no freaking clue how long it
would take R to run through something like this, with BigQuery, the query
is done in 13.3 seconds. That’s blazing fast!

Now that we have calculated distances between observations, let’s import it into
R and do some analysis.

How many miles did I travel between 2011 and 2018?

654,572.32 miles. Is that even possible?

How many miles did I travel by type?

activityType

distance

n

NA

333929.56

1065401

still

213682.44

546932

in vehicle

55706.42

33148

tilting

32686.28

65019

unknown

10505.78

35268

on foot

6513.16

15490

on bicycle

1268.23

866

exiting vehicle

280.45

963

Judging by the distance and the count of observations lets dive
deeper into NA and still which comprise the largest number of observations.

Originally I thought the discrepancy could be from the GPS in my phone trying
to locate itself by triangulation. It could be that Google’s algorithm wasn’t
able to properly figure out what I was doing. Here’s an example from one night
during one hour:

activityType

n

distance

in vehicle

6

6.25

tilting

5

1.38

NA

16

0.94

unknown

8

0.46

still

16

0.19

on foot

2

0.05

Other questions:

What was the most popular activity by hour of day?

How did my movements change over time?

What days of the week were most popular for activities?

Time is of the essence

I’m also intrigued by the idea of “time” with all of this information.
I think that it’s fascinating that Google is able to harness this information
to do things like:

Tell you when to leave for work and arrive on time based on traffic patterns

Tell you the most popular visited times for restaurants.

Tell you your average commute times at any given time.

Suggest what you should listen to based on your current location.

Additionally, I’ve used some simple CASE statements to augment the provided
information to give us seasons based on dates and part of day based on hour.

Since BigQuery only gives us time in UTC format, I’ll need to do some
additional work to clean up timestamps so they’re in the proper timezone.
I’m working on a way of doing this in R with googleway
which uses multiple Google maps API’s.