Becoming a Better Runner using Data

This summer I ran a lot – almost every day. It was great to start the morning getting my blood pumping and feeling energized to start the day. However, towards the end of this summer, I started to realize that my form might be unhealthy. Sometimes, depending on the run, my right knee would be more soar than my left.

So, at the beginning of the fall semester, much to my dissatisfaction, I stopped running until I could figure out what was causing the pain.

As recently confirmed by my doctor, my right leg is longer than my left. This is not abnormal but, combined with the fact that I have flat feet, it was most likely what was causing me pain. If I want to run again, I must be especially conscious of my form. A healthy form will minimize the potentially damaging impact on my knee and ensure I don’t feel that same pain. However, that raises the question, what factors lead to a deterioration in my form?

That brings me to this post. I almost always log my runs with Runkeeper. It’s a great app that provides a lot of statistics. It also lets you export the raw data. This is an absolute gold mine for data enthusiasts like myself.

With my renewed motivation to start running again and all this data, I decided to take a deeper look into how I was running before I stopped. Below, I’ll show how to determine what might be affecting my form and what changes I can make to help maintain a more healthy one.

Notice I import a module called custom_utils. This post is part of a larger project to keep track of my health and I found the need to reuse functions often. The file can be found here.

Here I’m just loading the keys I’ll need to use the APIs later.

withopen('keys.json','r')asf:keys=json.loads(f.read())

Runkeeper Data

Simply go to https://runkeeper.com/exportData to get a copy of your data. I downloaded the “activity data”. It includes a summary .csv along with .gpx files for each run. I’ll only be using the summary file.

data=[]fori,dowinenumerate(sorted(runkeeper_runs['dow'].unique())):# get pace histogram, ignoring outlierspace_vals=runkeeper_runs[runkeeper_runs['dow']==dow]['avg_pace_secs'].value_counts()data.append(go.Bar(x=pace_vals.keys()/60,y=pace_vals.values,name=str(dow),yaxis='y'+str(i+1)))fig=tools.make_subplots(rows=4,cols=2,subplot_titles=[dow_to_str[day]fordayinsorted(runkeeper_runs['dow'].unique())])fig.append_trace(data[0],1,1)fig.append_trace(data[1],1,2)fig.append_trace(data[2],2,1)fig.append_trace(data[3],2,2)fig.append_trace(data[4],3,1)fig.append_trace(data[5],3,2)fig.append_trace(data[6],4,1)fori,seminenumerate(sorted(runkeeper_runs['dow'].unique())):fig['layout']['xaxis'+str(i+1)].update(range=[7,10])fig['layout']['yaxis'+str(i+1)].update(range=[0,3])fig.layout.update(height=1000)fig.layout.update(title='Pace Distibution by Day of Week')iplot(fig,filename='dow pace')

Observations

Honestly, my paces are a lot less sporadic than I thought they would be. It seems my sweet spot for distance is between two and three miles. Also, I had no idea that I’ve never run in the afternoon. I thought I would’ve done it at least once in the past two years.

Other than that interesting finding, there doesn’t seem to be any factor that affects my average pace. We can dig deeper by examining my speed throughout each run.

Intra-run Data

The .gpx format output by RunKeeper provides latitude, longitude, elevation, and timestamp information throughout the run. For our analysis, we can use the coordinates and timestamp to determine speed.

Using the coordinates, we can also retrieve the location for each run. I’m going to be using a modified version of a function I got from this post.