Pandas and Bandwidth in Python Tutorial: Plotting the Results of Internet Speed Tests

Prologue: We all hate the cable company.

For the past few months we’ve had to reboot the cable modem every week or so. Nerd that I am, I went out and bought a new cable modem. I hooked it up like a champ, and then, BANG! It got much worse. It was slower, plus now the modem needed rebooting several times a day.

Since my wife was communicating mostly in spit and dirty looks, I called the cable company. They sent out a guy, and we (of course I hung around, “helping”) discovered that when I gave up cable TV, they put a gadget on my line that was hilariously incompatible with the new modem. The cable guy fixed everything, and I nearly fainted at how much faster it was.

I randomly sampled the bandwidth by bringing up a browser and going to a speed test page, but then I realized hey! I’m a programmer! I can automate that task. What follows is How I Did It.

The Task

My goal was to set up some scripts to run the speed test on a regular basis (every 30 minutes), log the data, and make a graph of the data available on-demand via a web browser.

The Python Toolbox

The primary tool I used is Python. It’s a powerful, but easy-to-use, language with plenty of modules available for data manipulation. In these examples, I’m using Python 2, but the Python 3 version would be very similar.

Python Modules

The following modules from the standard library:

os launch an external program

logging create a log file and add log entries

These modules from PyPI, the Python Package Index (https://pypi.python.org/pypi), which is an online repository of over 50,000 modules available for Python programmers:

pandas data analysis library

matplotlib data plotting library

web.py web development library

While these extra modules can be installed individually from PyPI (I had already installed the Anaconda bundle, which adds about 200 additional modules to the standard library.) The Anaconda modules are selected to be of use to scientists, statisticians, and engineers. See more at https://store.continuum.io/cshop/anaconda/.

speedtest-cli

speedtest-cli is a command-line tool (written in Python, although that was not why I chose it) for running Internet speed tests from the command line.

Logging The Data

To log the data, I needed to run speedtest-cli and capture its output, then write the data to a file. The output when using the –simple option consists of 3 lines:

Ping: 121.955 ms
Download: 11.77 Mbits/s
Upload: 1.07 Mbits/s

While the fancier subprocess module is suggested for running external programs, I used the simple popen() function in the os module. It opens an external program on a pipe, so you can read its standard output in the same way you’d read a text file.

To parse the data, I just used a string function to check the beginning of the line for each label, and split the line into fields, converting the middle field into a floating point number.

To write the data to a file, I used Python’s built-in logging module, because it’s simple to use and I could use a template string to automatically create custom-formatted timestamps. I added the script to crontab, which runs it twice an hour.

Parsing the Log File in Python

At this point I had a nice log file. The next step is to parse the file and plot the data.

Pandas is a really fantastic tool for parsing data. It can read data from many different sources, including flat files, CSV, SQL databases, HTML Tables and HDF5 files. It reads the data into a DataFrame, which is a 2-dimensional array something like a spreadsheet. You can select data by row and column, and there are plenty of built-in functions for indexing and analyzing the data. Reading my log file into a dataframe takes just one function call, then I can use Python “slice” notation to grab the last 48 entries, to give me 24 hours of data.

One of the things I really love about Pandas is the read_csv() function. It lets you tweak every aspect of reading in data from a file. Seriously, it has 54 possible parameters! Fortunately, you only have to specify the ones you need.

In my case, I told read_csv that to use one or more spaces, that there is no header line, to use a specified list of column names, to parse the first two columns into a single timestamp, and if a line has ‘TEST FAILED”, to set those columns to NaN (not a number).

Plotting the Data in Python

The primary tool for plotting data in the Python world is the matplotlib module. It has an object-oriented API that lets you control every possible aspect of the plot. The basic object is a figure, which is a single image. The figure can contain one or more axes, which are the coordinates for plotting. Each pair of axes can plot one or more datasets. Visit http://matplotlib.org/gallery.html to see matplotlib’s capabilities.

For simple plots, there is a MATLAB-like interface that uses default objects “behind the scenes.” This interface is implemented by the matplotlib.pyplot function, which is usually aliased to just plt.

Creating a plot is as simple as creating two arrays, one for x values and one for y values, then calling matplotlib.pyplot.plot(xvalues, yvalues). The fun comes in tweaking the endless variations of color, font, shapes, position, labels, ticks, and everything else that makes up a plot.

I only wanted to plot download speed against the timestamps, so I used the column headings from my Pandas dataframe to select just that data:

plt.plot(last_24['timestamp'],last_24['download'], 'b-')

This results in the following figure:

There are so many things wrong with this plot. It needs a title and labels for the X and Y axes. The labels on the X ticks are overlapping. Most important of all, it gives the impression that my internet speed drops to almost nothing, when really it’s only dropping a few Mbps. Overall, I changed the following:

Added a title

Added labels for the axes

Changed the Y axis scale to be 0-20Mbps.

Rotated the X axis tick labels to be at an angle

Added a background grid

There is no end to the time you can spend tweaking and tuning the appearance of the data. I had to stop myself so that I could get this post in only a little past deadline. I really wanted to add an annotation showing the slowest speed on the graph, but my programmer OCD kicked in, and I couldn’t get it to look just right. Adding basic annotations is pretty easy, but my labels were running off the figure if the lowest point was on the far left or far right.

Serving Up The Data

The graph looks good, so what’s left is to serve the graph image up on a web server so that I can access it any time. There are many web application frameworks for Python. One of the simplest, yet most powerful is web.py (http://webpy.org/). You can render HTML pages or data very easily, and it has a built-in HTTP server.

I chose to just serve the saved graph image directly, so no template is needed for this application.

URL mapping consists of a tuple (or any array-like object) containing (pattern, class, …). Repeat the pattern-class pair for each URL in your application. I have only one, but you can have as many as you need. The pattern is a regular expression that matches the actual URL. Any part of the URL that needs to be passed to the handler is marked as a group. This allows you to do things like:

To show the graph, I just hard-coded the URL ‘/bandwidth’. If I was more ambitious I could have added a fancier URL that lets you pick the starting date and time, such as ‘/bandwidth/2015/02/01/1000′:

(
'/bandwidth/(\d{4}/\d{2}/\d{2}/\d{4}/', 'bandwidth_by_date',
)

The request handler is a class that contains a GET method. If the URL contains any groups, they are passed to the GET method as parameters. The GET method returns the content. It defaults to returning HTML, but you can return any kind of data, by adding an HTTP header. In this case, I’m return a PNG file, so I generated the appropriate header. To return the image, I just read the image into memory and returned it as a string of bytes.

The rest of the app is boilerplate for starting up the built-in web server. For production, web.py apps can be integrated with apache, nginx, gunicorn, or other web servers.

Conclusion

Python is my go-to language for quickly analyzing and displaying data. It takes very little code to create a professional app, and the code remains readable (and thus maintainable) as the app scales to handle more data and more features.

hi
thanks for sharing this great info , i really was using other bad tools to test my internet speed but with your great tools i was amazed , they are very easy to use and very accurate
thanks again for this great article !

By HotCobra March 20, 2016 - 7:55 pm

Great article. However, I’m unable to get it to create the actual PNG on my Raspberry PI 2 (no monitor) and I’m doing this from Putty. Although I added the matplotlib.use(‘AGG’) param, I get
“AttributeError: ‘AutoLocator’ object has no attribute ‘maxticks'”