Plot or Not? Voting Results

During the SciPy conference last June, Adrian Price-Whelan and I got to commiserating about the ugly default styles of Matplotlib plots — something that also came up at the Matplotlib town-hall meeting. Because Python’s main plotting library was designed as an alternative to Matlab, it inherited much of the appearance and API from that language. This was originally an asset for Matplotlib, as it provided an obvious path for users to migrate away from Matlab. Unfortunately, as Matplotlib has matured, it’s core visual and programmatic design has calcified. People who want tocreateniceplots in Python are forced to write lots of tweaking code.

Users who push for changes to the default appearance of plots usually face 3 challenges:

– While most agree that Matplotlib would benefit from a better style, there is less consensus on what a replacement style should look like.

– Some are skeptical about the subjectivity of plot aesthetics, and think that “improvements” to plot styles are really just pleasing some users at the cost of others.

As Adrian and I discussed this, we wondered what it would take to integrate substantial stylistic changes into Matplotlib itself. We realized there’s very little data on what kind of Matplotlib plots people actually like. So we decided to collect some. During the SciPy sprints, we put together Plot or Not?, which randomly showed visitors the same Matplotlib plot rendered with two different styles, and asked which one they preferred. People liked it: the site crashed (I apparently don’t know how to make websites with actual traffic), we acquired about 14,000 votes, somebody suggested the name was misogynous, we triggered a good discussion on the Matplotlib developer mailing list, and we promised to share the voting results soon. Then I remembered I had to finish my thesis, and the data sat on a server somewhere for 6 months.

As luck would have it, I found some time to dig into the votes this weekend. You can explore the results at this page, which shows a scatter plot of each Plot or Not image as a function of the fraction of votes it received (X axis) and the margin of victory/defeat (Y axis). Clicking on a point will show you the voting breakdown for a given face-off.

The dataset, it turns out, has a lot of interesting information about what kinds of plots people like. You should explore for yourself, but here are some of the biggest themes I noticed:

People largely share the same aesthetic preferences

Yes aesthetics have a large subjective component. However most people agreed on which plots they preferred. This argues that there are stylistic changes one could make to Matplotlib that would be a net improvement, despite subjectivity.

Legibility is the most important factor

If you look at the heaviest favorites, many of them are comparisons between a plot with easily-seen lines and one whose lines are too thin, too transparent, or too light.

Cam Davidson-Pilon’s Style is Very Good

For some of the plots we generated, we used the settings Cam Davidson-Pilon used in his online book.These were consistently selected as the favorite, and often by large victories like 5:1 or more.

We also used the style from Huy Nguyen’s blog post, which emulates GGPlot — it’s very similar, though it uses a thinner font and linewidth. People slightly preferred Cam’s style in head-to-head comparisons — probably because the lines are easier to see.

People like the dark Color Brewer colors (but not the pastel ones)

Many of the plots in plotornot used colors from colorbrewer2.org. People liked line plots and histograms that used the Set1 and Dark2 color palettes. Likewise, people often preferred filled contour plots that used the divergent Color Brewer palettes.

However, people did not like plots drawn with pastel Color Brewer tables (Pastel, Accent, Paired2, Paired3). These are both harder to see, and feel a bit… “Easter-y” (this is a highly scientific adjective). Unpopular colors for contour plots included Accent, Prism, HSV, and gist_stern. All of these palettes cycle through several hues. It is hard to encode scale with Hue, and people preferred palettes restricted to one or two hues. In fairness, some of these multihue palettes would have looked better on images that encode more than ~5 values at a time. Still, the advice from visualization experts seems to be to stick to one- or two-hue colormaps. The latter are best suited in cases where you want to call attention to outliers with both large and small values.

The default Matplotlib colors are almost never preferred

Unlike the Color Brewer colors — which are designed for legibility and coherence — the default Matplotlib color set is pretty arbitrary (blue, green, red, cyan, magenta, yellow, black). These colors don’t work well together, and it shows in the votes. In the few instances where a matplotlib default was preferred, the other plot usually had hard-to-see lines.

An easy improvement

There are a lot of ways one could consider changing styles in matplotlib. The votes from Plot Or Not? suggest a few obvious improvements:

– Use the Set1 or Dark2 Color Brewer palettes for the default line style

– Use a single-hue colormap like ‘gray’ for the default color map.

– Increase the default linewidth from 1 to 2

While the Matplotlib devs are still resistant to changing any defaults, there are some improvements that you will start to see in Matplotlib v1.4. This includes a “style.use” function which will let you easily select style sheets by name or filepath/url. For example, to use the style changes advocated for in this blog post, you could write

from matplotlib.style import use
use('http://plotornot.chrisbeaumont.org/matplotlibrc')

My hope is that Matplotlib will start to build some nice stylesheets that ship with the library. Eventually, I would also love to see a new option for my matplotlibrc file that specifies ‘default_style: latest’ — this would indicate that I am “opting-in” to whatever the Matplotlib developers deem to be the best default style. This style could then incrementally improve with each release, without breaking any legacy code.

In the meantime, the 6 months since SciPy have seen a lot of progress on viz libraries which build on top of (seaborn, prettyplotlib, a ggplot clone, mpld3, glue) or offer alternatives to (vincent, bokeh) Matplotlib. I’m excited about all of these projects, but hope also that Matplotlib is able to keep evolving to stay modern (I haven’t talked at all about Matplotlib’s API, but I would love to see that improve as well). Matplotlib has solved a lot of problems and remains the most mature library for plotting in Python by far. Even incremental improvements to Matplotlib can have a big effect on the Python community.