Plotting in Python – introduction

It is often said that a picture is worth thousands of words. This is especially true when coming to data: charts make the data properties immediately apparent.

Let’s start with a simple chart: a bar graph is a chart with rectangular bars whose lengths are proportional to the values that they represent. They are often used to compare data that fits into categories (histograms – sometimes confused with bars – are used for continuous data).

I will use as example the World Heritage sites, specifically the list of the top 10 countries with the highest number of sites.
The list is short (as a matter of fact 10 items) and can be represented in Python by a dictionary, better known as associative array (basically a collection of unique key + value pairs):

As you see the key is a string (the name of the country) and the value is the number of sites in that country.
The Python dictionaries are orderless (if you print it, the order of the items will be different) so the items (lands on X axis and sites on Y axis) need first to be sorted, to have a better chart.
Note: there is a library available called Collections that have Ordered Dictionary as data structure, that can be used too. For this simple example would be though overkilling.

Note that the bars are displayed separate (unlike the histograms).
That’s nice but the chart looks quite bare … let’s add more descriptions, such as labels (which land is) for the bars, some more space between bars and axes and a grid:

Note that the labels are slightly slanted (parameter “rotation”) so that they are nicer to read and that the y-axis starts at 0; Starting at a value above zero truncates the bars and doesn’t accurately reflect the full value.
The “grid” API has many more parameters, such as line type, width and colour, that you can experiment with.
Here is the new outcome:

Step 2. Now with labels, grid and borders. Nicer!

Now is nicer but still not so descriptive. You can add a chart title and labels for the axis with the following APIs. You can also add arbitrary lines and texts in the chart (for example to show what is the target or the average):

# add a red dashed line and a label for the mean (nSitesMean)
pypl.hlines(nSitesMean, -0.3, 10, color='red', linestyles='dashed',
label='$\mu (top10) $')
pypl.legend() # display label for mean line into a legend
""" define the plot labels """
pypl.title('Distribution of the WHC sites by land - Top 10',
fontsize=18) # chart title
pypl.ylabel('Number of WHC sites') # y axis title
pypl.show()

And here is the final result:

Step 3 – With titles and legend

There are many more APIs that can be used, to make even fancier charts. This is outside the scope of this small tutorial but just to make a couple of examples, you can annotate some part of the charts, or change colour of one of the bar:

# change colour of single bars (the very first) and annotate it
bars[0].set_color('green')
pypl.annotate('Country with largest number of WHC sites', xy=(0.5,50),
xytext=(2,49), arrowprops=dict(facecolor='green',
shrink=0.2), color='green')
pypl.text(7, 36, '$\mu=35 $', color='red') # text for mean

Here is the chart with annotations, colours, … somehow overfitting:

Version A – bit exaggerated

Or you can strip the chart to the very minimum but still being readable:

Welcome!

This is my personal blog, where I write about what I learned, mostly about software, project management and machine learning.
Why this name? The blog should help me to navigate into the future using (and not forgetting) the past experiences.
From Europe to the world.