Author: Pisarev Ivan, ODS Slack: pisarev_i

Tutorial

Customizing some details in Matplotlib

Using the Matplotlib library for visualization is great because you can customize nothing if you don't need it, and everything works fine out of the box.
Another great thing is you can customize almost any part of the plot as you wish, the tuning options are very wide.

A good source of information about the possibilities of Matplotlib is a Gallery and tutorials on the project website. In the gallery you can find an example for any need, it is enough to imagine what and how you want to visualize - and you will find the implementation of your imagination in the gallery.
In this tutorial, we will not retell the User's Guide.
We just will create some plots and make corrections to them to better convey the idea.

Backend specify different output formats, and there are two types of backends: user interface (interactive) backends and hardcopy (non-interactive) backends to make image files. You can see available backends:

If you want to get more interactive capabilities with your plots (such as zoom in, zoom out etc.) you can choose an appropriate backend

matplotlib.use('nbagg')

before you call

importmatplotlib.pyplotasplt

We will not do that.

In [2]:

importmatplotlib.pyplotasplt%matplotlib inline
importseabornassns

What you really need is to choose a style. In Appendix 1 you can see the available styles.
The style defines many parameters of the chart, so if you can not live without a grid and with a gray background - first of all set your favorite style.
I will:

Our first idea would be to demonstrate the growth of some values over the years.

In [5]:

plt.plot(dfAthlete[dfAthlete['Season']=='Summer'].groupby('Year')['ID'].nunique(),'ro-');plt.plot(dfAthlete[dfAthlete['Season']=='Winter'].groupby('Year')['ID'].nunique(),'bo-');plt.xlabel('Years',fontsize=14);plt.ylabel('Athletes',fontsize=14);plt.title('The number of Athletes',fontsize=16);

The total number of athletes taking part in the games is growing over time, obviously.
Autoscaling has done well, but we want better. Let's consider, what we can do with a grid.

fig,ax=plt.subplots()ax.plot(dfAthlete[dfAthlete['Season']=='Summer'].groupby('Year')['ID'].nunique(),'ro-');ax.plot(dfAthlete[dfAthlete['Season']=='Winter'].groupby('Year')['ID'].nunique(),'bo-');ax.xaxis.set_ticks(np.arange(1896,2020,4),minor=True)ax.grid(True,which='minor',linestyle='dotted')ax.set_xlabel('Years',fontsize=14);ax.set_ylabel('Athletes',fontsize=14);ax.set_title('The number of Athletes',fontsize=16);

But there is a better way for location ticks.
We can use function MultipleLocator, it is just what we need in this case.
Do not invent your own ways to location ticks until you have looked into matplotlib.ticker. There is many useful locator for any cases (but for datetime you can use lacators from matplotlib.dates).
So, let's

Use MultipleLocator for minor (ticks every 4 years) and major (every 24 years) ticks.

For YAxis ticks we will use MaxNLocator (no more than 3 ticks).

Also, we add a legend, increase linewidth, marker and label size, add 5% "padding" to a plot in the y-direction.

Now we can add the same plot for the number of Events, and gather them together.
We want to use one XAxis for both plots so we will use

plt.subplots(nrows=2,ncols=1,sharex=True)

In [8]:

fig,axes=plt.subplots(nrows=2,ncols=1,sharex=True)axes[0].plot(dfAthlete[dfAthlete['Season']=='Summer'].groupby('Year')['ID'].nunique(),color='r',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='r',label='Summer');axes[0].plot(dfAthlete[dfAthlete['Season']=='Winter'].groupby('Year')['ID'].nunique(),color='b',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='b',label='Winter');axes[0].grid(True,which='minor',linestyle='dotted')axes[0].yaxis.set_major_locator(MaxNLocator(3))axes[0].margins(x=0.05,y=0.1)axes[0].legend(loc='center left',frameon=True,fontsize=12);axes[0].set_xlabel('Years',fontsize=14);axes[0].set_ylabel('Athletes',fontsize=14);axes[0].set_title('The number of Athletes',fontsize=16);axes[1].plot(dfAthlete[dfAthlete['Season']=='Summer'].groupby('Year')['Event'].nunique(),color='r',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='r',label='Summer');axes[1].plot(dfAthlete[dfAthlete['Season']=='Winter'].groupby('Year')['Event'].nunique(),color='b',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='b',label='Winter');axes[1].xaxis.set_major_locator(MultipleLocator(24))axes[1].xaxis.set_minor_locator(MultipleLocator(4))axes[1].yaxis.set_major_locator(MaxNLocator(3))axes[1].grid(True,which='minor',linestyle='dotted')axes[1].margins(x=0.05,y=0.1)axes[1].legend(loc='center left',frameon=True,fontsize=12);axes[1].set_xlabel('Years',fontsize=14);axes[1].set_ylabel('Events',fontsize=14);axes[1].set_title('The number of Events',fontsize=16);plt.show();

Oh, something wrong with labels and titles.
In this case, possible solutions are:

If you want matplotlib automatically adjusts subplot params so that the subplots fit into the figure area, you can use fig.tight_layout().

You can increase space between subplots by fig.subplots_adjust(hspace=XX)

We just increase figure.figsize.

Now let's set location of shared xticks. For example, we can locate it between two plots.
We can do so:

fig,axes=plt.subplots(nrows=2,ncols=1,sharex=True,figsize=(10,8))axes[0].plot(dfAthlete[dfAthlete['Season']=='Summer'].groupby('Year')['ID'].nunique(),color='r',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='r',label='Summer');axes[0].plot(dfAthlete[dfAthlete['Season']=='Winter'].groupby('Year')['ID'].nunique(),color='b',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='b',label='Winter');axes[0].grid(True,which='minor',linestyle='dotted')axes[0].yaxis.set_major_locator(MaxNLocator(3))axes[0].margins(x=0.05,y=0.1)axes[0].set_xlabel('Years',fontsize=14);axes[0].set_ylabel('Athletes',fontsize=14);axes[0].set_title('The number of athletes and events is growing',fontsize=16);# Common titleaxes[1].plot(dfAthlete[dfAthlete['Season']=='Summer'].groupby('Year')['Event'].nunique(),color='r',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='r',label='Summer');axes[1].plot(dfAthlete[dfAthlete['Season']=='Winter'].groupby('Year')['Event'].nunique(),color='b',linewidth=4,marker='o',markersize=8,markerfacecolor='w',markeredgecolor='b',label='Winter');axes[1].xaxis.set_major_locator(MultipleLocator(24))axes[1].xaxis.set_minor_locator(MultipleLocator(4))axes[1].yaxis.set_major_locator(MaxNLocator(3))axes[1].grid(True,which='minor',linestyle='dotted')axes[1].margins(x=0.05,y=0.1)axes[1].legend(loc='upper left',frameon=True,fontsize=12);# Common legendaxes[1].set_xlabel('');# Hide x-labelaxes[1].xaxis.tick_top();# Move ticks to topaxes[1].tick_params(axis='x',pad=5)# Increase space to plotaxes[1].set_ylabel('Events',fontsize=14);fig.subplots_adjust(hspace=0.1)# Reduce space between plotsplt.show();

We are going to add two more plots in the same style. So, we can get tired to set the same linewidth, markersize etc every time.
If we plan to customize more parameters and use this style again, we can save <style-name>.mplstyle file to mpl_configdir/stylelib with something like

lines.linewidth : 3

lines.markersize : 6

lines.marker : 'o'

lines.markerfacecolor : 'white'

And then get your style with plt.style.use(<style-name>) everytime you need it.

We can manually set center if we know the boundary separating the losers (we will consider as losers all who doesn't hold out to 0.16 medals on the athlete). Also, we can rotate yticks labels, and increase space from it to the plot.