Thomas Lecocq @ the Royal Observatory of Belgium

Menu

Matplotlib & Datetimes – Tutorial 03: Grouping Sparse Data

New tutorial, more advanced this time ! Let’s say we have a number of observations, like occurrences of earthquakes, or visitors connecting to a webserver, etc. These observations don’t occur every second, they are sparse on the time axis. To prepare an example, I’ve created a set of random datetimes like this :

N being the number of points we want. We also need to loop over the original times array to add the microseconds, because time.gmtime doesn’t take them into account. This times array can be plotted using:

Note we add some random values for the Y axis, just to make a nice plot !

So, now we have an array of datetime.datetime objects and we want to get the number of occurrences on a certain time span, let’s say, 5 seconds. First, to work easily, we convert the array to a numpy array. The binning will be done using the powerful itertools.groupby method. It takes two arguments: an iterable, here our times array, and a key, which can be the name of a method.

itertools.groupby(iterable[, key])

the key in our case will be a method that will return the timestamp int-divided by the time span we have. So, for example, occurences at 1000.02, 1001.12, 1004.66 will get the same integer value of 200, because int(1000.02)/5=200, e.g.

This code first shows the definition of the binning (“time-span”) and the group function that will return the key for the groupby method to group the data. The output of this method are two arrays d and g, containing the key and the elements matching each key, respectively. We then, in the long one-liner, create an array of datetime.datetime objects, multiplying back the key (=d) by the binning value to get the left corner of the bin and getting the length of the listed g array, the number of occurrences for each key. Finally, the grouped_dates array is zipped to get two arrays that will be easy to plot. The rest is super simple:

I have four coloum and the first coloum is data time in 5 minutes interval (e.g [09:00, 09:05,09:10… 15:00]). another coloum is another data as y values. I would like to ask question, how to make first coloum as X axis in 5 minutes interval and another coloum as y values? what library i should learn? Thank you before…