A time series is a set of data points collected over a given period of time. Examples of time series are stock ticker data, sensor data and netflow data. Generally one wants to preform some sort of analysis on a time series and or use previous performance to forecast future performance or else use past performance to detect anomalies new data.

The CLML.time-series system contains functionality to manipulate, analyze time series data. CLML.time-series has a definite opinion on what a time-series is. We will see that after we load some data.

Lets get started by loading the system necessary for this tutorial and creating a namespace to work in.

In [1]:

(ql:quickload'(:clml.utility; Need clml.utility.data to get data from the net:clml.hjs; Need clml.hjs.read-data to poke around the raw dataset:clml.time-series; Need Time Series package obviously:iolib:clml.extras.eazy-gnuplot:eazy-gnuplot))

CLML's main unit of currency in working with data is the dataset. The dataset is a hierarchy series of classes that contain datapoints and metadata. They are similar to dataframes in R or data-tables in Python.

numeric-and-category-dataset dataset containing a mixture of numeric and categorical data

Most relevant to this tutorial

time-series-dataset dataset containing time-series data

Datasets can be created directly or can be created by reading them from a file. Supported data formats or CSV and SEXP.
In this case the read-data-from-file function is reading a data set from a file. The file in this case is a file that is obtained with the fetch function, which downloads and caches a file from a location on a local files system or a URL.

Lets take a look at the data, it apparently is from a hit counter.
head-points gives us the first 5 rows of a dataset ( if we wanted all the rows in a dataset we would have used dataset-points

This is the point where we will talk about CLML.time-series's definite opinions about time series. Time series in CLML.time-series are discrete. In CLML.time-series's opinion time series have a regular frequency. (This implies that time series data must have a reading at each period. However CLML.time-series does support missing values which will be covered in a later part of this series) The representation of frequency is a important, especially when comparing time-series points at regular intervals. The FREQUENCY slot specifies the number of datapoints per cycle. The START slot indicates the starting time index and frequency interval. The measurements are contained in the points slot and are represented as a vector of ts-point objects. Another useful thing to know is the slot accessor prefix is ts-

In fact in the dataset we just created if you look at the raw dataset above you will see there are no time specifiers in the data (there are labels however but they are not used in computations). This can actually be very important if your time-series has literally astronomical ranges. Some time-series libraries/databases encode the index as seconds or milliseconds from some fixed point in time. Doing that then constricts the ability of the time series to represent times to the range of the datatype being used to encode the time index. To be fair CLML.time-series in effect is doing the same thing however the time index is relative and the time indices can range from 0 to most-positive-fixnum (~4.6e18 in SBCL) given a datapoint is defined by the time and frequency interval (which also range from 0 to most-positive-fixnum the number of theoretically possible datapoints in a time series is most-positive-fixnum squared (in SBCL this would be greater than 2.0e35)

Lets look at the points in the dataset to see how they are represented.

he ts-point class encodes each measurement maintaining the time and frequency interval, a label (which is just a string, ad the actual measurements. The measurements in the pos slot are stored in a vector arbitrary length. Looking back to IN[8] you can see when we gave time-series a start time of 18 , and a start frequency interval of 3 we can see by examining the ts-pointss how this is actually represented. Another useful thing to know is that the accessor prefix of ts-point is ts-s-

time-series-datasets can also be created programattically.
Some examples are:

Now lets look at the whole dataset. Since each ts-point has a label our x axis would get overwhelmed with labels, we use the :xtic-interval to specify that we only want labels displayed every 500 points.

I would like to thank Fredreric Peschanski the creator of fishbowl which provides common lisp support for iPython. I would also like to thank Masataro Asai the creator of eazy-gnuplot. I would like to thank the creators of iPython and project Jupyter a truly cross platform mechanisim for th presentation of code and content. Finally I would like to thank github for [providing the ability to view notebooks inside github repositories] (http://blog.jupyter.org/2015/05/07/rendering-notebooks-on-github/)