Catégories

SAX: Piecewise Aggregate Approximation

Problem: We have a series of n numbers that we want to divide into w slots. We want to compute the mean of each slot, how do we proceed when n is not divisible by w ? This is called a Piecewise Aggregate Approximation (PAA).

The natural way would be to consider each slot to be the size of the following equation (n//w is the floor division) \(\frac{n}{w} = n // w + \frac{n \% w}{w}\).
We start from index 0, we add n//w points, then we add a proportion of the next point, corresponding to (n%w)/w. For the second slot, we take the rest of the proportion of the previous point, we add n//w point, then the rest of the proportion of the last point so that the size of the slot is n/w. We keep doing this process until we reach the end of the series.
The issue with this strategy is that is quite difficult and inelegant to code.

There is a more elegant way. If we multiply n by w, we can consider that we repeat each point w times. What is the point of doing this ? We saw that the proportion of point we need to add to the current slot is a quantity that we will divide by w (it is (n%w)/w). Considering we repeat each point w times, we will be able to deal with a number of points that is an integer.

Now the slot does not have a size of n/w, but n. We sum each element of the slot, then we divide by n at the end: we will indeed get the mean of the slot. For example, the second slot in the picture above has a mean of (s[i] corresponds to the ith element of the serie):
\[\frac{\frac{2}{3}s[2] +\frac{2}{3}s[3]}{\frac{4}{3}}\]

Considering the new representation, the mean is:
\[\frac{2s[2] +2s[3]}{4}\]

NB: Note however that this method has the disadvantage of increasing a lot the number of iterations. If you consider a long timeseries, it may be too long. If it is the case, you can choose to simply consider slot to have
size n // w. If w is not divisble by w, we will have w + 1 slot, the last one having less points (you need to be aware of that).