Outlier vs Change-Point

It is important that anomaly detectors are generally categorized into outlier and change-point detectors. Outliers are some spiky "local" data points which are suddenly observed in a series of normal samples, and Local Outlier Detection is an algorithm to detect outliers. On the other hand, change-points indicate "global" change on a wider scale in terms of characteristics of data points.

In this page, we specially focus on change-point detection. More concretely, the following sections introduce a way to detect change-points on Hivemall, by using a specific technique named Singular Spectrum Transformation (SST).

Load data into the table

Next, the .t file we have generated before can be loaded to the table by:

$ hadoop fs -put twitter.t /dataset/twitter/timeseries

timeseries table in twitter database should be:

num

value

1

182.478

2

176.231

3

183.917

4

177.798

5

165.469

...

...

Change-Point Detection using SST

We are now ready to detect change-points. A UDF sst() takes a double value as the first argument, and you can set options in the second argument.

What the following query does is to detect change-points from a value column in the timeseries table. An option "-threshold 0.005" means that a data point is detected as a change-point if its score is greater than 0.005.