In the graphs above you
can see 20 time series, with t=,1,2,3,4, regarding measurements
about temperature and rain respectively. Every series represents a
place in which the measurement was made. On the left the trajectory
is almost linear, while on the right there is a peak at time 2 and a
downturn at time 3. (…,by the way, data are completely invented!).

Now we can put an
outlier for one of the series of temperature and another one for
those ones of Rain measurements (fig, 2).

The graphical analysis of
time series in figure 2 permits to find out the two outliers easily.

We can apply the “washer”
methodology by implementing in R the code of file “esempio.R” (esempio.R).
(all version after R 2.8.1, but
perhaps also before versions work)

A new and faster code for "washer" R function is this: (esempio2) In the last example you can find how to use "washer" for a single time series.

Graph 2

Data are recorded in data.frame “dati” in the “long”
structure of relational databases. So phenomena is the first column,
time is the second one (ordered sequence of numbers), zone the third one
and the values are in the last column.

So in general we can
analyze several number of phenomena (p=1,...,P; with P=1,2,...)
with time series wider than two periods (t=1,...,T ; with T≥3)
and, finally, with several number of time series (i=1, ..., n;
with n ≥20-25).

The
data set {ypit}must have positive values (if negative you must translate it all!) and it is
analyzed by means of a measure of linearity of three values at time(yp,i,t-1,
yp,i,t
, yp,i,t+1)with a rolling pace that starts from t= 2 and ends with t=T-1. Missing
values are treated dropping(yp,i,t-1,
yp,i,t, yp,i,t+1)
if at leastone of the three is a missing value.

For
a fixed p and a fixed t=2, we have n measures of linearity for i=1,
…, n and Si=yi1+yi2+yi3
:

This
AVindex measures the three points linearity or a sort of distance from not-linearity .

The R function washer.AV()
returns the output (first 5 rows and rows from 21 to 25) in table
1:

If
you look at graph 2, you can notice that at time 2 ( t.2 = 2 ) the three
points have a peak, while at time 3 ( t.2 = 3 ) there is a drop. This
behavior produces a value of AV positive in the first case and
negative in the second one (table 1).

The
method works because of the tendency of time series to behave in the
same way in term of linearity/non linearity. In fact this tendency, generally, is
independent from positive or negative slope of the general trend.

The
step after regards the distribution of AV values in order to find
outlier with only one dimensional data. The non parametric test is
that of Sprent:

Row
18 identifies the points (5.5; 6.3; 17.0) where outlier is at the end
of the three points. In this case the test AV is less sensitive for
detection, while in row 38, according to three points (6.3; 17.0;
5.9), the sensitiveness of test is full (test.AV=24.2).

The
value 17.0 is too big because the other series don't behave “like
that”. If all the other series regarding rain measurements had a
peak at time 3 the anomaly of time series “a18” would disappear
at all.

Last
but not least there is in the output a measure of the tendency of
time series to behave “like that” among one another: it's
madindex!

The
role of thumb is the following:

If
madindex is lower than 50 there is a good behavior of series
among one another in order to detect outliers. If madindex
is greater than 50 then the values of AV are not very informative for outlier
detection.