Friday, January 14, 2011

Seminar on Time Series Novelty Detection

Just got this email in my inbox, feel free to stop by and see my talk. There is no mention of middleware or opensource I promise :)

GRADUATE STUDENT SEMINAR

JONATHAN S. ANSTEYWILL GIVE A TALK ON

“TIME SERIES NOVELTY DETECTION WITH APPLICATION TO PRODUCTION SENSOR SYSTEMS”

WEDNESDAY, JANUARY 19, 20109:00 A.M. EN-4002

MR. ANSTEY IS A GRADUATE STUDENTIN THE M.ENG. PROGRAMUNDER THE SUPERVISION OF DR. D. PETERS

ALL INTERESTED ARE WELCOME

I'm not sure either why this is all in CAPS :) For those interested, the full abstract of the work is:

Modern fiber manufacturing plants rely heavily on the use of automation. Automated facilities use sensors to measure fiber state and react to data patterns, which correspond to physical events. Many patterns can be predefined either by careful analysis or by domain experts. Instances of these patterns can then be discovered through techniques such as pattern recognition. However, pattern recognition will fail to detect events that have not been predefined, potentially causing expensive production errors. A solution to this dilemma, novelty detection, allows for the identification of interesting data patterns embedded in otherwise normal data. In this thesis we investigate some of the aspects of implementing novelty detection in a fiber manufacturing system. Specifically, we empirically evaluate the effectiveness of currently available feature extraction and novelty detection techniques on data from a real fiber manufacturing system.

Our results show that piecewise linear approximation (PLA) methods produce the highest quality features for fiber property datasets. Motivated by this fact, we introduced a new PLA algorithm called improved bottom up segmentation (IBUS). This new algorithm produced the highest quality features and considerably more data reduction than all currently available feature extraction techniques for our application.

Further empirical results from several leading time series novelty detection techniques revealed two conclusions. A simple Euclidean distance based technique is the best overall when no feature extraction is used. However, when feature extraction is used the Tarzan technique performs best.

The Tarzan method defines novel patterns as those whose pattern frequency is more or less than expected. Its really cool stuff using concepts from String processing like suffix trees and some Markov chain theory stuff for the calculating expected pattern frequencies of unknown patterns.

Read up more on it in this paper:Keogh, E., Lonardi, S and Chiu, W. (2002). Finding Surprising Patterns in a Time Series Database In Linear Time and Space. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23 - 26, 2002. Edmonton, Alberta, Canada. pp 550-556.