Anomaly Detection: The Big Data Whack-a-Mole

Wikipedia defines anomaly detection as the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset. Anomalies in operational data can indicate urgent problems, while anomalies in business data might also reflect positive events (like a spike in sales due to a new promotion).

There’s plenty of available science for detecting anomalies for traditional data sets. However, as the volume, variety, and velocity of big data increase, especially for time series data, anomaly detection faces an entirely new set of challenges.

At the root of such challenges: the fact that today’s data are always in flux.

Data sets themselves are dynamic to be sure. But the problem is worse than that.

The very nature of anomalies is also constantly in flux – and thus, any tools you might use to find them must be able to deal with such change.

We’ve just taken the anomaly detection Whack-a-Mole game to the next level. The moles aren’t simply jumping up and down. They’re multiplying – and every new mole is different.

Looks like we’re gonna need a better hammer.

The Anomaly Detection Squeeze

If you try to build your own anomaly detection tool, you’ll quickly find that you have to navigate between two undesirable extremes: detecting only obvious anomalies at one end, and detecting numerous false positives at the other. Unfortunately, both ends of this spectrum present Whack-a-Mole issues.

Just what it means for an anomaly to be obvious continues to shift, as technologies get better at recognizing increasingly subtle anomalies within increasingly noisy and unpredictable data sets.

Other challenges include problems with potential data sets, like missing data in time series, which can throw off the whole anomaly detection algorithm. In other cases, some of the data aren’t reliable, resulting from causes as wide-ranging as miscalibrated instruments to poor data entry techniques.

Seasonality also presents a common, but subtly complex set of challenges. Most tools understand daily and weekly patterns. Some annual patterns are also straightforward, like the Black Friday and Cyber Monday spikes all retailers lust after.

However, in many other cases, seasonal patterns are far more arbitrary, depending on business decisions like when to hold clearance sales, or notorious big data challenges like understanding patterns in weather data.

In any case, detecting obvious anomalies is nothing more than table stakes – and the minimum bet keeps going up. Anomaly detection tools must continually detect less and less obvious anomalies over time.

Similarly, the battle to reduce false positives continues to rage unabated. The more varied and dynamic the data sets become, the more careful the detection algorithm must be.

The good old days where you’d simply set a threshold and consider any data point that exceeded the threshold to be an alert are long gone.

Open Source Tools or Data Scientists to the Rescue?

If you’re struggling to solve the anomaly detection problem, you have two basic options: build vs. buy. The build option usually begins with an open source tool.

Using available open source tools like AnomalyDetection from Twitter or Weka out of the University of Waikato in New Zealand still typically requires custom development on your part, because these tools are more packages of do-it-yourself algorithms and components than usable applications.

So you take an available open source tool – or if you have a particular excess of chutzpah, start from scratch – and assign your crack team of data scientists to synthesize the perfect hammer for whacking all your moles.

Just one problem: good data scientists are virtually impossible to find, unless you’re a Google or a Facebook. And even if you’re lucky enough to hire one, they may or may not have the skills or predilection to work on that anomaly detection challenge you’ve been struggling with.

Remember, good data scientists have their pick of employers and their pick of interesting projects. Whacking your moles is unlikely to be high on their list.

Anodot is the Answer

Anomaly detection is so tough and such a dynamic challenge that the only practical way to address it is to find a vendor who has already invested the numerous person-years it takes to build such a tool. Anodot is just such a company.

Anodot automatically learns your data’s normal behavior and then identifies any deviations from that behavior in real-time, even for vast quantities of time series data. The tool is then able to detect subtle anomalies within many different patterns of data, at any level of granularity.

Anodot is thus able to automatically discover anomalies in vast amounts of data and turn them into business insights. Anomaly detection, after all, isn’t a carnival game. It is the key to squeezing the most value out of the flood of data organizations deal with every day.