Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

9.
99
(the rest of this talk…)
Data Scientist @ Endgame
Time Series Anomaly Detection

10.
1010
Problem:
Highlight when recorded metrics deviate from
normal patterns.
for example: a high number of connections might be an
indication of a brute force attack
for example: a large volume of outgoing data might be an
indication of an exfiltration event

11.
1111
Solution:
Build a system that can track and store
historical records of any metric. Develop an
algorithm that will detect irregular behavior
with minimal false positives.

14.
1414
kairos
A Python interface to backend storage databases
(redis in my case, others available) tailored for time
series storage.
Takes care of expiring data and different types of time
series (series, histogram, count, gauge, set).
Open sourced by Agora Games.
https://github.com/agoragames/kairos

16.
1616
kafka-python
A Python interface to Apache Kafka, where Kafka is
publish-subscribe messaging rethought as a
distributed commit log.
Allows me to subscribe to the events as they come in
real time.
https://github.com/mumrah/kafka-python

18.
1818
pyspark
A Python interface to Apache Spark, where Spark is a
fast and general engine for large scale data
processing.
Allows me to fill in historical data to the time series
when I add or modify metrics.
http://spark.apache.org/

24.
2424
classification
Both naïve models left a lot to be desired. Two simple
classifications would help us treat different types of
time series appropriately:
Does this metric show a weekly pattern (ie. different
behavior on weekends versus weekdays)?
Does this metric show a daily pattern?

25.
2525
classification
Fit a sine curve to
the weekday and
weekend periods.
Ratio of the level of
those fits to
determine if
weekdays will be
divided from
weekends.
weekly

29.
2929
classification
Take a Fourier
transform of the time
series, and inspect
the bins associated
with a frequency of a
day.
Use the ratio of
those bins to the first
(constant or DC
component) in order
to classify the time
series.
daily

30.
3030
classification
Time series on
weekdays shown
with a strong daily
pattern.
Fourier transform
with bins around the
day frequency
highlighted.
daily

31.
3131
classification
Time series on
weekends shown
with no daily pattern.
Fourier transform
with bins around the
day frequency
highlighted.
daily

42.
4242
arima
I am currently investigating using ARIMA
(autoregressive integrated moving average) models to
make better predictions.
I’m not convinced that this level of detail is necessary
for the analysis I’m doing, but I wanted to highlight
another cool scientific computing library that’s
available.