Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

The collection and use of Big Data has become an important part of modern business practice. The Internet of Things (IoT) movement promises to provide new opportunities for businesses interested in the intersection of people and technology. It is also wrought with pitfalls for practitioners and researchers who struggle to make sense of an increasing cacophony of signals. How should they poll and collect data from millions of signals in a way that is manageable, scalable, and statistically valid? How should they analyze and predict using these data? This presentation will discuss these challenges with applied examples from monitoring and managing one of the world’s largest computers.

13.
Prediction Adoption Model (actual)
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 14
TIME
SOPHISTICATION
CHECK
THIS
OUT
OH NO,
OH NO,
OH NO!
HAHA,
IT
WORKED!
I NEVER
SAID IT
WOULD …
Stage I:
CHECK
THIS OUT
1. It runs
2. Results are
promising
Stage III:
HAHA,
IT WORKED!
5. I surprise myself
sometimes
6. I found a
shortcut to scale it
Stage II:
OH NO, OH NO,
OH NO!
3. It works but it’s
terrible
4. It will never scale
Stage IV:
I NEVER SAID
IT WOULD…
7. How do I prove it is
still working?
8. There is no way to
apply it to this scenario

14.
Stage I: INTRODUCTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 15
1. Design
► What should we measure?
► What are the core business
processes?
► What is the unit of analysis?
► What are our research questions/
hypotheses?
2. Measure
► Do we push or pull?
► How often should we measure?
► How long do we need the data?
► How do we represent the data
schema?

15.
Stage II: GROWTH
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 16
3. Describe
► Which metrics relate to our
outcomes of interest?
► What is the typical value of each
metric?
► How do you visualize each
metric?
4. Detect
► What do we expect to happen?
► Which values/events are
unexpected?
► When should we alert?
► How will we scale our analysis?

16.
Stage III: MATURITY
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 17
7. Predict
► Are there patterns?
► Are there more complex
relationships?
► What is going to happen?
► How do we get training data?
6. Act
► What actions should we take?
► How can we incorporate new
outcomes into the current
model?

17.
Stage IV: DECLINE
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 18
7. Feedback
► Is my model primarily basing its
decisions upon its previous
decisions?
► Can I separate the model from its
parameters?
► Can I still evaluate accuracy?
8. Obsolescence
► Are my business scenarios still
grounded?
► Do my model assumptions still hold?
► Does it still scale?
► Is the intervention still needed?

27.
14. COLLECTION
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 28
► Agreement of signals
► Cacophony of
signals
► How often should we
measure?
► We have no labeled
training data
► An approach we
can build upon in the
future
MEASURE

28.
13. SAMPLING
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 29
Shannon-Nyquist Paradox
► The more you measure
something the more it varies
► Bias related to time and
variability
► EG. Temperature yesterday
was 68 degrees
MEASURE (cont.)

33.
26-Jul-16presented by Ryan Kirk at StampedeCon 2016 34
► We have also built a search
engine for time series data
that allows us to build cool
looking graphs in real-time
► We basically do all of this to
empower slack alerts
► Allows tags to propagate
forwards
7. FEEDBACK LOOP PREDICT