Track: Sequential Data: Natural Language, Time Series, and Sound

Location: Cyril Magnin I

Day of week: Wednesday

Techniques, practices, and approaches around time series and sequential data. Expect topics including image recognition, NLP/NLU, preprocessing, & crunching of related algorithms.

A time series is a series of data points indexed (or listed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Sequential data looks at data problems where the ordering of data matters. Image processing and NLP/NLU fall in this space. The Groking Timeseries and Sequential Data track looks at the role of timeseries and sequential data in modern application development.

Track Host: Jendrik Jördening

Data Scientist @Nooxit

Jendrik is Head of Data Science at a stealth startup. He formerly worked at Aurubis and Akka Germany on Data Science and Deep Learning in the field of industry 4.0 and autonomous machines.

At the same time he took part in the Udacity Self-Driving Car Nanodegree, participating with a group of other Udacity student in the Self-Racing Cars event at the Thunderhill race-track in California.

The field of NLP is often not the most approachable, however. The field ranges from linguistics to cutting edge deep learning, so it can be hard to find tools to build a practical product in a reasonable timeframe.

In this talk, we will cover concrete examples of how to build practical applications using NLP. In the real world, most gains come from improvements to the pipeline, not necessarily the model. For this reason, we will dive into data visualization and labelling, as well as model validation.

We will walk through code, plots, and leave ample room for practical questions.

Forecasting is a common data science task that helps organizations with capacity planning, goal setting, and anomaly detection. Despite its importance, there are serious challenges associated with producing reliable and high quality forecasts – especially when there are a variety of time series and analysts with expertise in time series modeling are relatively rare. To address these challenges, we describe a practical, modular approach to forecasting “at scale” based on a flexible curve fitting procedure that produces high quality forecasts across a wide variety of business time series.

Uber’s Marketplace is the algorithmic brains and decision engine behind our ride-sharing services. Marketplace Forecasting builds and deploys spatio-temporal models and forecasts to enable hyperlocal decision making. To model the physical world requires us to reimagine how we look at the basic problem of forecasting.

We will discuss the challenges of modeling the influence of external signals, such as global news events and holidays in our Marketplace. In the majority of cases, there is limited historical data, and in cases where cities have just launched, there is no data at all. We will briefly cover how different techniques ranging from linear to deep learning models, generalized embeddings and cutting edge AI to help us forecast the future states of the Marketplace and even predict the onset of extreme events before they occur!

At PayPal, achieving four nines of availability is the norm. In the pursuit of exponentially complex additional nines, the company has recently embarked on applying deep learning to forecasting datacenter metrics. Seq2Seq networks are ripe for application to this difficult problem, but little has been shared to the open community.

Aashish Sheshadri shines a light on how PayPal applies Seq2Seq networks to forecasting CPU and memory metrics at scale. Forecasting enables alerting flows to get a head start reducing MTTD, augment autoremidiation, and consequentially aid MTTR. In doing so Aashish describes the ecosystem and tooling that enables developers at PayPal to experiment, build and train ML models while creating reusable, reproducible and sharable work in the Jupiter and Kubernetes ecosystem.

Is deep learning Alchemy? No! But it heavily relies on tips and tricks, a set of common wisdom that probably works for similar problems. In this talk, I’ll introduce what the audio/music research societies have discovered while playing with deep learning when it comes to audio classification and regression -- how to prepare the audio data and preprocess them, how to design the networks (or choose which one to steal from), and what we can expect as a result.