In recent years, deep learning models have become some of the most popular choices in machine learning for a variety of problems, in large part because they greatly reduce (or eliminate) the need for manual feature engineering.

In turn, TensorFlow has quicky become one of the most popular frameworks for training said deep learning models. TensorFlow's symbolic execution engine makes it easy to simply define an arbitary loss function--whether for a deep model or something more traditional--and then just call a favorite optimizer of choice to minimize the function using gradient descent. In this way, the barrier to deep learning has never been lower!

One of the biggest impediments to actually using deep learning in practice, however, is the requirement of large hand-labeled training sets. Increasingly, one approach has been to use weaker forms of supervision, i.e. programmatic or heuristic generation of training set labels which are often noisy and give conflicting signals.

Whichever way you label your training set, however, there is some process that you follow. The core idea in data programming (NIPS 2016) is that by modeling this training set creation process, you can improve quality. Right now, we're working on using this to power a new information extraction framework, Snorkel, but the concept of data programming is much more general.

In this tutorial, we'll walk through a simple toy example with synthetic data, showing how you can use data programming with TensorFlow to train arbitrary models like neural networks with only weak supervision. We'll walk through the three high-level steps of data programming:

Creating a noisy training set by writing labeling functions

Modeling this training set to denoise it

Training a noise-aware discriminative model

We note that for the most part, this is a tutorial on data programming, not deep learning. In fact, we won't use any neural networks or "deep" models in this tutorial--but everything here will be easily extendable to such models within TensorFlow! As you'll see below, step 3 can easily use a neural network but simply apply a different loss function after the top layer.

The goal of this tutorial is to go through a simple but end-to-end example of data programming, along with enough math to understand the basics, e.g. what objectives are we optimizing, how do they tie together, etc. If you are comfortable setting up and training machine learning models in a framework like TensorFlow, this tutorial should set you up to try out data programming with your favorite models! Other resources:

For a more detailed treatment, especially on the theory side, see our NIPS 2016 paper

To see how we use this technique on real information extraction problems, check out Snorkel, in particular the intro tutorial

Here we'll load the necessary libraries and generate some synthetic data, which we'll store as an $n \times d$ matrix $X_s$, where each row represents a data point $x \in \{0,1\}^d$.

We'll also generate a vector of ground-truth labels $Y_s \in \{-1,1\}^n$; we'll henceforth consider all but a small set of these labels unseen, as the whole point of our approach is to make do without labeled training data!