Better deep learning tools available today

Several months ago we wrote about updates to the two most popular deep learning frameworks, Tensorflow and PyTorch, that make them easier to use and deploy. Those updates are exciting and necessary, but I’m here to tell you that I still hate working with them. I don’t need to tell you why; I’ll just show you.

Let’s start with a simple image classification example from the official PyTorch tutorials. After creating your model and loading your data and making sure everything is correctly placed between the CPU and GPU, you write up a short training loop in Python.

This is the simplest possible training loop, and it’s still too much to deal with. Imagine having to copy and paste this into every new project you build. Imagine how many hours you’ll spend debugging poor model performance because you forgot optimizer.zero_grad()! (Seriously.) And that’s not to mention all the things you need to actually perform effective experimentation that aren’t included here:

model save/restore

logging (not printing to the console)

training visualization

model evaluation

custom training options

multiple GPU scaling

learning rate scheduling

early stopping

…

The great thing about PyTorch is you can do complex things (train crazy neural nets) using the low-level building blocks (simple math on tensors). You can customize everything. But the truth is, the abstractions are too low-level. Just because I can write detailed, customized training loops that incorporate ideas from the latest academic research doesn’t mean I want to! Working with raw PyTorch is just too difficult most of the time.

What about other frameworks? You’ll find exactly the same things if you go check out MXNet’s Gluon framework. Tensorflow is making an effort with Keras and the Estimator API, but there are still many pieces that you need to cobble together in just the right order and a lot of code that will get duplicated between projects.

If you’ve ever built more than one deep learning project then you’ve almost certainly noticed yourself writing code in one project that is highly similar (if not identical) to code you wrote in another. You may have even thought to yourself (I did) that it shouldn’t be that hard to create a library or module that does these common bits for you. But it turns out that doing that is really quite difficult - finding abstractions that apply to a sufficiently general class of problems without limiting the techniques you can use or the types of problems you can solve. Reader, I am happy to share with you that someone else has been thinking about these problems and has done the hard parts for us!

Use AllenNLP instead

AllenNLP, an open source project from the Allen Institute for AI, is one of a few libraries that are beginning to address the problems I mentioned above. Despite its name, you should be excited about AllenNLP even if you don’t care much about NLP. To get you sufficiently excited, let’s look at how you’d do a simple NLP task using cool deep learning models. First, you need a simple configuration file like this one. Next, you need to train it like this:

allennlp train path/to/experiment.jsonnet -s save/model/here

That’s it! Seriously. With just this you’ll get the best model as evaluated on a validation set saved and packaged for future use along with the full configuration that was used to train it. During training, all relevant information will be logged to the console, and you can visualize the entire process with a simple command:

tensorboard --logdir save/model/here

Everything is taken care of for you, and it’s easy to extend and customize. If you want to use a fancier model, just change the encoder type:

Of course, there are times where you’ll want or need to do something that’s not built into the library. AllenNLP is built on PyTorch and was made to be easy to extend. You can just add your own PyTorch code, and use API hooks to make your models available as config options.

What if you aren’t ready to deep learning? Consider a simpler use case like classifying text. You can use AllenNLP’s elmo command to get a vector representation of text via:

allennlp elmo path/to/sentences.txt save/vectors/here.hdf5

With one command and no deep learning knowledge you can numericalize your text and take that off to whatever prediction library you desire - scikit-learn for instance. These new tools don’t just make deep learning practitioner’s lives easier, they help democratize access to state-of-the-art methods.

Additionally, most of what I love about AllenNLP is not specific to NLP. Most of the things you need to do to train an NLP model are the same if you’re training a computer vision model or something else: logging, checkpointing, visualizing, scaling, etc. If you’re doing computer vision, you’ll have to write some custom models, but the rest of the tooling is still exceptionally convenient. And I have a hunch that it’s only a matter of time until this library extends to cover other fields.

The others

I mentioned that AllenNLP was one of a few new tools that are exciting in this space. The fast.ai library, also built on PyTorch, shares many of the same goals but focuses beyond just NLP.

Unfortunately, Pytorch was a long way from being a good option for part one of the course, which is designed to be accessible to people with no machine learning background. It did not have anything like the clear simple API of Keras for training models. Every project required dozens of lines of code just to implement the basics of training a neural network.

The Ludwig library was recently released by Uber to provide code-free model building for non-experts and is built on Tensorflow.

Ludwig is unique in its ability to help make deep learning easier to understand for non-experts and enable faster model improvement iteration cycles for experienced machine learning developers and researchers alike.

Clearly, there is a real need here. If we want to use deep learning to get real value in production contexts, we just cannot afford the massive technical debt we incur by copy and pasting the same for-loops in all our projects. We need to move deep learning past isolated research projects and into re-usable assets for creating business value. We are finally starting to see tools that not only give us the right abstractions, but are mature enough to invest in. Go check them out!

Victor Dibia will also be giving a talk called “Data2Vis: Automatic Generation of Data Visualizations Using Sequence-to-Sequence Recurrent Neural Network” at the NVIDIA GTC 2019 in San Jose, CA on March 21st.

Mike Lee Williams will be speaking on Federated Learning at the Strata Data Conference in San Francisco on March 27th.