HYLA Blog

Combining Machine Learning with Cloud and Network Architecture

HYLA TechTalk: This blog is part of an ongoing series which will provide insights into the technology we develop and data we utilize across our analytics, trade-in, insurance and processing solutions utilized by carriers, OEMs, retailers and throughout HYLA.

Deep learning is an integral part of the supply chain management process here at HYLA Mobile. As an emerging technology, it’s often very difficult to get this working correctly. Moving from a working concept to a production system also tends to be quite difficult. At HYLA, we get a lot of questions about how we make machine learning work for us, so if you’re training a machine learning algorithm for use with an application, read on for some useful information.

The two primary tools we use at HYLA Mobile are TensorBoard and TensorFlow™. Depending on your machine learning background, you might know about and use these tools already.

TensorFlow, championed by Google in its infancy, is an open-source software library specialized towards creating the calculations involved in machine learning. Image recognition is a common application (it’s also the application which HYLA uses TensorFlow for). At a high level, TensorFlow achieves image recognition by training a neural network on a set of images until it can pin down a specific set of values with reasonable accuracy.

TensorBoard is a tool which visualizes the calculations involved in TensorFlow. Using TensorBoard after using TensorFlow is analogous to using a graphing app after using graph paper – TensorBoard makes it easier to understand TensorFlow computations, identify errors, and eventually create a faster and more accurate algorithm as a result.

Our workflow involving these tools is simple to describe, but a bit more difficult in practice. Again, at a high level:

We start by writing Python code that defines and trains a deep learning model. As a rule of thumb, it takes about a week to get anything useful out of a neural net because of the large amounts of data required by these algorithms.

We use TensorBoard to understand how the model is performing during and post-training. This helps to identify overfitting, which is what happens when a neural network becomes too specific and starts reporting too many false positives and negatives.

We then create a test environment, also in Python, that tests the new model against a test set that was classified by a human ahead of time.

Compare this model to the last best model we had. If it performs better, then it may end up going into production.

Putting the model into production simply involves copying it to the TensorFlow server and restarting the process, pointing it to the new model.

This is a pretty simplified explanation of the process, and if you’re already using machine learning in your applications, you’re probably doing something similar. The nuts and bolts of the process are a bit more complicated, however, and they involve various trade-offs based on the strengths and weaknesses of the tools and systems that we use. Let’s take a deeper dive.

Right at the beginning, a neural network is trained by making guesses. It’s given a set of images that have been pre-classified by a human (for example, “images that contain a puppy,” versus “images that do not contain a puppy.” The system guesses which images are which, and refines itself using its wrong answers in a process known as backward propagation.

As one might expect, the actual math involved here is quite difficult, but TensorFlow solves this problem by isolating the math within its libraries. In other words, our programmers no longer need to write the equations underpinning machine learning from scratch. Rather, they need to simply select the equation they need and define its parameters.

A nuance of TensorFlow, however, is the fact that calculations involving TensorFlow are much faster when accomplished with graphics processing units (GPUs), as opposed to the traditional CPU. Compounding the problem is the fact that the fastest GPUs are hard to find on AWS servers, our preferred cloud computing platform. Furthermore, GPUs themselves are a bit finicky to work with, containing memory constraints which, when not avoided, will limit the size and power of the neural network you’re attempting to train.

So, given this constraint, how does HYLA push the results of its machine learning calculations to a cloud environment where there are slower GPUs?

Our solution has been to custom-build an on-premises server cluster, powered by the latest in GPU processors, which only handles the training side of the neural net. It turns out that training the neural net requires much more processing power than running the neural net – and it’s possible to run the inference servers without any connection at all to the training servers.

Once again, here’s the high-level workflow:

In the HYLA data center, a GPU cluster begins training a neural network.

After about a week, the training is done. At this point, the neural network is spit out in a compressed form – a series of files similar to a virtual machine.

This series of files is copied into AWS following testing and validation.

Once on AWS, the TensorFlow Serving reads that configuration.

Images from a web server and/or mobile phones are sent to the TensorFlow Serving, and are processed in the cloud.

New software tools such as TensorFlow make machine learning accessible for a much broader spectrum of companies. The areas where machine learning combines with network architecture, however, are still full of rough edges. At HYLA, we were able to solve a complex network architecture problem by using a home-built white box machine for our training architecture, while doing the actual image recognition in the cloud. For more information about cloud architecture and how to solve challenges there, check out our previous post, “Using Amazon Web Services to Deploy Your Web Application.”