Google’s machine-learning cloud pipeline explained

When Google first told the world about its Tensor Processing Unit, the strategy behind it seemed clear enough: Speed machine learning at scale by throwing custom hardware at the problem. Use commodity GPUs to train machine-learning models; use custom TPUs to deploy those trained models.

The new generation of Google’s TPUs is designed to handle both of those duties, training and deploying, on the same chip. That new generation is also faster, both on its own and when scaled out with others in what’s called a “TPU pod.”

But faster machine learning isn’t the only benefit from such a design. The TPU, especially in this new form, constitutes another piece of what amounts to Google building an end-to-end machine-learning pipeline, covering everything from intake of data to deployment of the trained model.

Machine learning: A pipeline runs through it

One of the largest obstacles to using machine learning right now is how tough it can be to put together a full pipeline for the data—intake, normalization, model training, model and deployment. The pieces are still highly disparate and uncoordinated. Companies like Baidu have hinted at wanting to create a single, unified, unpack-and-go solution, but so far that’s just a notion.

The most likely place for such a solution to emerge is in the cloud. As time goes by, much more of the data collected for machine learning (and everything else, really) lives there by default. So does the hardware needed to produce actionable results from it. Give people a single end-to-end, in-the-cloud workflow for machine learning, one with only a few knobs on it by default, and they’ll be happy to build on top of it.

Already mostly realized, Google’s vision is that each phase of the pipeline can be executed in the cloud, as close as possible to the data, for the best possible speed. With TPUs, Google’s also seeks to provide many of the phases with custom hardware acceleration that can be scaled out on demand.

The new TPUs are meant to boost pipeline acceleration in several ways. One speedup comes from being able to gang multiple TPUs. Another comes from being able to train and deploy models from the same slab of silicon. With the latter, it’s easier to incrementally retrain models as new data comes in, because the data doesn’t have to be moved around as much.

But are you willing to lock yourself into TensorFlow?

There’s one possible downside to Google’s vision: that the performance boost provided by TPUs works only if you use the right kind of machine-learning framework with it. And that means Google’s own TensorFlow.

It’s not that TensorFlow is a bad framework; in fact, it’s quite good. But it’s only one framework of many, each suited to different needs and use cases. So TPUs’ limitation of supporting just TensorFlow means you have to use it, regardless of its fit, if you want to squeeze maximum performance out of Google’s ML cloud. Another framework might be more convenient to use for a particular job, but it might not train or serve predictions as quickly because it’ll be consigned to running only on GPUs.

None of this also rules out the possibility that Google could introduce other hardware, such as customer-reprogrammable FPGAs, to allow frameworks not directly sponsored by Google to also have an edge.

But for most people, the inconvenience of being able to use TPUs to accelerate only certain things will be far outweighed by the convenience of having a managed, cloud-based everything-in-one-place pipeline for machine-learning work. So, like it or not, prepare to use TensorFlow.

Copyright 2017 IDG Communications. ABN 14 001 592 650. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of IDG Communications is prohibited.