UPCOMING EVENTS

Google today introduced TensorFlow Lite 1.0, its framework for developers deploying AI models on mobile and IoT devices. Improvements include selective registration and quantization during and after training for faster, smaller models. Quantization has led to 4 times compression of some models.

“We are going to fully support it. We’re not going to break things and make sure we guarantee its compatibility. I think a lot of people who deploy this on phones want those guarantees,” TensorFlow engineering director Rajat Monga told VentureBeat in a phone interview.

The TensorFlow Lite team at Google also shared its roadmap for the future today, designed to shrink and speed up AI models for edge deployment, including things like model acceleration, especially for Android developers using neural nets, as well as a Keras-based connecting pruning kit and additional quantization enhancements.

Other changes on the way:

Support for control flow, which is essential to the operation of models like recurrent neural networks

Expand coverage of GPU delegate operations and finalize the API to make it generally available

A TensorFlow 2.0 model converter to make Lite models will be made available for developers to better understand how things wrong in the conversion process and how to fix it.

TensorFlow Lite is deployed by more than two billion devices today, TensorFlow Lite engineer Raziel Alvarez said onstage at the TensorFlow Dev Summit being held at Google offices in Sunnyvale, California.

TensorFlow Lite increasingly makes TensorFlow Mobile obsolete, except for users who want to utilize it for training, but a solution is in the works, Alvarez said.

A variety of techniques are being explored to reduce the size of AI models and optimize for mobile devices, such as quantization and delegates (structured layers for executing graphs in different hardware to improve inference speed).

Mobile GPU acceleration with delegates for a number of devices was made available in developer preview in January; it can make model deployment 2 to 7 times faster than floating point CPU. Edge TPU delegates are able to increase speeds to 64 times faster than a floating point CPU.

In the future, Google plans to make GPU delegates generally available, expand coverage, and finalize APIs.

Above: TensorFlow Lite speeds

Image Credit: Khari Johnson / VentureBeat

A number of native Google apps and services use TensorFlow Lite, including GBoard, Google Photos, AutoML, and Nest. All computation for CPU models when Google Assistant needs to respond to queries when offline is now carried out by Lite.

Lite can also run on devices like Raspberry Pi and the new $150 Coral Dev Board, which was also introduced earlier today.

Also making their debut today: The alpha release of TensorFlow 2.0 for a simplified user experience; TensorFlow.js 1.0; and the version 0.2 release of TensorFlow for developers who write code in Apple’s programming language Swift.