Sections:

Breadcrumb

Google's TensorFlow 1.4 machine learning library adds the contributed Dataset API for working with data sources, but watch out for breakage caused by the update.

Serdar Yegulalp Nov 16th 2017

TensorFlow, Google’s contribution to the world of machine learning and data science, is a general framework for quickly developing neural networks. Despite being relatively new, TensorFlow has already found wide adoption as a common platform for such work, thanks to its powerful abstractions and ease of use.

TensorFlow 1.4 API additions

TensorFlow Keras API

The biggest changes in TensorFlow 1.4 involve two key additions to the core TensorFlow API. The tf.keras API allows users to employ the Keras API, a neural network library that predates TensorFlow but is quickly being displaced by it. The tf.keras API allows software using Keras to be transitioned to TensorFlow, either by using the Keras interface permanently, or as a prelude to the software being reworked to use TensorFlow natively.

TensorFlow Dataset API

Another addition to the core TensorFlow APIs is the tf.data or Dataset API, originally available as a contributed API but now officially supported. The Dataset API provides a set of abstractions for creating and re-using input pipelines—potentially complex datasets gleaned from one or more sources, with each element transformed as needed. Datasets can also have specific functions associated with iterations through the set—for instance, if you’re making multiple training passes through a dataset and need different behaviors on each pass.

TensorFlow Dataset API compatibility issue

If you have already been using the contributed version of the data API from the previous version of TensorFlow (tf.contrib.data), be warned that the official tf.data API isn’t perfectly backward compatible. The total number of changes isn’t large, but most of them are in strategic, commonly used functions, so there’s a fair chance existing code will break.

For example, if you previously used tf.contrib.data.rejection_resample(), or tf.contrib.data.Iterator.from_dataset(), both of those have been modified—the former has a new function signature, and the latter has been removed entirely and replaced with the Dataset.make_initializable_iterator() function. TensorFlow’s documentation has other details about how to migrate away from tf.contrib.data and use the official tf.data library instead.

TensorFlow Estimator simplified

Many of the other additions build on TensorFlow’s reputation for convenience. A train_and_evaluate function provides a simple way to run TensorFlow’s Estimator (used to automatically configure common model parameters) in a distributed fashion across a cluster. Also, TensorFlow’s built-in debugging system now lets you execute arbitrary Python code in the debugger’s command line, for quick-and-dirty inspection or modification.

TensorFlow CUDA and CuDNN support

TensorFlow 1.4 also updates support for CUDA and CuDNN, Nvidia’s libraries for GPU-accelerated data manipulation and deep learning, to versions 8 and 6, respectively. These aren’t the most recent versions, but TensorFlow’s developers state in the release notes, “We anticipate releasing TensorFlow 1.5 with CUDA 9 and CuDNN 7.”