PaddlePaddle’s New API Simplifies Deep Learning Programs

PaddlePaddle’s New API Simplifies Deep Learning Programs

In September, we open sourced PaddlePaddle, the deep learning framework that has been used to power a range of Baidu products since its inception four years ago. To make the platform easier to use for the community, we’ve made several updates since then, including adopting Kubernetes cluster management system.

Using the new API, PaddlePaddle programs now require fewer lines of code, as shown in the example below. The figure shows a convolutional network program written in the old API (left) and the new one (right).

This significant simplification is a result of three key improvements

1.A New Conceptual Model

New research requires a flexible way to describe innovative deep learning algorithms. For example, a GAN model contains two networks, whose layers share some parameters. Also, during the training, we need to fix some parameters while updating some others. With our old API, users would have to access very low-level APIs, which are often undocumented, for such flexibility. With the new API, the illustrative GAN example takes only a few lines of code.

2.Higher-level API

PaddlePaddle was created to support distributed training. The old API exposes many details that users need to know before writing the distributed program. While PaddlePaddle can run a train loop pre-defined in C++ code, it prevents PaddlePaddle programs from running inside Jupyter Notebook, an ideal solution for documenting the work. In the new API, we provide higher-level APIs like `train`, `test`, and `infer`. For example, the train API can run a local training job and will be able to run a distributed job on a Kubernetes cluster.

3.Compositional Data Bricks

Data loading in industrial AI applications is far from trivial and usually requires a lot of source code. Our new API provides compositional concepts of reader, reader-creator, and reader-decorator, which enables the reuse of data operations. For example, we can define a reader-creator, `impressions()`, that reads search engine’s impression log stream from Kafka, in a few lines of Python code. We can also define another reader, `clicks()`, for reading the click stream. Then, we can buffer and shuffle using predefined reader-decorators. We can even compose/join data streams: