Book

Design Doc: Concurrent Programming with Fluid

With PaddlePaddle Fluid, users describe a program other than a model. The program is a ProgramDesc protobuf message. TensorFlow/MxNet/Caffe2 applications generate protobuf messages too, but their protobuf messages represent the model, a graph of operators, but not the program that trains/uses the model.

Many know that when we program TensorFlow, we can specify the device on which each operator runs. This allows us to create a concurrent/parallel AI application. An interesting questions is how does a ProgramDesc represents a concurrent program?

The answer relies on the fact that a ProgramDesc is similar to an abstract syntax tree (AST) that describes a program. So users just program a concurrent program that they do with any concurrent programming language, e.g., Go.

Please be aware that the Fluid's Go binding provides the default main function, which calls the paddlepaddle function, which, in this case, is defined in above program and creates the following ProgramDesc message.

Then, the default main function calls fluid.run(), which creates an instance of the class Executor and calls Executor.Run(block[0]), where block[0] is the first and only block defined in above ProgramDesc message.

The default main function is defined as follows:

funcmain(){paddlepaddle()fluid.run()}

The Concurrent Version

By parallelizing the above program, we could support very big tensor X by splitting into small pieces {x_1, x_2, ...} and sent each piece to worker process/node for parallel multiplication.

In this case, we can write a transpiler that takes a ProgramDesc message that represents the above example program and outputs two ProgramDesc messages, one for running on the master process/node, and the other one for worker processes/nodes.

creates len(L) scopes, each for the concurrent running of the sub-block (block 1 in this case), and initializes a variable named "index" in the scope to an integer value in the range [0, len(L)-1], and

creates len(L) threads by calling into the ThreadPool singleton, each thread

creates an Executor instance, and

calls Executor.Run(block), where block is block 1 as explained above.

Please be aware that block 1 is a sub-block of block 0, so ops in block 1 could refer to variables defined in block 0.

creates an Executor instance and calls Executor.Run(block), where the block is generated by running the lambda specified as the second parameter of fluid.listen_and_do.

Summarization

From the above example, we see that:

Fluid enables the imperative programming paradigm by:

letting users describe a program, but not a model (a sequence of layers, or a graph of operators), and

call the fluid.run function that runs the program implicitly.

The program is described as a ProgramDesc protobuf message.

Function Executor.Run takes a block, instead of a ProgramDesc, as its parameter.

fluid.run calls Executor.Run to run the first block in the ProgramDesc message.

Executor.Run's implementation is extremely simple -- it doesn't plan the execution nor create threads; instead, it runs on the current thread and execute intrinsics/operators' Run method sequentially as they appear in the Block.ops array.

Intrinsics/operators' Run method might create threads. For example, the ListenAndDo operator creates a thread to handle each incoming request.

Threads are not necessarily OS thread; instead, they could be green threads managed by ThreadPool. Multiple green threads might run on the same OS thread. An example green threads is Go's goroutines.