The tensorlayer.cli.train module provides the tltrain subcommand.
It helps the user bootstrap a TensorFlow/TensorLayer program for distributed training
using multiple GPU cards or CPUs on a computer.

You need to first setup the CUDA_VISIBLE_DEVICES
to tell tltrain which GPUs are available. If the CUDA_VISIBLE_DEVICES is not given,
tltrain would try best to discover all available GPUs.

In distribute training, each TensorFlow program needs a TF_CONFIG environment variable to describe
the cluster. It also needs a master daemon to
monitor all trainers. tltrain is responsible
for automatically managing these two tasks.

A parallel training program would require multiple parameter servers
to help parallel trainers to exchange intermediate gradients.
The best number of parameter servers is often proportional to the
size of your model as well as the number of CPUs available.
You can control the number of parameter servers using the -p parameter.

If you have a single computer with massive CPUs, you can use the -c parameter
to enable CPU-only parallel training.
The reason we are not supporting GPU-CPU co-training is because GPU and
CPU are running at different speeds. Using them together in training would
incur stragglers.