This way must use model path saved by Paddle like this: model/pass-%5d. Testing model is from M-th pass to (N-1)-th pass. For example: M=12 and N=14 will test model/pass-00012 and model/pass-00013.

Sparse Training

Sparse training is usually used to accelerate calculation when input is sparse data with highly dimension. For example, dictionary dimension of input data is 1 million, but one sample just have several words. In paddle, sparse matrix multiplication is used in forward propagation and sparse updating is perfomed on weight updating after backward propagation.

1) Local training

You need to set sparse_update=True in network config. Check the network config documentation for more details.

2) cluster training

Add the following argument for cluster training of a sparse model. At the same time you need to set sparse_remote_update=True in network config. Check the network config documentation for more details.

--ports_num_for_sparse=1 #(default: 0)

parallel_nn

parallel_nn can be set to mixed use of GPUs and CPUs to compute layers. That is to say, you can deploy network to use a GPU to compute some layers and use a CPU to compute other layers. The other way is to split layers into different GPUs, which can reduce GPU memory or use parallel computation to accelerate some layers.

If you want to use these characteristics, you need to specify device ID in network config (denote it as deviceId) and add command line argument:

- default_device(0): set default device ID to 0. This means that except the layers with device=-1, all layers will use a GPU, and the specific GPU used for each layer depends on trainer_count and gpu_id (0 by default). Here, layer fc1 and fc2 are computed on the GPU.

device=-1: use the CPU for layer fc3.

trainer_count:

trainer_count=1: if gpu_id is not set, then use the first GPU to compute layers fc1 and fc2. Otherwise use the GPU with gpu_id.

trainer_count>1: use trainer_count GPUs to compute one layer using data parallelism. For example, trainer_count=2 means that GPUs 0 and 1 will use data parallelism to compute layer fc1 and fc2.