In this tutorial, we will introduce the basic knowledge of algorithm and show you how to train a target detection(DNN) with your own dataset. The trained model is the input of the compiler Plumber(which is going to be shown in next tutorial), and the model will be parsed, converted by plumber. The results from plumber can be accelerated by hardware.

Target detection is a classic problem in machine learning. The goal is to address and classify objects in a picture. The position of the object is represented by coordinates with upper left and lower right corners of a rectangular frame. Taking face recognition as an example, we input a picture, the result of the target detection is the position faces in this picture and the classification of the face (category is 1 if the object is a face, otherwise it will be 0 which means non-face).

The result of a face detection is as follows:

The deep learning algorithm has mature applications in target detection. The main framework is divided into One-Shot Detector and Two-Shot Detector. One-Shot is significantly faster than Two-Shot, while maintaining the same accuracy, making it even more practical for industrial applications.
We use the classic Single-Shot Multibox Detector (SSD) in this tutorial. You can refer to the link below for the specific principle:https://arxiv.org/abs/1512.02325

0. Preparation for DNN algorithm training：

A Host/PC/Server with GPU。
Deep learning algorithms usually take a long time (several hours or days) to train, and it takes much longer to be train in CPU, so we recommend to use GPU.

Data annotation and training。
Data is the basis of deep learning. We usually regard deep learning as a black box algorithm. The deep learning network constantly tries to fit network parameters which can generate designed output. For object detection problems, the user needs to provide the location and the classification of the objects in a picture that need to be detected.

Install GPU-based version docker (if you have GPU):

sudo docker run --runtime=nvidia --name plumber -dti brucvv/plumber

Install CPU-based version docker (if you don't have GPU):

sudo docker run --name plumber -dti brucvv/plumber:cpu_1.2

1. Data preparation

This step converts the input original image and annotation into the record data format of the Tensorflow framework.

We assume the data and data annotation file (img_label.txt) are in the same folder (in this example, the database folder is /app/imagetxt/, which contains 19 images and all the images are labeled with the face position).
The data annotation file format is as follows:

Users can define their own data types and write corresponding data conversion code.

Note: data annotation file img_label.txt is only labeled with one face position , so the trained algorithm of this tutorial will only detect one face in a image. User can take it as a practice, tries to add a new face positon to the annotation file and train the algorithm to achieve multiple face detection in one image.

The most important step here is to define network model model_name, and the model definition file is in the nets folder.

Note：

In general, the more data and the fewer annotation errors, the better training results. In this example, we only provide 19 images for training. They are only used to test verify overall processing. Please use the trained images to verify the test correctness.

It usually takes hours to train the model. We can decide whether to execute the subsequent operation by observing the loss and whether the corresponding ckpt file is saved in train_dir.

As the training progresses, the loss will gradually decrease. When the long-term loss only shows small numerical fluctuations, but the overall trend keeps constant, we can treat model training has completed (convergence).

Note:

In this example, since the number of images is small, it is very easy to converge and the loss will drop to a very low value. In a general two-classification training (the number of training sample is more than 10k+), when the loss value reaches 4 to 6, we can regard the training is basically complete.

Note: the number of classifications = 1 + total number of identification classifications, for example, the training sample contains both pedestrian and vehicle objects, then the number of classifications = 3

4. Generate post-processing parameter

Generate required post-processing parameter file (parameters about Anchor in SSD) for subsequent board inference. For the concept of Anchor, refer to the Faster-RDNN algorithm.Example：
Run under the /app/detection path.

5. Export the inference model

Frozen the trained model as pb file, which is suitable for deploying in the inference terminal. In TensorFlow, .pb holds the structure and variable names of the model network, and it also stores the values of all variables.

The final output of this chapter is the input of the subsequent compilation step using plumber. One goal of compiling trained model is to remove the nodes specified for training in the TensorFlow graph.