2019.09.09 LFFD is ported to NCNN (link) and MNN (link) by SyGoing, great thanks to SyGoing.

2019.09.10 face_detection: important bug fix: vibration offset should be subtracted by shift in data iterator. This bug may result in lower accuracy, inaccurate bbox prediction and bbox vibration in test phase. We will upgrade v1 and v2 as soon as possible (should have higher accuracy and more stable).

In practical, we have deployed it in cloud and edge devices (like NVIDIA Jetson series and ARM-based embedding system). The comprehensive performance of LFFD is robust enough to support our applications.

In fact, our method is a general detection framework that applicable to one class detection, such as face detection, pedestrian detection, head detection, vehicle detection and so on. In general, an object class, whose average ratio of the longer side and the shorter side is less than 5, is appropriate to apply our framework for detection.

Several practical advantages:

large scale coverage, and easy to extend to larger scales by adding more layers without much latency gain.

detect small objects (as small as 10 pixels) in images with extremely large resolution (8K or even larger) in only one inference.

easy backbone with very common operators makes it easy to deploy anywhere.

Accuracy and Latency

We train LFFD on train set of WIDER FACE benchmark. All methods are evaluated on val/test sets under the SIO schema (please refer to the paper for details).

Accuracy on val set of WIDER FACE (The values in () are results from the original papers):

Method

Easy Set

Medium Set

Hard Set

DSFD

0.949(0.966)

0.936(0.957)

0.850(0.904)

PyramidBox

0.937(0.961)

0.927(0.950)

0.867(0.889)

S3FD

0.923(0.937)

0.907(0.924)

0.822(0.852)

SSH

0.921(0.931)

0.907(0.921)

0.702(0.845)

FaceBoxes

0.840

0.766

0.395

FaceBoxes3.2×

0.798

0.802

0.715

LFFD

0.910

0.881

0.780

Accuracy on test set of WIDER FACE (The values in () are results from the original papers):

We report the latency of inference only (for NVIDIA hardwares, data transfer is included), excluding pre-processing and post-processing. The batchsize is set to 1 for all evaluations.

Latency on NVIDIA GTX TITAN Xp (MXNet+CUDA 9.0+CUDNN7.1):

Resolution->

640×480

1280×720

1920×1080

3840×2160

DSFD

78.08ms(12.81 FPS)

187.78ms(5.33 FPS)

392.82ms(2.55 FPS)

1562.50ms(0.64 FPS)

PyramidBox

50.51ms(19.08 FPS)

143.34ms(6.98 FPS)

331.93ms(3.01 FPS)

1344.07ms(0.74 FPS)

S3FD

21.75ms(45.95 FPS)

55.73ms(17.94 FPS)

119.53ms(8.37 FPS)

471.31ms(2.21 FPS)

SSH

22.44ms(44.47 FPS)

55.29ms(18.09 FPS)

118.43ms(8.44 FPS)

463.10ms(2.16 FPS)

FaceBoxes3.2×

6.80ms(147.00 FPS)

12.96ms(77.19 FPS)

25.37ms(39.41 FPS)

111.98ms(8.93 FPS)

LFFD

7.60ms(131.40 FPS)

16.37ms(61.07 FPS)

31.27ms(31.98 FPS)

87.79ms(11.39 FPS)

Latency on NVIDIA TX2 (MXNet+CUDA 9.0+CUDNN7.1) presented in the paper:

Resolution->

160×120

320×240

640×480

FaceBoxes3.2×

11.20ms(89.29 FPS)

19.62ms(50.97 FPS)

72.74ms(13.75 FPS)

LFFD

7.30ms(136.99 FPS)

19.64ms(50.92 FPS)

64.70ms(15.46 FPS)

Latency on Respberry Pi 3 Model B+ (ncnn) presented in the paper:

Resolution->

160×120

320×240

640×480

FaceBoxes3.2×

167.20ms(5.98 FPS)

686.19ms(1.46 FPS)

3232.26ms(0.31 FPS)

LFFD

118.45ms(8.44 FPS)

409.19ms(2.44 FPS)

4114.15ms(0.24 FPS)

On NVIDIA platform, TensorRT is the best choice for inference. So we conduct additional latency evaluations using TensorRT (the latency is dramatically decreased!!!). As for ARM based platform, we plan to use MNN and Tengine for latency evaluation. Details can be found in the sub-project face_detection.