Person detection - The Fastest Pedestrian Detector in the West

Original paper

Key idea of the approach

The authors underline that the speed problems of all current state-of-the-art person detectors mainly goes back to the need to construct an image pyramid and computing features at each of the levels of this pyramid which is time-consuming.

Nevertheless, the image pyramid is necessary to find small persons (original image is upscaled) or large persons (original image is downscaled) - while the detection window dimension stays the same.

For this, the authors explored how gradient magnitudes and orientations changed when up- or downsampling the images.

They found that having a feature (e.g. HOG - histograms of gradients) computed at one scale, we can approximate the corresponding feature at higher and lower scales by re-weightening the feature:

[…] our key insight is that for a broad family of features, including gradient histograms, the feature responses computed at a single scale can be used to approximate features responses at nearby scales.

[…] given gradients computed at one scale, is it possible to approximate gradient histograms at a different scale? If so, then we can avoid computing gradients over a ﬁnely sampled image pyramid.

Since this relationship gradually degrades when going to significantly higher or lower scales, we can at least construct a sparse image pyramid and approximate features at nearby scales by already computed features at a certain scale and thereby saving time.

Dollar et. al did exactly this on their own integral channel feature person detector, which is described in a previous paper: