Getting Started with R-CNN, Fast R-CNN, and Faster R-CNN

Object detection is the process of finding and classifying objects
in an image. One deep learning approach, regions with convolutional neural networks (R-CNN),
combines rectangular region proposals with convolutional neural network features. R-CNN is a
two-stage detection algorithm. The first stage identifies a subset of regions in an image
that might contain an object. The second stage classifies the object in each region.

Object Detection Using R-CNN Algorithms

Models for object detection using regions with CNNs are based on the following three processes:

Find regions in the image that might contain an object. These regions are
called region proposals.

Extract CNN features from the region proposals.

Classify the objects using the extracted features.

There are three variants of an R-CNN. Each variant attempts to optimize,
speed up, or enhance the results of one or more of these processes.

R-CNN

The R-CNN detector [2] first generates region proposals using an algorithm such as Edge Boxes[1]. The proposal regions are cropped out of the image and resized. Then, the CNN
classifies the cropped and resized regions. Finally, the region proposal bounding
boxes are refined by a support vector machine (SVM) that is trained using CNN
features.

Fast R-CNN

As in the R-CNN detector , the Fast R-CNN[3] detector also uses an algorithm like Edge Boxes to generate region proposals.
Unlike the R-CNN detector, which crops and resizes region proposals, the Fast R-CNN
detector processes the entire image. Whereas an R-CNN detector must classify each
region, Fast R-CNN pools CNN features corresponding to each region proposal. Fast
R-CNN is more efficient than R-CNN, because in the Fast R-CNN detector, the
computations for overlapping regions are shared.

Faster R-CNN

The Faster R-CNN[4] detector adds a region proposal network (RPN) to generate region proposals
directly in the network nstead of using an external algorithm like Edge Boxes. The
RPN uses Anchor Boxes for Object Detection. Generating
region proposals in the network is faster and better tuned to your data.

Comparison of R-CNN Object Detectors

This family of object detectors uses region proposals to detect objects within images.
The number of proposed regions dictates the time it takes to detect objects in an image.
The Fast R-CNN and Faster R-CNN detectors are designed to improve detection performance
with a large number of regions.

The Fast R-CNN model builds on the basic R-CNN model. A box regression layer
is added to improve on the position of the object in the image by learning a set
of box offsets. An ROI pooling layer is inserted into the network to pool CNN
features for each region proposal.

Label Training Data for Deep Learning

You can use the Image Labeler,
Video Labeler, or
Ground Truth Labeler (available in Automated
Driving Toolbox™) apps to interactively label pixels and export label data for training. The
apps can also be used to label rectangular regions of interest (ROIs) for object detection,
scene labels for image classification, and pixels for semantic segmentation.