In this example, we do detection by a pure Caffe edition of the R-CNN model for ImageNet. The R-CNN detector outputs class scores for the 200 detection classes of ILSVRC13. Keep in mind that these are raw one vs. all SVM scores, so they are not probabilistically calibrated or exactly comparable across classes. Note that this off-the-shelf model is simply for convenience, and is not the full R-CNN model.

Let's run detection on an image of a bicyclist riding a fish bike in the desert (from the ImageNet challenge—no joke).

Selective Search is the region proposer used by R-CNN. The selective_search_ijcv_with_python Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory selective_search_ijcv_with_python, run the demo in MATLAB to compile the necessary functions, then add it to your PYTHONPATH for importing. (If you have your own region proposals prepared, or would rather not bother with this step, detect.py accepts a list of images and bounding boxes as CSV.)

-Run ./scripts/download_model_binary.py models/bvlc_reference_rcnn_ilsvrc13 to get the Caffe R-CNN ImageNet model.

With that done, we'll call the bundled detect.py to generate the region proposals and run the network. For an explanation of the arguments, do ./detect.py --help.

1570 regions were proposed with the R-CNN configuration of selective search. The number of proposals will vary from image to image based on its contents and size -- selective search isn't scale invariant.

In general, detect.py is most efficient when running on a lot of images: it first extracts window proposals for all of them, batches the windows for efficient GPU processing, and then outputs the results.
Simply list an image per line in the images_file, and it will process all of them.

Although this guide gives an example of R-CNN ImageNet detection, detect.py is clever enough to adapt to different Caffe models’ input dimensions, batch size, and output categories. You can switch the model definition and pretrained model as desired. Refer to python detect.py --help for the parameters to describe your data set. There's no need for hardcoding.

Anyway, let's now load the ILSVRC13 detection class names and make a DataFrame of the predictions. Note you'll need the auxiliary ilsvrc2012 data fetched by data/ilsvrc12/get_ilsvrc12_aux.sh.

The top detections are in fact a person and bicycle.
Picking good localizations is a work in progress; we pick the top-scoring person and bicycle detections.

In [6]:

# Find, print, and display the top detections: person and bicycle.i=predictions_df['person'].argmax()j=predictions_df['bicycle'].argmax()# Show top predictions for top detection.f=pd.Series(df['prediction'].iloc[i],index=labels_df['name'])print('Top detection:')print(f.order(ascending=False)[:5])print('')# Show top predictions for second-best detection.f=pd.Series(df['prediction'].iloc[j],index=labels_df['name'])print('Second-best detection:')print(f.order(ascending=False)[:5])# Show top detection in red, second-best top detection in blue.im=plt.imread('images/fish-bike.jpg')plt.imshow(im)currentAxis=plt.gca()det=df.iloc[i]coords=(det['xmin'],det['ymin']),det['xmax']-det['xmin'],det['ymax']-det['ymin']currentAxis.add_patch(plt.Rectangle(*coords,fill=False,edgecolor='r',linewidth=5))det=df.iloc[j]coords=(det['xmin'],det['ymin']),det['xmax']-det['xmin'],det['ymax']-det['ymin']currentAxis.add_patch(plt.Rectangle(*coords,fill=False,edgecolor='b',linewidth=5))