Faster R-CNNs are likely the most “heard of” method for object detection using deep learning; however, the technique can be difficult to understand (especially for beginners in deep learning), hard to implement, and challenging to train.

Furthermore, even with the “faster” implementation R-CNNs (where the “R” stands for “Region Proposal”) the algorithm can be quite slow, on the order of 7 FPS.

If we are looking for pure speed then we tend to use YOLO as this algorithm is much faster, capable of processing 40-90 FPS on a Titan X GPU. The super fast variant of YOLO can even get up to 155 FPS.

The problem with YOLO is that it leaves much accuracy to be desired.

SSDs, originally developed by Google, are a balance between the two. The algorithm is more straightforward (and I would argue better explained in the original seminal paper) than Faster R-CNNs.

We can also enjoy a much faster FPS throughput than Girshick et al. at 22-46 FPS depending on which variant of the network we use. SSDs also tend to be more accurate than YOLO. To learn more about SSDs, please refer to Liu et al.

When building object detection networks we normally use an existing network architecture, such as VGG or ResNet, and then use it inside the object detection pipeline. The problem is that these network architectures can be very large in the order of 200-500MB.

Network architectures such as these are unsuitable for resource constrained devices due to their sheer size and resulting number of computations.

Instead, we can use MobileNets (Howard et al., 2017), another paper by Google researchers. We call these networks “MobileNets” because they are designed for resource constrained devices such as your smartphone. MobileNets differ from traditional CNNs through the usage of depthwise separable convolution (Figure 2 above).

The general idea behind depthwise separable convolution is to split convolution into two stages:

A 3×3 depthwise convolution.

Followed by a 1×1 pointwise convolution.

This allows us to actually reduce the number of parameters in our network.

The problem is that we sacrifice accuracy — MobileNets are normally not as accurate as their larger big brothers…

Again, example files for the first three arguments are included in the “Downloads” section of this blog post. I urge you to start there while also supplying some query images of your own.

Next, let’s initialize class labels and bounding box colors:

Object detection with deep learning and OpenCV

Python

18

19

20

21

22

23

24

# initialize the list of class labels MobileNet SSD was trained to

# detect, then generate a set of bounding box colors for each class

CLASSES=["background","aeroplane","bicycle","bird","boat",

"bottle","bus","car","cat","chair","cow","diningtable",

"dog","horse","motorbike","person","pottedplant","sheep",

"sofa","train","tvmonitor"]

COLORS=np.random.uniform(0,255,size=(len(CLASSES),3))

Lines 20-23 build a list called
CLASSES containing our labels. This is followed by a list,
COLORS which contains corresponding random colors for bounding boxes (Line 24).

Now we need to load our model:

Object detection with deep learning and OpenCV

Python

26

27

28

# load our serialized model from disk

print("[INFO] loading model...")

net=cv2.dnn.readNetFromCaffe(args["prototxt"],args["model"])

The above lines are self-explanatory, we simply print a message and load our
model (Lines 27 and 28).

Next, we will load our query image and prepare our
blob , which we will feed-forward through the network:

Object detection with deep learning and OpenCV

Python

30

31

32

33

34

35

36

37

# load the input image and construct an input blob for the image

# by resizing to a fixed 300x300 pixels and then normalizing it

# (note: normalization is done via the authors of the MobileNet SSD

# implementation)

image=cv2.imread(args["image"])

(h,w)=image.shape[:2]

blob=cv2.dnn.blobFromImage(cv2.resize(image,(300,300)),0.007843,

(300,300),127.5)

Taking note of the comment in this block, we load our
image (Line 34), extract the height and width (Line 35), and calculate a 300 by 300 pixel
blob from our image (Line 36).

Now we’re ready to do the heavy lifting — we’ll pass this blob through the neural network:

Object detection with deep learning and OpenCV

Python

38

39

40

41

42

# pass the blob through the network and obtain the detections and

# predictions

print("[INFO] computing object detections...")

net.setInput(blob)

detections=net.forward()

On Lines 41 and 42 we set the input to the network and compute the forward pass for the input, storing the result as
detections . Computing the forward pass and associated detections could take awhile depending on your model and input size, but for this example it will be relatively quick on most CPUs.

Let’s loop through our
detections and determine what and where the objects are in the image:

Object detection with deep learning and OpenCV

Python

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

# loop over the detections

foriinnp.arange(0,detections.shape[2]):

# extract the confidence (i.e., probability) associated with the

# prediction

confidence=detections[0,0,i,2]

# filter out weak detections by ensuring the `confidence` is

# greater than the minimum confidence

ifconfidence>args["confidence"]:

# extract the index of the class label from the `detections`,

# then compute the (x, y)-coordinates of the bounding box for

# the object

idx=int(detections[0,0,i,1])

box=detections[0,0,i,3:7]*np.array([w,h,w,h])

(startX,startY,endX,endY)=box.astype("int")

# display the prediction

label="{}: {:.2f}%".format(CLASSES[idx],confidence*100)

print("[INFO] {}".format(label))

cv2.rectangle(image,(startX,startY),(endX,endY),

COLORS[idx],2)

y=startY-15ifstartY-15>15elsestartY+15

cv2.putText(image,label,(startX,y),

cv2.FONT_HERSHEY_SIMPLEX,0.5,COLORS[idx],2)

We start by looping over our detections, keeping in mind that multiple objects can be detected in a single image. We also apply a check to the confidence (i.e., probability) associated with each detection. If the confidence is high enough (i.e. above the threshold), then we’ll display the prediction in the terminal as well as draw the prediction on the image with text and a colored bounding box. Let’s break it down line-by-line:

Looping through our
detections , first we extract the
confidence value (Line 48).

If the
confidence is above our minimum threshold (Line 52), we extract the class label index (Line 56) and compute the bounding box around the detected object (Line 57).

Then, we extract the (x, y)-coordinates of the box (Line 58) which we will will use shortly for drawing a rectangle and displaying text.

Next, we build a text
label containing the
CLASS name and the
confidence (Line 61).

Using the label, we print it to the terminal (Line 62), followed by drawing a colored rectangle around the object using our previously extracted (x, y)-coordinates (Lines 63 and 64).

In general, we want the label to be displayed above the rectangle, but if there isn’t room, we’ll display it just below the top of the rectangle (Line 65).

Finally, we overlay the colored text onto the
image using the y-value that we just calculated (Lines 66 and 67).

The only remaining step is to display the result:

Object detection with deep learning and OpenCV

Python

69

70

71

# show the output image

cv2.imshow("Output",image)

cv2.waitKey(0)

We display the resulting output image to the screen until a key is pressed (Lines 70 and 71).

OpenCV and deep learning object detection results

To download the code + pre-trained network + example images, be sure to use the “Downloads” section at the bottom of this blog post.

From there, unzip the archive and execute the following command:

Object detection with deep learning and OpenCV

Shell

1

2

3

4

5

6

7

8

9

$python deep_learning_object_detection.py\

--prototxt MobileNetSSD_deploy.prototxt.txt\

--model MobileNetSSD_deploy.caffemodel--image images/example_01.jpg

[INFO]loading model...

[INFO]computing objectdetections...

[INFO]loading model...

[INFO]computing objectdetections...

[INFO]car:99.78%

[INFO]car:99.25%

Figure 3: Two Toyotas on the highway recognized with near-100% confidence using OpenCV, deep learning, and object detection.

Our first result shows cars recognized and detected with near-100% confidence.

In this example we detect an airplane using deep learning-based object detection:

Figure 6: Deep learning + OpenCV are able to correctly detect a beer bottle in an input image.

Followed by another horse image which also contains a dog, car, and person:

Object detection with deep learning and OpenCV

Shell

1

2

3

4

5

6

7

8

9

$python deep_learning_object_detection.py\

--prototxt MobileNetSSD_deploy.prototxt.txt\

--model MobileNetSSD_deploy.caffemodel--image images/example_05.jpg

[INFO]loading model...

[INFO]computing objectdetections...

[INFO]car:99.87%

[INFO]dog:94.88%

[INFO]horse:99.97%

[INFO]person:99.88%

Figure 7: Several objects in this image including a car, dog, horse, and person are all recognized.

Finally, a picture of me and Jemma, the family beagle:

Object detection with deep learning and OpenCV

Shell

1

2

3

4

5

6

7

$python deep_learning_object_detection.py\

--prototxt MobileNetSSD_deploy.prototxt.txt\

--model MobileNetSSD_deploy.caffemodel--image images/example_06.jpg

[INFO]loading model...

[INFO]computing objectdetections...

[INFO]dog:95.88%

[INFO]person:99.95%

Figure 8: Me and the family beagle are corrected as a “person” and a “dog” via deep learning, object detection, and OpenCV. The TV monitor is not recognized.

Unfortunately the TV monitor isn’t recognized in this image which is likely due to (1) me blocking it and (2) poor contrast around the TV. That being said, we have demonstrated excellent object detection results using OpenCV’s
dnn module.

Summary

In today’s blog post we learned how to perform object detection using deep learning and OpenCV.

Specifically, we used both MobileNets + Single Shot Detectors along with OpenCV 3.3’s brand new (totally overhauled)
dnn module to detect objects in images.

As a computer vision and deep learning community we owe a lot to the contributions of Aleksandr Rybnikov, the main contributor to the
dnn module for making deep learning so accessible from within the OpenCV library. You can find Aleksandr’s original OpenCV example script here — I have modified it for the purposes of this blog post.

In a future blog post I’ll be demonstrating how we can modify today’s tutorial to work with real-time video streams, thus enabling us to perform deep learning-based object detection to videos. We’ll be sure to leverage efficient frame I/O to increase the FPS throughout our pipeline as well.

To be notified when future blog posts (such as the real-time object detection tutorial) are published here on PyImageSearch, simply enter your email address in the form below.

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

I would start by giving the first post in the series a read. You do not train the models with OpenCV’s dnn module. They are instead trained using tools like Caffe, TensorFlow, or PyTorch. This particular example demonstrates how to load a pre-trained Caffe network.

The dnn module has been totally re-done in OpenCV 3.3. Many Caffe models will work with it out-of-the-box. I would suggest taking a look at the Caffe Model Zoo for more pre-trained networks.

Hi Adrian,
how long does it take to forward walk through the provided network?
Is it faster than tensorflow based networks of same architecture?
Is there a tutorial inside of your books that covers fast recognition and detection using CNN at best in realtime with networks like YOLO.

Is there any way to make this work with OpenCV 3.2 – I am trying to make this work with ROS (Robot operating system) but this only incorporated OpenCV 3.2. AM I SOL don’t go there territory or is there a way?

Hey David — I wish I had better news for you. The dnn module was completely and entirely overhauled in OpenCV 3.3. Without OpenCV 3.3 you will not have the new dnn module and therefore you cannot apply object detection with deep learning and OpenCV.

Nice tutorial. Can i please have the video implementation of the object detection method. The challenge i am facing is of the model using up all my resources for inference and i am sure this method goes a long way in ensuring efficient resource usage during inference.

I’m still trying to understand how an image classifier cold be incorporated into a larger network for find bounding boxes. I thought about searching a tree of cropped images buy that would be interactive and slow.

I looks like this article took the black-box approach. How to detect objects? Make a call to an object detector. That’s easy but how does the object detector work?

How can an object classifier like vgg16 be used for deception without iteration

Traditional object detection is accomplished using a sliding window an image pyramid, like in Histogram of Oriented Gradients. Deep learning-based object detectors do end-to-end object detection. The actual inner workings of how SSD/Faster R-CNN work are outside the context of this post, but the gist is that you can divide an image into a grid, classify each grid, and then adjust the anchors of the grid to better fit the object. This is a huge simplification but it should help point you in the right direction.

The “person” class is the 14th index in CLASSES and therefore the returned detections as well. You can remove the for loop that loops over the detections and then just check the probability associated with the person class:

It didn’t work. the detections return only the shapes that were detected. if I had only 2 shapes in my image, the for loop will repeat twice, then integration would be 0 and 1 and not the whole CLASSES. So, your answer is wrong. I’ve tried it. But I can’t find a way of detecting only human shape.

That’s is exactly what I tried, but it’s 15 for “person”. You said in other comment that you’d be sharing the video implementation on Monday. I already did that following the instructions here and others about video. But, it takes around 17 s between frames (between processing a frame and another). Do you know what I could do to decrease this time?

Hi Barbara — unfortunately without knowing more about your setup I’m not sure what the issue is. I would kindly ask you to please wait until the video tutorial is released on Monday, September 18th. There are additional optimizations that you may not be considering such as reducing frame size, using threading to speedup the frames per second rate, etc.

I modified this algorithm to find only people, there are many false positives.
Is it possible to integrate it with a face search? I just want to know if there is a person in the picture, not a position in the picture, recognition or something else.

Are you trying to detect the presence of a face in an image? Simple Haar cascades or HOG + Linear SVM detectors could easily accomplish this. Take a look at this blog post as well as Practical Python and OpenCV for help with face detection.

If you’re trying to actually recognize the face in an image you should use face recognition algorithms such as Eigenfaces, Fisherfaces, LBPs for face recognition, or even deep learning-based techniques. The PyImageSearch Gurus course covers face recognition techniques.

Hi, Adrian. Have you tried the original TensorFlow Model to compare with the Caffe version? Do you plan to do such tests and show on your blog how to use a pre-trained model with differentt Network architectures? Thanks a lot for your great posts. It encourages me even more to buy your books, and I hope I will!

I personally haven’t benchmarked the original TensorFlow model compared to the Caffe one; however, the author of the TensorFlow did benchmark them. They share their benchmarks here and note the differences in implementation.

Thanks a lot, Adrian. And I have just watched your new real-time object detection video on YouTube. Oh, man, stop blowing my mind! Hahaha. I can’t wait to see the blog post. And thank you for always answering our questions. You must be a super organized person to do that on such a busy schedule. Cheers.

Hi Adrian,
I tried to combine this code with your previous code which uses googlenet, but found out that the forward procedure doesn’t support localization.
If I don’t care about the computation timing and would like to have much more classes with localization, what should I do?
Thx,
G

Unfortunately in that case you would need to train your own custom object detector to on the actual ImageNet dataset so you can localize the 1,000 specific categories rather than the 20 that this network was trained on.

Have you had a chance to look at the Neural network on a stick from Modivus? (developer dot movidius dot com/ ) Do you believe if it holds promise for this sort of application, where small and faster computation is more the need than the crunching power of say the Nvidiai Tesla machines?

Hi, Adrian. Maybe it’s something worth to give it a try. The stick is not that expensive and appears to increase the frame rate substanttialy on a Pi 2 or 3. I’m waiting for your post about real-time object detection on a Pi, but I’m afraid that it doesn’t work so well. I have seen these two videos (https://www.youtube.com/watch?time_continue=4&v=f39NFuZAj6s ; https://www.youtube.com/watch?v=41E5hni786Y) and i’m wondering how would it be using such pre-trained Caffe models running on Movidius NCS with a Raspberry Pi and OpenCV. It would be awesome! Have you ever thought about exploring it?

I’ve mentioned the Movidius in a handful of comments in other blog posts. The success of the Movidius is going to depend a lot on Intel’s documentation which is not something they are known for. I’ll likely play around with it in the future, but it’s primarily used for deploying pre-trained networks rather than training them. Again, it’s something that I need to give more thought to.

You can use pre-trained models to detect objects in images; however, these pre-trained models must be object detectors. The GoogLeNet model is not an object detector. It’s an image classifier. The version of GoogLeNet you supplied cannot be used for object detection (just image classification).

Interestingly, running your code on my machine gives different object detection results than yours. For instance, on example 3, I can only detect the horse and one potted plant. On example 5 I get the same detection plus the dog is also detected as a cat (with a higher probability) and the model is able to capture the person in the back, left side near the fence.

Is this variation expected? I would have expected that the dnn model would behave the same on an the same image for all repetitions of the experiment.

There will be a very tiny bit of variation depending on your version of OpenCV, optimization libraries, system dependencies, etc.; however, I would not expect results to vary as much as you are seeing. What OS and versions of libraries are you running?

Hi Peter, thanks for the comment. I’m honestly not sure what the problem is here. I have not run into this issue personally and I’m not sure what the problem/solution is. I will continue to look into it.

Thanks for writing wonderful tutorials. What is the best place to learn about all functions inside OpenCV module and Tensorflow deep learning modules? For understanding your code, I feel i should brushup these things first, I can better understand your code.

what does this line means when the blob is forward pass through the network in the line “net.forward”?

In this line,

confidence = detections[0, 0, i, 2]

what are these 4 parameters(0,0,i,2) means and how it extracts the confidence of the object detected?

In this line,

idx = int(detections[0, 0, i, 1])

what is this 1 signifies in detections[ ]?

In this line,

box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])

what do you want to do by multiplying numpy array with detections? Why you take 4th argument of detections[ ] as 3:7, what does this mean? Why you pass [w, h, w, h] to numpy array and why you pass width and height two times to numpy array?

The detections object is a mulit-dimensional NumPy array. The call to detections.shape gives us the number of actual detections. We can then extract the confidence for the i-th detection via detections[0, 0, i, 2]. The slice 3:7 gives us the bounding box coordinates of the object that was detected. We need to multiply these coordinates by the image width and height as they were relatively scaled by the SSD.

Take a look at the detections NumPy array and play around with it. If you’re new to NumPy, take the time to educate yourself on how array slices work and how vector multiplies work. This will help you learn more.

Just to make sure I’m understanding what is going on here. SSD is an object detector that sits on top of an image classifier (in this case MobileNet). So, technically, one can switch to a more accurate (but slower) image classifier such as Inception. And this would improve the detection results of SSD. Is this correct? I guess I can look at your other posts about using Google LeNet and change a few lines in this example to switch MobileNet with Google LeNet in OpenCV?

Also, have you come across any implementations or blog posts that discuss playing around with various image classifiers + SSD in Keras to perform object detection?

Thanks once again for your blog posts. They have saved me hours and hours of time and the hair on my head.

This is a bit incorrect. In the SSD architecture, the bounding boxes and confidences for multiple categories are predicted directly within a single network. We can modify an existing network architecture to fit the SSD framework and then train it to recognize objects, but they are not hot swappable.

For example, the base of the network could be VGG or ResNet through the final pooling layers. We then convert the FC layers to CNV layers. Additional layers are then used to perform the object detection. The loss function then minimizes over correct classifications and detections. A complete review of the SSD framework is outside the scope of this post, but I will be covering it in detail inside Deep Learning for Computer Vision with Python.

There are one or two implementations I’ve seen of SSDs in Keras and mxnet, but from what I understand they are a bit buggy.

Will the ImageNet Bundle of “Deep Learning for Computer Vision with Python” cover code (at least to some extent) to play around with object detectors and image classifiers, like I asked in my first post? There’s plenty of stuff on the net to train image classifiers but not much if one wants to couple object detection with everything. Cheers. (Oh, and when will the review of SSD and everything related be available for reading and exploring in your book?)

Yes, you are absolutely correct. the ImageNet Bundle of Deep Learning for Computer Vision with Python will demonstrate how to train your own custom object detectors using deep learning. From there I’ll also demonstrate how to create a custom image processing pipeline that will enable you to take an input image and obtain the output predictions + detections using your classifier.

Secondly, I will be reviewing SSD inside the ImageNet Bundle. I won’t be demonstrating how to implement it, but I will be discussing how it works and demonstrating how to use it.

Hi Jason — thanks for the comment. I’ve seen a handful of readers run into this problem. Unfortunately I have not been able to replicate it. It would be a big help to me and the rest of the PyImageSearch community could help to replicate this error.

I guess I have found the solution, at least it worked for me. Some times downloaded files are blocked by the computer, so you have to open the properties of Model file and Prototxt file, and check the UNBLOCK at bottom right. Hopefully it would work. Thanks again.

what algorithm you used detect object in image or can you please links for research paper , other code for object detection from image in which i can train my own images as it will be covered in your book you told but for now i need a reference as a part of my project….
so i would be glad if you can share github link for thr object detection code with train.py file.

I want to play with this code on my pc which is windows 7 64-bit. On my machine, I still don’t yet have opencv installed and even I don’t know about which configuration(working environment) should I have in order to run this code. I even don’t know how to install opencv on my pc so that this code will run, please help…..

Hi Aniket — if you are interested in studying computer vision and deep learning I would recommend that you use either Linux or macOS. Windows is not recommended for deep learning or computer vision. I demonstrate how to configure Ubuntu for deep learning and macOS for deep learning.

I’ll actually be doing a tutorial that details every parameter of the cv2.dnn.blobFromImage in the next few weeks. In the meantime, 127.5 is the mean subtraction value and 0.007843 is your normalization factor.

Ok, is this a special function you are using? I am currently using openCV 3.3 from august this year. Actually, I do not understand yet how the normalization factor fits to the current API. There is the mean value which gets subtracted from each color channel and parameters for the target size of the image. And finally a boolean flag to swap the red and green channels.

You are correct. The mean value is computed across the training set and then subtracted from each channel of the image. You can also optionally supply a 3-tuple if you have different RGB values (which in most cases you do). Once you perform the mean subtraction you multiply by the scaling value.

I am trying to convert the python code in C++ , I think you already did it . is it possible for you to share it or give some direction on it.
I am trying to detect just 1 object . I am able to run the c++ example provided by OPenCV but want to add the rectangle around the object . I am not so good in python so unable to understand much out of it.

Hello,
I was trying to replicate your results of example 3. In my case only the horse and potted plants were getting detected and not the person. Either I had to remove the mean (127.5) from blobFromImage or resize to 400×400 to get person detected. Do you know why so ?

Thank you for sharing the additional details, Nihit! Unfortunately I’m not sure what the exact issue is here. I wish I could help more, but without physical access to your machine to diagnose any library issues, I’m not sure what the problem may be.

Hi! Thanks for the clear tutorial, really makes difference in trying to figure this stuff out!
This is what I don’t get about how the dnn works (I’m a newbie with the object detection so :D):
how does the model go through the blob to get the location? I mean, if the object recognition model is (presumably) trained with the object nicely framed in the middle of the image, how does the detection model find a small or partially covered object like the baseball glove? Does it somehow divide the image in seqments?

The model is not trained with images that have the objects nicely framed in the center of the image. Instead, images are provided with plaintext bounding boxes that indicate where in the image the object is. The SSD then learns patterns in the input images that correspond to the class labels while simultaneously adjusting the predicted bounding boxes.

If you’re new to computer vision and object detection be sure to read this post on the fundamentals on more traditional object detectors.

I just downloaded the source and the images were supposed to be in the .ZIP but I don’t see them. Not sure if I fumbled, but downloaded a second time and still not in the .ZIP. Not sure what I am doing wrong.

Hi Adrian,
Thanks for the great tutorial.
I have used it for object detection, and it works like charm on my laptop!!.

I tried to replicate the same thing on Jetson Tx1, on which Opencv4Tegra was preinstalled, but, while installing Opencv 3.3 make -j4, there arises space issues.
It seems i do not have enough space on Tx1.

Can you suggest what are the possible options that might get me out of this problem,
Thanks in Advance.

I would suggest using an external SD card, or better yet, external drive.

Be sure to download the OpenCV repos to your SD card/external drive and do the compile there. This will ensure you have the additional space during the compile. After the compile has finished run sudo make install which will copy the compiled files to their appropriate locations.

I really enjoyed your tutorial because it gave me a good start with this interesting topic. So one question regarding object detection. Is there an approach that will tell me, if a general object is in my image or not? Let’s say we have a background that stays the same and there is an object in the image. I do not know what is the object, but that an object is there. At the moment I tend to solve this problem with “classic” computer vision, is there a deep learning approach? Maybe check if no object is matched? (with certain probability)

Are you trying to recognizing the object and label it? Or just say “yes, there is an object here” or “no, there is no object”. If it’s the latter deep learning is overkill. Simple motion detection/background subtraction is more than enough.

Hi Adrian. Thanks very much for this awesome tutorial. I have one concern here though. You only take one image for detection. But this is not efficient. If I have multiple images or a video file I could read bunch of images/frames and try to detect them all at once. That will be much faster. This is a huge problem that I’m facing right now with RCNN. I can test one image but I could not find any solution how to do batch testing. It would be really great if you could also do a post about it.

Amazing post !..really inspired me to work on my computer vision project . I am a new baby and was really worried about it. Thank you !..I would try this and move for my project !..we were going to fine tune vgg16 with google ref dataset…which machine learning library do you suggest for use ?

Are you trying to perform object detection or image classification? Keep in mind that VGG16 cannot be directly used for object detection. You would need to fit into a deep learning + object detection framework, such as SSD.

Nice example. I looked at the script of Aleksandr Rybnikov you mentioned in the post and tried to adapt your example to use it with the tensorflow prelearned model.
I adapted it to the 90 classes, used the ssd_mobilenet_v1_coco.pbtxt from opencv_extra and downloaded ssd_mobilenet_v1_coco_11_06_2017.tar.gz to get the frozen_inference_graph.pb.
At first I used the graph.pbtxt included in the tar file but that doesn’t work with OpenCV3.1.1 and your script. So I tried the ssd_mobilenet_v1_coco.pbtxt from opencv_extra. This sort of works (doesn’t give errors) but the object recognition results are not good.
Is there a way to generate an OpenCV3.1.1 compatible *.pbtxt? to work with your script or doesn’t it work this way?

I built OpenCV 3.3 on a Raspberry Pi3 following your Raspbian Stretch instructions and downloaded this sample code. Everything seems to work except my results don’t quite match the results shown in this blog.

Seems I should get the same results with the same code and test images, but it appears I don’t. The “boxes” drawn on my images seem better located than those in your example, except for the cat, which is not really there and probably is drawn over by the box for the dog.

I setup virtual environments for python3 and python2.7 and my results are the same with the python3 and python2.7 environment, but different from yours.

Hi Adrian!
I was trying this with an input image containing a bat and a ball. Since these classes aren’t part of the trained classes, I was expecting that the classifier doesn’t classify my image into any classes. However it was classifying the bat and the ball into ‘Aeroplane’ and ‘Bottle’.
Is there any way through which the classifier doesn’t classify an image containing untrained objects and instead outputs a message saying that the classifier was not able to detect any classes.

There are “background” classes (i.e., “not interesting objects” or “not an object all”) that are used when training some object detectors; however, these only work in some contexts. I would suggest upping the minimum probability used to filter out weak predictions.

Hi! I was not able to detect the ‘background’ class, even when testing it against ‘white background’ image! Could you provide me with some idea for an image where the background class can be detected.
Secondly I wanted to ask if the training file is available for this? I wanted to train some classes on my own.
Thirdly is there any portal where datasets for multiple images can be easily availible that can be used to test this?
I am hoping to receive your guidance at the earliest. Thank you so much 🙂

Dear Adrian!
Thank you for a kind example. I am new to neural networks and I am wondering, how much speedup can one achieve, if the object classification is trained only for e.g. aeroplane in comparison to this case, where the detector is trained for 20 classes? Is that 2 times, 3 times, certainly not 20 times? What is your rough estimate?

What about the size of the training file – this should be reduced 20-times?

The number of classes a network has to recognize does not change the size of weights in the network (within reason). What changes the size of the network and associated weight file is the depth and number of parameters. You can use the same architecture and use 20 classes or 2 classes and the output model would be almost identical and size. Again, it’s the depth and type of architecture. I discuss this more in my book, Deep Learning for Computer Vision with Python.

Hello,
Thank you for providing us these useful and important things.
I have a question about dataset which we are using while training. I want to create a dataset which consists of luggage: handbag, backpack, suitcase etc.
Does it matter to have different types of an object in the dataset, because I want to combine all these types in luggage class? Will it effect my accuracy?

I would run a test using different classes and all of them combined. Handbags, backpacks, and suitcases can vary quite dramatically but without seeing your particular dataset my gut tells me that you should be using separate classes.

I am using the above code, to get distance value from rectified stereo left and right images. I detect same object in both left and right images using the cv2.dnn.blobFromImage. Then from difference in the horizontal pixel location,i am finding distance.

But the blob returns different vertical pixel values for same object, as the images are rectified, we should get same value right, do you know why this happens ?

Also the estimated distance is erroneous, is is due to resize or scaling that we apply during cv2.dnn.blobFromImage function ?

Please take a look at this blog post where I discuss performing object detection in real-time using deep learning. Instead of supplying the index of the webcam to cv2.VideoCapture you can pass in a file path. If you’re new to using OpenCV for video processing I would suggest reading through my introductory book, Practical Python and OpenCV. I hope that helps!

I don’t know if this question will be answered or if anyone will know how to answer. But I am getting video feed from a TurtleBot Kinect Camera. How would I go about showing the feed with the rectangles if I am using the TurtleBot’s Kinect camera???

thanks for real for your unlimited support of the community
I need to use SSD person detection In a transport vehicle in their sitting position.
do I need to train the person detection or the pretrained data in Caffe is enough?
Is there any method of counting the bounding boxes after person detection?
does caffe limited of youtube tutorials?

When building a production-level system you should always train or fine-tune on images that represent what the CNN will be used to detect in real-world scenarios. I would suggest fine-tuning on your own dataset if at all possible.

Hi Adrian,
Thanks for your help in learning Mobilenet and SSD using dnn module. This blog is first of its kind and very unique. However i wanted to know if we can extend the number of classes more than 20, say 100? If so can you please guide as to how can do that?

1. Gather example images of the additional objects you want to recognize (including any images the network was originally trained on if you wanted to continue to utilize those classes).
2. And then either re-train the network from scratch or fine-tune it

Hi Adrian,
Thank You! This was a beautiful guide. I have one question though, what are the 4 values in detections[] ? in line 42. You use it in the loop like this detections[0, 0, i, 2] what is 0,0,i,2?

What an excellent blog. My pics are size 640×480, and I see much more accurate results (detecting objects as opposed to not detecting anything at all sometimes) when I modify the source code to not resize to 300×300, (lines 36-37), but to put 640×480 there. Is this to be expected, and why yes why no? Ofcourse I should invest the time to learn exactly what it is I’m doing, but my time for these things unfortunately is limited ;(

So keep in mind if your images at not resized to 300×300 pixels than OpenCV will just take the center crop of your 640×480 image and then process it. Perhaps the center of your image contains higher resolution objects that you are trying to detect and using the center crop helps enable this?

Great post as always!
I am currently preparing a dataset to train SSD on it in order to localize my own objects.
What is the best way to prepare data for the training and validation part:
– is it to make annotations (class_id + bounding box) for each object in the images I have
– or crop my images to isolate my objects alone in smaller images, and then put them in a folder which represents its class?

Would one of these techniques make a difference during training?
I am asking this question because I noticed that for classifiers the second method is used while for detectors the first one is used.
But I couldn’t found anywhere if annotations were a rule for detectors or just a convention.

As for the test images, I perfectly understand the use of annotations.
Thanks in advance for any support you could provide me

You should always make annotations of the class ID + bounding boxes of each object in an image and save the annotations to a separate file (I recommend a simple CSV or JSON file). You can always use this information to later crop out bounding boxes and save the ROIs individually if you wish. The reverse is not true.

Since SSDs and Faster R-CNNs have a concept of hard-negatives (where they take a non-annotated ROI region and see if the network incorrectly classifies it) you’ll want to supply the entire image to the network, not just a small crop of the ROI.

Adrian,
I just wanted to say that I am deeply impressed by your diligence and sincerity in your blog posts.
I have immense respect for you.
I am an applied-maths guy and was looking to catch on recent developments in cv and your posts arrived just at the right time.

OpenCV’s “dnn” module works a bit better with Caffe models right now. I’m sure in future releases of OpenCV the TensorFlow model loading will become more robust, but for the time being OpenCV supports loading Caffe models a bit better.

You would need to apply either (1) transfer learning via feature extraction or fine-tuning or (2) train your own custom network from scratch. I discuss how to perform all of these techniques inside Deep Learning for Computer Vision with Python.

The class labels (21 labels) used for initialization at the beginning of the code in this post are those used during the training. That’s the reason why you choose only 21 labels in the post. Am i right ?

There are more than 21 objects in the COCO dataset. Why do we only choose 21 of them as labels ? I mean we can set, say, 100 labels during the training, of course that would require more training time.

Great intro. I didn’t read the code part because I was looking for reasoning regarding training new classes (classes outside of PASCAL VOC, or whichever dataset the pretrained weights were trained on). I look forward to reading more of your articles.

Thanks Matt, I’m glad you enjoyed the post. If you’re interested in training your own object detectors on your own custom classes and datasets be sure to take a look at Deep Learning for Computer Vision with Python where I discuss it in detail (including code as well).

The path to your input prototxt file is incorrect. Make sure you use the “Downloads” section of this post to download the source code and then double-check your paths to the input .prototxt and .caffemodel files.

Hi Adrian, I have been following your posts, great stuff. I am saving money to buy one of your bundles. By the way, have you look into Keras RetinaNet implementation?, I would like to hear your thoughts.

You cannot directly add more classes to the pre-trained model. You would need to either train the model from scratch or apply transfer learning via fine-tuning. I discuss how to train your own custom deep learning object detectors, including how to recognize different types of vehicles, inside Deep Learning for Computer Vision with Python.

Hi Adrian,
Your blogs have helped me understand the code easily and I thank you for that.
What if I want to reduce the number of classes for detection?
I have tried doing that and have been facing errors with idx out of range.
Here are the changes I’ve made:
CLASSES = [“bicycle”,”bus”, “car”, “motorbike”, “person”]
And executing this I get the following error:
Traceback (most recent call last):
File “real_time_object_detection.py”, line 67, in
label = “{}: {:.2f}%”.format(CLASSES[idx],confidence * 100)
IndexError: list index out of range

What is the difference between training a convolutional neural network for classification and one for object detection?

I know that when you train a CNN for classification you need a big dataset of images where those images contain the objects that we want the network learns to recognize, but for object detection how do you train the CNN (For example with SSD, I know it would be different if we train a YOLO network)
The paper for SSD says “ground truth information needs to be assigned to specific outputs in
the fixed set of detector outputs” (What it means with ground truth information needs to be assigned?)
“Once
this assignment is determined, the loss function and back propagation are applied endto-end.” (This is the normal training for a CNN)

“Training also involves choosing the set of default boxes and scales for detection
as well as the hard negative mining and data augmentation strategies.”
(How do we apply this?)

For me, an object detection is one which can detect an object, no matter what that object is, but it seems that a CNN for object detection can only recognize objects for what it was trained. (For example, if we train an SSD to detect objects of dogs we train the model with a dataset of dogs)
If that is the case, I don´t see why to have 2 CNN to detect objects (1 for classification and another one for object detection)

For what I understood in your post is that once you are ready, you have 2 models 1 for object classification y another for object classification.
How do you combine both models to work together?

1. A classification network will give you a class label of what the image contains.
2. An object detection network will give you multiple class labels AND bounding boxes that indicate where in the image each object is.

Keep in mind that it’s impossible for a machine learning model to recognize classes or objects it was not trained it. It has to be trained on the classes to recognize them.

If you’re interested in learning more about classification, object detection, and deep learning, I would suggest taking a look at Deep Learning for Computer Vision with Python where I discuss the techniques in detail (and with source code to help solidify the concepts).

So, what you are saying is that for object detection there is only one neural network that will bring the class label and the bounding boxes? I just need one big dataset and with it I can train my neural network for object detection?

or an object detection network is form with 2 differente networks, one for class label and other for bounding boxes?

You normally start with what we call a “base network”. This network is typically, but not always, pre-trained on an existing dataset for classification. We then modify the network architecture, remove some layers, add new special ones, and transform it into an object detection network. We then train the entire modified network end to end to perform detection.

hello,
i have downloaded openCV 3.3
and also the code that was mailed to me.
the problem is i dont know how to run it.
i am new to this and i have no clue on how to go about the execution of this code and if i require any other software.
so it would be really helpful if someone gave me steps to execute it.
please….

First of all, love your work. And especially love this tutorial for making ML easily understandable and used with opencv.

Just wanted to let you know about the MobileNet-SSD object detection model trained in TensorFlow found by following the information in opencv > dnn > samples > “mobilenet_ssd_accuracy.py” has alot higher accuracy (or more detections if accuracy isnt the right word here).
It detected the tv in the background of your last picture and detected relatively small people in a picture that the caffe model provided here didnt. With roughly the same time for prediction

Your blogs have helped me understand the code easily and I thank you for that.If I want to detect fruits on tree specifically fruits like apple,mango,strawberry,watermelon,orange,pineapple then what should I use.
Actually I have detected on tree fruits on the basis of color. But that is not much accurate .Is their any way to detect and identiy on tree fruit.

It is certainly possible to detect various fruits in an image/video; however, you will need to train your own custom object detector. I would suggest taking a look at Deep Learning for Computer Vision with Python where I provide detailed instructions (including code) on how to train your own object detectors. After going through the book I am confident that you will be able to train your fruit detector 🙂

Hey can you explain me the different parameters used in the layers in prototxt file and how the image is processed from one layer to other like what is the input and output of the hidden layers?
how do we decide the number of layers?

Also how does the entrire process goes?
please help me out.
thnak you.

Hey Bhavitha — explaining the entire process of how an image/volume is transformed layer-by-layer by a network is far too detailed to cover in a blog post comment, especially when you consider the different types of layers (convolution, activation, batch normalization, pooling, etc.).

The gist is that a network is inputted to a network. A total of K convolutions are applied resulting in a MxNxK volume. We then pass through a non-linear activation (ReLU) and optionally a batch normalization (sometimes the order of activation and BN are swapped). Max pooling could be used to reduce volume size or convolutions can be used as well if their strides are large enough.

This process repeats, reducing the size of the volume and increasing the depth as it passes through the network.

Eventually we use a fully-connected layer(s) to obtain the final predictions.

If you’re interested in learning more about CNNs, including:

– How they work
– The parameters used for each layer
– How to piece together the building blocks to build your own CNN architectures

A CNN is used for image classification. A CNN is also used as a base network in the SSD framework. When saying “MobileNet + SSD” we’re saying that MobileNet is the base network and SSD is the object detection framework.

Hi Adrain,
Thanks a lot for this wonderful tutorial, I was trying to detect human hands from the caffe model obtained from “http://vision.soic.indiana.edu/projects/lending-a-hand/”, in case you gets time please tell me how to fix it. I tried it both your tutorial as well as opencv dnn samples (C++)

In your program (deep learning object detection.py) you have detected 20 object, but for me I choose to detected other electronic object as resistance, diode, Microcontroller …. I would like to help you to show how add these object.

I am running object detection on Rpi 3 with a raspicam(Raspberry pi camera connect via CSI cable) I am getting following error . I tried debugging None type error but no luck can you please help into this?

To start, take a look at my reply to “Ajeya B Jois December 29, 2017”. My reply discusses “NoneType” errors and how to resolve them. Additionally, the post you commented on does not include real-time object detection — perhaps you meant this post?

I want to add a new class but it does not work. the example of the class I want to add is the ladder. what’s wrong with this. does the class of stairs not exist? from the reading even shows that the picture is a chair.

Are you using the pre-trained network in this blog post? Keep in mind that the network was never trained on a “ladder” or “chair” class. You would need to either train the network from scratch or apply fine-tuning. This appears to be a common misconception with this post so I’ll make sure to write a follow up tutorial in early May.

Hi Adrian !!
do you have any idea how to run SSD detector fast; like how to increase FPS? I implement SSD( single shot multi box detector) person detector and add a dlib tracker with it. but it is very very slow to the extent that it can not be used for real time applications. but Hog detectors work well with dlib tracker and it is fast (sufficient for real time apps).
Thank you in advance

Thanks for the inspiration Adrian. Well, I have almost followed all your blog post and successfully applied some of it. Particularly for this blog, I have investigated several related papers started from R-CNN and all the way to YOLO.
But I have been stacking with very first stage of all of those methods, how to prepare own custom dataset include annotation so I can use my own custom dataset . It seems all those methods are using public dataset which already annotated.

This method does indeed apply NMS internally. In your particular image it sounds like the network is localizing the person as two objects. This could be due to an odd angle of the person in the image, the input resolution, or image quality.

Thanks for the post it is really usefull
how to get other objects to detect by using this program..
and how to train the objects to help program to detect in the video frame..
could you please tell me how to do it

For example, you have 2 pictures of one area, but a couple of people or a car are present in the first picture. And on the other picture there is, for example, a bicycle or a cat. The background remains the same (large buildings, trees), but only some moving objects have changed. And I want to compare background of this images to understand same or different this pictures

Second of all, I have a question. From the description above, I understand that

–prototxt : The path to the Caffe prototxt file.
–model : The path to the pre-trained model.

I have installed Caffe successfully. I have OpenCV version 3.4.1 and I am using python 3.5

So my question is:

Does MobileNetSSD_deploy.prototxt.txt gets install when one installs Caffe? I could not find it in the “Caffe” (the installed) folder.

Also, how do I train the model?

For example, I want to train an image with a different set of objects (not the one mentioned above) and would like to have lesser neural network layers (since I do not have a complicated image to train). How do I do that?

You’ll want to take a look at the Caffe library. As I mentioned in my previous comment, I discuss how to train networks using Caffe inside the PyImageSearch Gurus course. That said, you might want to take a look at Keras along with the TensorFlow Object Detection API to train your own custom object detectors as well.

I’ve had such good results with your realtime Movidius examples and MobileNetSSD as a starting point for adding AI “person detection” to video security systems, that I decided to use this as a starting point for a stand-alone Raspberry Pi3 & PiCamera module AI enhanced video security system.

Basically I just replaced image = cv2.imread(args[“image”]) with image = vs.read() in the loop after starting the camera with vs = VideoStream(usePiCamera=True, resolution=DISPLAY_DIMS, framerate=8).start()

It works wonderfully and for monitoring intrusions and the one frame every ~2 second is very useful in many situations.

Minor problem is the latency to detection. I can walk through the FOV, sit down at my monitor and then watch the frames where I walk through be processed. I’ve reduced the framerate from the default to 8 and it made no noticeable difference. Is there an option to use VideoStream from your imutils without the threading? Or should I just switch to using picamera module directly? I’m using your latest imutils v-0.4.6.

But the big problem is after it runs for a while it seems the net.forward() stops detecting the new image and returns the detection from the previous detection, sometimes just for the next frame a two, others continuously until I move in front of the camera again or move it to apparently force another detection. I’ve tried setting detections=0 before the net.forward() call and it makes no difference. Something weird seems to be going on in the dnn opencv module. I have two nice example images to illustrate the problem if you tell me where to upload them. (I could put then on OneDrive and post a link, if that is allowed here)

Basically you can see a person walking briskly enter the right side of the frame and get a good detection and then the next image ~2 seconds later about to exit the left side of the frame but the detection box is drawn from the previous detection on the right side!

If you don’t want to use threading you should just use the picamera module directly. As for your second question, I answered that in your comment in your other comment. In the future please keep all comments related to the project on the same post. It gets too confusing to jump back and forth — and furthermore, other readers cannot learn from your comments either.

I have a question. In your post you mentioned that it example based on combination of the MobileNet architecture and the Single Shot Detector (SSD) framework. As I understood you right, it example only suit to COCO dataset and was pretrained on it.

What if I want to use it network for my purposes? I need to gather my own database and train it network on it? If yes, what requirements to images will be, where to find it? And how to train it? Use Caffe, right?

You are correct that you would need to gather your own dataset but you don’t have to use Caffe (other deep learning frameworks can be used as well). I actually cover how to train your own custom object detectors inside Deep Learning for Computer Vision with Python. You should also read this guide on the fundamentals of deep learning object detection.

if i have a picture with multiple persons and dogs in it, how do i come to know what is the percentage of women and the percentage of dogs in the picture ?? the percentage obtained from this technique is the accuracy rate which is not what i’m looking for .
plz help!

Total objects = the # of objects detected in an image with probability greater than the minimum confidence
Percentage of women = # of women detected / total objects
Percent of dogs = # of dogs detected / total objects

Also I tried using the same way you used to command prompt but it didn’t seem to work. Am I supposed to make any changes in program? My current directory is the same where models and images are stored. and btw I am using windows cmd

The post you are referring to uses Haar cascades which can be very hard to tune the parameters to on an image-by-image basis. You should consider using a deep learning detector (like the one covered in this post).

Hey Adrian, thanks for this wonderful article and for so many comments; I went through each of them and spawned multiple tabs. I have three questions as below.
1) Are you aware of any trained dataset which consist of primitive geometrical shapes viz. squares, circles, rectangles, semi-circles, quadrilaterals, polygons, etc. where the shapes are just wire-frames and not the solid types filled with some colors?

2) If such a dataset exists then can deep learning like in this article be applied to recognize multiple shapes of different sizes stacked together in a drawing as in https://imgur.com/a/5yw1b2m ? If yes, can there sizes be extracted too using some technique? For some reason I don’t want to use openCV image processing functions but want to apply deep learning in this problem.

3) If such a dataset DOES NOT exist then is the following strategy feasible/practical?
a) Collect all the primitive and non-primitive shapes of different sizes occurring in many such drawings and put them into a dataset and annotate them manually.
b) Train a model through some popular technique on this dataset to learn the features.
c) Use this model with deep learning combined with openCV and detect shapes.

I deeply appreciate you taking out time to read through this and for providing any advices, pointers, or existing work. Many thanks.

Of the top of my head, sorry, I do not know of such a dataset. But you could easily create one yourself using OpenCV’s built-in drawing functions. Loop over random selections of shapes, sizes, colors, etc. and then create your dataset that way. From there you can train your own model.

Thanks a lot Adrian, it saved my time in exploring such a dataset. Actually, there exist many 2D shapes dataset but they are very big and contain many different things so probably ill-suited for my problem. So what I understand from your advice is that I generate all the shapes with different parameters and create a dataset of all shapes occurring in my drawings. After that I can train a model for this dataset and do object recognition using deep learning ? Is deep learning the only solution if I want to have an AI based solution? Thanks a lot.

Your understanding is correct. Deep learning is certainly not the only solution but you will need to leverage machine learning to some extent. I think you would be a great fit for the PyImageSearch Gurus course where I discuss image classification in detail. I have no doubt you would be able to solve your project after working through the course.

hi,
thanks a lot for this amazing tutorial. its really very helpful. i am able to execute this code on windows and getting good results. but when the same code i am executing on Raspberry Pi, i get this following error:

Thanks a lot for your tutorial, it worked perfectly for me.
I have a couple of questions though.
1) I tried the tensorflow framework itself to implement the same task – object detection, I used their example https://github.com/tensorflow/models/tree/master/research/object_detection
It worked well but much, much slower and, what was more important to me, it consumed a tremendous amount of memory (around 1G for just one image processing). How would you explain that OpenCV does its work much better (faster, less greedy)?
2) Do I understand properly that I can feed cv2.dnn any other supported model from other frameworks like tensorflow?

Which model did you use from the TensorFlow Object Detection API? Keep in mind that the architecture you used will have a very different memory footprint. Secondly, the cv2.dnn module does support a number of different frameworks but you’ll need to check the documentation depending on which specific one you want to use.

If you’re new to Python and programming in general, that’s okay, but this is a more advanced guide and it does assume you know the fundamentals. I would suggest investing some time into learning the basics of Python before trying to run these more advanced examples.

This is what I am getting when I check for the version. Does this mean that the opencv version in 2.4.9.1? Is that different from opencv version or both are the same? Because the folder shows “opencv3.4.2”

It sounds like you’re actually not using OpenCV 3.4.2, you’re using OpenCV 2.4.9.1. It sounds like a Python path issue. Make sure you have followed one of my install tutorials to ensure your system is configured properly.

Your install of OpenCV does not have the “highgui” module. I assume you pip installed OpenCV? If so, you didn’t have the proper GUI library pre-installed. Make sure you refer to eone of my OpenCV install tutorials to help you configure your machine properly.

I don’t have any specific tutorials on x-ray images and semantic segmentation. I am considering doing more semantic segmentation posts and perhaps even additional chapters in the future though! Be sure to signup for the PyImageSearch Newsletter to be notified when any new chapters or posts are published 🙂

Hi Adrian, thank you for the tutorials. I am a beginner and your tutorials are of great help. I do have some questions.

1) I tried dnn-mobilenet ssd, using the 20 classes trained by chuanqi (same as yours). It works. But there are any pre-trained models that I can use? I am actually trying to detect boxes but sadly the 20 classes did not include boxes.

2) Inception V3’s model do have cartons. But are they strictly for image classification? Can we use the model for object detection? If can, how can we do that?

1. What kinds of boxes? 2D representations that are squares? Or actual packing boxes?

2. You would want to apply transfer learning via fine-tuning with the Inception V3 model as your “base” or “backbone” network. I discuss how to apply transfer learning for object detection inside Deep Learning for Computer Vision with Python.

I would suggest starting by reading my gentle guide to object detection. From there you’ll have a good understanding of how these algorithms work, including resources to start training your own models.

I downloaded the code and it works really well. However, I obtain slightly different detection results than the ones you showed. For instance, one potted plant and the person are missing in my detections in the file example_03.jpg (a horse jumping over a hurdle). I also get different bounding boxes in the first image of two cars on the highway.

My question is: was the model retrained in the meantime? What I find surprising is that it seems significantly less accurate than the YOLO network you presented recently.

Deep Learning for Computer Vision with Python Book — OUT NOW!

You can detect faces in images & video.

Are you interested in detecting faces in images & video? But tired of Googling for tutorials that never work? Then let me help! I guarantee that my new book will turn you into a face detection ninja by the end of this weekend.
Click here to give it a shot yourself.

PyImageSearch Gurus: NOW ENROLLING!

The PyImageSearch Gurus course is now enrolling! Inside the course you'll learn how to perform:

Automatic License Plate Recognition (ANPR)

Deep Learning

Face Recognition

and much more!

Click the button below to learn more about the course, take a tour, and get 10 (FREE) sample lessons.

Hello! I’m Adrian Rosebrock.

I'm an entrepreneur and Ph.D who has launched two successful image search engines, ID My Pill and Chic Engine. I'm here to share my tips, tricks, and hacks I've learned along the way.

Learn computer vision in a single weekend.

Want to learn computer vision & OpenCV? I can teach you in a single weekend. I know. It sounds crazy, but it’s no joke. My new book is your guaranteed, quick-start guide to becoming an OpenCV Ninja. So why not give it a try? Click here to become a computer vision ninja.

Subscribe via RSS

Never miss a post! Subscribe to the PyImageSearch RSS Feed and keep up to date with my image search engine tutorials, tips, and tricks