Thanks for more than two lakh views. All about openCV, Image Processing converging towards Biometric face recognition. Use the Easy Navigation button on the top bar to view all the posts at a glance related to openCV. I kept this blog small so that anyone can complete going through all posts and acquaint himself with openCV.

Pages

Thursday, August 18, 2011

Creating a haar cascade classifier aka haar training

In the previous posts, I used haar cascade xml files for the detection of face, eyes etc.., In this post, I am going to show you, how to create your own haar cascade classifier xml files. It took me a total of 16 hours to do it. Hope you can do it even sooner, following this post

Note : The below is only for linux opencv users. If you are a windows user, use this link

For most of the dough, that is going to come, you will need these executable linux files. Here's the link for it.

Before I start, remember two important definetions

Positive images : These images contain the object to be detected
Negative images : Absolutely except that image, anything can be present

It's better to explain, with an example. So I will tell you, a step by step procedure, to make a classifier that detects a pen. You can use the same for any object, you are going to experiment.

First of all, I took the photographs of three of my pens, along with some background, the pics looked like the one below

I took a total of 7 photographs (I didn't care to count, for which pen, I took more photographs, out of three) with my 2MP camera phone and loaded them into my computer. Now I cropped, each one of them using Image Clipper excluding the background and storing only the pen's image. Image clipper can be downloaded from the link below

The one thing that I hated about this imageclipper is that even though it made me work fast, it requires me to install a lot of other libraries. It would have been much simpler, if I just used GIMP, to crop, since there were only 7 images and cropping would have been easier. It is upto you, to follow the way you find easier.The cropped image looks like the one below

From here on the procedure is simple. As I told you at the starting of post, after downloading the necessary linux executables, keep them in a seperate folder named "Haar_Training", where you need to carry on your experimentations. Now inside that folder, create three folders and fill them with the necessary data, as explained below

1. Positive_Images : In this folder, I kept all my seven cropped images
2. Negative_Images : In this folder, I kept some other images (99 in total), which can be any image that you have, except, they should be of like genre and should not contain any where the cropped part i.e., pen
3. Samples : Keep it empty

Now, your Haar_Training folder looks like this

Now, navigate into that Haar_training folder through terminal. In the folder named "Positive_Images", the images are in png format and in "Negative_Images", they are in ppm format. So accordingly, I collected the information about those files in two text files named postives.dat and negatives.dat repectively using the below two commands

The important arguements in the above command, to be discussed are
positives.dat - Contains the list of positive image paths
negatives.dat - Contains the list of negative image paths
samples - The folder that is used to store the data of training samples
250 - No. of training samples
-w 160 -h 20 - These two signify the width and height ratio of the pen. Originally it's -w 20 -h 20 for face. But, since I am using a pen here which is long and thin unlike face I made it 160 and 20

The next two commands use the data in samples folder to create a unified training data, that is used for haar training

This creates the haarcascade xml file desired by us in the same Haar_training folder, keeping all the meta data, that is forms in the meanwhile in the haarcascade folder, which is created automatically. The arguements to be noted in the above command are

haarcascade : The folder in which meta data can be kept
samples.vec : The unified training data, that we created just a while ago, before this command
negatives.dat : The list of negative image paths
20 - Stages, the more, the better, but great time consumer
99 - No. of negative images
2048 - The RAM memory that can be used

It took me five hours for this process to complete and generate my final pen detector. If you are a bit skeptic about the final outcome, instead of waiting for five hours, at any time, when the above command is running, you can issue the below command, to generate a intermediate haar cascade xml file, from the data available.

./convert_cascade --size="160x20" haarcascade haarcascade-inter.xml

Now finally after five hours, I got the haar cascade xml file. As I have shown you, in the previous posts, the way to use haar cascade xml files. I experimented on some pens of my own. Below is the youtube video for it.

Update (16 March, 2012): On the request of people I am sharing the haar classifier file for Pen detection. Below it the link for it.

Please note that I used very less sample images, less than a 100 to create it, while in reality we use at-least 5000-10000 images to create a robust classifier. So, it may give false positives in some cases

Update (25 March,2012): The link to the executable Linux files that I gave earlier is slowly loading or sometimes not working at all. So here's the new link for it

That will be a very big database to share or send. Creating those databases shouldn't be a problem for you. For positive images, I just took photographs of my pen and cropped them, so that only the pen is visible. For negative images, take any kind of images - any image you want as long as the image doesn't have pen in it.

hi dileep, i followed your steps for creating an xml for mobile phone detection. does the size of positive image affects the speed of training? i used 50*75 but the training seems to be stuck at stage 4. i tried it several times but the process get stuck either at stage3 or at stage 4 pls help..

tjv,Good that you tried to create one by yourself. Did you try to create an intermediate classifier at that stage using the last of the commands that I mentioned ? If yes, howz the performance using it?

Size of positive samples shouldn't be a problem. I used 157*1096 sized ones. And the timing speed solely depends on the number of training samples. How many are you using ? 250 ?

Did you change the -w and -h? These attributes tell d width and height in the above said commands. Since a pen's width is very high compared to it's length, I took '-w 160 -h 20'. This is the least sized object that I could find in a given scene. If it's size is greater than this, I could find, but if it's less, I fail. So as per the mobile phone size, change the dimensions to be the minimum level, but make sure you don't keep them so minimum that you lose robustness.

My final classifier is around 25Kb.

Also, try changing the various parameters in 'opencv_haartraining' command, like reducing the number of stages, increasing splits etc.,

Hello Mario,It is exactly similar to the face detection codehttp://opencvuser.blogspot.in/2011/06/face-detector.htmlexcept when you refer the haarcascade xml file ( char filename[]="haarcascade_frontalface_alt.xml";), you refer to the haar cascade file that I mentioned in this post.

When you run the command "perl createtrainsamples.pl positives.dat negatives.dat samples 250 ......" as stated in the above post, it should generate ".vec" files in the samples folder. Or is there some problem with perl in mac ? .

I have a createtrainsamples.pl in my working directory but entering the command does not quite seem to output anything to my samples folder. im using windows and using cygwin terminal for all the linux commands. i've been reading your tutorial and http://note.sonots.com/SciSoftware/haartraining.html#tbc650a5 tutorial. hope you could help me with this.

Dileep, I would like to thank you for posting this. It is exactly what I need for my project.

However, I am having issues concerning the "perl createtrainsamples.pl positives.dat ..." (it is createtestsamples.pl right ? just to make sure):

The command works and it is generating the samples with the distortions and all, but the problem is that it does not generate the .vec files. --> Vec file name: (NULL). Here is the output of one of the files:

Thanks for the post Dileep, but I have an odd error. When trying to execute the mergevec, it says error while loading shared libraries: libml.so.2.1:cannot open shared object file: No such file or directory. I already chmodded mergevec with 777, is there anything else I should do?

Hi Dileep .. Can u tell me the working of OCR i have to implement it using opencv libraries but dont know from where to start .. tell me how can find a letter from an image and how can i make segments of every letter .. ??

@KojiI too followed that notes. Try with the createtrainsamples.pl in the link I have provided.

@JadThanks for pointing it out. Actually it's what I have given, createtrainsamples only. But I forgot to keep that executable in the link. Now everything is set right. Try again with the createtrainsamples.pl provided in the link to executables to get .vec

@Alex Try chmodding the whole directory, where you are doing the haar training. Please try again for the samples folder (perl program), with the executable from the updated link of executables, now. I have updated it.

@HiposaiCan you elaborate and paste the error here ?

@Moiz Optical Character Recognition or OCR is a very interesting field. I haven't worked on it, but one of my friends did research on extracting and recognizing the numbers from a number plate. You can read his thesis here http://ethesis.nitrkl.ac.in/3352/.

He worked with openCV in windows and he may be the one to help you better. Best of luck for your project.

Hi Dileep,I'm using OpenCV for a face detection application. I have a performance issue with face detection, and I want to train my own haar classifier to see if I can bring down the execution time. Do you have any idea as to what the minimum number of positive samples should be for decent accuracy? Also, do you have any approximation about time taken for training haar classifier vs. time taken for training lbp classifier?P.S - I'm an alumni of NIT Rkl. :P

For decent, atleast 5000 positive samples would do. But it's simply waste of time to make your own classifier, when there is already one. Don't try to reinvent the wheel. In the opencv installation folder, if you search, there is a folder exclusively for haar classifiers. In that there are around 3-4, exclusively for face. Try with them. I donno about lbp, but to train a decent haar classifier, it would take 15-20 days, if you leave your computer as is

Dileep, I would like to thank you for your great efforts But please...I already downloaded the code in ur given link http://opencvuser.blogspot.in/2011/06/face-detector.html when i try it it build successfully but it gives me the error messageAssertion failed:cascade && Membuffer && capture line 46I'm working on Win XP and VS 2010Please Help me as I'm beginner in AR field

Hi Dileep, thanks very much for this article and it's really helpful. Just one small question, is there any width and height ratio requirments in the cropped images? I mean, do they have to be with the same width and height ratio when be cropped?

I'm trying to create my own classifier, firstly I would like to thank his otimo tutorial.

I am facing a problem and I'm very grateful if you could help me.The mergevc compiled which you provided not worked on my 64-bit ubuntu.I downloaded the source and tried to compile it into the source of the opencv haartraining however got an error:

erro fatal: cvhaartraining.h: file not found

Any idea how to solve?

Or by chance you do not have a mergevc compiled for 64-bit version to send me?

Sorry, that I don't have one for a 64-bit ubuntu version for the same. The problem you are facing is clear that the header file cvhaartraining.h is missing. Google for that header file and keep it in the same folder where you are doing the training and try again

i have a problem . I run this command "perl createtrainsamples.pl positives.dat negatives.dat samples 250 "./opencv_createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 160 -h 20", but "createtrainsamples" is empty and no .vec file can be found. And i have a question about format of negative and positive images, can i use jpg format?

hi Dileep i run all commands mention by you postive.dat,negative.dat ,sample.dat file is also made but when i run this command:find samples/ -name '*.vec' > samples.dat./mergevec samples.dat samples.vecon terminal i got error bash: ./mergevec: Permission deniedplease help on it Thanks in Advance

Hi Abhi,Make "chmod 777 *" in the folder where you are running the tests. Let me know if the issue gets resolved.

Hi huy,you cannot use jpg format; I haven't tested it, but even if it works, you know the fact the jpg uses compressioin techniques that take the originality of bits present per pixel away. So, better to stick with .bmp or .png

Are you referring to createtrainsamples.pl as being empty?

Ronald,Please try in a 32-bit Ubuntu, that has proper perl and opencv 2.1 installation. Many people have reported problems with other variants..

hi Dileep, i still got same error as i new to linux ios so not much know about its commands .i paste here the command line o/p's please help me on thisroot@nitin:/media/New Volume/Haar_Training# find samples/ -name '*.vec' > samples.datfind: `samples/': No such file or directoryroot@nitin:/media/New Volume/Haar_Training# ./mergevec samples.dat samples.vecbash: ./mergevec: Permission denied

It's pretty clear that there is no "samples" folder at all in the directory.

In the first step, where I suggested to create three folders, did you create "samples" - as one of the three. And then when you use "createtrainsamples..." command, the samples folder gets populated. Make sure, these both are done, before moving onto your next step

Hi Dileep,i m trying to create unified training data. i executed find samples/ -name '*.vec' > samples.dat./mergevec samples.dat samples.vecthese 2 command.but i m getting the following error./mergevec: error while loading shared libraries: libml.so.2.1: cannot open shared object file: No such file or directory.and libml.so.2.1 is not is usr/local/lib. i also tried to set the path in bashrc file.any solution?

Hey Dileep,I have installed OpenCV 2.4.1 from the following link. http://www.samontab.com/web/2012/06/installing-opencv-2-4-1-ubuntu-12-04-lts/All the samples are running fine.So i guess the problem is not with installation.Because I am getting the same error.

and if i keep my positive images size to 24*24 than it becomes very small and i m unable to mark the object from it in objectmarker tool. so i have kept the positive images size to 320*240. Pls tell me what should be my -w and -h values.(i m detecting cell phone in positive images)

There is no 24*24 in the command I have mentioned. It's 160*20. That is the minimum size of the pen, that I want in an given large image to be detected. One more thing to note here is, that it is the size of pen, not the whole positive image(sorry that I have told it's positive image size in my previous comment).

Now let's go to your cell phone, which is 320*240 is width and height. If it is the cell phones width and height and it is the minimum size you want to detect, you can keep it as "perl createtrainsamples.pl positives.dat negatives.dat samples 250 "./opencv_createsamples -bgcolor 0 -bgthresh 0 -maxxangle 1.1 -maxyangle 1.1 maxzangle 0.5 -maxidev 40 -w 320 -h 240"". But if the whole image in which the cell phone is present is some 300*200 photo, then it's not possible to detect cell phone. Now the whole image, in which you are detecting cell phone should be greater than 320*240 in size.

First of all you have to understand the difference between both images. Positive images are images of pen and negative images are images of scenes with out pen. So putting the positive on negative images, will give us a scene with a pen inside it.

Those 250 are thus the scenes that contain pen, are being generated. These were needed to train our classifier

Hi dear Friend's, I'm doing my project in FPGA based Face recognition using Haar Classifier Algorithm. i want to know how to set the threshold value's in cascade(PIPELINE scheme) method.

After getting the two or three Haar-Like Feature, to add that feature value's. that result is Final haar feature classifier resultant value, then that resultant value compare with the Feature Threshold vale.

And also the accumulator value also compare with the Stage threshold value.

I have some doubt setting the threshold value's.

what is meant by feature threshold value, then how to set that feature threshold value? And

What is meant by Stage threshold value? How to set that stage threshold value's ?

'libcvaux' is a file related to opencv. So, I guess the problem is with opencv installation or may be due to compatibility issues. Can you try in open cv 2.1 and Ubuntu; If you don't have, can you try in one of your friends and check the result.

One thing I can suggest is to have a look at one of my previous posts of openCV installation, where I have given some checks to know whether it is properly installed or not

HiI have another problem now. I want to generate the xml file with 170 positive images and 100 negative images. After typing "opencv_haartraining -data haarcascade -vec samples.vec...." in the console there is always written "killed". Can you help please?Thanks

Hi Dileep, this is a fantastic tutorial! Trying to get this to recognize flowers (specifically, daffodils)However, I'm having the same problem that some others were having earlier in the comments. I'm using a Mac, and I'm having trouble with this step:

I was able to make the samples.dat file, but not the samples.vec, and I don't have access to linux as the other person did to solve their same problem. I'm very new at using terminal, and I"m not sure how to change the shell out of bash as you suggested. I've got the positives.dat and negatives.dat files. though.

I'm trying to make this work, but i got stucked when i try to execute the "./mergevec samples.dat samples.vec" part. This is my error: ./mergevec: No such file or directoryi tried with chmod 777 and with sudo, and still doesn't work. Can you help me?

Nope, I did not try to evaluate the performance, as anyway it will be lower. You need to use atleast 4000 positive images to construct a good classifier. Since I used only 7, I know the performance will be lower and hence did not try evaluating.

Hello Dileep: I wanted to ask that when do you come to know that the training is complete? i waited for like 6 hours now and still i feel the training is not yet completed.. the number of positive sample i have given is 570 and negative samples is 1320..The height and width are 24 and 24 respectively..Please help me

You can generate an intermediate classifer as I have written in the post and check the results. I have 99 negative and an equal amount of positive and it took me 6 hours (mine is a 2.1 GHZ, 3 GB RAM computer). For your size, better to take some highly configured computer and leave it over weekend.

Please try with openCV 2.1. There is some problem with newer versions with shared libraries. Earlier also, many people posted the same. Just for this one step, execute it in one of your friend's computers who have openCV 2.1 installed.

and 1. i tried in another computer with the same version but i didnt get any error like this. any how then fine but

when i run this command it crates only the +ve number of images only but not 250 images Ex.. in +ve image folder there are 10 images, it creates only 10 ".vec" files in samples folder after run this command even i gave 250.why so..? or is there any mistake ..?

Hey Kumar ,very useful blog, I have a amateur level in OPENCV but I have a problem ,i have to detect patterns using haar classifer. The problem is, I'm using windows 7 and in OPENCV/apps/haartraining i found the c files of create samples , but i dont knw how to use them, they havent been compiling since i have been working.can u tell how createsample.cpp can be used in windows?

Hi,i guess you missed out something in samples.vec creation. Please follow the steps carefully. Also please make sure that positives.dat and negatives.dat files are intact and have correct paths to exact number of images they represent

Hi, I have a problem. When I use [perl createtrainsamples.pl positive.dat negative.dat Samples 300] it appears that the operation goes well but I can not find any files in my Samples folder. Please help me!

Hey Dileep,thanks for the tutorial.I have followed the steps you have said.At the end when I execute the following command on the terminal it threw me an error initially saying that "permission is denied" later on I have changed the permissions using chmod.Later on when I execute the same command it is showing up as something like this.Could you help me in this regard..."cannot open shared object file:No such file or directory" when it is already present

Hello I need help with my homework about Haar-cascade detection in open CV, is anybody good with paython programming to help me. and interested in some tutoring remotely please let me know at megrobelly@hotmail.com for more details. thanks

Hi Rafa, From the output, you can see that samples/crop_000003b.png.vec, samples/crop_000002a.png.vec. It means it is trying to create the .vec files with in samples folder. But is is not able to do so. Can you check whether the samples folder has all Read/Write permissions. Also, can you rerun the script and check for error messages again. Also, make sure you are giving the folder name correct "samples". Is it the exact same name of the folder present in the directory, or are you missing "S", something like that.

i checked, permissions are ok, same thing with folder name, i tried again and again with no sucess.however i used haartraining in windows using cygwin and following ( http://note.sonots.com/SciSoftware/haartraining.html ) , i used their binaries for windows , it worked ( generate the xml ) , but with the ./haartraining , it works only with -w 24 -h 24 or less , when i try to use -w 70 -h 134 for example , it runs about 1 minute after stopping with error " OpenCV gui error handler insufficient memory " , any ideas about the origin of the problem ??

hi this is my second post, i read your post here and im really interesting. Im doing a human detector in python but i have a problen with this when the human is not in front position... i know that i have to do my xml file with people in an other position but, could it be posible that you have any xml file for human detection? if yes, could you sen me it? Good job, and thanks!

Dileep, First of all thanks for posting this tutorial. I have followed your tutorial to do exactly your example for the pen and created the haarclassifier file. I am using OpenCV to detect the image. I dont know why but there is always a flickering rectangular box on the image even with no pen, as if it is detecting a pen, unlike on your post. The haarclassifer ended with an error of 0.0003625 after 20 stages. Can you 1) Share with me your OpenCV file you used to detect the pen. and 2) Any thoughts on why I am getting this box. I tried changing the background to a completely white one by putting a piece of paper in front of the camera, and then this rectangular box does not show up. 3) The box does not get created if I hold the pen vertical-How can this be changed so that no matter what angle the pen is held at it recognizes it. Thanks for your prompt response.

Hi Sandeep,It is expected. I will answer your points one by one.1. It will be same as yours2. This is because the samples size is less, so it cannot be accurate. Furthur, you are using one pen only. For greater accuracy, I recommend using atleast 10,000 positive/negative images.3. You can create another xml file for the verticle angle and put it in the loop, so that when you hold it vertical, the vertical xml file gets triggered. But for all angles it may not be possible. Because, if you hold it in 45 degrees, you may have to draw a parellelogram, but not rectangle around it, which is quite complex.

Hi again. My problem started when I ran the perl createsamples.pl. It ran for a bit but nothing was created in the samples folder. Initially I thought it was nothing so I ran the next step which is the merge.vec command but then I got an error where it stated

I have gotten “SAMPLES.TXT” only the same numbers of positive_ samples (200 photos). It is not the numbers of “samples 1000”. Is it correct?7- What is the best computer configuration to run this application?8- Must the parameters “-mem 2048” limit the memory used? If yes. Why my computers have stopped run code? Best regards Fabiano.

I have gotten “SAMPLES.TXT” only the same numbers of positive_ samples (200 photos). It is not the numbers of “samples 1000”. Is it correct?7- What is the best computer configuration to run this application?8- Must the parameters “-mem 2048” limit the memory used? If yes. Why my computers have stopped run code? Best regards Fabiano.

Hi Dileep KumarI just begin with opencv, after create project in vs10 and link with opencv lib I try this code below:http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html#cascade-classifierBut when I debug, the console has "--(!)Error loading"what wrong? Can you help me please?

Hi Dileep KumarI just begin with opencv, after create project in vs10 and link with opencv lib I try this code below:http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html#cascade-classifierBut when I debug, the console has "--(!)Error loading"what wrong? Can you help me please?

Is there a file called mergevec in the folder you are executing the command? if not can you copy mergevec to the folder and try. And one more thing is that, check whether samples.dat got created with the previous command.

Or you can just open mergevec file and search for the error string. You can check the code yourself, why it is failing. This helps you in solving majority of the script problems yourself.

Dileep, I noticed an error: when I run createtrainsamples.pl, is not creating the .vec files in the Samples folder. I generated the files positives.dat and negatives.dat. And I'm doing to detect a pen too, so I left the same size. I can not find the error

Hi Dileep,It seems like createtrainsamples.pl creates samples only from the positive images and not from the negative images. Is that expected?In addition, could you please explain what does the number of training samples stand for? (250) How do I know what number to give as input?

Hi Everyone,I see many people have problems with sample creation. You can try checking the below two posts. When developing I followed these. These may help. Please mind, as I am too busy with other tasks.

All of you who have problem with any .exe file try to find out wether it is 32 bit or 64 bit..

commmand : "file ./"above command will tell you version of file..Main issue is with you Operating System..Most of the people trying to run 32 bit .exe file on 64 bit machine which is not possible..

Now the solution is dont use 32 bit .exe files you can find 64 bit .exe files on your file system goto "/usr/local/bin"... copy and replace all exe from that path to your directory where you are working..

Hello Dileep Kumar , I am Gabriel, I living in mexico, today I am working in one project, I need detect a railcars of the train, my question for you is? the positive pictures must be a gray color or BGR ?

Hi Dileep,Actually,I want to ask you something.It's not related to this post,but I think you can help me out!I have made a cpp file using opencv for eyes detection and tracking but I am unable to compile it and run in terminal.I compiled it simply using g++ command in ubuntu terminal and put the opencv folder in the same directory,but I guess it's not the correct way.I run the available sample codes in opencv,all of them worked pretty well but I am unable to run this file.cpp.Can u kindly tell me how should I do?I hope I am able to explain you my problem well enough!Thanks in advance!!

Hi Kriti,What is the compilation error it's giving. Are you sure, you installed all the supporting packages?Also, check this post to see if it can be useful to solve the issue - http://opencvuser.blogspot.in/2011/06/installed-opencv-so-whats-next.html

I just see the post i am so happy the post of information's.So I have really enjoyed and reading your blogs for these posts.Any way I’ll be subscribing to your feed and I hope you post again soon. Python Training in Chennai

Hi Please create the samples yourself. It is very easy and you can customise it for the object you want. For pen detector, you can straight away use the XML file for creating a detector. All the links in the post are working fine, I rechecked just now. Please try to do some R&D, as everything can't be fed with spoon. When I wrote this post, I have to refer 5 different websites and do my own research. Now it is much simpler for you.