Monday, 20 May 2013

In
my last entry I hypothesized that the under-performance of my first
detector, when compared to the one documented in the original paper,
could be explained by the difference in the number of features used.
To verify this, I trained a classifier with only 10K features and run
tests to compare both detectors. The results are illustrated in the
next graph.

As illustrated by the graphic, the classifier trained with 10K features scores ~4 points bellow the one trained with 15K features at the reference value of 0.0001 FPPW. Having this results in mind, it is safe to assume that if the classifier was trained with 30K features, as it was in the original paper, the detector would most likely achieve results similar to the ones documented in the publication.

Tuesday, 14 May 2013

I've finally finished my first implementation of the pedestrian detection algorithm. Results for the INRIA dataset are shown in the following graph

At the reference value of 0.0001 false positives per window my detector correctly labels ~79.5% pedestrian windows. This is around 10 points bellow the original paper, and for this I offer two explanations.

Firstly and foremost, I've used a pool of 15000 features for learning and classification, when in the original paper 30000 were used.

Secondly, the original paper uses a optimized boosted cascade for decision-making. This type of classification not only speeds up the algorithm by several orders of magnitude, but also leads to a slightly better detection performance, since it is designed to reject most false positives on the first steps of the cascade.

Not being able to implement this type of classification myself due to the lack of time available, I resorted to extract less features to speed up the algorithm, thus sacrificing the results.

Given these explanations, I think the method is validated and the next step is to test it on our own self-obtained dataset.

Tuesday, 30 April 2013

I've come across some road blocks around the way when it comes to training a classifier, so that part of the work is on standby right now.

Meanwhile I've been drawing the skeleton of my thesis, and this is my first draft.

1 –
Introduction

Introducing
the problem, motivations and objectives of the present work.

1.1
– Problems

Problems
normally associated with visual human detection.

1.2 – State of
the Art

Brief explanation of
some methods already developed for human detection, possibly
referring to any real application that might already be implemented
(not sure if any). Also a brief overview of the evolution of
visual object detection algorithms in general.

1.3 – Solution

State my
approach to solve the problem and why it was chosen, rather than any
other.

2 –
Experimental setup

Detailed
explanation of the experimental platform implemented in ROS for the
development of the present work. Also stating and explaining the main
software tools used for elaborating the code (openCV). Possibly bring
out that this application is to be implemented in the ATLAScar thus
ilustrating the setup in run-time. This chapter will probably be
divided in sub-topics.

3 –
Integral Channel Features

A compact explanation
of the algorithm.

3.1
– Channels

What is
a channel of an image, which were computed and how

3.2
– Integral Images

What an
integral image is, what they are for, how they are computed, why they
are useful for this work.

3.3
– Features

What is
a feature, how they are computed, how many and why. Ilustration of
the random mechanism constructed for obtaining random parameters for
feature harvesting.

3.4
– “The whole picture” (not sure of the name yet, but seems to
me an important sub-topic)

An
explanation of the architecture of the code, meaning, how the image
is being treated, probably a fluxogram of some sort will come in
handy.

4
– Machine Learning Method

Brief
explanation of what a ML method is, why it is absolutely necessary
for these detection problems.

4.1
– Adaboost

What is
adaboost, why is it ideal for the present work

4.2
– Training a classifier

Explain
all the steps necessary for successfully training a classifier.

5 –
Experiments and Results

Explain how the results were
acquired, and what makes this method a valid confirmation of the
results.

Saturday, 27 April 2013

When I first looked at the results of my first classifier I wasn't to exited, since I was detecting not only pedestrians, but also trees, poles, traffic signs, etc. Bootstrapping is a critical operation for enhancing the performance of the detector, and it consists in running the algorithm on negative images, with the purpose of introducing to the classifier the false positive examples as negative ones.

After that I have to find a way for evaluation the performance of my algorithm. In detecting problems this is usually done with ROC curves, which plot the miss rate against the false positive rate.

The INRIA dataset has a training set and a testing set, each with positive-only images and negative-only images. So I am going to use the training set for training my classifier (bootstrapping and all), and then I will run the final classifier in the test set for different thresholds, getting results for miss rate, on the positive-only images, and false positive rate, on the negative-only images.

Monday, 15 April 2013

A positive classification of a Detection Window [DW] outputs the information about its coordinates, as well as at which scale that detection was made. Knowing that the same object will be detected several times at different scales, a careful treatment of the raw outputed information is mandatory.

This is a representation of the untreated data. By looking at it we immediately figure out that the same objects are being detected multiple times at different scales, so the first step was to transform all the rectangles to the original scale:

Final step is to group the rectangles in terms of distance between each other. This is a clustering problem, and is an advanced procedure. Fortunately openCV already has a function that does this for us, and using the standard parameters this is the result:

This image in particular was chosen for this entry because of the good results it provided. This is not the case for the whole dataset, since my algorithm is prone to identify bike wheels, trees, and other objects as pedestrians, so my works is far from finished.

Wednesday, 10 April 2013

The final step of the algorithm is the classification of Detection Windows [DW] for Pedestrian or Not Pedestrian.

In this entry I'll proceed to explain the steps I take for training and testing a classifier. Know that I'm not yet very well acquainted with all the concepts and different parameters on this matter.

In my problem, there is a DW to be classified, and thousands of features that describe it. In such problems Adaboost is normally a preferred approach, since it is a Machine Learning method that takes in a large number of weak classifiers (features) and creates a strong classifier. This is exactly the approach that ChnFtrs proposes, and fortunately openCV has an implementation of it.

Step 1: Introducing the data to the algorithm.

For this I need to prepare a .csv file with the category in one column, followed by the features. For example, if I was using 3 features my file could look like this:

N,1000,1020,900

P,2000,1200,300

P,3300,1235,1000

N,1432,1587,5587

...

The file is being generated by running the code on multiple images and writing the results to a file.

Step 2: Opening the file

Supposing that our file is called "train.csv", the code looks like this:

CvMLData cvml;

cvml.read_csv ("train.csv");

CvMLData stands for Computer Vision Machine Learning Data and is a class made just for handling machine learning problems with openCV.

The CvBoost::train method as several parameters, some of which I'm not using yet. It is possible to select a subset of the train data to do the training, leaving the rest for testing, allowing for an immediate grasp on the performance of the classifier. It is also possible to have missing fields on the feature pool and let the algorithm fill them with approximated values.

As it is, I'm training a model of type REAL, with 1000 weak classifiers, using all the data on the file and leaving all other parameters default.

Step 6: Classifying new samples

On the main code I have to load the classifier and the run the CvBoost::predict method on samples with the same number of columns as the ones used to train it.

In this cycle, Test is filled with the feature values and then it is tested with the predict method.

x outputs the class (1 for Ped and 2 for nPed) and this way I was able to do some preliminary testing of my first classifier. On 1300 windows with pedestrians, the classifier failed 6 predictions, which is pretty good for a first try. I also tested the classifier for over 14 million DWs without pedestrians, leading to around 290000 false detections, which in terms of blunt accuracy means 98% correct predictions. However, I'll not discuss this results yet since this is hardly the best way to evaluate a classifier.

Sunday, 7 April 2013

(Note: In this entry, the term detection window [DW] refers to the image that we will classify as pedestrian or not pedestrian. It is a section of a larger one)

There are 2 distinct approaches for describing a DW featurewise. Some researchers opt to generate a fine tuned pool of features that are subject to tests until they achieve satisfactory results. This is the case of the HOG classifier, which features constitute of local sums calculated over a dense overlapping grid in the DW. Other approach is to generate random features, knowing that we are guaranteed to get a good characterization of our scene if we our feature pool is large enough. This is the case of the Integral Channel Features [ChnFtrs] algorithm.

In the ChnFtrs approach, a feature has 3 parameters that are generated randomly. Given that we compute different channels of our DW, a feature is calculated on a random channel, over a rectangle which dimensions and position is also random.

In terms of implementation this is how I've done:

Create a struct that characterizes one feature

typedef struct{ int nFtrs; //number of features to be calculated int channelidx; // index of the channel where to calculate the feature int width; int height; int x; //up-left x position corner of the random rectangle int y; //up-left y position corner of the random rectangle} FtrParams;

Create a type of vector for this structs to hold all the parameters.

typedef vector<FtrParams> FtrVecParams;

Create a function that initializes this vector with N parameters

For generating random values I use the RNG class of openCV. Using the same seed It is possible to replicate results in the future.

Wednesday, 27 March 2013

After computing the channels, they're merged into one matrix with 10 layers (number of channels I am computing). This allows me to use the full potential of openCV in future steps. OpenCV is an open source library for computer vision applications.

The features are nothing more than simple local sums calculated over rectangular areas in the image. The fastest way to do this is by using the integral image trick introduced by Viola and Jones in their face detection work. This allows for a local sum to be computed by a simple floating point operation, making them extremely fast to compute.

"To calculate the summation of the pixels in the black box, you take
the corresponding box in the integral. You sum as follows: (Bottom right
+ top left – top right – bottom left).

So for the 3,5,4,1 box, the calculations would go like this: (30+0-17-0 = 13). For the 4,1 box, it would be (0+15-10-0 = 5)." - Taken from here.

OpenCV provides a function to obtain the integral of an image, and just by doing:

integral (MergedChannels (source), IntegralChns (destination));

I get a 10 layered matrix with the integral channel images in the correct order. Awesome, right?

OpenCV also allows access to a multichannel matrix's elements at a certain coordinate all at once. This comes in handy when its time to calculate the features, making it possible for all the channels being handled at the same time, and features being naturally and easily stored in a coherent manner.

This analysis has not only to be done in the whole image with a sliding window, but also in the same image multiple times for several different scales.This leads to dozens of thousands of sums being calculated per image, and each 640x480 image takes about half a second to be processed (this depends on the size of the sliding window, number of scalings being done and other parameters).

Now onto the final step of the algorithm, training and testing a machine learning method.

In the paper, many channels were computed and tested for performance , but once that work is done there is no need to repeat it. Only the ones that achieve the best results were computed. These were the gradient magnitude, gradient histogram channels with 6 bins of orientation and the LUV colour channels.

Now its time to extract features from those channels. Those features constitute of local sums using the integral of the image for fast computation.

Monday, 18 March 2013

My name is Pedro Silva, I'm 24 years old and I study Mechanical Engineering in Aveiro University, Portugal. On the next few months I will be working on my final project to obtain my master's degree.

The title of the project is "Visual Recognition of Pedestrians for a Driver Assistance System" and my main goal is to create a software application that uses Computer Vision tools to detect pedestrians on images of urban setting. This application is to be tested and (hopefully) validated in the ATLASCAR, which is an on-going project of the Mechanical Engineering Department with the goal of creating an autonomous driving car. This car has participated in several robotics competitions with many prizes won.

In this blog I will be making regular updates of the development of my work. I will be writing in English so that the content can be accessible to anyone, and hopefully get some feedback !

Pedestrian Detection is a discipline that has been subject to a great deal of research and investigation in the past decade, and because of that the development of this technology has been extraordinary. After conducting some literature review I was able to create a work plan.

As a first step, I will implement the Integral Channel Features, an algorithm that takes advantage of the wealth of information contained in various channels of an image. The paper describing this method can be found here.

Since this method runs at a slow rate, due to the need to evaluate each image several times at different scales, I will then implement the FPDW - Fastest Pedestrian Detector of the West. This algorithm speeds up the process through a number of simplifications and approximations made on the first one. It is documented that this method runs at 6 FPS on 640x480 sized images.

I will be developing the program under ROS (C++) environment and at the moment I've assembled a platform for advertising, subscribing, processing and publishing images, which will be the base of all the future work.

My introductory presentation that was made for the laboratory's team can be downloaded here, although it is in Portuguese.