Tuesday, December 20, 2011

Hyperspectral imaging goes beyond what the human eye can see, collecting information from across the electromagnetic spectrum for use in analyzing a particular object or location (seeing if certain mineral deposits are present, for example). The specialized equipment doesn't run cheap, but researchers at the Vienna University of Technology have been able to turn an ordinary consumer DSLR camera into a full-fledged computed tomography image spectrometer.

Starting with a Canon EOS 5D, the team added a frankenlens made of PVC pipe and a diffraction gel combined with a 50mm, 14-40mm, and a +10 diopter macro lens. They were then able to mathematically reconstruct the full range of spectra from the data captured by the camera's imaging sensor, achieving performance comparable to that of commercial imagers: a resolution of 4.89nm in a hyperspectral configuration of 120 x 120 pixels. The downside is exposure time, with the DSLR requiring several seconds to capture the data, while tailor-made devices need mere milliseconds. The team admits that the current system is on the "low-end" of what is possible, but they already have their sights set on a direct-mount version, which will increase the aperture and lower the necessary exposure time — all while costing under $1,000.

Thursday, November 10, 2011

SURF stands for Speeded-Up Robust Features. It is inspired by SIFT, and is intended for applications that requires speed but does not require the precision of SIFT. SURF generates 64 features compared to SIFT which generates 128 features.

Wednesday, November 9, 2011

FLANN is a way to solve the nearest neighbor search (NNS), also known as proximity search, similarity search or closest point search, is an optimization problem for finding closest points in metric spaces. Donald Knuth in vol. 3 of The Art of Computer Programming (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office.

A simple way to solve the nearest neighbor problem is by the k-NN algorithm, but k-NN will not work in very complex problems such as matching SIFT or SURF keypoints. An intermediate solution is to use KD-trees.

Tuesday, November 8, 2011

The atan2 function computes the principal value of the arc tangent of y / x, using the signs of both arguments to determine the quadrant of the return value. It produces correct results even when the resulting angle is near π/2 or -π/2 (or x near 0).

The atan2 function is used mostly to convert from rectangular (x,y) to polar (r, θ) coordinates that must satisfy x = r cos(θ) and y = r sin (θ). In general, conversions to polar coordinates should be computed thus

Thursday, October 27, 2011

A common application in computer graphics, is to work out the distance between two points as √(Δx2+Δy2). However, for performance reasons, the square root operation is a killer, and often, very crude approximations are acceptable. So we examine the metrics (1 / √2)*(|x|+|y|), and max(|x|,|y|):

Notice the octagonal intersection of the area covered by these metrics, that very tightly fits around the ordinary distance metric. The metric that corresponds to this, therefore is simply:

octagon(x,y) = min ((1 / √2)*(|x|+|y|), max (|x|, |y|))

With a little more work we can bound the distance metric between the following pair of octagonal metrics:

octagon(x,y) / (4-2*√2) ≤ √(x2+y2) ≤ octagon(x,y)

Where 1/(4-2*√2) ≈ 0.8536, which is not that far from 1. So we can get a crude approximation of the distance metric without a square root with the formula:

However, something that should be pointed out, is that often the distance is only required for comparison purposes. For example, in the classical mandelbrot set (z←z2+c) calculation, the magnitude of a complex number is typically compared to a boundary radius length of 2. In these cases, one can simply drop the square root, by essentially squaring both sides of the comparison (since distances are always non-negative). That is to say:

Monday, October 10, 2011

Description

As the name suggests the CORDIC algorithm was developed for rotating coordinates, a piece of hardware for doing real-time navigational computations in the 1950's. The CORDIC uses a sequence like successive approximation to reach its results. The nice part is it does this by adding/subtracting and shifting only. Suppose we want to rotate a point(X,Y) by an angle(Z). The coordinates for the new point(Xnew, Ynew) are:

Xnew = X * cos(Z) - Y * sin(Z)
Ynew = Y * cos(Z) + X * sin(Z)

Or rewritten:

Xnew / cos(Z) = X - Y * tan(Z)
Ynew / cos(Z) = Y + X * tan(Z)

It is possible to break the angle into small pieces, such that the tangents of these pieces are always a power of 2. This results in the following equations:

The atan(1/2^n) has to be pre-computed, because the algorithm uses it to approximate the angle. The P(n) factor can be eliminated from the equations by pre-computing its final result. If we multiply all P(n)'s together we get the aggregate constant.

The command make check is optional. What is does is to compile SystemC source files to see if the files can run. I strongly suggest that you run it.

Step 4: Tell your compiler where to find SystemC

Since we do not install SystemC with a standard location we need to specifically tell the compiler where to look for the libraries. We do this with an environment variable.

$ export SYSTEMC=/usr/local/systemc/

This, however will disappear on the next login. To permanently add it to your environment, alter ~/.profile or ~/.bash_profile if it exists. For system wide changes, edit /etc/environment. (newline with expression: SYSTEMC_HOME=”/usr/local/systemc/“) To compile a systemC program simply use this expression:

Monday, August 22, 2011

Begin copied blog post.
It gives a working example on choosing of various modules at the recognition pipeline for human figure (pedestrians).

Much simplified summary

It uses Histogram of Gradient Orientations as a descriptor in a 'dense' setting. Meaning that it does not detect key-Points like SIFT detectors (sparse). Each feature vector is computed from a window (64x128) placed across an input image. Each vector element is a histogram of gradient orientations (9 bins from 0-180 degrees, +/- directions count as the same). The histogram is collected within a cell of pixels (8x8). The contrasts are locally normalized by a block of size 2x2 cells (16x16 pixels). Normalization is an important enhancement. The block moves in 8-pixel steps - half the block size. Meaning that each cell contributes to 4 different normalization blocks. A linear SVM is trained to classify whether a window is human-figure or not. The output from a trained linear SVM is a set of coefficient for each element in a feature vector.

I presume Linear SVM means the Kernel Method is linear, and no projections to higher dimension. The paper by Hsu, et al suggests that linear method is enough when the feature dimension is already high.

OpenCV implementation (hog.cpp, objdetect.hpp)

The HOGDescriptor class is not found in the API documentation. Here is notable points judging by the source code and sample program(people_detect.cpp):

Comes with a default human-detector. It says at the file comment that it is "compatible with the INRIA Object Detection and Localization toolkit. I presume this is a trained linear SVM classifier represented as a vector of coefficients;

No need to call SVM code. The HOGDescriptor.detect() function simply uses the coefficients on the input feature-vector to compute the weight-sum. If the sum is greated than the user specified 'hitThreshold' (default to 0), then it is a human-figure.

'hitThreshold' argument could be negative.

'winStride' argument (default 8x8)- controls how the window is slide across the input window.

'scale0' controls how much down-sampling is performed on the input image before calling 'detect()'. It is repeated for 'nlevels' number of times. Default is 64. All levels could be done in parallel.

Sample (people_detect.cpp)

Uses the built-in trained coefficients.

Actually needs to eliminate for duplicate rectangles from the results of detectMultiScale(). Is it because it's calling to match at multiple-scales?

detect() return list of detected points. The size is the detector window size.

Observations

With GrabCut BSDS300 test images - only able to detect one human figure (89072.jpg). The rest could be either too small or big or obscured. Interestingly, it detected a few long-narrow upright trees as human figure. It takes about 2 seconds to process each picture.

With GrabCut Data_GT test images - able to detect human figure from 3 images: tennis.jpg, bool.jpg (left), person5.jpg (right), _not_ person7.jpg though. An interesting false-positive is from grave.jpg. The cut-off tomb-stone on the right edge is detected. Most pictures took about 4.5 seconds to process.

MIT Pedestrian Database (64x128 pedestrian shots):

The default HOG detector window (feature-vector) is the same size as the test images.

Recognized 72 out of 925 images with detectMultiScale() using default parameters. Takes about 15 ms for each image.

Recognized 595 out of 925 images with detect() using default parameters. Takes about 3 ms for each image.

Monday, July 4, 2011

The latest SVN version of OpenCV contains an (undocumented) implementation of HOG-based pedestrian detection. It even comes with a pre-trained detector and a python wrapper. The basic usage is as follows:

First you calculate the gradients. They tested various ways of doing this and concluded that a simple [-1,0,1] filter was best. After this calculation you will have a direction and a magnitude for each pixel.

Divide the angle of direction in bins (Notice that you can divide 180 or 360 degrees). This is just a way to gather gradient directions into bins. A bin can be all the angles from 0 to 30 degrees.

Divide the image in cells. Each pixel in the cell adds to a histogram of orientations based on the angle division in 2. Two really cool things to note here:

You can avoid aliasing by interpolating votes between neighboring bins

The magnitude of the gradient controls the way the vote is counted in the histogram

Note that each cell is a histogram that contains the “amount” of all gradient directions in that cell.

Create a way to group adjacent cell histograms and call it a block. For each block (group of cells) you will “normalize” it. Papers suggests something like v/sqrt(|v|^2 + e^2). Note that V is the vector representing the adjacent cell histograms of the block. Further not that || is the L-2 norm of the vector.

Now move through the image in block steps. Each block you create is to be “normalized”. The way you move through the image allows for cells to be in more than one block (Though this is not necessary).

For each block in the image you will get a “normalized” vector. All these vectors placed one after another is the HOG.

Comments:

Awesome idea: The used 1239 pedestrian images. The SVM was trained with the 1239 originals and the left-right reflections. This is so cool on so many levels. Of course!! the pedestrian is still a pedestrian in the reflection image. And this little trick give double the information to the SVM with no additional storage overhead.

They created negative training images from a data base of images which did not contain any pedestrians. Basically randomly sampled those non-pedestrian images and created the negative training set. They ran the non-pedestrian images on the resulting classifier in look for false-positives and then added these false-positives to the training set.

A word on scale: To make detection happen they had to move a detection window through the image and run the classifier on each ROI. They did this for various scales of the image. We might not have to be so strict with this as all the flowers are going to be within a small range from the camera. Whereas pedestrians can be very close or very far from the camera. The point is that the pedestrian range is much larger.

Wednesday, April 27, 2011

In the Sonic Gesture project by Gijs Molinaar, hand gestures are detected by vision and converted into sound. The source code is derived from the OpenCV code by Saurabh Goyal. Link to source code is here.

This is a follow up post to an earlier post on calculation of hog feature vectors for object detection using opencv. Here I describe how a support vector machine (svm) can be trained for a dataset containing positive and negative examples of the object to detected. The code has been commented for easier understanding of how it works :

/*This function takes in a the path and names of
64x128 pixel images, the size of the cell to be
used for calculation of hog features(which should
be 8x8 pixels, some modifications will have to be
done in the code for a different cell size, which
could be easily done once the reader understands
how the code works), a default block size of 2x2
cells has been considered and the window size
parameter should be 64x128 pixels (appropriate
modifications can be easily done for other say
64x80 pixel window size). All the training images
are expected to be stored at the same location and
the names of all the images are expected to be in
sequential order like a1.jpg, a2.jpg, a3.jpg ..
and so on or a(1).jpg, a(2).jpg, a(3).jpg ... The
explanation of all the parameters below will make
clear the usage of the function. The synopsis of
the function is as follows :
prefix : it should be the path of the images, along
with the prefix in the image name for
example if the present working directory is
/home/saurabh/hog/ and the images are in
/home/saurabh/hog/images/positive/ and are
named like pos1.jpg, pos2.jpg, pos3.jpg ....,
then the prefix parameter would be
"images/positive/pos" or if the images are
named like pos(1).jpg, pos(2).jpg,
pos(3).jpg ... instead, the prefix parameter
would be "images/positive/pos("
suffix : it is the part of the name of the image
files after the number for example for the
above examples it would be ".jpg" or ").jpg"
cell : it should be CvSize(8,8), appropriate changes
need to be made for other cell sizes
window : it should be CvSize(64,128), appropriate
changes need to be made for other window sizes
number_samples : it should be equal to the number of
training images, for example if the
training images are pos1.jpg, pos2.jpg
..... pos1216.jpg, then it should be
1216
start_index : it should be the start index of the images'
names for example for the above case it
should be 1 or if the images were named
like pos1000.jpg, pos1001.jpg, pos1002.jpg
.... pos2216.jpg, then it should be 1000
end_index : it should be the end index of the images'
name for example for the above cases it
should be 1216 or 2216
savexml : if you want to store the extracted features,
then you can pass to it the name of an xml
file to which they should be saved
normalization : the normalization scheme to be used for
computing the hog features, any of the
opencv schemes could be passed or -1
could be passed if no normalization is
to be done */
CvMat *train_64x128(char *prefix, char *suffix, CvSize cell,
CvSize window, int number_samples, int start_index,
int end_index, char *savexml = NULL, int canny = 0,
int block = 1, int normalization = 4)
{
char filename[50] = "\0", number[8];
int prefix_length;
prefix_length = strlen(prefix);
int bins = 9;
/* A default block size of 2x2 cells is considered */
int block_width = 2, block_height = 2;
/* Calculation of the length of a feature vector for
an image (64x128 pixels)*/
int feature_vector_length;
feature_vector_length = (((window.width -
cell.width * block_width) / cell.width) +
1) *
(((window.height - cell.height * block_height)
/ cell.height) + 1) * 36;
/* Matrix to store the feature vectors for
all(number_samples) the training samples */
CvMat *training = cvCreateMat(number_samples,
feature_vector_length, CV_32FC1);
CvMat row;
CvMat *img_feature_vector;
IplImage **integrals;
int i = 0, j = 0;
printf("Beginning to extract HoG features from
positive images\n");
strcat(filename, prefix);
/* Loop to calculate hog features for each image one by one */
for (i = start_index; i <= end_index; i++) {
cvtInt(number, i);
strcat(filename, number);
strcat(filename, suffix);
IplImage *img = cvLoadImage(filename);
/* Calculation of the integral histogram for
fast calculation of hog features*/
integrals = calculateIntegralHOG(img);
cvGetRow(training, &row, j);
img_feature_vector
= calculateHOG_window(integrals, cvRect(0, 0,
window.width,
window.height),
normalization);
cvCopy(img_feature_vector, &row);
j++;
printf("%s\n", filename);
filename[prefix_length] = '\0';
for (int k = 0; k < 9; k++) {
cvReleaseImage(&integrals[k]);
}
}
if (savexml != NULL) {
cvSave(savexml, training);
}
return training;
}
/* This function is almost the same as
train_64x128(...), except the fact that it can
take as input images of bigger sizes and
generate multiple samples out of a single
image.
It takes 2 more parameters than
train_64x128(...), horizontal_scans and
vertical_scans to determine how many samples
are to be generated from the image. It
generates horizontal_scans x vertical_scans
number of samples. The meaning of rest of the
parameters is same.
For example for a window size of
64x128 pixels, if a 320x240 pixel image is
given input with horizontal_scans = 5 and
vertical scans = 2, then it will generate to
samples by considering windows in the image
with (x,y,width,height) as (0,0,64,128),
(64,0,64,128), (128,0,64,128), .....,
(0,112,64,128), (64,112,64,128) .....
(256,112,64,128)
The function takes non-overlapping windows
from the image except the last row and last
column, which could overlap with the second
last row or second last column. So the values
of horizontal_scans and vertical_scans passed
should be such that it is possible to perform
that many scans in a non-overlapping fashion
on the given image. For example horizontal_scans
= 5 and vertical_scans = 3 cannot be passed for
a 320x240 pixel image as that many vertical scans
are not possible for an image of height 240
pixels and window of height 128 pixels. */
CvMat *train_large(char *prefix, char *suffix,
CvSize cell, CvSize window, int number_images,
int horizontal_scans, int vertical_scans,
int start_index, int end_index,
char *savexml = NULL, int normalization = 4)
{
char filename[50] = "\0", number[8];
int prefix_length;
prefix_length = strlen(prefix);
int bins = 9;
/* A default block size of 2x2 cells is considered */
int block_width = 2, block_height = 2;
/* Calculation of the length of a feature vector for
an image (64x128 pixels)*/
int feature_vector_length;
feature_vector_length = (((window.width -
cell.width * block_width) / cell.width) +
1) *
(((window.height - cell.height * block_height)
/ cell.height) + 1) * 36;
/* Matrix to store the feature vectors for
all(number_samples) the training samples */
CvMat *training = cvCreateMat(number_images
* horizontal_scans * vertical_scans,
feature_vector_length, CV_32FC1);
CvMat row;
CvMat *img_feature_vector;
IplImage **integrals;
int i = 0, j = 0;
strcat(filename, prefix);
printf("Beginning to extract HoG features
from negative images\n");
/* Loop to calculate hog features for each
image one by one */
for (i = start_index; i <= end_index; i++) {
cvtInt(number, i);
strcat(filename, number);
strcat(filename, suffix);
IplImage *img = cvLoadImage(filename);
integrals = calculateIntegralHOG(img);
for (int l = 0; l < vertical_scans - 1; l++) {
for (int k = 0; k < horizontal_scans - 1; k++) {
cvGetRow(training, &row, j);
img_feature_vector =
calculateHOG_window(integrals,
cvRect(window.width * k,
window.height *
l, window.width,
window.height),
normalization);
cvCopy(img_feature_vector, &row);
j++;
}
cvGetRow(training, &row, j);
img_feature_vector =
calculateHOG_window(integrals,
cvRect(img->width -
window.width,
window.height * l,
window.width,
window.height),
normalization);
cvCopy(img_feature_vector, &row);
j++;
}
for (int k = 0; k < horizontal_scans - 1; k++) {
cvGetRow(training, &row, j);
img_feature_vector =
calculateHOG_window(integrals,
cvRect(window.width * k,
img->height -
window.height,
window.width,
window.height),
normalization);
cvCopy(img_feature_vector, &row);
j++;
}
cvGetRow(training, &row, j);
img_feature_vector = calculateHOG_window(integrals,
cvRect(img->width -
window.width,
img->height -
window.height,
window.width,
window.height),
normalization);
cvCopy(img_feature_vector, &row);
j++;
printf("%s\n", filename);
filename[prefix_length] = '\0';
for (int k = 0; k < 9; k++) {
cvReleaseImage(&integrals[k]);
}
cvReleaseImage(&img);
}
printf("%d negative samples created \n", training->rows);
if (savexml != NULL) {
cvSave(savexml, training);
printf("Negative samples saved as %s\n", savexml);
}
return training;
}
/* This function trains a linear support vector
machine for object classification. The synopsis is
as follows :
pos_mat : pointer to CvMat containing hog feature
vectors for positive samples. This may be
NULL if the feature vectors are to be read
from an xml file
neg_mat : pointer to CvMat containing hog feature
vectors for negative samples. This may be
NULL if the feature vectors are to be read
from an xml file
savexml : The name of the xml file to which the learnt
svm model should be saved
pos_file: The name of the xml file from which feature
vectors for positive samples are to be read.
It may be NULL if feature vectors are passed
as pos_mat
neg_file: The name of the xml file from which feature
vectors for negative samples are to be read.
It may be NULL if feature vectors are passed
as neg_mat*/
void trainSVM(CvMat * pos_mat, CvMat * neg_mat, char *savexml,
char *pos_file = NULL, char *neg_file = NULL)
{
/* Read the feature vectors for positive samples */
if (pos_file != NULL) {
printf("positive loading...\n");
pos_mat = (CvMat *) cvLoad(pos_file);
printf("positive loaded\n");
}
/* Read the feature vectors for negative samples */
if (neg_file != NULL) {
neg_mat = (CvMat *) cvLoad(neg_file);
printf("negative loaded\n");
}
int n_positive, n_negative;
n_positive = pos_mat->rows;
n_negative = neg_mat->rows;
int feature_vector_length = pos_mat->cols;
int total_samples;
total_samples = n_positive + n_negative;
CvMat *trainData = cvCreateMat(total_samples,
feature_vector_length, CV_32FC1);
CvMat *trainClasses = cvCreateMat(total_samples,
1, CV_32FC1);
CvMat trainData1, trainData2, trainClasses1, trainClasses2;
printf("Number of positive Samples : %d\n", pos_mat->rows);
/*Copy the positive feature vectors to training
data*/
cvGetRows(trainData, &trainData1, 0, n_positive);
cvCopy(pos_mat, &trainData1);
cvReleaseMat(&pos_mat);
/*Copy the negative feature vectors to training
data*/
cvGetRows(trainData, &trainData2, n_positive, total_samples);
cvCopy(neg_mat, &trainData2);
cvReleaseMat(&neg_mat);
printf("Number of negative Samples : %d\n", trainData2.rows);
/*Form the training classes for positive and
negative samples. Positive samples belong to class
1 and negative samples belong to class 2 */
cvGetRows(trainClasses, &trainClasses1, 0, n_positive);
cvSet(&trainClasses1, cvScalar(1));
cvGetRows(trainClasses, &trainClasses2, n_positive, total_samples);
cvSet(&trainClasses2, cvScalar(2));
/* Train a linear support vector machine to learn from
the training data. The parameters may played and
experimented with to see their effects*/
CvSVM svm(trainData, trainClasses, 0, 0,
CvSVMParams(CvSVM::C_SVC, CvSVM::LINEAR, 0, 0, 0, 2,
0, 0, 0, cvTermCriteria(CV_TERMCRIT_EPS, 0,
0.01)));
printf("SVM Training Complete!!\n");
/*Save the learnt model*/
if (savexml != NULL) {
svm.save(savexml);
}
cvReleaseMat(&trainClasses);
cvReleaseMat(&trainData);
}

I hope the comments were helpful to understand and use the code. To see how a large collection of files can be renamed to a sequential order which is required by this implementation refer here. Another way to read in the images of dataset could be to store the paths of all files in a text file and parse then parse the text file. I will follow up this post soon, describing how the learnt model can be used for actual detection of an object in an image.

This is follow up post to an earlier post where I have described how an integral histogram can be obtained from an image for fast calculation of hog features. Here I am posting the code for how this integral histogram can be used to calculate the hog feature vectors for an image window. I have commented the code for easier understanding of how it works :

I will very soon post how a support vector machine (svm) can trained using the above functions for an object using a dataset and how the learned model can be used to detect the corresponding object in an image.

Start of borrowed stuffHistograms of Oriented Gradients or HOG features in combination with a support vector machine have been successfully used for object Detection (most popularly pedestrian detection). An Integral Histogram representation can be used for fast calculation of Histograms of Oriented Gradients over arbitrary rectangular regions of the image. The idea of an integral histogram is analogous to that of an integral image, used by viola and jones for fast calculation of haar features for face detection. Mathematically,

where b represents the bin number of the histogram. This way the calculation of hog over any arbitrary rectangle in the image requires just 4*bins number of array references. For more details on integral histogram representation, please refer, [Porikli 2005]. The following demonstrates how such integral histogram can be calculated from an image and used for the calculation of hog features using the opencv computer vision library :

End of borrowed stuff
I will describe how the HOG features for pedestrian detection can be obtained using the above framework and how an svm can be trained for such features for pedestrian detection in a later post.