Computer Vision Applications with C# - Part I

Introduction

Computer Vision and programming go hand in hand. One needs to use programming to materialize the theory so it can be applied to real world problems. Computer Vision is an exciting field where we try to make sense of images. These images could be static or could be retrieved from videos. Making sense could be things like tracking an object, modeling the background, pattern recognition etc. This article is the first of a series of articles that will using C# to educate users in Computer Vision. Being the first article, I intend to introduce some basic concepts used in Computer Vision. I will refer back to these concepts in upcoming articles where I will implement a few state-of-the-art algorithms in Computer Vision, covering areas such as object tracking, background modeling, patter recognition etc.

Image Understanding

An image is composed of many dots called Pixels (Picture Elements). More the pixels, higher the resolution of the image. When an image is grabbed by the camera, it is often in RGB (Red Green Blue) format. RGB is one of many colour spaces used in Computer Vision. Other colour spaces include HSV, Lab, XYZ, YIQ etc. RGB is an additive colour space where we get different colours by mixing red, green, and blue values. In a 24-bit RGB image, the individual values of R, G, and B components range from 0 - 255. A 24-bit RGB image can represent 224 different colours, i.e., 16 million. OK, going back to the image, we humans see objects in there, where as the computer sees pixels having RGB values ranging from 0 - 255. So, there is an obvious need to build some kind of intelligence into computers so we can make them make sense of images.

If you want to pursue a career in Computer Vision, you have to understand one thing: Statistics & Probability! Normally, statistics would be used in creating a model, and probability would be used in making sense of the model. So, moving forward, I will try to explain some fundamentals required to understand an image so Computer Vision techniques can be applied to it.

Image Attributes

The very first step in modeling an image is to pick an attribute to be modeled. It does not have to be a single attribute - you would normally use a combination of attributes to make your algorithm robust. Some of the primary attributes include edges, colour etc. The attributes are chosen such that they are unique. But in reality, that is not the case. For example, if using colour, many images would share the same colour distribution. So, we need to find an attribute or a combination of attributes that provides a greater degree of uniqueness.

Attribute Modeling with Histogram

Once an attribute is chosen, the next step is to model it. There are many models available in Computer Vision - each with its pros and cons. But in this article, I will concentrate on histogram. The reason for selecting histogram is because it is very popular in Computer Vision, plus it forms the foundation for articles coming up. From elementary statistics, we know that a histogram is nothing more than a frequency distribution. Hence, a colour histogram is the frequency of different colours in the image.

Using normalisation, we can add scale invariance to a histogram. What that means is that the same object with different scales will have identical histograms. Normalisation is achieved by dividing the value of each bin by the total value of the bins.

To create a colour histogram, we first need to decide on the number of bins of the histogram. Generally speaking, the more bins you have, the more discriminatory power you get. But then, the flip side is that you need more computational resources. The second decision you need to make is how to implement this colour histogram. Remember that you normally would have three colour components, such as Red, Green, and Blue. A popular approach is either to use a 3D array or a single array. Using a 3D array is straightforward, but using a single array for three components require some thought. In the end, it is a matter of liking - I prefer the latter approach.

For a 16x16x16 bin histogram, we have 256/16 = 16 colour components per bin. So, we define a 3D array something like this:

As an example, if the pixel's RGB value is 13, 232, and 211, then this means you are dealing with RGB bins 0, 14, and 13. These bin numbers are obtained by dividing the colour values by the number of bins - 16, in our case. There you have to increment the histogram [0, 14, 15] by 1. If we do that for all the pixels in an image, we would end up with the colour histogram of the image which tells us about the colour distribution in the image.

Again, if the pixel's RGB value is 13, 232, and 211, then this means you are dealing with RGB bins 0, 14, and 13. This points to an index of 0 + 14 x 16 + 13 x 16 x 16 = 3552 in a 1D array. To create a histogram, you will increment the value of this bin by 1.

Model Matching

Once we have represented an image attribute as a histogram, we often need to perform recognition. So, we can have a source histogram and a candidate histogram, and match the histogram to see how closely the candidate object resembles the source object. There are many techniques available, such as Bhattacharyya Coefficient, Earth Movers Distance, Chi Squared, Euclidean Distance etc. In this article, I will describe the Bhattacharyya Coefficient. You can implement your own matching technique bearing in mind that each matching technique has its pros and cons.

The Bhattacharyya Coefficient works on normalised histograms with an identical number of bins. Given two histograms with p and q, the Bhattacharyya Coefficient is given as:

If you are like me and get discouraged by mathematical equations, then, don't worry: I have a worked example for you! Considering the following two histograms, the calculation of Bhattacharyya Coefficient is shown below:

As we can see, it requires us to multiply the bins. Furthermore, we can see that for identical histograms, the coefficient will be 1. The values of Bhattacharyya Coefficient ranges from 0 to 1, i.e., from least similar to exact match.

Using the Code

Step 1: Select a histogram size - the default is 4x4x4.

Step 2: Select an image from the list and click the top "<<" button to see its histogram.

Step 3: Select an image from the list and click the bottom "<<" button to see its histogram.

Step 4: Click the "Find Bhattacharyya Coefficient" button to see the coefficient. For the same images, it will be 1.

Points of Interest

Native image processing in .NET is slow! Using a Bitmap object with GetPixel() and SetPixel() methods is not the way to do image processing in .NET. We need to access the pixel data by using unsafe processing. I have used the code by Eric Gunnerson.

I've tried writing a Java version of this(as I know it and not C#) but my histograms (or more specifically, their data) are nothing like yours.
I think it had to do with how we are both accessing the pixel information.

You are doing something i don't really understand (return (BGRA*)(pBase + y * width + x * sizeof(BGRA));)and in your getBinIndex method you are doing (int idx = (int)(colourValue * (float)binCount / maxValue);) I'm guessing to convert the colourValue to a number 0-255 then dividing it by the number of bins? something like that.

This gives you values 0-255 for R G and B, I don't need to convert. This difference is creating different histograms, I assume mine are wrong as my Bhattacharyya Coefficients are erroneous (like 0.78 for two completely different pictures, where yours are 0.34).

Sorry buddy, Java is not my expertise but will give it a go. First of all to create a histogram, you decide upfront the number of bins a histogram will have. Suppose we decide for 16 bins and we have a gray image with each pixel having a value between 0 - 255. This means we have (Number of Possible Colours/Total Number of Bins) i.e. 256/16=16 colours per bin. So pixels with values 0 - 15 belong to first bin, 16 - 31 belong to the second bin and so on. That is what the GetBinIndex returns i.e. the index of the bin. The other bit of code you've asked about is getting a pointer to pixel at location "x" and "y" - something .Net specific.

Now the Java code you pointed me to is also getting the pixel value and passing it to isWhite function to check if the pixel is white or not. What you need to do is similar to what I have done. Define an float array for histogram with length equal to the product of bins (binCount1, binCount2, binCount3) for each colour component in the pixel e.g. float[] hist = new float[4 * 4 * 4]. Read the pixel and pass the value for each colour component to GetBinIndex function to return the bin indices idx1, idx2 and idx3. Now get a single bin index: idx = idx1 + idx2 * binCount1 + idx3 * binCount1 * binCount2. Increment the histogram array by 1 at idx: hist[idx] += 1. Don't forget to normalise the histogram once you have iterated through the whole image.

For exactly the same image, you should get a Bhattacharyya Coefficient of 1 - if you are not getting this then something is wrong with your code. Also don't forget to double check your code for calculating Bhattacharrya Coefficient. Just note that the actual float variable might have some decimal point error e.g. you may get 0.99999999 or 1.000002 but it is correct. I hope that helps otherwise let me know!

Thanks! You are right about the theory bit. I would take it a step further and say it is sometimes hard for students to understand the theory as well. The reference material I have used is my experience. I have done research in computer vision for 2 years and know how difficult it is to find easy-to-understand material. This is the whole point of writing this series of articles: to help programmers & student bridge the gap between theory and applications. To give you a heads up, I will be covering object tracking, background modelling & pattern recognition in coming articles.

Excellent article and I look forward to rest in the series. One thing I would point out is that you are using pointers in your C# method declarations. Do you plan to update you code samples with the appropriate unsafe and fixed keywords to support your samples?

Thanks mate. The project properties is set to "Allow unsafe code" so you should not have any compilation errors. The image processing code came from Eric Gunnerson as mentioned in the article. The whole UnsafeBitmap class is marked "unsafe". Once the class is created, LockBits method is called to lock the whole bitmap in the memory. Then I loop through the image to do my processing and at the end, UnLockBitmap is called to release the lock. After that the bitmap is marked for garbage collection. So there is no need to use "fixed" keyword to pin the bitmap.