Introduction

Wouldn't it be great if you could control your computer with your hands? In this article, I will show you a small application I created to control the Windows Media Player with my hand motion and an ordinary web-cam. You can see the picture above; this application creates three "hot-spots" in the web-cam view, and you can assume that these hot-spots are virtual buttons, and they get activated when you make a click movement in them. I've programmed each of the hot-spots to control a particular button in Windows Media Player; the one at the top (blue) is for play/pause, and the other two are for next/previous songs. Check out this video to see my application in action.

Creating click-able virtual buttons

The criteria for clicking of a "hot-spot" is the increase in motion to a certain threshold; when it crosses its limit, then a click event is raised for a particular "hot-spot", which could be programmed to do further actions (like controlling the Media Player). Basically, I process each frame of my web-cam to check for motion levels in each of "hot-spots", and whenever it sees a motion is continuously increasing its threshold value in 10 to 20 frames, then it raises an event. Pretty simple!

To detect motion, I used a fairly simple algorithm of subtracting two images. Every image is made of three layers: red, blue, and green. Thus, each of the pixels in an image holds three values corresponding to its RGB. But, what about gray-scale images? Do they have three layers too? Well, yes, but the value of R, G, and B are the same for a particular pixel in a gray-scaled image. Imagine that there is an image of 2X2 pixels:

{1R 2G 3B, 4R 5G 6B}
{7R 8G 9B, 2R 3G 4B}

The above 2X2 matrix is for a bitmap image where each pixel consists of RGB. Now, to make a gray-scale of the above image, we will take the average of the RGB values of each pixel and assign this average value to the RGB of the pixel. So, our gray-scaled image would be:

We can see that the gray-scaled images have the same RGB value for each pixel. So, I preferred to convert the web-cam images to gray-scale and then simply subtract it from the previous images using matrix algebra.

The above code snippet shows that it is easy to get a matrix of form byte[,] from a bitmap image. Once you have two of such matrices from consecutive web-cam frames, you could subtract them to find the matrix that shows the motion in each of the pixels. Further, you can make an image from this byte[,] using the code snippet shown below:

This is easy.
If you look at the code, it generates a grayscale value using the GetAverage which, in turns, sums (R + G + B) and divides by three.

You can simple get Color.R instead of using the average.
But, note that doing so you will be "ignoring" other elements. So, White (255, 255 and 255) will be registered the same way as full red (255, 0, 0).

If you want the nearest to full red, you can use something like:
int redLevel = color.R - color.G - color.B;

In this case, (255, 0, 0) which is full red will result in 255.
(255, 255, 255), which is white, will return -255.

Yellow (255, 255, 0) will return zero. So, you will only be considering the colors that are really similar to red (instead of colors composed of red).