Visual Surveillance Laboratory Part 2

Introduction

In the previous article we described the basic structure of a surveillance system and gave an overview of the code.

This article describes the algorithms that are used in the example of the tracking system given in the former article. The intention is to show that simple heuristics work in constraint environment.

But first let us understand the basics of image manipulation required for our demonstration.

Image manipulation

In order to simplify the model let us consider only gray level images.

Histogram

Each pixel in an 8 bit image can have 256 distinct values (0-255) when 0 is completely black and 255 is completely white. A histogram is the count of how many pixels are there in a certain value. For example:

The above image displays the histogram of the following image:

Just by looking at the histogram one can learn quite a lot about the image. For example if the peek is closer to 0 it means that the image is dark, on other hand if the peek is closer to 255 it means the image is bright.

Difference filter

Measuring the difference between two images means taking the value of a pixel from image A, taking the same pixel (same position) from image B and taking the absolute difference between them. If both pixels are equal in value then the difference will be 0 meaning totally black. For instance a difference filter between the right hand side and the left hand side images

yields:

Threshold filter

Threshold is a technique that helps us to "delete" unwanted pixels from an image and only concentrate on the ones we want. For each pixel in an image if the pixel value is above a certain threshold convert it to 255 (white) otherwise convert it to 0 (black). For example, a threshold of 120

This approach treats each changed pixel in the current frame as indication to a moving object that we want to track. In this sense it is a simple motion detection algorithm. Here are possible problems with the above approach:

We want to ignore the moving background, i.e. leafs falling, illumination changes, dirty pixels etc. Such a method will treat every small change as a moving object.

Computationally heavy, too many moving objects can appear, it takes time to handle them.

Important objects can never have a size of one pixel, it's quite obvious that moving objects are much larger.

Merging close pixels is a big improvement. The connectPixels algorithm is actually called "Connect components labelling algorithm" which you can read more about it in [2].

The algorithm naiveApproach3 is actually similar to the Motion segmentation part in the example algorithm code given in the previous article. However naiveApproach3 needs some more manipulation on the image in order to reduce noise.

The above method, called background subtraction, has many variations, but its core stays the same; take two images, analyze the difference between them by using a difference filter and the outcome is your moving object.

This approach is a naive one since it assumes that anything that is big enough (i.e. 500 pixels and more) is a person. The real question is whether this assumption is correct. That depends on the expectations from the environment, meaning what the environment actually contains, for example a lobby entrance scenario that will contain only people so this assumption might be correct on the other hand this assumption will not be correct on a highway.

There are more sophisticated ways to classify objects, for example by knowing that people are usually taller then wider, we can use this distinction.

Still one problem left, each new object is given a new id, but still there is no tracking.

Tracking

Implementing tracking will use a very simple but surprisingly successful heuristic[3] of object overlapping.

The assumption behind this idea is that in a video shot there are usually 10+ frames per second. That means that the same object will appear very close to the location it appeared on the previous frame, so if objects overlap in a non broken sequence of frames means they are the same object.

The tracking here has one major problem that this article does not intend to solve; If a person is walking behind a tree (the tree occludes him) that means the second time he will appear he will be considered as a totally new object with a new id. The occlusion problem has a major effect on any surveillance system, but in this example we assume a clean view.

The code above is not complete and thus several additions are required:

There are two lists, one contains pending objects, i.e. objects that are still not being tracked and the other contains active objects that are being tracked.

Pending object is a moving object that was just discovered.

Before an object is considered to be an active object the algorithm waits to see if it keeps moving for at least 4 frames. it allows ignoring junk pixels and small moving objects like leafs.

If an active object has stopped moving for 10 frames delete it from the list.

Iterate over all blobs, ignoring Unknown blobs.
if pendingBlobs contains blob then
increase the size of pending blob by 1.
update blob location.
if pending blob frame time >= 4 then move blob to activeBlobs and
give them id.
else if activeBlobs contains blob
update blob location.
initialize times not touched to 0.
else
add blob to pending blobs.
Iterate over all pending blobs
delete blob if it wasn't updated.
Iterate over all active blobs
delete blob if it wasn't updated for 10 frames.
return blobs in activeBlobs list.

It is possible to divide the code into 3 main parts that correspond to the parts learnt in the previous article. Please pay attention that there is no implementation of any Environment modeling techniques, for information on the subject you are welcome to read [4].

Using the code

The code itself is a translation of what we did here, the code is written in SimpleTrackingSystemExample project and it is compiled as a DLL.

The three parts motion segmentation, object classification and object tracking are all implemented interfaces, the really interesting part in the code is how they are connected together.

You probably all remember the general structure of a visual surveillance system. What we want is to encapsulate this structure, but let the programmer choose the specific details, here comes the wonderful builder pattern[5] that helps us.

Of course you do not have to use BaseImageProcess in your tracking algorithms, all you have to do is to implement IImageProcess interface.

What to do next

Test the example algorithms that came with the project on various scenes, check what happens when you are using it on people walking from right to left, check what happens when you try to track leaves of a tree falling.

Test the limits, check where it fails, check where it succeeds.

Conclusion

We learnt different techniques of image manipulation.

We tried to develop simple algorithms which can be used in a visual surveillance system.

Special Thanks

To Anat Kravitz for her help.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.