Introduction

Visual surveillance is an attempt to detect, recognize and track certain objects from image sequences, and more generally to understand and describe object behaviors[1].

The purpose of this two part article is to describe a framework of visual surveillance systems and sample algorithms in order to help programmers around the world learn about this subject. This part reviews the basic structure of surveillance systems and the second will demonstrate an algorithm used in such systems.

Although in his article, Andrew Kirillov[2] discusses this subject in detail, he presents a narrow view of how a surveillance system should be. These articles attempt to provide a wider insight about surveillance systems structure.

Motivation for Surveillance Systems

Anomaly Detection

The ability to learn what normal behaviors and anomalies are. For example it is known that in an office building entrance, people usually go straight to the lobby after entering the building. Thus individuals that go in a different direction can be considered harmful.

Automated Security

Increases effectiveness of the entire surveillance process by paying attention only to certain events instead of watching and analyzing several different surveillance cameras as happens today. Such a system can also decrease costs.

Crowed Flux Statistics

For example, by knowing how many cars are using a certain road and how many are coming from road A or B we can decide which road to widen.

Blood Squirting Halloween skull[6]

Just a nice usage found on coding4fun, not all reasons have to be dead serious.

General Structure of Visual Surveillance Systems

First the system receives a stream of images and then tries to learn several important facts from it:

Are there any objects (meaning people, cars, suitcases, body limbs and so on).

Determining type of objects, (i.e., is it a car or a person etc.).

Current position of the objects in x,y coordinates (maybe even z).

Interpreting objects' positions whether it means something e.g., is this car in position 10,50 driving in a valid trajectory.

Usually the first 3 are considered low level and the 4th is a high level.

The general structure below is not mandatory however a lot of the existing systems follow it.

Camera 1

Usually a hardware device although we can use visual surveillance systems on saved video files as well. The camera provides a stream of images (also referred to as frames) which is the input. It is important to note that although we can learn a lot of important features from a single image, it is not enough.

Environment Modeling

The philosophy is simple, not all pixels are of interest like those that belong to the background of the image. The background means that for example, in a scene where two people are walking in front of a tree, the tree is considered to be background. In extracting the background, the calculations can be much more accurate.

Look at the following image:

As you can see, we managed to distinguish between the people (foreground) and their background surroundings.

Motion Segmentation

Trying to detect regions in the image that correspond to moving objects. Why? Because moving objects are what interest us. They help us save computational time by focusing only on them. However it is important to remember that sometimes stationary objects are precisely what is of interest like in the case of a car that stops in front of a house for a whole day where the lack of movement is what makes it suspicious.

Object Classification

As stated before, classification (determining the type) means giving each object a class it belongs to, you will probably want to ignore certain classes of objects, for example birds.

Tracking

It is important to understand that tracking is not motion detection (motion segmentation). It is identifying the same object in different frames. For instance a person who walks in front of the camera in frame 1 will be identified as the same person in consecutive frames. This provides the trajectory that this person took during his entire trip.

For example look at the following image:

You can see that people are being tracked for the entire scene.

The main problem in tracking is occlusion where an object is being concealed by another object like a tree, a car etc.

Behavior Understanding/Personal Identification

Here we can put various learning algorithms or any kind of algorithms that manipulate the data that was gathered.

Code Overview

The main idea is to create a common GUI that supports plug-ins. It will display the final result and then surveillance algorithms will be added as an external library. This architecture of mix and match of various algorithms enables modularity. Another advantage is educational since students can plug various algorithms and see how well they operate on various situations.

The code contains 4 main projects and 3 utility projects.

Here we shall review how to encapsulate the structure of the visual surveillance system.

Core Project

This core project in a sense is a project of interfaces. Most of the interfaces here are helper interfaces. Any external plug-in library must implement the ISurveillanceSystem interface before it can be added to the system. Let us look at this interface.

This interface defines properties and methods that allow the visual environment to query what the abilities of the underlying algorithms are. Most of these properties correspond directly to a GUI feature like one that enables or disables a certain menu or button. An example, does the HasRuntimeInformation return true if this algorithm supports the ability to show its inner working in GUI manner. If this property returns true, then the environment can query RuntimeInformation to retrieve a form which will be displayed. Many of the properties in this interface act in the same way.

An important method is GetImageProcess which returns an object that inherits from the IImageProcess interface.

This interface encapsulates the entire surveillance algorithm and has only one method that receives an image and returns a collection of blobs. Blobs are objects which were identified in the tracking part and have a unique id and position.

Another important interface is the IOutputSystem.

The IOutputSystem allows to output the results of the IImageProcess object to a different format, for example a file or maybe a learning algorithm.

A delegate which draws the final result of the IImageProcess on screen, for example, draws a rectangle around all tracked objects.

We are now ready to fully understand the work flow of the surveillance system.

Visual Surveillance Laboratory Project

The laboratory project contains the main user interface and code to load available plug-ins into memory.

The GUI part will not be described, you can view it yourself, instead I will describe the plug-in mechanism.

This project supports two types of plug-ins: Tracking systems and Output systems. In order to load the available plug-ins the same scheme is used in both cases, that is a factory class that parses an XML file which contains the details of disk location and namespace location, the factory class tries loading the plug-ins into memory.

Currently it is not very robust, there is no error handling here, there are things that will be added on the next version of this system.

User Guide

In order to start a surveillance mission you first need to make sure that the configuration file points to the right location of the various plug-ins. Trying to run the application without changing the config.xml file will result in an empty plug-in list.

You start a surveillance mission by choosing configure from the menu and the following dialog should appear.

You can see that the configure window contains three parts:

Input

From where the stream of images will come from, it can either be from a camera attached to the computer or via a regular video file (currently only supporting Avi files). You can download an avi test file from the project web site on http://code.google.com/p/vsl/ . The movie file is originally located at [5] .This section was built using the AForge.NET framework[3].

Surveillance Systems

Lists the available plug-ins that encapsulate the surveillance algorithms. If a system can be configured then the configure button will be enabled.

Output Systems

Lists the available plug-ins that outputs the final result of the surveillance algorithms, if a system can be configured then the configure button will be enabled. After clicking Ok the main window should write a "Connected" message which means you are ready to start. Press Start or Stop to either start or stop running the algorithm.

Future work

Improve robustness of software.

Debug, debug and debug...

Add more example algorithms.

Add support for recording.

Conclusion

We learned about the basic structure of a surveillance system. It contains 5 main parts: environment modeling, motion segmentation, classification, tracking and behavior understanding.

We described the basic structure of a plug-in in our system and showed how to load it dynamically.

I found this visual surveillance and motion detection thing interesting because I need to have que alert system in my event entrance (stupid Finnish law limiting amount of people allowed to que in front of entrance per security guy standing on door) butcommercial que measurement systems cost money.. :/

Is it easy to modify this to work so that it will alert you with message or other way if the amount of "objects" in the picture exceed certain level?

hi Efi, your work is great, but i cant run the source,,,,first i got nothing on the combo box, after changing the path to the correct bin\Debug ,, i got somewhat...i got the list on combo boxes.. But after giving video source, when i click start nothing happens.....

I don't get to open the project correctly because it happens a mistake:

one or more projects in the soluction could not be loaded for this following reason(s):the project file or web has been moved, renamed or is not on your computer.

I verified and I saw that the roads of the project were incorrect and that lacked the project 'motion_src', I include this project, even so it continues with mistake to the contruir the executable with the mistake.

mainly in the code 'MainForm.cs' that says:

could not find type 'Zoombut.VisualSurvillanceLaboratory.Controls.CameraWindows.'

as i testing your application using a camera, i suggest to save an mpeg video file whenever a motion is detected instead of saving whole length movie, that is to save disk space while in security surveilance.