A Texture-based Method for Detecting Moving Objects

Transcription

1 A Texture-based Method for Detecting Moving Objects M. Heikkilä, M. Pietikäinen and J. Heikkilä Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering P.O. Box 4500 FIN University of Oulu, Finland Abstract The detection of moving objects from video frames plays an important and often very critical role in different kinds of machine vision applications including human detection and tracking, traffic monitoring, humanmachine interfaces and military applications, since it usually is one of the first phases in a system architecture. A common way to detect moving objects is background subtraction. In background subtraction, moving objects are detected by comparing each video frame against an existing model of the scene background. In this paper, we propose a novel block-based algorithm for background subtraction. The algorithm is based on the Local Binary Pattern (LBP) texture measure. Each image block is modelled as a group of weighted adaptive LBP histograms. The algorithm operates in real-time under the assumption of a stationary camera with fixed focal length. It can adapt to inherent changes in scene background and can also handle multimodal backgrounds. 1 Introduction Background subtraction is often one of the first tasks in machine vision applications, making it a critical part of the system. The output of background subtraction is an input to a higher-level process that can be, for example, the tracking of an identified object. The performance of background subtraction depends mainly on the background modelling technique used to model the scene background. Especially natural scenes put many challenging demands on background modelling since they are usually dynamic in nature including illumination changes, swaying vegetation, rippling water, flickering monitors etc. A robust background modelling algorithm should also handle situations where new stationary objects are introduced to or old ones removed from the scene. Furthermore the shadows of the moving and scene objects can cause problems. Even in a static scene frame-to-frame changes can occur due to noise and camera jitter. Moreover, the background modelling algorithm should operate in real-time. In this paper, we propose a novel texture-based approach to background subtraction. The goal of the new approach was to address all of the above-mentioned difficulties except the handling of shadows BMVC 2004 doi: /c.18.21

2 which turned out to be an extremely difficult problem to solve with background modelling. 2 Related Work Different kinds of methods for detecting moving objects have been proposed in the literature. The most commonly used cue is pixel intensity. One very popular technique is to model each pixel in a video frame with a Gaussian distribution. This is the underlying model for many background subtraction algorithms. A simple technique is to calculate an average image of the scene, to subtract each new video frame from it and to threshold the result. The adaptive version of this algorithm updates the model parameters (mean and covariance) recursively by using a simple adaptive filter. This approach is used in [16]. The previous model does not work well in the case of dynamic natural environments since they include repetitive motions like swaying vegetation, rippling water, flickering monitors, camera jitter etc. This means that the scene background is not completely static. By using more than one Gaussian distribution per pixel it is possible to handle such backgrounds [2, 3, 13]. In [13], the pixel intensity was modelled by a mixture of weighted Gaussian distributions. The weights were related to the persistence of the distributions in the model. In [2], the mixture of Gaussians approach was used in a traffic monitoring application. The model for pixel intensity consisted of three Gaussian distributions corresponding to the road, the vehicle and the shadow distributions. Adaptation of the Gaussian mixture models can be achieved using an incremental version of the EM algorithm. The Gaussian assumption for pixel intensity distribution does not always hold. To deal with the limitations of parametric methods, a non-parametric approach to background modelling was proposed in [1]. The proposed method utilises a general non-parametric kernel density estimation technique for building a statistical representation of the scene background. The probability density function for pixel intensity is estimated directly from the data without any assumptions about the underlying distributions. In [5, 11], each image pixel was modelled with a Kalman filter. The proposed method can adapt to changes in illumination, but has problems with complex dynamic backgrounds. This approach was used in the automatic traffic monitoring application presented in [6]. In [17], the dynamic texture (background) is modelled by an ARMA model. A robust Kalman filter algorithm is used to iteratively estimate the intrinsic appearance of the dynamic texture as well as the regions of the foreground objects. Hidden Markov Models (HMMs) have also been used to model pixel intensity. In this approach, the pixel intensity variations are represented as discrete states corresponding to scene modes. In [12], the approach was used in a traffic monitoring application where the pixel states represented the road, vehicles and shadows. In [14], HMM states corresponded to global intensity modes each modelled with a single Gaussian distribution. The model was capable of handling sudden illumination changes. Modelling the scene background using edge features has also been under research. In [7], the background model was constructed from the first video frame of the sequence by dividing it into equally sized blocks and calculating an edge histogram for each block. The histograms were constructed using the pixel-specific edge directions as bin indices and incrementing the bins with the corresponding edge magnitudes. In [4], a fusion of edge and intensity information was used for background subtraction.

3 Motion based approaches have also been proposed for background subtraction. The algorithm presented in [15] detects salient motion by integrating frame-to-frame optical flow over time. Salient motion is assumed to be motion that tends to move in a consistent direction over time. The saliency measure used is directly related to the distance over which a point has traveled with a consistent direction. Region-based algorithms usually divide an image into blocks and calculate blockspecific features. Change detection is achieved via block matching. In [8], the block correlation is measured using the NVD (Normalised Vector Distance) measure. In [7], an edge histogram calculated over the block area is used as a feature vector describing the block. The region-based approaches allow only coarse detection of the moving objects unless a multi-scale approach is used. 3 New Approach 3.1 LBP Operator The proposed texture-based method for background subtraction is based on the Local Binary Pattern (LBP) texture measure. The LBP is a powerful means of texture description. The operator labels the pixels of an image block by thresholding the neighbourhood of each pixel with the center value and considering the result as a binary number (LBP code): P 1 LBP(x c,y c ) = p=0 s(g p g c )2 p, (1) where g c corresponds to the grey value of the center pixel (x c,y c ) and g p to the grey values of the P neighbourhood pixels. The function s(x) is defined as follows: { 1 x 0 s(x) = 0 x < 0. (2) The original LBP operator worked with the 3 3-neighbourhood of a pixel [9]. See Figure 1 for an illustration of the operator. The general version of the LBP operator uses a circularly symmetric neighbour set as illustrated in Figure 2 [10]. In this case, g p corresponds to the grey values of P equally spaced pixels on a circle of radius R. The histogram of the LBP codes calculated over an image block can be used as a texture descriptor for the block. As can be seen from (1), the LBP is invariant to monotonic changes in grey scale. 3.2 The Method The goal of the proposed method is to identify regions of a video frame that contain moving objects. The method divides each new video frame into equally sized blocks by using a partially overlapping grid structure as illustrated in Figure 3. By using partially overlapping blocks it is possible to extract the shape of the moving object more accurately than in the case of non-overlapping blocks. Let us consider the feature vectors of a particular image block over time as a block process. Since we use an LBP histogram as the feature vector, the block process is defined as a time series of LBP histograms. We denote the block histogram at time instant t by x t.

4 Threshold Multiply LBP = = 143 Figure 1: Example for calculating the original LBP code. P = 4, R = 1 P = 8, R = 2 P = 12, R = 3 Figure 2: Circularly symmetric neighbour sets for different values of P and R. The values of neighbours that do not fall exactly on pixels are estimated by bilinear interpolation. As mentioned in the previous section, LBP is invariant to monotonic changes in grey scale. This makes it robust against illumination changes. Since the LBP histogram does not contain information about the positions where the individual LBP codes were calculated, it also supports the modelling of multi-modal backgrounds. LBP is computationally very fast, which is an important property from the practical implementation point of view. Furthermore, we use more than one histogram to model each block. The history of each block, {x 1,...,x t }, is modelled by a group of K weighted LBP histograms. We denote the weight of the k th histogram at time instant t by ω k,t. In the following, we explain the background model update procedure for one block, but the procedure is identical for each block. We utilise some ideas proposed by Stauffer and Grimson in [13]. The first thing to do is to compare the new block histogram x t against the existing K model histograms using a distance measure. We used the histogram intersection as the distance measure in our experiments. The histogram intersection for the normalised histograms x 1 and x 2 is defined as follows: H(x 1,x 2 ) = min(x 1,i,x 2,i ), (3) i where i is the bin index of the histogram. The user defines the threshold value T D for the histogram intersection as a method parameter. Notice that it is also possible to use other measures such as Chi-square or Log-likelihood.

5 Figure 3: Partially overlapping grid structure used by the algorithm. If none of the model histograms is close enough to the new histogram, the model histogram with the lowest weight is replaced with the new histogram and is given a low initial weight. After this, the weights are normalised so that they sum up to one. If a model histogram close enough to the new histogram was found, the bins of this histogram are updated as follows: x k,t [i] = α b x t [i] + (1 α b )x k,t 1 [i], α b [0, 1], (4) where α b is the user-defined learning rate. In addition, the weights ω k,t are updated as follows: ω k,t = (1 α w )ω k,t 1 + α w M k,t, α w [0, 1], (5) where α w is the user-defined learning rate and M k,t is 1 for the matched histogram and 0 for the others. Next, we need to decide which of the histograms of the model are most likely produced by the background processes. We use the persistence of the histogram as an evidence for this. Because the persistence of the k th histogram is directly related to its weight ω k,t, the histograms are sorted in decreasing order according to their weights. As a result the most probable background histograms are on the top of the list. As a last phase of the updating procedure, the first B histograms are selected to be the background model as follows: ω 1,t ω B,t > T B, T B [0, 1], (6) where T B is the user-defined threshold for the selection. If we are modelling a unimodal background, a small value for T B should be sufficient. In the case of a multi-modal background, a larger value for T B is recommended. A small value for T B selects only the most probable histogram to be the background model, whereas a large value allows the histogram x t to get more bin configurations since the background model consists of more than one histogram. Foreground detection is achieved via comparison of the new block histogram x t against the existing B background histograms selected at the previous time instant. If a match is not found, the block is considered to belong to the foreground. Otherwise, the block is marked as background.

6 4 Experiments The performance of the proposed method was evaluated using several video sequences. Both indoor and outdoor scenes were included. Figure 5 shows the results for the indoor test sequence where a person is walking in a room. As can be seen, the algorithm can separate the moving object almost perfectly. Because the algorithm is block-based, there occur some false positives on the contour areas of the object. Figure 6 shows the results for the indoor test sequence where a person walks towards the camera. Many adaptive pixel-based methods output a huge amount of false negatives on the inner areas of the moving object, because the pixel values stay almost the same over time. The proposed method gives good results, because it exploits information gathered over a larger area than a single pixel. The indoor test sequence illustrated in Figure 7 tests how well the algorithm handles a situation with a sudden illumination change. As can be seen the adaptation is pretty fast. The adaptation speed can be controlled with the learning rate parameters α b and α w. The bigger the learning rates are the faster the adaptation. Figure 8 shows the results of our algorithm for the outdoor test sequence, which contains relatively small moving objects. The original sequence has been taken from the PETS database (ftp://pets.rdg.ac.uk/). The proposed algorithm successfully handles this situation, but the shape information is not preserved well because of the block-based approach. The test sequence in Figure 9 contains heavily swaying trees in the wind and rippling water, which make it a very difficult scene from the background modelling point of view. There can also be some camera jitter present because of the wind. Since the method was designed to handle also multi-modal backgrounds, it manages the situation relatively well. The values of the method parameters used for the test sequences can be found from Table 1. From the test results we can see that our algorithm preserves the shape of a moving object pretty well in the case of large objects. Because the approach is block-based, the detection of very small objects can be done only coarsely. We measured the sensitivity of the algorithm to small changes of its parameter values. The most sensitive parameter is the distance threshold T D which is used during the comparison of the histograms. The selected LBP operator mainly imposes the value of T D. In summary, the algorithm proved not to be very sensitive to small changes in its parameter values, which makes the selection process quite easy. The performance of the proposed algorithm was compared to four algorithms representing state-of-the-art pixel and block-based approaches [7, 8, 13, 5, 11]. We manually labelled five frames from all of the five test sequences and calculated the accumulated sums for the true negatives and the true positives reported by the algorithms for the test sequences. The results are illustrated in Figure 4. The overall performance of our approach seems to be better than the performance of the comparison methods. The new method detected moving objects very accurately (true positives). The proportion of the true negatives was also pretty good. As can be seen from Figures 5-9, most of the false positives occurred on the contour areas of the moving objects. This is satisfactory because our approach is block-based. We also measured the real-time performance of our algorithm. For the parameter values used in the tests, a frame rate of 28 fps was achieved. We used a standard PC with a 1.8 GHz AMD Athlon XP CPU processor and 512 MB of memory in our experiments. The image resolution was pixels.

7 Test Sequence Block Size LBP P,R K α b α w T D T B 1 (Figure 5) LBP 6, (Figure 6) LBP 6, (Figure 7) LBP 6, (Figure 8) LBP 6, (Figure 9) LBP 6, Table 1: The parameter values used in the tests New approach [7] [8] [13] [5,11] 0.8 True positives True negatives Figure 4: The proportions of the correctly classified pixels reported by the algorithms for the test sequences. 5 Discussion In this paper, a novel block-based approach for detecting moving objects from video frames was proposed. Since the algorithm uses partially overlapping blocks, the shape information of the moving objects is preserved relatively well. The recent history of each block is modelled by a group of K weighted adaptive LBP histograms. The LBP is invariant to monotonic changes in grey scale, which makes it robust against illumination changes. The LBP also supports the modelling of multi-modal backgrounds. To make the model adaptive we update it at each discrete time instant. The proposed algorithm was tested against several test sequences including both indoor and outdoor scenes. It proved to be tolerant to illumination changes and multimodality of the background. Since the algorithm is adaptive, it also manages situations where new stationary objects are introduced to or old ones removed from the scene background. Furthermore, the algorithm operates in real-time, which makes it well suited to systems that require real-time performance.

10 Figure 7: Frames 330, 340, 350, 360 and 370 from an indoor test sequence where the lights are just turned on. Figure 8: Frames 150, 590, 920, 1000 and 1340 from an outdoor test sequence taken from the PETS database. Figure 9: Frames 280, 300, 320, 350 and 380 from an outdoor test sequence where two persons are walking in front of heavily swaying threes and rippling water. There can also be some camera jitter present because of the wind.

A NOVEL APPROACH TO ACCESS CONTROL BASED ON FACE RECOGNITION A. Hadid, M. Heikkilä, T. Ahonen, and M. Pietikäinen Machine Vision Group Infotech Oulu and Department of Electrical and Information Engineering

International Journal of Soft Computing and Engineering (IJSCE) A Survey on Moving Object Detection and Tracking in Video Surveillance System Kinjal A Joshi, Darshak G. Thakore Abstract This paper presents

HISTOGRAMS OF ORIENTATIO N GRADIENTS Histograms of Orientation Gradients Objective: object recognition Basic idea Local shape information often well described by the distribution of intensity gradients

Invariant Features of Local Textures a rotation invariant local texture descriptor Pranam Janney and Zhenghua Yu 1 School of Computer Science and Engineering University of New South Wales Sydney, Australia

http://www.diva-portal.org Postprint This is the accepted version of a paper presented at 14th International Conference of the Biometrics Special Interest Group, BIOSIG, Darmstadt, Germany, 9-11 September,

CHAPTER -1 An Introduction to Content Based Image Retrieval 1.1 Introduction With the advancement in internet and multimedia technologies, a huge amount of multimedia data in the form of audio, video and

Topological Mapping Discrete Bayes Filter Vision Based Localization Given a image(s) acquired by moving camera determine the robot s location and pose? Towards localization without odometry What can be

DALES: Automated Tool for Detection, Annotation, Labelling and Segmentation of Multiple Objects in Multi-Camera Video Streams M. Bhat and J. I. Olszewska University of Gloucestershire School of Computing

W 4 : Who? When? Where? What? A Real Time System for Detecting and Tracking People Ismail Haritaoglu, David Harwood and Larry S. Davis Computer Vision Laboratory University of Maryland College Park, MD

3. International Conference on Face and Gesture Recognition, April 14-16, 1998, Nara, Japan 1 W 4 : Who? When? Where? What? A Real Time System for Detecting and Tracking People Ismail Haritaoglu, David

Corner Detection Harvey Rhody Chester F. Carlson Center for Imaging Science Rochester Institute of Technology rhody@cis.rit.edu April 11, 2006 Abstract Corners and edges are two of the most important geometrical

Texture Analysis Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr Texture An important approach to image description is to quantify its texture content. Texture