EE 475

 

DIGITAL IMAGE PROCESSING

 

TERM PROJECT

Intermediate Report

 

 

TRACKING HUMAN FACES USING MOTION AND BACKGROUND SUBTRACTION

 

 

Sancar ADALI

Hasan AYAZ

Ali Oğuz ŪSTŪN

 

 

 

http://hasanayaz.com/image

  1. Abstract

 

Target detection and tracking is one of the most important and fundamental technologies to develop the real world computer vision systems such as security and traffic monitoring systems. Objects in real world are recognized and localized in the image. This perception is used to trigger a physical event such as directing the robot that carries the camera towards the object. In this project, our aim focuses on tracking of moving faces. The detection will depend on motion of the target and background subtraction. Using motion we have will have slight differences between subsequent frames. This will allow us to perceive the moving region in the image. Using the background subtraction algorithm, we will be able to perceive objects regions in the current frame.

 

  1. Introduction

 

Detection of motion is the first stage in many automated surveillance system applications. In these systems, the aim is to achieve very high sensitivity in detection of moving objects with lowest possible false alarm rates. Background segmentation is one of the several techniques that is most commonly used. It depends on the difference of the current frame with the scene background.

 

If we consider the intensity value of a pixel over time in a completely static scene which has no motion over time, then the pixel intensity can be modeled with a normal distribution N(µ,ς˛), where µ is the mean and  ς˛ is the variance of the Gaussian random distribution. In fact, this distribution will be a  zero mean Normal distribution N(µ,ς˛). This Normal distribution model for the intensity value of a pixel is the underlying model for many background subtraction techniques. [1], [2].

 

Many video segmentation algorithms use change detection as their primary segmentation criterion. The position and shape of the moving object is detected from the frame difference of two consecutive frames.

 

There are several techniques, [1], that combine both motion detection, using frame differencing and background subtraction. In this project, we assume that we have an initial empty image of the background. This assumption simplifies our algorithm by excluding background registration process. But, this assumption is not a restricting one, since all static cameras should be initialized and calibrated for better performance. Thus, a real world system can work on this basis as far as it updates its initial background image with respect to illumination changes. Our algorithm assumes that we have the background scene without any moving objects.

 

A system block diagram as shown in figure 2.1, takes advantages of two different source of information. By utilizing the frame difference, it detects motion in the scene. A good surveillance application should be able to track the suspicious objects even when they stop. So by utilizing the background difference, the algorithm can detect static objects in the scene.

 

 

Flowchart: Data: Current Frame Flowchart: Data: Previous
 Frame
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


                                                          Figure 2.2

 

 

            After detecting the moving object, we will check whether this object is a face or not, since, the project aims at tracking human faces. In this first part, we have deal with detection of moving and static objects, and leave verifying if the detected object is human face or not to the next part.

 

 

 

3.  Setup & Tools

For the current system we have make several assumptions. First we focus on portrait speaker systems where the there’s an person standing near the camera and only upper part of this person is on display. Also, to simplify our problem, we assume our environment is well illuminated and the light directed from behind the camera. This is, in fact mandatory, since we are using web cams and they add great amount of noise to the image they shot.  Below figure 3.1 shows a scene of the type described.

Figure 3.1

We have used Microsoft Visual C++ programming environment to write Windows platform executables. We are using Web-cams D-Link C300 and MediaForte PC Vision 300. They are connected to PC over USB bus. We use Microsoft Video for Windows Application interface to reach the camera drivers. The Software Development Kit that wraps Microsoft Video for Windows is Microsoft Vision SDK and this C++ class SDK is used to grab continuous images from the cameras.  Figure 3.3 shows the development environment screen shot. Figure 3.2 shows the user interface of the developed program.

Figure 3.2

The program can save the current image as bitmap using ‘S’ labeled button. Also, the program changes display to processed image or original image using the toolbar button labeled ‘T’. Program displays the timing and uses high precision Multimedia timing available in Windows Operating System. The grabbed frames per second and processed and displayed frames per second are displayed separately. These are different because the program uses two different threads; one is for image capture and the other for processing. If processing operations are greedy in time, than the displayed frames per second decreases. We can capture 24 fps from the web-cams we are using but this value changes with time and camera characteristics. We sometimes observe 26 fps. Figure 3.4 shows the working environment and the camera we use.

Figure 3.3

Figure 3.4

4. Processing Methods

 

We have mainly used the background subtraction algorithm and thresholding described in [1]. However, for the first step, we assume that we have the background image that has been registered initially. Thus, we do not  need to do background registration.  Next, since we assume the illumination is high,  thresholding with proper values gives the required effects.

            First we do absolute frame difference between subsequent grabbed frames. Then, we threshold the difference images for noise. A sample is shown in figure 3.5 where the original image the difference image at that instance is shown. To suppress the shadow, threshold values is chosen high. We can see the edges of the object with motion. In the image, a face is moving.  Next, we find the difference between the original grabbed frame and the background we have initially saved. This gives us the regions that are probably in the object region. Figure 3.6 shows a sample shot which shows the difference of the current frame with the initial background image. Note, that our algorithm have also detected the shadow of the moving head.  Figure 3.7 shows another sequence of face and its shadow.

 

Figure 3.5

 

Figure 3.6

Figure 3.7

 

To eliminate the shadow, we have implemented absolute thresholding. Then the image is inverted. The result can be seen in figure 3.8. The algorithm now works with 24-25 fps and displays the face at real-time.

 

                                               Figure 3.8

 

 

5. Future Work

 

            Thus far, we have mainly deal with hardware and software setting up. After creating working environment, we have utilized some detection algorithms that extract moving objects from scenes with cluttered backgrounds.

 

            Next, we plan to write the necessary algorithms to realize object localization in the scene and artificial intelligence to track that moving object.  Moreover, we will add a some kind of pattern matching algorithm to check if the detected object is a human face. Currently, the system works at 22-24 frames per second which can be considered to be real-time. We will try to keep up with that and optimize upcoming additions to the algorithm and preserve the real –time property of the system.

 

 

 

 

6. References

 

[1]. Saho-Yi Chien, Shyh-Yih Ma, Liang-Gee Chen, “Efficient Moving Object Segmentation Algorithm Using Background Subtraction”, IEEE Transactions on Circuits and Systems for Video Technology, Vol 12, No 7, pages 577-585

 

[2]. Ahmed Elgamal, David Harwood, Larry Davis, “Non-parametric model for background subtraction”,  Computer Vision Lab. University of Maryland, Collage Park, MD 20742

 

[3]. Kenneth M. Dawson – Howe, “ Active Survaillance using dynamic background subtraction”,  Dept. of Computer Science, Trinity Collage,Dublin, Ireland.

 

[4]. N. Paragios and G. Tziritas, “Detection and location of moving objects using deterministic relaxation algorithms”, Institute of Computer Science - FORTH, and,

Department of Computer Science, University of Crete

P.O. Box 1470, Heraklion, Greece

 

[5].  Yaakov Tsaig and Amir Averbuch, “Automatic Segmentation of Moving Objects in Video Sequences : A Region Labelling Approach”, IEEE Transactions on Circuits and Systems for Video Technology, Vol 12, No 7, page 597-612

 

[6].  Takashi Matsuyama and Norimichi Ukita, “Real-Time Multitarget Tracking by Cooperative Distributed Vision System”, Proceeding of the IEEE Vol. 90, No.7, pages 1136-1149