Abstract
We present a system that is capable of segmenting, detecting and tracking multiple people in a cluttered scene using multiple synchronized cameras located far from each other. The system improves upon existing systems in many ways including: (1) We do not assume that a foreground connected component belongs to only one object; rather, we segment the views taking into account color models for the objects and the background. This helps us to not only separate foreground regions belonging to different objects, but to also obtain better background regions than traditional background subtraction methods (as it uses foreground color models in the algorithm). (2) It is fully automatic and does not require any manual input or initializations of any kind. (3) Instead of taking decisions about object detection and tracking from a single view or camera pair, we collect evidences from each pair and combine the evidence to obtain a decision in the end. This helps us to obtain much better detection and tracking as opposed to traditional systems.
Several innovations help us tackle the problem. The first is the introduction of a region-based stereo algorithm that is capable of finding 3D points inside an object if we know the regions belonging to the object in two views. No exact point matching is required. This is especially useful in wide baseline camera systems where exact point matching is very difficult due to self-occlusion and a substantial change in viewpoint. The second contribution is the development of a scheme for setting priors for use in segmentation of a view using bayesian classification. The scheme, which assumes knowledge of approximate shape and location of objects, dynamically assigns priors for different objects at each pixel so that occlusion information is encoded in the priors. The third contribution is a scheme for combining evidences gathered from different camera pairs using occlusion analysis so as to obtain a globally optimum detection and tracking of objects.
The system has been tested using different density of people in the scene which helps us to determine the number of cameras required for a particular density of people.
Chapter PDF
Similar content being viewed by others
References
Cai Q. and Aggarwal J.K. 1998. Automatic Tracking of Human Motion in Indoor Scenes Across Multiple Synchronized video Streams. In 6th Internation Conference on Computer Vision, Bombay, India, pp. 356–262.
Collins R.T., Lipton A.J., and Kanade T. 1999. A System for Video Surveillance and Monitoring. American Nuclear Society Eighth International Topical Meeting on Robotics and Remote Systems, Pittsburgh.
Darrell T., Gordon G., Harville M., and Woodfill J. 1998. Integrated Person Tracking Using Stereo, color, and Pattern Detection. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Santa Barbara, CA, pp. 601–608.
Darrell T., Demirdjian D., Checka N., and Felzenszwalb P. 2001. Plan-View Trajectory Estimation with Dense Stereo Background Models. In IEEE International Conference on Computer Vision., Vancouver, Canada.
Elgammal A., Duraiswami R. and Davis L.S. 2001. Efficient Non-parametric Adaptive Color Modeling Using Fast Gauss Transform. IEEE Conference on Computer Vision and Pattern Recognition, Hawaii.
Haritaoglu I., Harwood D. and Davis, L.S. 1998. W4:Who, When, Where, What: A Real Time System for Detecting and Tracking People. Third IEEE International Conference on Automatic Face and Gesture Recognition, Nara, Japan, pp. 222–227.
Haritaoglu I., Harwood D., and Davis L.S. 1998. W4S: A real-time system for detecting and tracking people in 2 1/2D. 5th European Conference on Computer Vision, Freiburg, Germany.
Horaud R. and Skordas T. 1989. Stereo Correspondence through Feature Grouping and Maximal Cliques. IEEE Journal on Pattern Analysis and Computer Vision, vol 11(11):1168–1180.
Intille S. S. and Bobick A. F. 1995. Closed-World Tracking. 5TH International Conference on Computer Vision, Cambridge, MA, pp. 672–678.
Intille S.S., Davis, J.W. and Bobick A.F. 1997. Real-Time Closed-World Tracking. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 697–703.
Kettnaker V. and Zabih R. 1999. Counting People from Multiple Cameras. In IEEE International Conference on Multimedia Computing and Systems, Florence, Italy, pp. 267–271.
Sander P.T., Vinet L, Cohen L. and Gagalowicz A. 1989. Hierarchical Region Based Stereo Matching. In IEEE Conference on Computer Vision and Pattern Recognition, San Diego.
Krumm J., Harris S., Meyers B., Brumitt B., Hale M. and Shafer S. 2000. Multi-camera Multi-person Tracking for EasyLiving. 3rd IEEE International Workshop on Visual Surveillance, Dublin, Ireland.
Mittal A. and Huttenlocher D. 2000. Site Modeling for Wide Area Surveillance and Image Synthesis. In IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, South Carolina.
Mittal A. and Davis L.S. 2001. Unified Multi-Camera Detection and Tracking Using Region-Matching. In IEEE Workshop on Multi-Object Tracking, Vancouver, Canada.
Orwell J., Remagnino P. and Jones G.A. 1999. Multi-Camera Color Tracking. Proceedings of the 2nd IEEE Workshop on Visual Surveillance, Fort Collins, Colorado.
Orwell J., Massey S., Remagnino P., Greenhill D., and Jones G.A. 1999. A Multi-agent Framework for Visual Surveillance. International Conference on Image Analysis and Processing, Venice, Italy, pp 1104–1107.
Pritchett P., and Zisserman A. 1998. Wide Baseline Stereo Matching. In Sixth International Conference on Computer Vision, Bombay, India, pp. 754–760.
Rosales R. and Sclaroff S. 1999. 3D Trajectory Recovery for Tracking Multiple Objects and Trajectory Guided Recognition of Actions. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, pp. 117–123.
Snow D., Viola P., and Zabih R. 2000. Exact Voxel Occupancy Using Graph Cuts. In IEEE Conference on Computer Vision and Pattern Recognition, Hilton Head, South Carolina.
Wren C.R., Azarbayejani A., Darrell T. and Pentland A.P. 1997. Pfinder: Real-time Tracking of the Human Body. IEEE Transactions on Pattern Recognition and Machine Intelligence, vol 19.7.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mittal, A., Davis, L.S. (2002). M2Tracker: A Multi-View Approach to Segmenting and Tracking People in a Cluttered Scene Using Region-Based Stereo. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds) Computer Vision — ECCV 2002. ECCV 2002. Lecture Notes in Computer Science, vol 2350. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47969-4_2
Download citation
DOI: https://doi.org/10.1007/3-540-47969-4_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43745-1
Online ISBN: 978-3-540-47969-7
eBook Packages: Springer Book Archive