Abstract
Service robots have to robustly follow and interact with humans. In this paper, we propose a very fast multi-people tracking algorithm designed to be applied on mobile service robots. Our approach exploits RGB-D data and can run in real-time at very high frame rate on a standard laptop without the need for a GPU implementation. It also features a novel depth-based sub-clustering method which allows to detect people within groups or even standing near walls. Moreover, for limiting drifts and track ID switches, an online learning appearance classifier is proposed featuring a three-term joint likelihood. We compared the performances of our system with a number of state-of-the-art tracking algorithms on two public datasets acquired with three static Kinects and a moving stereo pair, respectively. In order to validate the 3D accuracy of our system, we created a new dataset in which RGB-D data are acquired by a moving robot. We made publicly available this dataset which is not only annotated by hand, but the ground-truth position of people and robot are acquired with a motion capture system in order to evaluate tracking accuracy and precision in 3D coordinates. Results of experiments on these datasets are presented, showing that, even without the need for a GPU, our approach achieves state-of-the-art accuracy and superior speed.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
Contained in his Matlab toolbox http://vision.ucsd.edu/~pdollar/toolbox.
Bayes++ - http://bayesclasses.sourceforge.net.
Both computers had 4GB DDR3 memory.
This is the resolution used for most of the tests reported in this paper.
References
Bajracharya, M., Moghaddam, B., Howard, A., Brennan, S., & Matthies, L. H. (2009). A fast stereo-based system for detecting and tracking pedestrians from a moving vehicle. International Journal of Robotics Research, 28(11–12), 1466–1485.
Basso, F., Munaro, M., Michieletto, S., Pagello, E., & Menegatti, E. (2012). IAS-12 (pp. 265–276). Korea: Jeju Island.
Bellotto, N., & Hu, H. (2010). Computationally efficient solutions for tracking people with a mobile robot: an experimental evaluation of bayesian filters. Autonomous Robots, 28, 425–438.
Bernardin, K., & Stiefelhagen, R. (2008). Evaluating multiple object tracking performance: The clear mot metrics. Journal of Image Video Processing, 2008, 1:1–1:10.
Bradski, G. (2000). The OpenCV Library. Dr. Dobb’s Journal of Software Tools.
Breitenstein, M. D., Reichlin, F., Leibe, B., Koller-Meier, E., & Gool, L. V. (2009). Robust tracking-by-detection using a detector confidence particle filter. 12th International Conference on Computer Vision, 1, 1515–1522.
Carballo, A., Ohya, A., & Yuta, S. (2011). Reliable people detection using range and intensity data from multiple layers of laser range finders on a mobile robot. International Journal of Social Robotics, 3(2), 167–186.
Choi, W., Pantofaru, C., & Savarese, S. (2011). Detecting and tracking people using an rgb-d camera via multiple detector fusion. ICCV Workshops, 2011, 1076–1083.
Choi, W., Pantofaru, C., & Savarese, S. (2012). A general framework for tracking multiple people from a moving camera. Pattern Analysis and Machine Intelligence (PAMI), 35(7), 1577–1591.
Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, 1, 886–893.
Dollár, P., Wojek, C., Schiele, B., & Perona, P. (2009). Pedestrian detection: A benchmark. Computer Vision and Pattern Recognition, 2009, 304–311.
Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2009). Moving obstacle detection in highly dynamic scenes. International Conference on Robotics and Automation, 4451–4458.
Ess, A., Leibe, B., Schindler, K., & Van Gool, L. (2008). A mobile vision system for robust multi-person tracking. Computer Vision and Pattern Recognition, 2008, 1–8.
Everingham, M., Gool, L., Williams, C. K., Winn, J., & Zisserman, A. (2010). The pascal visual object classes (voc) challenge. International Journal of Computer Vision, 88, 303–338.
Felzenszwalb, P., Girshick, R., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part based models. Pattern Analysis and Machine Intelligence (PAMI), 32(9), 1627–1645.
Geiger, A., Lenz, P., & Urtasun, R. (2012). Are we ready for autonomous driving? The kitti vision benchmark suite. CVPR 2012 (pp. 3354–3361). USA: Providence.
Grabner, H., & Bischof, H. (2006). On-line boosting and vision. In CVPR, Vol. 1, pp. 260–267. IEEE Computer Society
Janoch, A., Karayev, S., Jia, Y., Barron, J., Fritz, M., Saenko, K., et al. (2011). A category-level 3-D object dataset: Putting the kinect to work. In ICCV workshop on consumer depth cameras in computer vision.
Kim, W., Yibing, W., Ovsiannikov, I., Lee, S., Park, Y., Chung, C., et al. (2012). A 1.5Mpixel RGBZ CMOS image sensor for simultaneous color and range image capture. In ISSCC 2012, San Francisco, USA, pp. 392–394.
Konstantinova, P., Udvarev, A., & Semerdjiev, T. (2003). A study of a target tracking algorithm using global nearest neighbor approach. In CompSysTec 2003: e-Learning, pp. 290–295. ACM
Koppula, H. S., Anand, A., Joachims, T., & Saxena, A. (2011). Semantic labeling of 3d point clouds for indoor scenes. Advances in Neural Information Processing Systems, 244–252.
Lai, K., Bo, L., Ren, X., & Fox, D. (2011). A large-scale hierarchical multi-view RGB-D object dataset. International Conference on Robotics and Automation, 2011, 1817–1824.
Luber, M., Spinello, L., & Arras, K. O. (2011). People tracking in RGB-D data with on-line boosted target models. Intelligent Robots and Systems, 2011, 3844–3849.
Martin, C., Schaffernicht, E., Scheidig, A., & Gross, H.-M. (2006). Multi-modal sensor fusion using a probabilistic aggregation scheme for people detection and tracking. Robotics and Autonomous Systems, 54(9), 721–728.
Mitzel, D., & Leibe, B. (2011). Real-time multi-person tracking with detector assisted structure propagation. ICCV Workshops, 2011, 974–981.
Mozos, O., Kurazume, R., & Hasegawa, T. (2010). Multi-part people detection using 2d range data. International Journal of Social Robotics, 2, 31–40.
Munaro, M., Basso, F., & Menegatti, E. (2012). Tracking people within groups with RGB-D data. IROS 2012 (pp. 2101–2107). Portugal: Algarve.
Munaro, M., Basso, F., Michieletto, S., Pagello, E., & Menegatti, E. (2013). A software architecture for RGB-D people tracking based on ros framework for a mobile robot. Frontiers of Intelligent Autonomous Systems, 466, 53–68.
Navarro-Serment, L. E., Mertz, C., & Hebert, M. (2009). Pedestrian detection and tracking using three-dimensional ladar data. The International Journal of Robotics Research, 103–112.
Pandey, G., McBride, J. R., & Eustice, R. M. (2011). Ford campus vision and lidar data set. International Journal of Robotics Research, 30(13), 1543–1552.
Pantofaru, C. (2010). The Moving People, Moving Platform Dataset. http://bags.willowgarage.com/downloads/people_dataset/.
Quigley, M., Gerkey, B., Conley, K., Faust, J., Foote, T., Leibs, J., et al. (2009). Ros: An open-source robot operating system ICRA.
Rusu, R. B., & Cousins, S. (2011). 3D is here: Point Cloud Library (PCL). In ICRA 2011, Shanghai, China, May 9–13, pp. 1–4.
Satake, J., & Miura, J. (2009). Robust stereo-based person detection and tracking for a person following robot. Workshop on people detection and tracking (ICRA 2009).
Silberman, N., & Fergus, R. (2011). Indoor scene segmentation using a structured light sensor. In ICCV 2011— workshop on 3D representation and recognition, pp. 601–608.
Spinello, L., Arras, K. O., Triebel, R., & Siegwart, R. (2010). A layered approach to people detection in 3d range data. In AAAI’10. Atlanta, USA: PGAI Track.
Spinello, L., Luber, M., & Arras, K. O. (2011). Tracking people in 3d using a bottom-up top-down people detector. In ICRA 2011 (pp. 1304–1310). Shanghai.
Spinello, L., & Arras, K. O. (2011). People detection in RGB-D data. Intelligent Robots and Systems, 2011, 3838–3843.
Sturm, J., Engelhard, N., Endres, F., Burgard, W., & Cremers, D. (2012). A benchmark for the evaluation of RGB-D SLAM systems. Intelligent Robots and Systems, 2012, 573–580.
Sung, J., Ponce, C., Selman, B., & Saxena, A. (2012). Unstructured human activity detection from RGBD images. IEEE International Conference on Robotics and Automation, 2012, 842–849.
Xing, J., Ai, H., & Lao, S. (2009). Multi-object tracking through occlusions by local tracklets filtering and global tracklets association with detection responses. Computer Vision and Pattern Recognition 1200–1207.
Zhang, L., Li, Y., & N. R. (2008). Global data association for multi-object tracking using network flows. Computer Vision and Pattern Recognition 1–8.
Zhang, H., & Parker, L. E. (2011). 4-dimensional local spatio-temporal features for human activity recognition. Intelligent Robots and Systems, 2011, 2044–2049.
Acknowledgments
We wish to thank the Biongineering of Movement Laboratory of the University of Padova for providing the motion capture facility, in particular Martina Negretto and Annamaria Guiotto for their help for the data acquisition and all the people who took part to the KTP Dataset. We wish also to thank Filippo Basso and Stefano Michieletto as co-authors of the previous publications related to this work and Mauro Antonello for the advices on the disparity computation for the ETH dataset.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Munaro, M., Menegatti, E. Fast RGB-D people tracking for service robots. Auton Robot 37, 227–242 (2014). https://doi.org/10.1007/s10514-014-9385-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10514-014-9385-0