Abstract
The performance of pedestrian multiple object tracking (MOT), which is based on the tracking-by-detection framework, is exceedingly susceptible to the quality of detection, especially suffering from detection missing or inaccuracy caused by occlusion. Several studies aimed at alleviating the problem continue to perform poorly in scenarios with frequent heavy occlusions. In this study, a novel online pedestrian MOT method is proposed for targets with severe occlusion. First, a regression network is employed to refine the predicted position of the target to obtain a precise bounding box and visibility score. Considering the visibility score and the overlap between these refined bounding boxes globally, the targets that are heavily occluded are categorised into the following two types: (1) targets occluded by a non-pedestrian object and (2) targets occluded by other pedestrians. Then, these occluded targets are handled in different ways, which reduces the number of false negatives (FNs) and false positives (FPs). Finally, to enhance the precision of the prediction, a motion model that combines the Kalman filter and camera motion compensation is developed. The tracking results applied to three widely used pedestrian MOT benchmark datasets demonstrates the state-of-the-art performance of the proposed method.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Yu J, Rui Y, Chen B (2013) Exploiting click constraints and multi-view features for image re-ranking. IEEE Trans Multimedia 16(1):159–168
Yu J, Tan M, Zhang H, Tao D, Rui Y (2019) Hierarchical deep click feature prediction for fine-grained image recognition. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2019.2932058
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), 25–28 September 2016, pp 3464–3468. https://doi.org/10.1109/ICIP.2016.7533003
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), 17–20 September 2017, pp 3645–3649. https://doi.org/10.1109/ICIP.2017.8296962
Chu Q, Ouyang W, Li H, Wang X, Liu B, Yu N (2017) Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism. In: Proceedings of the IEEE international conference on computer vision, pp 4836–4845
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang M-H (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 366–382
Feng W, Hu Z, Wu W, Yan J, Ouyang W (2019) Multi-object tracking with multiple cues and switcher-aware classification. arXiv Preprint. arXiv:1901.06129
EvangelidisPsarakis GDEZ (2008) Parametric image alignment using enhanced correlation coefficient maximization. IEEE Trans Pattern Anal Mach Intell 30(10):1858–1865
Song Y-m, Jeon M (2016) Online multiple object tracking with the hierarchically adopted GM-PHD filter using motion and appearance. In: 2016 IEEE international conference on consumer electronics-Asia (ICCE-Asia). IEEE, Piscataway, pp 1–4
Chen L, Ai H, Zhuang Z, Shang C (2018) Real-time multiple people tracking with deeply learned candidate selection and person re-identification. In: 2018 IEEE international conference on Multimedia and Expo (ICME). IEEE, Piscataway, pp 1–6
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. Curran Associates, Red Hook, pp 379–387
Choi W, Savarese S (2010) Multiple target tracking in world coordinate with single, minimally calibrated camera. In: European conference on computer vision. Springer, Heidelberg, pp 553–567
Andriyenko A, Schindler K (2011) Multi-target tracking by continuous energy minimization. In: CVPR 2011. IEEE, Piscataway, pp 1265–1272
Leal-Taixé L, Pons-Moll G, Rosenhahn B (2011) Everybody needs somebody: modeling social and grouping behavior on a linear programming multiple people tracker. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops). IEEE, Piscataway, pp 120–127
Scovanner P, Tappen MF (2009) Learning pedestrian dynamics from the real world. In: 2009 IEEE 12th international conference on computer vision. IEEE, Piscataway, pp 381–388
Pellegrini S, Ess A, Schindler K, Van Gool L (2009) You’ll never walk alone: modeling social behavior for multi-target tracking. In: 2009 IEEE 12th international conference on computer vision. IEEE, Piscataway, pp 261–268
Yamaguchi K, Berg AC, Ortiz LE, Berg TL (2011) Who are you with and where are you going? In: CVPR 2011. IEEE, Piscataway, pp 1345–1352
Leal-Taixé L, Fenzi M, Kuznetsova A, Rosenhahn B, Savarese S (2014) Learning an image-based motion context for multiple people tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3542–3549
Alahi A, Goel K, Ramanathan V, Robicquet A, Fei-Fei L, Savarese S (2016) Social LSTM: Human trajectory prediction in crowded spaces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 961–971
Robicquet A, Sadeghian A, Alahi A, Savarese S (2016) Learning social etiquette: human trajectory understanding in crowded scenes. In: European conference on computer vision. Springer, Heidelberg, pp 549–565
Chen B, Wang D, Li P, Wang S, Lu H (2018) Real-time ‘actor-critic’ tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 318–334
Ren L, Lu J, Wang Z, Tian Q, Zhou J (2018) Collaborative deep reinforcement learning for multi-object tracking. In: Proceedings of the European conference on computer vision (ECCV), pp 586–602
Babaee M, Li Z, Rigoll G (2018) Occlusion handling in tracking multiple people using RNN. In: 2018 25th IEEE international conference on image processing (ICIP). IEEE, Piscataway, pp 2715–2719
Sundermeyer M, Schlüter R, Ney H (2012) LSTM neural networks for language modeling. In: 13th annual conference of the international speech communication association
Kim C, Li F, Ciptadi A, Rehg JM (2015) Multiple hypothesis tracking revisited. In: Proceedings of the IEEE international conference on computer vision, pp 4696–4704
Kuo C-H, Nevatia R (2011) How does person identity recognition help multi-person tracking? In: CVPR 2011. IEEE, Piscataway, pp 1217–1224
Yang B, Nevatia R (2012) An online learned CRF model for multi-target tracking. In: 2012 IEEE conference on computer vision and pattern recognition. IEEE, Piscataway, pp 2034–2041
Zhao L, Li X, Zhuang Y, Wang J (2017) Deeply-learned part-aligned representations for person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 3219–3228
Ristani E, Tomasi C (2018) Features for multi-target multi-camera tracking and re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6036–6046
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision. Springer, Heidelberg, pp 21–37
Feichtenhofer C, Pinz A, Zisserman A (2017) Detect to track and track to detect. In: Proceedings of the IEEE international conference on computer vision, pp 3038–3046
Kieritz H, Hubner W, Arens M (2018) Joint detection and online multi-object tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 1459–1467
Bergmann P, Meinhardt T, Leal-Taixe L (2019) Tracking without bells and whistles. In: Proceedings of the IEEE international conference on computer vision, pp 941–951
Keuper M, Tang S, Andres B, Brox T, Schiele B (2018) Motion segmentation and multiple object tracking by correlation co-clustering. IEEE Trans Pattern Anal Mach Intell 42(1):140–153
Bochinski E, Eiselein V, Sikora T (2017) High-speed tracking-by-detection without using image information. In: 2017 14th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 1–6
Bochinski E, Senst T, Sikora T (2018) Extending IOU based multi-object tracking by visual information. In: 2018 15th IEEE international conference on advanced video and signal based surveillance (AVSS). IEEE, Piscataway, pp 1–6
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Hermans A, Beyer L, Leibe B (2017) In defense of the triplet loss for person re-identification. Arxiv preprint. arXiv:1703.07737
Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: towards a benchmark for multi-target tracking. Arxiv preprint. arXiv:1504.01942
Milan A, Leal-Taixé L, Reid I, Roth S, Schindler KJapa (2016) MOT16: a benchmark for multi-object tracking. Arxiv preprint. arXiv:1603.00831
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2009) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J Image Video Process 2008:1–10
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Heidelberg, pp 740–755
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. Arxiv preprint. arXiv:1412.6980
Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE international conference on computer vision, pp 300–311
Xu J, Cao Y, Zhang Z, Hu H (2019) Spatial-temporal relation networks for multi-object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 3988–3998
Sun S, Akhtar N, Song H, Mian AS, Shah M (2019) Deep affinity network for multiple object tracking. IEEE Trans Pattern Anal Mach Intell 43(1):104–119
Chu P, Fan H, Tan CC, Ling H (2019) Online multi-object tracking with instance-aware tracker and dynamic model refreshment. In: 2019 IEEE winter conference on applications of computer vision (WACV). IEEE, Piscataway, pp 161–170
Yoon Y-C, Kim DY, Yoon K, Song Y, Jeon M (2019) Online multiple pedestrian tracking using deep temporal appearance matching association. Inf Sci 561:326–351
Levinkov E, Uhrig J, Tang S, Omran M, Insafutdinov E, Kirillov A, Rother C, Brox T, Schiele B, Andres B (2017) Joint graph decomposition & node labeling: problem, algorithms, applications. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6012–6020
Tang S, Andriluka M, Andres B, Schiele B (2017) Multiple people tracking by lifted multicut and person re-identification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3539–3548
Ma L, Tang S, Black MJ, Van Gool L (2018) Customized multi-person tracker. In: Asian conference on computer vision. Springer, Heidelberg, pp 612–628
Chen L, Ai H, Chen R, Zhuang Z (2019) Aggregate tracklet appearance features for multi-object tracking. IEEE Signal Process Lett 26(11):1613–1617
Chu P, Ling H (2019) Famnet: joint learning of feature, affinity and multi-dimensional assignment for online multiple object tracking. In: Proceedings of the IEEE international conference on computer vision, pp 6172–6181
Wang G, Wang Y, Zhang H, Gu R, Hwang J-N (2019) Exploit the connectivity: multi-object tracking with trackletnet. In: Proceedings of the 27th ACM international conference on multimedia, pp 482–490
Henschel R, Zou Y, Rosenhahn B (2019) Multiple people tracking using body and joint detections. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2019
Osep A, Mehner W, Mathias M et al (2017) Combined image-and world-space tracking in traffic scenes. In: IEEE international conference on robotics and automation (ICRA). IEEE, Piscataway, pp 1988–1995
Shenoi A, Patel M, Gwak JY et al (2020) JRMOT: a real-time 3D multi-object tracker and a new large-scale dataset. In: The IEEE/RSJ international conference on intelligent robots and systems (IROS)
Yoon JH, Lee CR, Yang MH et al (2016) Online multi-object tracking via structural constraint event aggregation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1392–1400
Choi W (2015) Near-online multi-target tracking with aggregated local flow descriptor. In: Proceedings of the IEEE international conference on computer vision, pp 3029–3037
Weng X, Wang J, Held D et al (2020) 3D multi-object tracking: A baseline and new evaluation metrics. In: 2020 IEEE/RSJ international conference on intelligent robots and systems. IEEE, Piscataway, pp 10359–10366
Acknowledgements
This study was supported by the Graduate Innovation Foundation of Jiangsu Province under Grant No. KYLX16_0781, Natural Science Foundation of Jiangsu Province under Grant No. BK20181340, the 111 Project under Grant No. B12018, and PAPD of Jiangsu Higher Education Institutions. We would like to thank Editage (www.editage.cn) for English language editing.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, J., Ge, H., Yang, J. et al. Online Pedestrian Multiple-Object Tracking with Prediction Refinement and Track Classification. Neural Process Lett 54, 4893–4919 (2022). https://doi.org/10.1007/s11063-022-10840-7
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-10840-7