Abstract
Multi-object tracking in complex scenarios remains a challenging task due to objects’ irregular motions and indistinguishable appearances. Traditional methods often approximate the motion direction of objects solely based on their bounding box information, leading to cumulative noise and incorrect association. Furthermore, the lack of depth information in these methods can result in failed discrimination between foreground and background objects due to the perspective projection of the camera. To address these limitations, we propose a Pose Intersection over Union (P-IoU) method to predict the true motion direction of objects by incorporating body pose information, specifically the motion of the human torso. Based on P-IoU, we propose PoseTracker, a novel approach that combines bounding box IoU and P-IoU effectively during association to improve tracking performance. Exploiting the relative stability of the human torso and the confidence of keypoints, our method effectively captures the genuine motion cues, reducing identity switches caused by irregular movements. Experiments on the DanceTrack and MOT17 datasets demonstrate that the proposed PoseTracker outperforms existing methods. Our method highlights the importance of accurate motion prediction of objects for data association in MOT and provides a new perspective for addressing the challenges posed by irregular object motion.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. In: CVPR (2018)
Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: CVPR (2008)
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: ICIP (2016)
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: AVSS (2017)
Cao, J., Pang, J., Weng, X., Khirodkar, R., Kitani, K.: Observation-centric sort: rethinking sort for robust multi-object tracking. In: CVPR (2023)
Chu, P., Wang, J., You, Q., Ling, H., Liu, Z.: TransMOT: spatial-temporal graph transformer for multiple object tracking. In: WACV (2023)
Du, Y., et al.: StrongSORT: make DeepSORT great again. IEEE Trans. Multimedia (2023). https://doi.org/10.1109/TMM.2023.3240881
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: ICCV (2017)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Han, S., Huang, P., Wang, H., Yu, E., Liu, D., Pan, X.: MAT: motion-aware multi-object tracking. Neurocomputing 473, 75–86 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
Lehmann, E.L., Casella, G.: Theory of Point Estimation. Springer, New York (2006). https://doi.org/10.1007/b98854
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Luiten, J.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129, 548–578 (2020). https://doi.org/10.1007/s11263-020-01375-2
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: CVPR (2022)
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: CVPR (2021)
Saribas, H., Cevikalp, H., Köpüklü, O., Uzun, B.: TRAT: tracking by attention using spatio-temporal features. Neurocomputing 492, 150–161 (2022)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, P., et al.: DanceTrack: multi-object tracking in uniform appearance and diverse motion. In: CVPR (2022)
Sun, P., et al.: TransTrack: multiple object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Wan, X., Cao, J., Zhou, S., Wang, J., Zheng, N.: Tracking beyond detection: learning a global response map for end-to-end multi-object tracking. IEEE Trans. Image Process. 30, 8222–8235 (2021)
Wang, S., Sheng, H., Zhang, Y., Wu, Y., Xiong, Z.: A general recurrent tracking framework without real data. In: ICCV (2021)
Welch, G., Bishop, G., et al.: An introduction to the Kalman filter (1995)
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: ICIP (2017)
Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment: an online multi-object tracker. In: CVPR (2021)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29
Xu, Y., Ban, Y., Delorme, G., Gan, C., Rus, D., Alameda-Pineda, X.: TransCenter: transformers with dense queries for multiple-object tracking. arXiv e-prints, pp. arXiv-2103 (2021)
Yang, F., Odashima, S., Masui, S., Jiang, S.: Hard to track objects with irregular motions and similar appearances? Make it easier by buffering the matching space. In: WACV (2023)
Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13687, pp. 659–675. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_38
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision, ECCV 2022. LNCS, vol. 13682, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_1
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: the fairness of detection and re-identification in multiple object tracking. IJCV 129, 1–19 (2021)
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Zhou, X., Yin, T., Koltun, V., Krähenbühl, P.: Global tracking transformers. In: CVPR (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wu, X., Xu, J. (2024). P-IoU: Accurate Motion Prediction Based Data Association for Multi-object Tracking. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14451. Springer, Singapore. https://doi.org/10.1007/978-981-99-8073-4_37
Download citation
DOI: https://doi.org/10.1007/978-981-99-8073-4_37
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8072-7
Online ISBN: 978-981-99-8073-4
eBook Packages: Computer ScienceComputer Science (R0)