P-IoU: Accurate Motion Prediction Based Data Association for Multi-object Tracking

Xinya Wu¹² &
Jinhua Xu¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14451))

Included in the following conference series:

International Conference on Neural Information Processing

910 Accesses
1 Citations

Abstract

Multi-object tracking in complex scenarios remains a challenging task due to objects’ irregular motions and indistinguishable appearances. Traditional methods often approximate the motion direction of objects solely based on their bounding box information, leading to cumulative noise and incorrect association. Furthermore, the lack of depth information in these methods can result in failed discrimination between foreground and background objects due to the perspective projection of the camera. To address these limitations, we propose a Pose Intersection over Union (P-IoU) method to predict the true motion direction of objects by incorporating body pose information, specifically the motion of the human torso. Based on P-IoU, we propose PoseTracker, a novel approach that combines bounding box IoU and P-IoU effectively during association to improve tracking performance. Exploiting the relative stability of the human torso and the confidence of keypoints, our method effectively captures the genuine motion cues, reducing identity switches caused by irregular movements. Experiments on the DanceTrack and MOT17 datasets demonstrate that the proposed PoseTracker outperforms existing methods. Our method highlights the importance of accurate motion prediction of objects for data association in MOT and provides a new perspective for addressing the challenges posed by irregular object motion.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 55.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 69.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Andriluka, M., et al.: PoseTrack: a benchmark for human pose estimation and tracking. In: CVPR (2018)
Google Scholar
Andriluka, M., Roth, S., Schiele, B.: People-tracking-by-detection and people-detection-by-tracking. In: CVPR (2008)
Google Scholar
Bewley, A., Ge, Z., Ott, L., Ramos, F., Upcroft, B.: Simple online and realtime tracking. In: ICIP (2016)
Google Scholar
Bochinski, E., Eiselein, V., Sikora, T.: High-speed tracking-by-detection without using image information. In: AVSS (2017)
Google Scholar
Cao, J., Pang, J., Weng, X., Khirodkar, R., Kitani, K.: Observation-centric sort: rethinking sort for robust multi-object tracking. In: CVPR (2023)
Google Scholar
Chu, P., Wang, J., You, Q., Ling, H., Liu, Z.: TransMOT: spatial-temporal graph transformer for multiple object tracking. In: WACV (2023)
Google Scholar
Du, Y., et al.: StrongSORT: make DeepSORT great again. IEEE Trans. Multimedia (2023). https://doi.org/10.1109/TMM.2023.3240881
Feichtenhofer, C., Pinz, A., Zisserman, A.: Detect to track and track to detect. In: ICCV (2017)
Google Scholar
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: YOLOX: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
Han, S., Huang, P., Wang, H., Yu, E., Liu, D., Pan, X.: MAT: motion-aware multi-object tracking. Neurocomputing 473, 75–86 (2022)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Held, D., Thrun, S., Savarese, S.: Learning to track at 100 FPS with deep regression networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 749–765. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_45
Chapter Google Scholar
Lehmann, E.L., Casella, G.: Theory of Point Estimation. Springer, New York (2006). https://doi.org/10.1007/b98854
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Google Scholar
Luiten, J.: HOTA: a higher order metric for evaluating multi-object tracking. Int. J. Comput. Vis. 129, 548–578 (2020). https://doi.org/10.1007/s11263-020-01375-2
Article Google Scholar
Meinhardt, T., Kirillov, A., Leal-Taixe, L., Feichtenhofer, C.: TrackFormer: multi-object tracking with transformers. In: CVPR (2022)
Google Scholar
Milan, A., Leal-Taixé, L., Reid, I., Roth, S., Schindler, K.: MOT16: a benchmark for multi-object tracking. arXiv preprint arXiv:1603.00831 (2016)
Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: CVPR (2021)
Google Scholar
Saribas, H., Cevikalp, H., Köpüklü, O., Uzun, B.: TRAT: tracking by attention using spatio-temporal features. Neurocomputing 492, 150–161 (2022)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, P., et al.: DanceTrack: multi-object tracking in uniform appearance and diverse motion. In: CVPR (2022)
Google Scholar
Sun, P., et al.: TransTrack: multiple object tracking with transformer. arXiv preprint arXiv:2012.15460 (2020)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Google Scholar
Wan, X., Cao, J., Zhou, S., Wang, J., Zheng, N.: Tracking beyond detection: learning a global response map for end-to-end multi-object tracking. IEEE Trans. Image Process. 30, 8222–8235 (2021)
Article Google Scholar
Wang, S., Sheng, H., Zhang, Y., Wu, Y., Xiong, Z.: A general recurrent tracking framework without real data. In: ICCV (2021)
Google Scholar
Welch, G., Bishop, G., et al.: An introduction to the Kalman filter (1995)
Google Scholar
Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: ICIP (2017)
Google Scholar
Wu, J., Cao, J., Song, L., Wang, Y., Yang, M., Yuan, J.: Track to detect and segment: an online multi-object tracker. In: CVPR (2021)
Google Scholar
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 472–487. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_29
Chapter Google Scholar
Xu, Y., Ban, Y., Delorme, G., Gan, C., Rus, D., Alameda-Pineda, X.: TransCenter: transformers with dense queries for multiple-object tracking. arXiv e-prints, pp. arXiv-2103 (2021)
Google Scholar
Yang, F., Odashima, S., Masui, S., Jiang, S.: Hard to track objects with irregular motions and similar appearances? Make it easier by buffering the matching space. In: WACV (2023)
Google Scholar
Yu, F., Li, W., Li, Q., Liu, Yu., Shi, X., Yan, J.: POI: multiple object tracking with high performance detection and appearance feature. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9914, pp. 36–42. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48881-3_3
Chapter Google Scholar
Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., Wei, Y.: MOTR: end-to-end multiple-object tracking with transformer. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision, ECCV 2022. LNCS, vol. 13687, pp. 659–675. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19812-0_38
Zhang, Y., et al.: ByteTrack: multi-object tracking by associating every detection box. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision, ECCV 2022. LNCS, vol. 13682, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20047-2_1
Zhang, Y., Wang, C., Wang, X., Zeng, W., Liu, W.: FairMOT: the fairness of detection and re-identification in multiple object tracking. IJCV 129, 1–19 (2021)
Article Google Scholar
Zhou, X., Koltun, V., Krähenbühl, P.: Tracking objects as points. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 474–490. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_28
Chapter Google Scholar
Zhou, X., Yin, T., Koltun, V., Krähenbühl, P.: Global tracking transformers. In: CVPR (2022)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Compute Science and Technology, East China Normal University, Shanghai, China
Xinya Wu & Jinhua Xu

Authors

Xinya Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jinhua Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinhua Xu .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, X., Xu, J. (2024). P-IoU: Accurate Motion Prediction Based Data Association for Multi-object Tracking. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14451. Springer, Singapore. https://doi.org/10.1007/978-981-99-8073-4_37

Download citation

DOI: https://doi.org/10.1007/978-981-99-8073-4_37
Published: 15 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8072-7
Online ISBN: 978-981-99-8073-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics