More Web Proxy on the site http://driver.im/

Article

COOLer: Class-Incremental Learning for Appearance-Based Multiple Object Tracking

Authors:

Fisher YuAuthors Info & Claims

Pattern Recognition: 45th DAGM German Conference, DAGM GCPR 2023, Heidelberg, Germany, September 19–22, 2023, Proceedings

Pages 443 - 458

https://doi.org/10.1007/978-3-031-54605-1_29

Published: 19 September 2023 Publication History

Abstract

Continual learning allows a model to learn multiple tasks sequentially while retaining the old knowledge without the training data of the preceding tasks. This paper extends the scope of continual learning research to class-incremental learning for multiple object tracking (MOT), which is desirable to accommodate the continuously evolving needs of autonomous systems. Previous solutions for continual learning of object detectors do not address the data association stage of appearance-based trackers, leading to catastrophic forgetting of previous classes’ re-identification features. We introduce COOLer, a COntrastive- and cOntinual-Learning-based tracker, which incrementally learns to track new categories while preserving past knowledge by training on a combination of currently available ground truth labels and pseudo-labels generated by the past tracker. To further exacerbate the disentanglement of instance representations, we introduce a novel contrastive class-incremental instance representation learning technique. Finally, we propose a practical evaluation protocol for continual learning for MOT and conduct experiments on the BDD100K and SHIFT datasets. Experimental results demonstrate that COOLer continually learns while effectively addressing catastrophic forgetting of both tracking and detection. The project page is available at https://www.vis.xyz/pub/cooler.

References

[1]

Aharon, N., Orfaig, R., Bobrovsky, B.Z.: Bot-sort: robust associations multi-pedestrian tracking. arXiv preprint arXiv:2206.14651 (2022)

[2]

Bernardin K and Stiefelhagen R Evaluating multiple object tracking performance: the clear mot metrics EURASIP J. Image Video Process. 2008 2008 1-10

Digital Library

[3]

Cha, H., Lee, J., Shin, J.: Co2l: contrastive continual learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9516–9525 (2021)

[4]

Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

[5]

Chen Z and Liu B Lifelong machine learning Synth. Lect. Artif. Intell. Mach. Learn. 2018 12 3 1-207

[6]

Dave A, Khurana T, Tokmakov P, Schmid C, and Ramanan D Vedaldi A, Bischof H, Brox T, and Frahm J-M TAO: a large-scale benchmark for tracking any object Computer Vision – ECCV 2020 2020 Cham Springer 436-454

Digital Library

[7]

Dendorfer, P., et al.: Mot20: a benchmark for multi object tracking in crowded scenes. arXiv preprint arXiv:2003.09003 (2020)

[8]

Du Y et al. Strongsort: make deepsort great again IEEE Trans. Multimedia 2023 25 8725-8737

Digital Library

[9]

Fan, H., Zheng, L., Yan, C., Yang, Y.: Unsupervised person re-identification: clustering and fine-tuning. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 14(4), 1–18 (2018)

[10]

He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

[11]

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

[12]

Joseph, K., Khan, S., Khan, F.S., Balasubramanian, V.N.: Towards open world object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5830–5840 (2021)

[13]

Karthik, S., Prabhu, A., Gandhi, V.: Simple unsupervised multi-object tracking. arXiv preprint arXiv:2006.02609 (2020)

[14]

Kirkpatrick J et al. Overcoming catastrophic forgetting in neural networks Proc. Natl. Acad. Sci. 2017 114 13 3521-3526

[15]

Li M, Zhu X, and Gong S Unsupervised tracklet person re-identification IEEE Trans. Pattern Anal. Mach. Intell. 2019 42 7 1770-1782

[16]

Li Z and Hoiem D Learning without forgetting IEEE Trans. Pattern Anal. Mach. Intell. 2017 40 12 2935-2947

Digital Library

[17]

Liu, Y., Schiele, B., Vedaldi, A., Rupprecht, C.: Continual detection transformer for incremental object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 23799–23808 (2023)

[18]

Mai, Z., Li, R., Kim, H., Sanner, S.: Supervised contrastive replay: Revisiting the nearest class mean classifier in online class-incremental continual learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3589–3599 (2021)

[19]

Mallya, A., Lazebnik, S.: Packnet: adding multiple tasks to a single network by iterative pruning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7765–7773 (2018)

[20]

McCloskey, M., Cohen, N.J.: Catastrophic interference in connectionist networks: the sequential learning problem. In: Psychology of Learning and Motivation, vol. 24, pp. 109–165. Elsevier (1989)

[21]

Pang, J., et al.: Quasi-dense similarity learning for multiple object tracking. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 164–173 (2021)

[22]

Peng C, Zhao K, and Lovell BC Faster ilod: incremental learning for object detectors based on faster RCNN Pattern Recogn. Lett. 2020 140 109-115

Digital Library

[23]

Ramanan, D., Forsyth, D.A.: Finding and tracking people from the bottom up. In: 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Proceedings, vol. 2, p. II. IEEE (2003)

[24]

Rebuffi, S.A., Kolesnikov, A., Sperl, G., Lampert, C.H.: ICARL: incremental classifier and representation learning. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2001–2010 (2017)

[25]

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)

[26]

Rusu, A.A., et al.: Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016)

[27]

Segu, M., Schiele, B., Yu, F.: Darth: holistic test-time adaptation for multiple object tracking. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

[28]

Shmelkov, K., Schmid, C., Alahari, K.: Incremental learning of object detectors without catastrophic forgetting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3400–3409 (2017)

[29]

Sun, T., et al.: SHIFT: a synthetic driving dataset for continuous multi-task domain adaptation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 21371–21382 (2022)

[30]

Wang, Y.H.: Smiletrack: similarity learning for multiple object tracking. arXiv preprint arXiv:2211.08824 (2022)

[31]

Wojke, N., Bewley, A., Paulus, D.: Simple online and realtime tracking with a deep association metric. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 3645–3649. IEEE (2017).

Digital Library

[32]

Wu, G., Zhu, X., Gong, S.: Tracklet self-supervised learning for unsupervised person re-identification. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12362–12369 (2020)

[33]

Xie, J., Yan, S., He, X.: General incremental learning with domain-aware categorical representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14351–14360 (2022)

[34]

Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020)

[35]

Zhang, Y., et al.: Bytetrack: multi-object tracking by associating every detection box. In: European Conference on Computer Vision, pp. 1–21. Springer, Heidelberg (2022).

Digital Library

[36]

Zheng, K., Chen, C.: Contrast r-cnn for continual learning in object detection. arXiv preprint arXiv:2108.04224 (2021)

[37]

Zhou, W., Chang, S., Sosa, N., Hamann, H., Cox, D.: Lifelong object detection. arXiv preprint arXiv:2009.01129 (2020)

[38]

Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable DETR: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)

Cited By

Li SKe LYang YPiccinelli LSegù MDanelljan MGool L(2024)SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary TrackingComputer Vision – ECCV 202410.1007/978-3-031-73383-3_1(1-18)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73383-3_1

Recommendations

Walker: Self-supervised Multiple Object Tracking by Walking on Temporal Appearance Graphs
Computer Vision – ECCV 2024
Abstract
The supervision of state-of-the-art multiple object tracking (MOT) methods requires enormous annotation efforts to provide bounding boxes for all frames of all videos, and instance IDs to associate them through time. To this end, we introduce ...
Heterogeneous Fusion of Omnidirectional and PTZ Cameras for Multiple Object Tracking

Dual-camera systems have been widely used in surveillance because of the ability to explore the wide field of view (FOV) of the omnidirectional camera and the wide zoom range of the PTZ camera. Most existing algorithms require a priori knowledge of the ...
Prototype learning based generic multiple object tracking via point-to-box supervision
Abstract
Generic multiple object tracking aims to recover the trajectories for generic moving objects of the same category. This task relies on the ability of effectively extracting representative features of the target objects. To this end, we propose a ...
Highlights
- A prototype learning based generic multi-object detector.
- A point-to-box label refinement training algorithm.
- A hierarchical motion-aware association algorithm for tracking.
- Extensive experiments demonstrate the effectiveness ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings

Pattern Recognition: 45th DAGM German Conference, DAGM GCPR 2023, Heidelberg, Germany, September 19–22, 2023, Proceedings

Sep 2023

647 pages

ISBN:978-3-031-54604-4

DOI:10.1007/978-3-031-54605-1

Editors:
Ullrich Köthe
https://ror.org/038t36y30IWR, Heidelberg University, Heidelberg, Germany
,
Carsten Rother
https://ror.org/038t36y30IWR, Heidelberg University, Heidelberg, Germany

© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 19 September 2023

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li SKe LYang YPiccinelli LSegù MDanelljan MGool L(2024)SLAck: Semantic, Location, and Appearance Aware Open-Vocabulary TrackingComputer Vision – ECCV 202410.1007/978-3-031-73383-3_1(1-18)Online publication date: 29-Sep-2024
https://dl.acm.org/doi/10.1007/978-3-031-73383-3_1

View Options

View options

Figures

Tables

Media

View Table of Conten