More Web Proxy on the site http://driver.im/

research-article

TS-ILM:Class Incremental Learning for Online Action Detection

Authors:

Nyima TashiAuthors Info & Claims

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Pages 1158 - 1167

https://doi.org/10.1145/3664647.3681456

Published: 28 October 2024 Publication History

Abstract

Online action detection aims to identify ongoing actions within untrimmed video streams, with extensive applications in real-life scenarios. However, in practical applications, video frames are received sequentially over time and new action categories continually emerge, giving rise to the challenge of catastrophic forgetting - a problem that remains inadequately explored. Generally, in the field of video understanding, researchers address catastrophic forgetting through class-incremental learning. Nevertheless, online action detection is based solely on historical observations, thus demanding higher temporal modeling capabilities for class-incremental learning methods. In this paper, we conceptualize this task as Class-Incremental Online Action Detection (CIOAD) and propose a novel framework, TS-ILM, to address it. Specifically, TS-ILM consists of two components: task-level temporal pattern extractor and temporal-sensitive exemplar selector. The former extracts the temporal patterns of actions in different tasks and saves them, allowing the data to be comprehensively observed on a temporal level before it is input into the backbone. The latter selects a set of frames with the highest causal relevance and minimum information redundancy for subsequent replay, enabling the model to learn the temporal information of previous tasks more effectively. We benchmark our approach against SoTA class-incremental learning methods applied in the image and video domains on THUMOS'14 and TVSeries datasets. Our method outperforms the previous approaches.

References

[1]

Alexandre Alahi, Kratarth Goel, Vignesh Ramanathan, Alexandre Robicquet, Li Fei-Fei, and Silvio Savarese. 2016. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE conference on computer vision and pattern recognition. 961--971.

[2]

Rahaf Aljundi, Francesca Babiloni, Mohamed Elhoseiny, Marcus Rohrbach, and Tinne Tuytelaars. 2018. Memory aware synapses: Learning what (not) to forget. In Proceedings of the European conference on computer vision (ECCV). 139--154.

Digital Library

[3]

Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. 2017. Expert gate: Lifelong learning with a network of experts. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3366--3375.

[4]

Lama Alssum, Juan Leon Alcazar, Merey Ramazanova, Chen Zhao, and Bernard Ghanem. 2023. Just a glimpse: Rethinking temporal information for video continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2474--2483.

[5]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.

[6]

Francisco M Castro, Manuel J Marín-Jiménez, Nicolás Guil, Cordelia Schmid, and Karteek Alahari. 2018. End-to-end incremental learning. In Proceedings of the European conference on computer vision (ECCV). 233--248.

Digital Library

[7]

Arslan Chaudhry, Puneet K Dokania, Thalaiyasingam Ajanthan, and Philip HS Torr. 2018. Riemannian walk for incremental learning: Understanding forgetting and intransigence. In Proceedings of the European conference on computer vision (ECCV). 532--547.

Digital Library

[8]

Junwen Chen, Gaurav Mittal, Ye Yu, Yu Kong, and Mei Chen. 2022. Gatehub: Gated history unit with background suppression for online action detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19925--19934.

[9]

Roeland De Geest, Efstratios Gavves, Amir Ghodrati, Zhenyang Li, Cees Snoek, and Tinne Tuytelaars. 2016. Online action detection. In Computer Vision--ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11--14, 2016, Proceedings, Part V 14. Springer, 269--284.

[10]

Matthias De Lange, Rahaf Aljundi, Marc Masana, Sarah Parisot, Xu Jia, Alevs Leonardis, Gregory Slabaugh, and Tinne Tuytelaars. 2021. A continual learning survey: Defying forgetting in classification tasks. IEEE transactions on pattern analysis and machine intelligence, Vol. 44, 7 (2021), 3366--3385.

[11]

Arthur Douillard, Matthieu Cord, Charles Ollion, Thomas Robert, and Eduardo Valle. 2020. Podnet: Pooled outputs distillation for small-tasks incremental learning. In Computer vision--ECCV 2020: 16th European conference, Glasgow, UK, August 23--28, 2020, proceedings, part XX 16. Springer, 86--102.

[12]

Sayna Ebrahimi, Franziska Meier, Roberto Calandra, Trevor Darrell, and Marcus Rohrbach. 2020. Adversarial continual learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part XI 16. Springer, 386--402.

[13]

Hyunjun Eun, Jinyoung Moon, Jongyoul Park, Chanho Jung, and Changick Kim. 2020. Learning to discriminate information for online action detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 809--818.

[14]

Jiyang Gao, Zhenheng Yang, and Ram Nevatia. 2017. Red: Reinforced encoder-decoder networks for action anticipation. arXiv preprint arXiv:1707.04818 (2017).

[15]

Alex Graves and Alex Graves. 2012. Long short-term memory. Supervised sequence labelling with recurrent neural networks (2012), 37--45.

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[17]

Saihui Hou, Xinyu Pan, Chen Change Loy, Zilei Wang, and Dahua Lin. 2019. Learning a unified classifier incrementally via rebalancing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 831--839.

[18]

Haroon Idrees, Amir R Zamir, Yu-Gang Jiang, Alex Gorban, Ivan Laptev, Rahul Sukthankar, and Mubarak Shah. 2017. The thumos challenge on action recognition for videos ?in the wild?. Computer Vision and Image Understanding, Vol. 155 (2017), 1--23.

[19]

Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning. pmlr, 448--456.

[20]

Cheng-Bin Jin, Shengzhe Li, Trung Dung Do, and Hakil Kim. 2015. Real-time human action recognition using CNN over temporal images for static video surveillance cameras. In Advances in Multimedia Information Processing--PCM 2015: 16th Pacific-Rim Conference on Multimedia, Gwangju, South Korea, September 16--18, 2015, Proceedings, Part II 16. Springer, 330--339.

[21]

Mahmut Kaya and Hasan cSakir Bilge. 2019. Deep metric learning: A survey. Symmetry, Vol. 11, 9 (2019), 1066.

[22]

Christoph G Keller and Dariu M Gavrila. 2013. Will the pedestrian cross? a study on pedestrian path prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 15, 2 (2013), 494--506.

[23]

Ronald Kemker and Christopher Kanan. 2017. Fearnet: Brain-inspired model for incremental learning. arXiv preprint arXiv:1711.10563 (2017).

[24]

Jinkyu Kim, Teruhisa Misu, Yi-Ting Chen, Ashish Tawari, and John Canny. 2019. Grounding human-to-vehicle advice for self-driving vehicles. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10591--10599.

[25]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[26]

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. 2017. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, Vol. 114, 13 (2017), 3521--3526.

[27]

Zhizhong Li and Derek Hoiem. 2017. Learning without forgetting. IEEE transactions on pattern analysis and machine intelligence, Vol. 40, 12 (2017), 2935--2947.

[28]

Siyu Liu, Jian Cheng, Ziying Xia, Zhilong Xi, Qin Hou, and Zhicheng Dong. 2023. HCM: Online Action Detection With Hard Video Clip Mining. IEEE Transactions on Multimedia (2023).

[29]

Arun Mallya, Dillon Davis, and Svetlana Lazebnik. 2018. Piggyback: Adapting a single network to multiple tasks by learning to mask weights. In Proceedings of the European conference on computer vision (ECCV). 67--82.

Digital Library

[30]

Arun Mallya and Svetlana Lazebnik. 2018. Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 7765--7773.

[31]

Marc Masana, Xialei Liu, Bartłomiej Twardowski, Mikel Menta, Andrew D Bagdanov, and Joost Van De Weijer. 2022. Class-incremental learning: survey and performance evaluation on image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 45, 5 (2022), 5513--5533.

Digital Library

[32]

Michael McCloskey and Neal J Cohen. 1989. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation. Vol. 24. Elsevier, 109--165.

[33]

Oleksiy Ostapenko, Mihai Puscas, Tassilo Klein, Patrick Jahnichen, and Moin Nabi. 2019. Learning to remember: A synaptic plasticity driven framework for continual learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11321--11329.

[34]

Guansong Pang, Cheng Yan, Chunhua Shen, Anton van den Hengel, and Xiao Bai. 2020. Self-trained deep ordinal regression for end-to-end video anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 12173--12182.

[35]

German I Parisi, Ronald Kemker, Jose L Part, Christopher Kanan, and Stefan Wermter. 2019. Continual lifelong learning with neural networks: A review. Neural networks, Vol. 113 (2019), 54--71.

[36]

Hyunjong Park, Jongyoun Noh, and Bumsub Ham. 2020. Learning memory-guided normality for anomaly detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 14372--14381.

[37]

Jaeyoo Park, Minsoo Kang, and Bohyung Han. 2021. Class-incremental learning for action recognition in videos. In Proceedings of the IEEE/CVF international conference on computer vision. 13698--13707.

[38]

Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).

[39]

Yixuan Pei, Zhiwu Qing, Jun Cen, Xiang Wang, Shiwei Zhang, Yaxiong Wang, Mingqian Tang, Nong Sang, and Xueming Qian. 2022. Learning a condensed frame for memory-efficient video class-incremental learning. Advances in Neural Information Processing Systems, Vol. 35 (2022), 31002--31016.

[40]

Yixuan Pei, Zhiwu Qing, Shiwei Zhang, Xiang Wang, Yingya Zhang, Deli Zhao, and Xueming Qian. 2023. Space-time prompting for video class-incremental learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 11932--11942.

[41]

Ameya Prabhu, Philip HS Torr, and Puneet K Dokania. 2020. Gdumb: A simple approach that questions our progress in continual learning. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23--28, 2020, Proceedings, Part II 16. Springer, 524--540.

[42]

Robert Clay Prim. 1957. Shortest connection networks and some generalizations. The Bell System Technical Journal, Vol. 36, 6 (1957), 1389--1401.

[43]

Amal Rannen, Rahaf Aljundi, Matthew B Blaschko, and Tinne Tuytelaars. 2017. Encoder based lifelong learning. In Proceedings of the IEEE international conference on computer vision. 1320--1328.

[44]

Sylvestre-Alvise Rebuffi, Alexander Kolesnikov, Georg Sperl, and Christoph H Lampert. 2017. icarl: Incremental classifier and representation learning. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2001--2010.

[45]

Andrei A Rusu, Neil C Rabinowitz, Guillaume Desjardins, Hubert Soyer, James Kirkpatrick, Koray Kavukcuoglu, Razvan Pascanu, and Raia Hadsell. 2016. Progressive neural networks. arXiv preprint arXiv:1606.04671 (2016).

[46]

Jonathan Schwarz, Wojciech Czarnecki, Jelena Luketina, Agnieszka Grabska-Barwinska, Yee Whye Teh, Razvan Pascanu, and Raia Hadsell. 2018. Progress & compress: A scalable framework for continual learning. In International conference on machine learning. PMLR, 4528--4537.

[47]

Hanul Shin, Jung Kwon Lee, Jaehong Kim, and Jiwon Kim. 2017. Continual learning with deep generative replay. Advances in neural information processing systems, Vol. 30 (2017).

Digital Library

[48]

Konstantin Shmelkov, Cordelia Schmid, and Karteek Alahari. 2017. Incremental learning of object detectors without catastrophic forgetting. In Proceedings of the IEEE international conference on computer vision. 3400--3409.

[49]

Daniel L Silver and Robert E Mercer. 2002. The task rehearsal method of life-long learning: Overcoming impoverished data. In Advances in Artificial Intelligence: 15th Conference of the Canadian Society for Computational Studies of Intelligence, AI 2002 Calgary, Canada, May 27--29, 2002 Proceedings 15. Springer, 90--101.

[50]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[51]

Andrés Villa, Juan León Alcázar, Motasem Alfarra, Kumail Alhamoud, Julio Hurtado, Fabian Caba Heilbron, Alvaro Soto, and Bernard Ghanem. 2023. Pivot: Prompting for video continual learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 24214--24223.

[52]

Andrés Villa, Kumail Alhamoud, Victor Escorcia, Fabian Caba, Juan León Alcázar, and Bernard Ghanem. 2022. vclimb: A novel video class incremental learning benchmark. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 19035--19044.

[53]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European conference on computer vision. Springer, 20--36.

[54]

Liyuan Wang, Xingxing Zhang, Kuo Yang, Longhui Yu, Chongxuan Li, Lanqing Hong, Shifeng Zhang, Zhenguo Li, Yi Zhong, and Jun Zhu. 2022. Memory replay with data compression for continual learning. arXiv preprint arXiv:2202.06592 (2022).

[55]

Xiang Wang, Shiwei Zhang, Zhiwu Qing, Yuanjie Shao, Zhengrong Zuo, Changxin Gao, and Nong Sang. 2021. Oadtr: Online action detection with transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 7565--7575.

[56]

Yue Wu, Yinpeng Chen, Lijuan Wang, Yuancheng Ye, Zicheng Liu, Yandong Guo, and Yun Fu. 2019. Large scale incremental learning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 374--382.

[57]

Ye Xiang, Ying Fu, Pan Ji, and Hua Huang. 2019. Incremental learning using conditional adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 6619--6628.

[58]

Mingze Xu, Mingfei Gao, Yi-Ting Chen, Larry S Davis, and David J Crandall. 2019. Temporal recurrent networks for online action detection. In Proceedings of the IEEE/CVF international conference on computer vision. 5532--5541.

[59]

Mingze Xu, Yuanjun Xiong, Hao Chen, Xinyu Li, Wei Xia, Zhuowen Tu, and Stefano Soatto. 2021. Long short-term transformer for online action detection. Advances in Neural Information Processing Systems, Vol. 34 (2021), 1086--1099.

[60]

Friedemann Zenke, Ben Poole, and Surya Ganguli. 2017. Continual learning through synaptic intelligence. In International conference on machine learning. PMLR, 3987--3995.

[61]

Hanbin Zhao, Xin Qin, Shihao Su, Yongjian Fu, Zibo Lin, and Xi Li. 2021. When video classification meets incremental classes. In Proceedings of the 29th ACM International Conference on Multimedia. 880--889.

Digital Library

Index Terms

TS-ILM:Class Incremental Learning for Online Action Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Temporally smooth online action detection using cycle-consistent future anticipation
Highlights
- We propose a novel network that anticipates future and applies temporal smoothness for online action detection.
Abstract
Many video understanding tasks work in the offline setting by assuming that the input video is given from the start to the end. However, many real-world problems require the online setting, making a decision immediately using only the ...
SCOAD: Single-Frame Click Supervision for Online Action Detection
Computer Vision – ACCV 2022
Abstract
Online action detection based on supervised learning requires heavy manual annotation, which is difficult to obtain and may be impractical in real applications. Weakly supervised online action detection (WOAD) can effectively mitigate the problem ...
F2S-Net: learning frame-to-segment prediction for online action detection
Abstract
Online action detection (OAD) aims at predicting action per frame from a streaming untrimmed video in real time. Most existing approaches leverage all the historical frames in the sliding window as the temporal context of the current frame since ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

October 2024

11719 pages

ISBN:9798400706868

DOI:10.1145/3664647

General Chairs:
Jianfei Cai
Monash University, Australia
,
Mohan Kankanhalli
NUS, Singapore
,
Balakrishnan Prabhakaran
UT Dallas, USA
,
Susanne Boll
University of Oldenburg, Germany
,
Program Chairs:
Ramanathan Subramanian
University of Canberra & IIT Ropar, Australia
,
Liang Zheng
Australian National University, Australia
,
Vivek K. Singh
Rutgers University, USA
,
Pablo Cesar
Centrum Wiskunde & Informatica, Netherlands
,
Lexing Xie
Australian National University, Australia
,
Dong Xu
University of Hong Kong, Hong Kong

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 October 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NNSFC&CAAC
Natural Science Foundation of Sichuan, China
National Natural Science Foundation of China

Conference

MM '24

Sponsor:

SIGMM

MM '24: The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne VIC, Australia

Acceptance Rates

MM '24 Paper Acceptance Rate 1,150 of 4,385 submissions, 26%;

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
51
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)33

Reflects downloads up to 19 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents