[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3581783.3612144acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition

Published: 27 October 2023 Publication History

Abstract

Egocentric Action Recognition (EAR) is required to recognize both the interacting objects (noun) and the motion (verb) against cluttered backgrounds with distracting objects. For capturing interacting objects, traditional approaches heavily rely on luxury object annotations or detectors, though a few works heuristically enumerate the fixed sets of verb-constrained prototypes to roughly exclude the background. For capturing motion, the inherent variations of motion duration among egocentric videos with different lengths are almost ignored. To this end, we propose a novel Slowfast Diversity-aware Prototype learning (SDP) to effectively capture interacting objects by learning compact yet diverse prototypes, and adaptively capture motion in either long-time video or short-time video. Specifically, we present a new Part-to-Prototype (P2P) scheme to learn prototypes from raw videos covering the interacting objects by refining the semantic information from part level to prototype level. Moreover, for adaptively capturing motion, we design a new Slow-Fast Context (SFC) mechanism that explores the Up/Down augmentations for the prototype representation at the semantic level to strengthen the transient dynamic information in short-time videos and eliminate the redundant dynamic information in long-time videos, which are further fine-complemented via the slow- and fast-aware attentions. Extensive experiments demonstrate SDP outperforms state-of-the-art methods on two large-scale egocentric video benchmarks, i.e., EPIC-KITCHENS-100 and EGTEA.

References

[1]
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 6836--6846.
[2]
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In Proceedings of the International Conference on Machine Learning (ICML). 813--824.
[3]
Adrian Bulat, Juan Manuel Perez Rua, Swathikiran Sudhakaran, Brais Martinez, and Georgios Tzimiropoulos. 2021. Space-time mixing attention for video transformer. In Advances in Neural Information Processing Systems (NeurIPS). 19594-- 19607.
[4]
A Calway, W Mayol-Cuevas, D Damen, O Haines, and T Leelasawassuk. 2015. Discovering Task relevant objects and their modes of interaction from multi-user egocentric video. In Proceedings of the British Machine Vision Conference (BMVC). 1--13.
[5]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6299--6308.
[6]
Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. 2020. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning (ICML). 1597--1607.
[7]
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. 2018. Scaling egocentric vision: The epic-kitchens dataset. In Proceedings of the European Conference on Computer Vision (ECCV). 720--736.
[8]
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Sanja Fidler, Antonino Furnari, Evangelos Kazakos, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, and Michael Wray. 2021. The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2021), 4125--4141.
[9]
Dima Damen, Hazel Doughty, Giovanni Maria Farinella, Antonino Furnari, Evangelos Kazakos, Jian Ma, Davide Moltisanti, Jonathan Munro, Toby Perrett, Will Price, et al. 2020. Rescaling egocentric vision. arXiv preprint arXiv:2006.13256 (2020).
[10]
Dima Damen, Teesid Leelasawassuk, Osian Haines, Andrew Calway, and Walterio W Mayol-Cuevas. 2014. You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video. In Proceedings of the British Machine Vision Conference (BMVC). 1--13.
[11]
Ana Garcia Del Molino, Cheston Tan, Joo-Hwee Lim, and Ah-Hwee Tan. 2016. Summarization of egocentric videos: A comprehensive survey. IEEE Transactions on Human-Machine Systems 47, 1 (2016), 65--76.
[12]
Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Lu Yuan, Dong Chen, and Baining Guo. 2022. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 12124--12134.
[13]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[14]
Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, and Brais Martinez. 2022. SOS! Self-supervised Learning over Sets of Handled Objects in Egocentric Action Recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 604--620.
[15]
Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. 2021. Multiscale vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 6824--6835.
[16]
Yuxin Fang, Bencheng Liao, Xinggang Wang, Jiemin Fang, Jiyang Qi, Rui Wu, Jianwei Niu, and Wenyu Liu. 2021. You only look at one sequence: Rethinking transformer in vision through object detection. In Advances in Neural Information Processing Systems (NeurIPS). 26183--26197.
[17]
Alireza Fathi, Yin Li, James M Rehg, et al. 2012. Learning to Recognize Daily Actions Using Gaze. In Proceedings of the European Conference on Computer Vision (ECCV). 314--327.
[18]
Christoph Feichtenhofer, Haoqi Fan, Jitendra Malik, and Kaiming He. 2019. Slowfast networks for video recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 6202--6211.
[19]
Antonino Furnari, Sebastiano Battiato, and Giovanni Maria Farinella. 2018. Leveraging uncertainty to rethink loss functions and evaluation measures for egocentric action anticipation. In Proceedings of the European Conference on Computer Vision Workshops (ECCVW).
[20]
Antonino Furnari and Giovanni Maria Farinella. 2020. Rolling-unrolling lstms for action anticipation from first-person video. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 11 (2020), 4021--4036.
[21]
Rohit Girdhar, Mannat Singh, Nikhila Ravi, Laurens van der Maaten, Armand Joulin, and Ishan Misra. 2022. Omnivore: A single model for many visual modalities. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16102--16112.
[22]
Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, et al. 2022. Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 18995--19012.
[23]
Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 16000--16009.
[24]
Roei Herzig, Elad Ben-Avraham, Karttikeya Mangalam, Amir Bar, Gal Chechik, Anna Rohrbach, Trevor Darrell, and Amir Globerson. 2022. Object-region video transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3148--3159.
[25]
Yifei Huang, Minjie Cai, Zhenqiang Li, Feng Lu, and Yoichi Sato. 2020. Mutual context network for jointly estimating egocentric gaze and action. IEEE Transactions on Image Processing 29 (2020), 7795--7806.
[26]
Yi Huang, Xiaoshan Yang, and Changsheng Xu. 2021. Multimodal Global Relation Knowledge Distillation for Egocentric Action Anticipation. In Proceedings of the ACM International Conference on Multimedia (ACM MM). 245--254.
[27]
Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, and Remco Veltkamp. 2019. Multitask Learning to Improve Egocentric Action Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW).
[28]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, et al. 2017. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017).
[29]
Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, and Dima Damen. 2021. With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition. In Proceedings of the British Machine Vision Conference (BMVC). 1--16.
[30]
Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, and Dima Damen. 2019. Epic-fusion: Audio-visual temporal binding for egocentric action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 5492--5501.
[31]
Dongkeun Kim, Jinsung Lee, Minsu Cho, and Suha Kwak. 2022. Detector-free weakly supervised group activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 20083--20093.
[32]
Hildegard Kuehne, Hueihan Jhuang, Estíbaliz Garrote, Tomaso Poggio, and Thomas Serre. 2011. HMDB: a large video database for human motion recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2556--2563.
[33]
Yong Jae Lee and Kristen Grauman. 2015. Predicting important objects for egocentric video summarization. International Journal of Computer Vision 114 (2015), 38--55.
[34]
Haoxin Li,Wei-Shi Zheng, Jianguo Zhang, Haifeng Hu, Jiwen Lu, and Jian-Huang Lai. 2022. Egocentric Action Recognition by Automatic Relation Modeling. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2022), 489--507.
[35]
Yin Li, Miao Liu, and James M Rehg. 2018. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European conference on computer vision (ECCV). 619--635.
[36]
Ji Lin, Chuang Gan, and Song Han. 2019. Tsm: Temporal shift module for efficient video understanding. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 7083--7093.
[37]
Fanfan Liu, HaoranWei,Wenzhe Zhao, Guozhen Li, Jingquan Peng, and Zihao Li. 2021. WB-DETR: transformer-based detector without backbone. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2979--2987.
[38]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 10012--10022.
[39]
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. 2022. Video swin transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3202--3211.
[40]
Minlong Lu, Danping Liao, and Ze-Nian Li. 2019. Learning Spatiotemporal Attention for Egocentric Action Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). 4425--4434.
[41]
Jian Ma and Dima Damen. 2022. Hand-object interaction reasoning. In Proceedings of the International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1--8.
[42]
Kyle Min and Jason J Corso. 2021. Integrating human gaze into attention for egocentric activity recognition. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 1069--1078.
[43]
Sanath Narayan, Mohan S Kankanhalli, and Kalpathi R Ramakrishnan. 2014. Action and interaction recognition in first-person videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 512--518.
[44]
Mandela Patrick, Dylan Campbell, Yuki Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, and João F Henriques. 2021. Keeping your eye on the ball: Trajectory attention in video transformers. In Advances in Neural Information Processing Systems (NeurIPS). 12493--12506.
[45]
Chiara Plizzari, Mirco Planamente, Gabriele Goletto, Marco Cannici, Emanuele Gusso, Matteo Matteucci, and Barbara Caputo. 2022. E2 (go) motion: Motion augmented event stream for egocentric action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 19935--19947.
[46]
Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning spatio-temporal representation with pseudo-3d residual networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 5533--5541.
[47]
Francesco Ragusa, Antonino Furnari, Salvatore Livatino, and Giovanni Maria Farinella. 2021. The meccano dataset: Understanding human-object interactions from egocentric videos in an industrial-like domain. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). 1569--1578.
[48]
Michael S Ryoo and Larry Matthies. 2013. First-person activity recognition: What are they doing to me?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2730--2737.
[49]
Xiangbo Shu, Guo-Jun Qi, Jinhui Tang, and JingdongWang. 2015. Weakly-shared deep transfer networks for heterogeneous-domain knowledge propagation. In Proceedings of the ACM International Conference on Multimedia (ACM MM). 35--44.
[50]
Xiangbo Shu, Jinhui Tang, Hanjiang Lai, Luoqi Liu, and Shuicheng Yan. 2015. Personalized age progression with aging dictionary. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3970--3978.
[51]
Xiangbo Shu, Jinhui Tang, Guo-Jun Qi,Wei Liu, and Jian Yang. 2019. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 1110--1118.
[52]
Xiangbo Shu, Binqian Xu, Liyan Zhang, and Jinhui Tang. 2022. Multi-Granularity Anchor-Contrastive Representation Learning for Semi-Supervised Skeleton- Based Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[53]
Xiangbo Shu, Jiawen Yang, Rui Yan, and Yan Song. 2022. Expansion-squeeze excitation fusion network for elderly activity recognition. IEEE Transactions on Circuits and Systems for Video Technology 32, 8 (2022), 5281--5292.
[54]
Xiangbo Shu, Liyan Zhang, Guo-Jun Qi, Wei Liu, and Jinhui Tang. 2021. Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 6 (2021), 3300--3315.
[55]
Xiangbo Shu, Liyan Zhang, Yunlian Sun, and Jinhui Tang. 2020. Host--parasite: Graph LSTM-in-LSTM for group activity recognition. IEEE Transactions on Neural Networks and Learning Systems 32, 2 (2020), 663--674.
[56]
Gunnar A Sigurdsson, Abhinav Gupta, Cordelia Schmid, Ali Farhadi, and Karteek Alahari. 2018. Actor and observer: Joint modeling of first and third-person videos. In proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 7396--7404.
[57]
Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems (NeurIPS). 568--576.
[58]
Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, and Ngai-Man Cheung. 2016. Multimodal multi-stream deep learning for egocentric activity recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 24--31.
[59]
Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. 2019. Lsta: Long short-term attention for egocentric action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 9954--9963.
[60]
Swathikiran Sudhakaran and Oswald Lanz. 2018. Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition. In Proceedings of the British Machine Vision Conference (BMVC). 1--12.
[61]
Jinhui Tang, Xiangbo Shu, Rui Yan, and Liyan Zhang. 2019. Coherence constrained graph LSTM for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 2 (2019), 636--647.
[62]
Du Tran, Heng Wang, Lorenzo Torresani, Jamie Ray, Yann LeCun, and Manohar Paluri. 2018. A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 6450--6459.
[63]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems (NeurIPS). 5998--6008.
[64]
Guangzhi Wang, Yangyang Guo, Yongkang Wong, and Mohan Kankanhalli. 2022. Distance Matters in Human-Object Interaction Detection. In Proceedings of the ACM International Conference on Multimedia (ACM MM). 4546--4554.
[65]
Junbo Wang, Wei Wang, Zhiyong Wang, Liang Wang, Dagan Feng, and Tieniu Tan. 2019. Stacked memory network for video summarization. In Proceedings of the ACM International Conference on Multimedia (ACM MM). 836--844.
[66]
Lei Wang, Piotr Koniusz, and Du Q Huynh. 2019. Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8698--8708.
[67]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In Proceedings of the European Conference on Computer Vision (ECCV). 20--36.
[68]
Xiaohan Wang, Yu Wu, Linchao Zhu, and Yi Yang. 2020. Symbiotic attention with privileged information for egocentric action recognition. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), Vol. 34. 12249--12256.
[69]
Xiaohan Wang, Linchao Zhu, Heng Wang, and Yi Yang. 2021. Interactive prototype learning for egocentric action recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 8168--8177.
[70]
Xiaohan Wang, Linchao Zhu, Yu Wu, and Yi Yang. 2020. Symbiotic attention for egocentric action recognition with object-centric alignment. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[71]
Yunbo Wang, Mingsheng Long, Jianmin Wang, and Philip S Yu. 2017. Spatiotemporal pyramid network for video action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1529--1538.
[72]
Chao-Yuan Wu, Christoph Feichtenhofer, Haoqi Fan, Kaiming He, Philipp Krahenbuhl, and Ross Girshick. 2019. Long-term feature banks for detailed video understanding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 284--293.
[73]
Chao-YuanWu, Yanghao Li, Karttikeya Mangalam, Haoqi Fan, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. 2022. Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 13587--13597.
[74]
Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In Proceedings of the European Conference on Computer Vision (ECCV). 305--321.
[75]
Rui Yan, Peng Huang, Xiangbo Shu, Junhao Zhang, Yonghua Pan, and Jinhui Tang. 2022. Look Less Think More: Rethinking Compositional Action Recognition. In Proceedings of the ACM International Conference on Multimedia (ACM MM). 3666--3675.
[76]
Shen Yan, Xuehan Xiong, Anurag Arnab, Zhichao Lu, Mi Zhang, Chen Sun, and Cordelia Schmid. 2022. Multiview transformers for video recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 3333--3343.
[77]
Chuhan Zhang, Ankush Gupta, and Andrew Zisserman. 2022. Is an Object- Centric Video Representation Beneficial for Transfer?. In Proceedings of the Asian Conference on Computer Vision (ACCV). 1976--1994.
[78]
Da Zhang, Xiyang Dai, and Yuan-Fang Wang. 2018. Dynamic temporal pyramid network: A closer look at multi-scale modeling for activity detection. In Proceedings of the Asian Conference on Computer Vision (ACCV). 712--728.
[79]
Lu Zhang, Yang Wang, Jiaogen Zhou, Chenbo Zhang, Yinglu Zhang, Jihong Guan, Yatao Bian, and Shuigeng Zhou. 2022. Hierarchical Few-Shot Object Detection: Problem, Benchmark and Method. In Proceedings of the ACM International Conference on Multimedia (ACM MM). 2002--2011.
[80]
Yue Zhao, Ishan Misra, Philipp Krähenbühl, and Rohit Girdhar. 2022. Learning Video Representations from Large Language Models. arXiv preprint arXiv:2212.04501 (2022).
[81]
Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal relational reasoning in videos. In Proceedings of the European Conference on Computer Vision (ECCV). 803--818.

Index Terms

  1. Slowfast Diversity-aware Prototype Learning for Egocentric Action Recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. egocentric action recognition
    2. prototype learning
    3. video understanding

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 214
      Total Downloads
    • Downloads (Last 12 months)153
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 11 Dec 2024

    Other Metrics

    Citations

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media