[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3503161.3548238acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Progressive Cross-modal Knowledge Distillation for Human Action Recognition

Published: 10 October 2022 Publication History

Abstract

Wearable sensor-based Human Action Recognition (HAR) has achieved remarkable success recently. However, the accuracy performance of wearable sensor-based HAR is still far behind the ones from the visual modalities-based system (i.e., RGB video, skeleton and depth). Diverse input modalities can provide complementary cues and thus improve the accuracy performance of HAR, but how to take advantage of multi-modal data on wearable sensor-based HAR has rarely been explored. Currently, wearable devices, i.e., smartwatches, can only capture limited kinds of non-visual modality data. This hinders the multi-modal HAR association as it is unable to simultaneously use both visual and non-visual modality data. Another major challenge lies in how to efficiently utilize multi-modal data on wearable devices with their limited computation resources. In this work, we propose a novel Progressive Skeleton-to-sensor Knowledge Distillation (PSKD) model which utilizes only time-series data, i.e., accelerometer data, from a smartwatch for solving the wearable sensor-based HAR problem. Specifically, we construct multiple teacher models using data from both teacher (human skeleton sequence) and student (time-series accelerometer data) modalities. In addition, we propose an effective progressive learning scheme to eliminate the performance gap between teacher and student models. We also designed a novel loss function called Adaptive-Confidence Semantic (ACS), to allow the student model to adaptively select either one of the teacher models or the ground-truth label it needs to mimic. To demonstrate the effectiveness of our proposed PSKD method, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD and MMAct datasets. The results confirm that the proposed PSKD method has competitive performance compared to the previous mono sensor-based HAR methods.

Supplementary Material

MP4 File (MM22-fp2032.mp4)
Short presentation for the paper "Progressive Cross-modal Knowledge Distillation for Human Action Recognition"

References

[1]
Zeeshan Ahmad and Naimul Khan. 2020. CNN-based multistage gated average fusion (MGAF) for human action recognition using depth and inertial sensors. IEEE Sensors Journal, Vol. 21, 3 (2020), 3623--3634.
[2]
Zeeshan Ahmad and Naimul Mefraz Khan. 2019. Multidomain multimodal fusion for human action recognition using inertial sensors. In BigMM. 429--434.
[3]
Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? Advances in neural information processing systems, Vol. 27 (2014).
[4]
Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Realtime multi-person 2d pose estimation using part affinity fields. In CVPR. 7291--7299.
[5]
Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2015. UTD-MHAD: A multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. In ICIP. 168--172.
[6]
Yuqing Chen and Yang Xue. 2015. A deep learning approach to human activity recognition based on single accelerometer. In 2015 IEEE international conference on systems, man, and cybernetics. 1488--1492.
[7]
Ke Cheng, Yifan Zhang, Xiangyu He, Weihan Chen, Jian Cheng, and Hanqing Lu. 2020. Skeleton-based action recognition with shift graph convolutional network. In CVPR. 183--192.
[8]
Avigyan Das, Pritam Sil, Pawan Kumar Singh, Vikrant Bhateja, and Ram Sarkar. 2020. MMHAR-EnsemNet: A multi-modal human activity recognition model. IEEE Sensors Journal, Vol. 21, 10 (2020), 11569--11576.
[9]
Neha Dawar and Nasser Kehtarnavaz. 2018. A convolutional neural network-based sensor fusion system for monitoring transition movements in healthcare applications. In ICCA. 482--485.
[10]
Neha Dawar, Sarah Ostadabbas, and Nasser Kehtarnavaz. 2018. Data augmentation in deep learning-based fusion of depth and inertial sensing for action recognition. IEEE Sensors Letters, Vol. 3, 1 (2018), 1--4.
[11]
Florenc Demrozi, Graziano Pravadelli, Azra Bihorac, and Parisa Rashidi. 2020. Human activity recognition using inertial, physiological and environmental sensors: A comprehensive survey. IEEE Access, Vol. 8 (2020), 210816--210836.
[12]
Yong Du, Wei Wang, and Liang Wang. 2015. Hierarchical recurrent neural network for skeleton based action recognition. In CVPR. 1110--1118.
[13]
Michael Duhme. 2021. Multimodal Action Recognition using Graph Convolutional Neural Networks. Master's thesis. University of Koblenz-Landau, Active Vision Group.
[14]
Michael Duhme, Raphael Memmesheimer, and Dietrich Paulus. 2021. Fusion-GCN: Multimodal Action Recognition using Graph Convolutional Networks. In DAGM German Conference on Pattern Recognition. Springer, 265--281.
[15]
Takashi Fukuda, Masayuki Suzuki, Gakuto Kurata, Samuel Thomas, Jia Cui, and Bhuvana Ramabhadran. 2017. Efficient Knowledge Distillation from an Ensemble of Teachers. In Interspeech. 3697--3701.
[16]
Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, and Lorenzo Torresani. 2020. Listen to look: Action recognition by previewing audio. In CVPR. 10457--10467.
[17]
Nuno C Garcia, Sarah Adel Bargal, Vitaly Ablavsky, Pietro Morerio, Vittorio Murino, and Stan Sclaroff. 2019. Dmcl: Distillation multiple choice learning for multimodal action recognition. arXiv preprint arXiv:1912.10982 (2019).
[18]
Enrique Garcia-Ceja, Carlos E Galván-Tejada, and Ramon Brena. 2018. Multi-view stacking for activity recognition with sound and accelerometer data. Information Fusion, Vol. 40 (2018), 45--56.
[19]
Jianping Gou, Baosheng Yu, Stephen J Maybank, and Dacheng Tao. 2021. Knowledge distillation: A survey. International Journal of Computer Vision, Vol. 129, 6 (2021), 1789--1819.
[20]
Haodong Guo, Ling Chen, Liangying Peng, and Gencai Chen. 2016. Wearable sensor based multimodal human activity recognition exploiting the diversity of classifier ensemble. In UbiComp. 1112--1123.
[21]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[22]
Judy Hoffman, Saurabh Gupta, and Trevor Darrell. 2016. Learning with side information through modality hallucination. In CVPR. 826--834.
[23]
Mohamed E Hussein, Marwan Torki, Mohammad A Gowayyed, and Motaz El-Saban. 2013. Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations. In IJCAI.
[24]
Md Mofijul Islam and Tariq Iqbal. 2020. Hamlet: A hierarchical multimodal attention-based human activity recognition algorithm. In IROS. 10285--10292.
[25]
Xiao Jin, Baoyun Peng, Yichao Wu, Yu Liu, Jiaheng Liu, Ding Liang, Junjie Yan, and Xiaolin Hu. 2019. Knowledge distillation via route constrained optimization. In ICCV. 1345--1354.
[26]
Ioannis Kapsouras and Nikos Nikolaidis. 2014. Action recognition on motion capture data using a dynemes and forward differences representation. Journal of Visual Communication and Image Representation, Vol. 25, 6 (2014), 1432--1445.
[27]
Adil Mehmood Khan, Y-K Lee, Seok-Yong Lee, and T-S Kim. 2010. Human activity recognition via an accelerometer-enabled-smartphone using kernel discriminant analysis. In 2010 5th international conference on future information technology. 1--6.
[28]
Jangho Kim, SeongUk Park, and Nojun Kwak. 2018. Paraphrasing complex network: Network compression via factor transfer. Advances in neural information processing systems, Vol. 31 (2018).
[29]
Quan Kong, Wenpeng Wei, Ziwei Deng, Tomoaki Yoshinaga, and Tomokazu Murakami. 2020. Cycle-contrast for self-supervised video representation learning. arXiv preprint arXiv:2010.14810 (2020).
[30]
Quan Kong, Ziming Wu, Ziwei Deng, Martin Klinkigt, Bin Tong, and Tomokazu Murakami. 2019. Mmact: A large-scale dataset for cross modal human action understanding. In ICCV. 8658--8667.
[31]
Kisoo Kwon, Hwidong Na, Hoshik Lee, and Nam Soo Kim. 2020. Adaptive knowledge distillation based on entropy. In ICASSP. 7409--7413.
[32]
Song-Mi Lee, Sang Min Yoon, and Heeryon Cho. 2017. Human activity recognition from accelerometer data using Convolutional Neural Network. In BigComp. 131--134.
[33]
Bin Li, Xi Li, Zhongfei Zhang, and Fei Wu. 2019. Spatio-temporal graph routing for skeleton-based action recognition. In AAAI, Vol. 33. 8561--8568.
[34]
Jianan Li, Xuemei Xie, Qingzhe Pan, Yuhan Cao, Zhifu Zhao, and Guangming Shi. 2020. SGM-Net: Skeleton-guided multimodal network for action recognition. Pattern Recognition, Vol. 104 (2020), 107356.
[35]
Maosen Li, Siheng Chen, Xu Chen, Ya Zhang, Yanfeng Wang, and Qi Tian. 2021. Symbiotic graph neural networks for 3d skeleton-based human action recognition and motion prediction. TPAMI (2021).
[36]
Ji Lin, Chuang Gan, and Song Han. 2019. Tsm: Temporal shift module for efficient video understanding. In ICCV. 7083--7093.
[37]
Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. 2016. Spatio-temporal lstm with trust gates for 3d human action recognition. In ECCV. Springer, 816--833.
[38]
Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C Kot. 2017a. Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Transactions on Image Processing, Vol. 27, 4 (2017), 1586--1599.
[39]
Jun Liu, Gang Wang, Ping Hu, Ling-Yu Duan, and Alex C Kot. 2017b. Global context-aware attention lstm networks for 3d action recognition. In CVPR. 1647--1656.
[40]
Linqing Liu, Huan Wang, Jimmy Lin, Richard Socher, and Caiming Xiong. 2019. Attentive student meets multi-task teacher: Improved knowledge distillation for pretrained models. (2019).
[41]
Mengyuan Liu and Junsong Yuan. 2018. Recognizing human actions as the evolution of pose estimation maps. In CVPR. 1159--1168.
[42]
Yang Liu, Keze Wang, Guanbin Li, and Liang Lin. 2021. Semantics-aware adaptive knowledge distillation for sensor-to-vision action recognition. IEEE Transactions on Image Processing, Vol. 30 (2021), 5573--5588.
[43]
Jianjie Lu and Kai-Yu Tong. 2019. Robust single accelerometer-based activity recognition using modified recurrence plot. IEEE Sensors Journal, Vol. 19, 15 (2019), 6317--6324.
[44]
Taylor Mauldin, Anne H Ngu, Vangelis Metsis, and Marc E Canby. 2020. Ensemble deep learning on wearables using small datasets. ACM Transactions on Computing for Healthcare, Vol. 2, 1 (2020), 1--30.
[45]
Taylor R Mauldin, Marc E Canby, Vangelis Metsis, Anne HH Ngu, and Coralys Cubero Rivera. 2018. SmartFall: A smartwatch-based fall detection system using deep learning. Sensors, Vol. 18, 10 (2018), 3363.
[46]
Uwe Maurer, Asim Smailagic, Daniel P Siewiorek, and Michael Deisher. 2006. Activity recognition and monitoring using multiple sensors on different body positions. In International Workshop on Wearable and Implantable Body Sensor Networks (BSN'06). 4--pp.
[47]
Sakorn Mekruksavanich and Anuchit Jitpattanakul. 2020. Smartwatch-based human activity recognition using hybrid lstm network. In 2020 IEEE SENSORS. 1--4.
[48]
Sakorn Mekruksavanich and Anuchit Jitpattanakul. 2021. A Multichannel CNN-LSTM network for daily activity recognition using smartwatch sensor data. In 2021 Joint International Conference on Digital Arts, Media and Technology with ECTI Northern Section Conference on Electrical, Electronics, Computer and Telecommunication Engineering. 277--280.
[49]
Sakorn Mekruksavanich, Anuchit Jitpattanakul, Phichai Youplao, and Preecha Yupapin. 2020. Enhanced hand-oriented activity recognition based on smartwatch sensor data using lstms. Symmetry, Vol. 12, 9 (2020), 1570.
[50]
Seyed Iman Mirzadeh, Mehrdad Farajtabar, Ang Li, Nir Levine, Akihiro Matsukawa, and Hassan Ghasemzadeh. 2020. Improved knowledge distillation via teacher assistant. In AAAI, Vol. 34. 5191--5198.
[51]
Federico Monti, Karl Otness, and Michael M Bronstein. 2018. Motifnet: a motif-based graph convolutional network for directed graphs. In DSW. 225--228.
[52]
Jianyuan Ni, Raunak Sarbajna, Yang Liu, Anne HH Ngu, and Yan Yan. 2021. Cross-modal Knowledge Distillation for Vision-to-Sensor Action Recognition. arXiv preprint arXiv:2112.01849 (2021).
[53]
Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, René Vidal, and Ruzena Bajcsy. 2013. Berkeley mhad: A comprehensive multimodal human action database. In WACV. 53--60.
[54]
Francisco Javier Ordó nez and Daniel Roggen. 2016. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors, Vol. 16, 1 (2016), 115.
[55]
Ayodeji Oseni, Nour Moustafa, Helge Janicke, Peng Liu, Zahir Tari, and Athanasios Vasilakos. 2021. Security and privacy for artificial intelligence: Opportunities and challenges. arXiv preprint arXiv:2102.04661 (2021).
[56]
Madhuri Panwar, S Ram Dyuthi, K Chandra Prakash, Dwaipayan Biswas, Amit Acharyya, Koushik Maharatna, Arvind Gautam, and Ganesh R Naik. 2017. CNN based approach for activity recognition using a wrist-worn accelerometer. In EMBC. 2438--2441.
[57]
Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In CVPR. 3967--3976.
[58]
Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang. 2019. Correlation congruence for knowledge distillation. In ICCV. 5007--5016.
[59]
Cuong Pham, Linh Nguyen, Anh Nguyen, Ngon Nguyen, and Van-Toi Nguyen. 2021. Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks. Multimedia Tools and Applications, Vol. 80, 19 (2021), 28919--28940.
[60]
Mehdi Rezagholizadeh, Aref Jafari, Puneeth Salad, Pranav Sharma, Ali Saheb Pasand, and Ali Ghodsi. 2021. Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher. arXiv preprint arXiv:2110.08532 (2021).
[61]
Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2014. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014).
[62]
Guto Leoni Santos, Patricia Takako Endo, Kayo Henrique de Carvalho Monteiro, Elisson da Silva Rocha, Ivanovitch Silva, and Theo Lynn. 2019. Accelerometer-based human fall detection using convolutional neural networks. Sensors, Vol. 19, 7 (2019), 1644.
[63]
Lei Shi, Yifan Zhang, Jian Cheng, and Hanqing Lu. 2019. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In CVPR. 12026--12035.
[64]
Wenxian Shi, Yuxuan Song, Hao Zhou, Bohan Li, and Lei Li. 2021. Follow your path: a progressive method for knowledge distillation. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 596--611.
[65]
Chenyang Si, Ya Jing, Wei Wang, Liang Wang, and Tieniu Tan. 2020. Skeleton-based action recognition with hierarchical spatial reasoning and temporal stack learning network. Pattern Recognition, Vol. 107 (2020), 107511.
[66]
Satya P Singh, Madan Kumar Sharma, Aimé Lay-Ekuakille, Deepak Gangwar, and Sukrit Gupta. 2020. Deep ConvLSTM with self-attention for human activity decoding using wearable sensors. IEEE Sensors Journal, Vol. 21, 6 (2020), 8575--8582.
[67]
Sijie Song, Cuiling Lan, Junliang Xing, Wenjun Zeng, and Jiaying Liu. 2017. An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In AAAI, Vol. 31.
[68]
Yi-Fan Song, Zhang Zhang, Caifeng Shan, and Liang Wang. 2021. Constructing stronger and faster baselines for skeleton-based action recognition. arXiv preprint arXiv:2106.15125 (2021).
[69]
Pablo Speciale, Johannes L Schonberger, Sing Bing Kang, Sudipta N Sinha, and Marc Pollefeys. 2019. Privacy preserving image-based localization. In CVPR. 5493--5503.
[70]
Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, and Jun Liu. 2020. Human action recognition from various data modalities: A review. arXiv preprint arXiv:2012.11866 (2020).
[71]
Antti Tarvainen and Harri Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. Advances in neural information processing systems, Vol. 30 (2017).
[72]
Fida Mohammad Thoker and Juergen Gall. 2019. Cross-modal knowledge distillation for action recognition. In ICIP. 6--10.
[73]
Frederick Tung and Greg Mori. 2019. Similarity-preserving knowledge distillation. In ICCV. 1365--1374.
[74]
Suraj Vantigodi and R Venkatesh Babu. 2013. Real-time human action recognition from motion capture data. In NCVPRIPG. 1--4.
[75]
Suraj Vantigodi and Venkatesh Babu Radhakrishnan. 2014. Action recognition from motion capture data using meta-cognitive rbf network classifier. In ISSNIP. 1--6.
[76]
Raviteja Vemulapalli, Felipe Arrate, and Rama Chellappa. 2014. Human action recognition by representing 3d skeletons as points in a lie group. In CVPR. 588--595.
[77]
Vini Vijayan, James P Connolly, Joan Condell, Nigel McKelvey, and Philip Gardiner. 2021. Review of wearable devices and data collection considerations for connected health. Sensors, Vol. 21, 16 (2021), 5589.
[78]
Michalis Vrigkas, Christophoros Nikou, and Ioannis A Kakadiaris. 2015. A review of human activity recognition methods. Frontiers in Robotics and AI, Vol. 2 (2015), 28.
[79]
Jiang Wang, Zicheng Liu, Ying Wu, and Junsong Yuan. 2013. Learning actionlet ensemble for 3D human action recognition. TPAMI, Vol. 36, 5 (2013), 914--927.
[80]
Jiahao Wang, Qiuling Long, Kexuan Liu, Yingzi Xie, et al. 2019. Human action recognition on cellphone using compositional bidir-lstm-cnn networks. In CNCI. Atlantis Press, 687--692.
[81]
Kai Wang, Yu Liu, Qian Ma, and Quan Z Sheng. 2021. Mulde: Multi-teacher knowledge distillation for low-dimensional knowledge graph embeddings. In Proceedings of the Web Conference 2021. 1716--1726.
[82]
LuKun Wang and RuYue Liu. 2020. Human activity recognition based on wearable sensor using hierarchical deep LSTM networks. Circuits, Systems, and Signal Processing, Vol. 39, 2 (2020), 837--856.
[83]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016b. Temporal segment networks: Towards good practices for deep action recognition. In ECCV. Springer, 20--36.
[84]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2018. Temporal segment networks for action recognition in videos. TPAMI, Vol. 41, 11 (2018), 2740--2755.
[85]
Pichao Wang, Zhaoyang Li, Yonghong Hou, and Wanqing Li. 2016a. Action recognition based on joint trajectory maps using convolutional neural networks. In Proceedings of the 24th ACM international conference on Multimedia. 102--106.
[86]
Haoran Wei, Roozbeh Jafari, and Nasser Kehtarnavaz. 2019. Fusion of video and inertial sensing for deep learning-based human action recognition. Sensors, Vol. 19, 17 (2019), 3680.
[87]
Cong Wu, Xiao-Jun Wu, and Josef Kittler. 2019b. Spatial residual layer and dense connection block enhanced spatial temporal graph convolutional network for skeleton-based action recognition. In ICCV workshops. 0--0.
[88]
Meng-Chieh Wu, Ching-Te Chiu, and Kun-Hsuan Wu. 2019a. Multi-teacher knowledge distillation for compressed video action recognition on deep neural networks. In ICASSP. 2202--2206.
[89]
Kun Xia, Jianguang Huang, and Hanyu Wang. 2020. LSTM-CNN architecture for human activity recognition. IEEE Access, Vol. 8 (2020), 56855--56866.
[90]
Liuyu Xiang, Guiguang Ding, and Jungong Han. 2020. Learning from multiple experts: Self-paced knowledge distillation for long-tailed classification. In ECCV. Springer, 247--263.
[91]
Renyi Xiao, Yonghong Hou, Zihui Guo, Chuankun Li, Pichao Wang, and Wanqing Li. 2019. Self-attention guided deep features for action recognition. In ICME. 1060--1065.
[92]
Ting-Bing Xu and Cheng-Lin Liu. 2019. Data-distortion guided self-distillation for deep neural networks. In AAAI, Vol. 33. 5565--5572.
[93]
Sijie Yan, Yuanjun Xiong, and Dahua Lin. 2018. Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-second AAAI conference on artificial intelligence.
[94]
Kaiyu Yang, Jacqueline Yau, Li Fei-Fei, Jia Deng, and Olga Russakovsky. 2021. A study of face obfuscation in imagenet. arXiv preprint arXiv:2103.06191 (2021).
[95]
Jingwen Ye, Yixin Ji, Xinchao Wang, Kairi Ou, Dapeng Tao, and Mingli Song. 2019. Student becoming the master: Knowledge amalgamation for joint scene parsing, depth estimation, and more. In CVPR. 2829--2838.
[96]
Shan You, Chang Xu, Chao Xu, and Dacheng Tao. 2017. Learning from multiple teacher networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1285--1294.
[97]
Sukmin Yun, Jongjin Park, Kimin Lee, and Jinwoo Shin. 2020. Regularizing class-wise predictions via self-knowledge distillation. In CVPR. 13876--13885.
[98]
Sergey Zagoruyko and Nikos Komodakis. 2016. Paying more attention to attention: Improving the performance of convolutional neural networks via attention transfer. arXiv preprint arXiv:1612.03928 (2016).
[99]
Ming Zeng, Le T Nguyen, Bo Yu, Ole J Mengshoel, Jiang Zhu, Pang Wu, and Joy Zhang. 2014. Convolutional neural networks for human activity recognition using mobile sensors. In 6th international conference on mobile computing, applications and services. 197--205.
[100]
Hailin Zhang, Defang Chen, and Can Wang. 2021. Confidence-Aware Multi-Teacher Knowledge Distillation. arXiv preprint arXiv:2201.00007 (2021).
[101]
Linfeng Zhang, Jiebo Song, Anni Gao, Jingwei Chen, Chenglong Bao, and Kaisheng Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV. 3713--3722.
[102]
Songyang Zhang, Xiaoming Liu, and Jun Xiao. 2017. On geometric features for skeleton-based action recognition using multilayer lstm networks. In 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). 148--157.
[103]
Songyang Zhang, Yang Yang, Jun Xiao, Xiaoming Liu, Yi Yang, Di Xie, and Yueting Zhuang. 2018. Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. I?EEE Transactions on Multimedia, Vol. 20, 9 (2018), 2330--2343.
[104]
Ling Zhao, Yujiao Song, Chao Zhang, Yu Liu, Pu Wang, Tao Lin, Min Deng, and Haifeng Li. 2019a. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Transactions on Intelligent Transportation Systems, Vol. 21, 9 (2019), 3848--3858.
[105]
Rui Zhao, Kang Wang, Hui Su, and Qiang Ji. 2019b. Bayesian graph convolution LSTM for skeleton based action recognition. In ICCV. 6882--6892.
[106]
Yu Zhao, Rennong Yang, Guillaume Chevalier, Ximeng Xu, and Zhenxing Zhang. 2018. Deep residual bidir-LSTM for human activity recognition using wearable sensors. Mathematical Problems in Engineering, Vol. 2018 (2018).
[107]
Bolei Zhou, Alex Andonian, Aude Oliva, and Antonio Torralba. 2018. Temporal relational reasoning in videos. In ECCV. 803--818.

Cited By

View all
  • (2024)SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal SignalsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997798:4(1-30)Online publication date: 21-Nov-2024
  • (2024)Advancing Micro-Action Recognition with Multi-Auxiliary Heads and Hybrid Loss OptimizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688975(11313-11319)Online publication date: 28-Oct-2024
  • (2024)LHAR: Lightweight Human Activity Recognition on Knowledge DistillationIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.329893228:11(6318-6328)Online publication date: Nov-2024
  • Show More Cited By

Index Terms

  1. Progressive Cross-modal Knowledge Distillation for Human Action Recognition

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '22: Proceedings of the 30th ACM International Conference on Multimedia
    October 2022
    7537 pages
    ISBN:9781450392037
    DOI:10.1145/3503161
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 October 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. knowledge distillation
    2. machine learning
    3. progressive learning
    4. sensor-based human activity recognition

    Qualifiers

    • Research-article

    Funding Sources

    • NSF

    Conference

    MM '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)203
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 13 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal SignalsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997798:4(1-30)Online publication date: 21-Nov-2024
    • (2024)Advancing Micro-Action Recognition with Multi-Auxiliary Heads and Hybrid Loss OptimizationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3688975(11313-11319)Online publication date: 28-Oct-2024
    • (2024)LHAR: Lightweight Human Activity Recognition on Knowledge DistillationIEEE Journal of Biomedical and Health Informatics10.1109/JBHI.2023.329893228:11(6318-6328)Online publication date: Nov-2024
    • (2024)QMKD: A Two-Stage Approach to Enhance Multi-Teacher Knowledge Distillation2024 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN60899.2024.10650186(1-7)Online publication date: 30-Jun-2024
    • (2024)Multimodal Human Action Recognition Framework Using an Improved CNNGRU ClassifierIEEE Access10.1109/ACCESS.2024.348163112(158388-158406)Online publication date: 2024
    • (2024)Multiscale knowledge distillation with attention based fusion for robust human activity recognitionScientific Reports10.1038/s41598-024-63195-514:1Online publication date: 30-May-2024
    • (2024)A New Lightweight Framework Based on Knowledge Distillation for Reducing the Complexity of Multi-Modal Solar Irradiance Prediction ModelJournal of Cleaner Production10.1016/j.jclepro.2024.143663(143663)Online publication date: Sep-2024
    • (2024)TSAK: Two-Stage Semantic-Aware Knowledge Distillation for Efficient Wearable Modality and Model Optimization in Manufacturing LinesPattern Recognition10.1007/978-3-031-78389-0_14(201-216)Online publication date: 5-Dec-2024
    • (2024)LightHART: Lightweight Human Activity Recognition TransformerPattern Recognition10.1007/978-3-031-78354-8_27(425-441)Online publication date: 4-Dec-2024
    • (2023)Transfer Learning on Small Datasets for Improved Fall DetectionSensors10.3390/s2303110523:3(1105)Online publication date: 18-Jan-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media