[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3594806.3594840acmotherconferencesArticle/Chapter ViewAbstractPublication PagespetraConference Proceedingsconference-collections
research-article

Enhancing Action Recognition in Vehicle Environments With Human Pose Information

Published: 10 August 2023 Publication History

Abstract

Monitoring driver behavior and recognizing driver actions is a crucial task in modern semi-autonomous driving conditions, where secondary activities, irrelevant to driving, should be minimized. The driver activity recognition problem represents a subclass of the widely studied action recognition task, but poses additional challenges stemming from the environment, the appearance of the participants, and the limited data availability for this specific task. Furthermore, the similarity of body movements and the nuanced changes when performing different actions further complicate the classification process. In this work, we explore the effectiveness of Temporal Segment Networks (TSNs) on the driver activity recognition task. Moreover, we propose a model to enhance the performance of such networks through the integration of information from pose landmarks, allowing for multi-modal fusion either in the early or late stages of the model, providing informed predictions for input videos. Thus, the simplicity of the TSN models is counterbalanced by the incorporation of prior knowledge, resulting in a fused model that outperforms more resource-demanding 3D architectures. The proposed method is evaluated on the Drive&Act dataset and demonstrates state-of-the-art performance, surpassing previous works by a margin of 8.01% using only RGB video as input.

References

[1]
Munif Alotaibi and Bandar Alotaibi. 2020. Distracted driver classification using deep learning. Signal, Image and Video Processing 14 (2020), 617–624.
[2]
Valentin Bazarevsky, Ivan Grishchenko, Karthik Raveendran, Tyler Zhu, Fan Zhang, and Matthias Grundmann. 2020. BlazePose: On-device Real-time Body Pose tracking. https://doi.org/10.48550/ARXIV.2006.10204
[3]
Joao Carreira and Andrew Zisserman. 2017. Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. https://doi.org/10.48550/ARXIV.1705.07750
[4]
Srijan Das, Rui Dai, Di Yang, and François Brémond. 2021. VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living. IEEE Transactions on Pattern Analysis and Machine Intelligence 40 (2021), 9703–9717.
[5]
Haodong Duan, Yue Zhao, Kai Chen, Dahua Lin, and Bo Dai. 2021. Revisiting Skeleton-based Action Recognition. https://doi.org/10.48550/ARXIV.2104.13586
[6]
Haodong Duan, Yue Zhao, Kai Chen, Dian Shao, Dahua Lin, and Bo Dai. 2021. Revisiting Skeleton-based Action Recognition. CoRR abs/2104.13586 (2021), 2959–2968.
[7]
N Efthymiou, P P Filntisis, G Potamianos, and P Maragos. 2021. Visual Robotic Perception System with Incremental Learning for Child–Robot Interaction Scenarios. Technologies 9, 4 (2021). https://doi.org/10.3390/technologies9040086
[8]
Panagiotis Paraskevas Filntisis, Niki Efthymiou, Petros Koutras, Gerasimos Potamianos, and Petros Maragos. 2019. Fusing Body Posture With Facial Expressions for Joint Recognition of Affect in Child–Robot Interaction. IEEE Robotics and Automation Letters 4, 4 (2019), 4011–4018. https://doi.org/10.1109/LRA.2019.2930434
[9]
Sergey Ioffe and Christian Szegedy. 2015. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. https://doi.org/10.48550/ARXIV.1502.03167
[10]
Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, and Mohamed Ali Mahjoub. 2019. MDAD: A Multimodal and Multiview in-Vehicle Driver Action Dataset. In Computer Analysis of Images and Patterns, Mario Vento and Gennaro Percannella (Eds.). Springer International Publishing, Cham, 518–529.
[11]
Imen Jegham, Anouar Ben Khalifa, Ihsen Alouani, and Mohamed Ali Mahjoub. 2021. Soft Spatial Attention-Based Multimodal Driver Action Recognition Using Deep Learning. IEEE Sensors Journal 21, 2 (2021), 1918–1925. https://doi.org/10.1109/JSEN.2020.3019258
[12]
Will Kay, Joao Carreira, Karen Simonyan, Brian Zhang, Chloe Hillier, Sudheendra Vijayanarasimhan, Fabio Viola, Tim Green, Trevor Back, Paul Natsev, Mustafa Suleyman, and Andrew Zisserman. 2017. The Kinetics Human Action Video Dataset. https://doi.org/10.48550/ARXIV.1705.06950
[13]
Ji Lin, Chuang Gan, and Song Han. 2018. TSM: Temporal Shift Module for Efficient Video Understanding. https://doi.org/10.48550/ARXIV.1811.08383
[14]
Dichao Liu, Toshihiko Yamasaki, Yu Wang, Kenji Mase, and Jien Kato. 2021. TML: A Triple-Wise Multi-Task Learning Framework for Distracted Driver Recognition. IEEE Access 9 (2021), 125955–125969. https://doi.org/10.1109/ACCESS.2021.3109815
[15]
Manuel Martin, Alina Roitberg, Monica Haurilet, Matthias Horne, Simon Reiß, Michael Voit, and Rainer Stiefelhagen. 2019. Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). arXiv, Seoul, Korea (South), 2801–2810. https://doi.org/10.1109/ICCV.2019.00289
[16]
Sarfaraz Masood, Abhinav Rai, Aakash Aggarwal, Mohammad Najam Doja, and Musheer Ahmad. 2020. Detecting distraction of drivers using Convolutional Neural Network. Pattern Recognit. Lett. 139 (2020), 79–85.
[17]
World Health Organization. 2018. Global status report on road safety 2018. World Health Organization, Geneva. https://www.who.int/publications/i/item/9789241565684
[18]
Juan Diego Ortega, Neslihan Kose, Paola Cañas, Min-An Chao, Alexander Unnervik, Marcos Nieto, Oihana Otaegui, and Luis Salgado. 2020. DMD: A Large-Scale Multi-modal Driver Monitoring Dataset for Attention and Alertness Analysis. In Computer Vision – ECCV 2020 Workshops, Adrien Bartoli and Andrea Fusiello (Eds.). Springer International Publishing, Cham, 387–405.
[19]
Kunyu Peng, Alina Roitberg, Kailun Yang, Jiaming Zhang, and Rainer Stiefelhagen. 2022. TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration. https://doi.org/10.48550/ARXIV.2203.00927
[20]
Binbin Qin, Jiangbo Qian, Yu Xin, Baisong Liu, and Yihong Dong. 2022. Distracted Driver Detection Based on a CNN With Decreasing Filter Size. Trans. Intell. Transport. Sys. 23, 7 (jul 2022), 6922–6933. https://doi.org/10.1109/TITS.2021.3063521
[21]
Zhaofan Qiu, Ting Yao, and Tao Mei. 2017. Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks. https://doi.org/10.48550/ARXIV.1711.10305
[22]
George Retsinas, Panagiotis Paraskevas Filntisis, Nikos Kardaris, and Petros Maragos. 2022. Attribute-based Gesture Recognition: Generalization to Unseen Classes. In 2022 IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop (IVMSP). 1–5. https://doi.org/10.1109/IVMSP54334.2022.9816275
[23]
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. 2015. Learning Spatiotemporal Features with 3D Convolutional Networks. In 2015 IEEE International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 4489–4497. https://doi.org/10.1109/ICCV.2015.510
[24]
Vasiliki I. Vasileiou, Nikolaos Kardaris, and Petros Maragos. 2021. Exploring Temporal Context and Human Movement Dynamics for Online Action Detection in Videos. CoRR abs/2106.13967 (2021).
[25]
Hongsong Wang and Liang Wang. 2017. Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks. https://doi.org/10.48550/ARXIV.1704.02581
[26]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. In Computer Vision – ECCV 2016, Bastian Leibe, Jiri Matas, Nicu Sebe, and Max Welling (Eds.). Springer International Publishing, Cham, 20–36.
[27]
Zachary Wharton, Ardhendu Behera, Yonghuai Liu, and Nik Bessis. 2021. Coarse Temporal Attention Network (CTA-Net) for Driver’s Activity Recognition. CoRR abs/2101.06636 (2021).
[28]
Bruce Yu, Yan Liu, Xiang Zhang, Sheng-hua Zhong, and Keith Chan. 2022. MMNet: A Model-based Multimodal Network for Human Action Recognition in RGB-D Videos. IEEE Transactions on Pattern Analysis and Machine Intelligence PP (05 2022), 1–1. https://doi.org/10.1109/TPAMI.2022.3177813

Cited By

View all
  • (2024)A Computer Vision Framework on Biomechanical Analysis of Jump LandingsProceedings of the Fifteenth Indian Conference on Computer Vision Graphics and Image Processing10.1145/3702250.3702259(1-9)Online publication date: 13-Dec-2024

Index Terms

  1. Enhancing Action Recognition in Vehicle Environments With Human Pose Information

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      PETRA '23: Proceedings of the 16th International Conference on PErvasive Technologies Related to Assistive Environments
      July 2023
      797 pages
      ISBN:9798400700699
      DOI:10.1145/3594806
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 August 2023

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. activity recognition
      2. autonomous vehicles
      3. computer vision
      4. drive and act
      5. neural networks

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • Horizon Europe

      Conference

      PETRA '23

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)69
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 05 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)A Computer Vision Framework on Biomechanical Analysis of Jump LandingsProceedings of the Fifteenth Indian Conference on Computer Vision Graphics and Image Processing10.1145/3702250.3702259(1-9)Online publication date: 13-Dec-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media