Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features

Yulin Shi¹,
Jintao Cheng¹,
Xingming Chen¹,
Jiehao Luo² &
…
Xiaoyu Tang¹

185 Accesses
Explore all metrics

Abstract

With the rapid expansion of the automotive industry and the continuous growth of vehicle fleets, traffic safety has become a critical global social issue. Developing detection and alert systems for fatigue and distracted driving is essential for enhancing traffic safety. Factors, such as variations in the driver’s facial details, lighting conditions, and camera pixel quality, significantly affect the accuracy of fatigue and distracted driving detection, often resulting in the low effectiveness of existing methods. This study introduces a new network designed to detect fatigue and distracted driving amidst the complex backgrounds typical within vehicles. To extract driver and facial information as well as gradient details more efficiently, we introduce the Multihead Difference Kernel Convolution Module (MDKC) and Multiscale Large Convolutional Fusion Module (MLCF) in baseline. This incorporates a blend of Multihead Mixed Convolution and Large and Small Convolutional Kernels to amplify the spatial intricacies of the backbone. To extract gradient details from different illumination and noise feature maps, we enhance the network’s neck by introducing the Adaptive Convolutional Attention Module (ACAM) in NECK, optimizing feature retention. Extensive comparative experiments validate the efficacy of our network, showcasing superior performance not only on the Fatigue and Distracted Driving Dataset but also competitive results on the public COCO dataset. Source code is available at https://github.com/SCNU-RISLAB/MFDD.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

Real-Time Driver Fatigue Detection Method Based on Comprehensive Facial Features

A Comprehensive Vision-Based Model for Commercial Truck Driver Fatigue Detection

A Technique for Authentic Fatigue Driving Detection Using Nighttime Infrared Images

Data availability

Data will be made available on request.

References

Prat,C.S., Seo,R., Yamasaki,B.L.: The role of individual differences in working memory capacity on reading comprehension ability,in Handbook of Individual Differences in Reading (Routledge, 2015), 331–347
Xing, Y., Lv, C., Wang, H., Cao, D., Velenis, E.: An ensemble deep learning approach for driver lane change intention inference. Transp. Res. Part C: Emerg Technol. 115, 102615 (2020)
Article Google Scholar
Kapoor,K., Pamula,R., Murthy,S.V.: Real-time driver distraction detection system using convolutional neural networks, In: Proceedings of ICETIT 2019: Emerging Trends in Information Technology, (2020), 280–291
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceed. IEEE 86(11), 2278 (1998)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84 (2017)
Article Google Scholar
He,K., Zhang,X., Ren,S., Sun,J.: Deep residual learning for image recognition, In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016), 770–778
Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: Segnext: Rethinking convolutional attention design for semantic segmentation. Adv. Neural. Inf. Process. Syst. 35, 1140 (2022)
Google Scholar
Ding,X., Zhang,X., Han,J., Ding,G.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2022), 11,963–11,975
Sandler,M., Howard,A., Zhu,M., Zhmoginov,A., Chen,L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks, In: Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 4510–4520
Hu,J., Shen,L., Sun,G.: Squeeze-and-excitation networks, In: Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 7132–7141
Li,Y., Li,X., Yang,J.: Spatial group-wise enhance: Enhancing semantic feature learning in cnn,in Proceedings of the Asian Conference on Computer Vision (2022), 687–702
Woo,S., Park,J., Lee,J.Y., Kweon,I.S.: Cbam: Convolutional block attention module, In: Proceedings of the European conference on computer vision (ECCV) (2018), 3–19
Hou,Q., Zhou,D., Feng,J.: Coordinate attention for efficient mobile network design, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021), 13,713–13,722
Lin,M., Chen,Q., Yan,S.: Network in network, arXiv preprint arXiv:1312.4400 (2013)
Lin,W., Wu,Z., Chen,J., Huang,J., Jin,L.: Scale-aware modulation meet transformer, In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), 6015–6026
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077 (2021)
Google Scholar
Guo, M.H., Lu, C.Z., Liu, Z.N., Cheng, M.M., Hu, S.M.: Visual attention network, Computational Visual. Media 9(4), 733 (2023)
Google Scholar
Zou,Z., Chen,K., Shi,Z., Guo, Y., Ye,J.: Object detection in 20 years: A survey, Proceedings of the IEEE (2023)
Ge,Z., Liu,S., Wang,F., Li,Z., Sun,J.: Yolox: Exceeding yolo series in 2021, arXiv preprint arXiv:2107.08430 (2021)
Redmon,J., Farhadi,A.: Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767 (2018)
Jocher,G., Chaurasia,A., Qiu,J: .Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing, Yolo by ultralytics (2023). https://github.com/ultralytics/ultralytics
Chen,Z., Yang,C., Li,Q., Zhao,F., Zha,Z.J., Wu,F.: Disentangle Your Dense Object Detector,TOOD: Task-aligned One-stage Object Detection, In: Proceedings of the 29th ACM International Conference on Multimedia (2021), 4939–4948
Feng,C., Zhong,Y., Gao,Y., Scott,M.R., Huang,W.: TOOD: Task-aligned One-stage Object Detection,in ICCV (2021)
Liu,S., Li,F., Zhang,H., Yang,X., Qi,X., Su,H., Zhu,J., Zhang,L.: DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR, in International Conference on Learning Representations (2022). https://openreview.net/forum?id=oMI9PjOb9Jl
Zhang,H., Li,F., Liu,S., Zhang,L., Su,H., Zhu,J., Ni,L.M., Shum,H.Y.: Dino: Detr with improved denoising anchor boxes for end-to-end object detection (2022)
Chen,S., Sun,P., Song,Y., Luo,P.: DiffusionDet: Diffusion Model for Object Detection, arXiv preprint arXiv:2211.09788 (2022)
Zhang,S., Wang,X., Wang,J., Pang,J., Lyu,C., Zhang,W., Luo,P., Chen,K.: Dense Distinct Query for End-to-End Object Detection, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), 7329–7338
Zhao,Y., Lv,W., Xu,S., Wei,J., Wang,G., Dang,Q., Liu,Y., Chen,J.: Detrs beat yolos on real-time object detection (2023)
Bochkovskiy,A., Wang,C.Y., Liao,H.Y.M.:Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934 (2020)
He,K., Zhang,X., Ren,S., Sun,J.: Deep Residual Learning for Image Recognition, In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
Jocher,G., Chaurasia,A., Stoken,A., Borovec,J., Kwon,Y., Michael,K., Fang,J., Wong,C., Yifu,Z., Montes,D. et al.,ultralytics/yolov5: v6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations, Zenodo (2022)
Li,C., Li,L., Jiang,H., Weng,K., Geng,Y., Li,L., Ke,Z., Li,Q., Cheng,M., Nie,W. et al.,YOLOv6: A single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976 (2022)
Xu,X., Jiang,Y., Chen,W., Huang,Y., Zhang,Y., Sun,X.:Damo-yolo: A report on real-time object detection design, arXiv preprint arXiv:2211.15444 (2022)
Wang,W., Xie,E., Li,X., Fan,D.P., Song,K., Liang,D., Lu,T., Luo,P., Shao,L.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,In: Proceedings of the IEEE/CVF international conference on computer vision (2021), 568–578
Chen,Y., Dai,X., Chen,D., Liu,M., Dong,X., Yuan,L., Liu,Z.:Mobile-former: Bridging mobilenet and transformer, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), 5270–5279
Yan,H., Li,Z., Li,W., Wang,C., Wu,M., Zhang,C.:Contnet: Why not use convolution and transformer at the same time?, arXiv preprint arXiv:2104.13497 (2021)

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China under Grants 62001173 and 62233013, the Project of Special Funds for the Cultivation of Guangdong College Students’Scientific and Technological Innovation (“Climbing Program”Special Funds) under Grants pdjh2022a0131 and pdjh2023b0141, the Science and Technology Commission of Shanghai Municipal under Grant 22511104500, and the Fundamental Research Funds for the Central Universities. (Corresponding author: Xiaoyu Tang.)

Author information

Authors and Affiliations

School of Electronics and Information Engineering, Faculty of Engineering, South China Normal University, Foshan, 528225, China
Yulin Shi, Jintao Cheng, Xingming Chen & Xiaoyu Tang
School of Data Science and Engineering, Xingzhi College, South China Normal University, Shanwei, 516600, China
Jiehao Luo

Authors

Yulin Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jintao Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Xingming Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiehao Luo
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyu Tang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Yulin Shi participated in methodology, surveys, experimental work, writing-first draft. Jintao Cheng participated in experimental work and writing-original manuscripts. Xingming Chen participated in project management work organizing, validating, writing-reviewing and editing. Jiehao Luo participated in validation, writing reviews and editing. Xiaoyu Tang participated in the writing, reviewing, and editing of resource resources.

Corresponding author

Correspondence to Xiaoyu Tang.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 43 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shi, Y., Cheng, J., Chen, X. et al. Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features. J Real-Time Image Proc 21, 170 (2024). https://doi.org/10.1007/s11554-024-01549-y

Download citation

Received: 11 July 2024
Accepted: 26 August 2024
Published: 11 September 2024
DOI: https://doi.org/10.1007/s11554-024-01549-y

Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Driver Fatigue Detection Method Based on Comprehensive Facial Features

A Comprehensive Vision-Based Model for Commercial Truck Driver Fatigue Detection

A Technique for Authentic Fatigue Driving Detection Using Nighttime Infrared Images

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 43 KB)

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Real-Time Driver Fatigue Detection Method Based on Comprehensive Facial Features

A Comprehensive Vision-Based Model for Commercial Truck Driver Fatigue Detection

A Technique for Authentic Fatigue Driving Detection Using Nighttime Infrared Images

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 43 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation