Abstract
With the rapid expansion of the automotive industry and the continuous growth of vehicle fleets, traffic safety has become a critical global social issue. Developing detection and alert systems for fatigue and distracted driving is essential for enhancing traffic safety. Factors, such as variations in the driver’s facial details, lighting conditions, and camera pixel quality, significantly affect the accuracy of fatigue and distracted driving detection, often resulting in the low effectiveness of existing methods. This study introduces a new network designed to detect fatigue and distracted driving amidst the complex backgrounds typical within vehicles. To extract driver and facial information as well as gradient details more efficiently, we introduce the Multihead Difference Kernel Convolution Module (MDKC) and Multiscale Large Convolutional Fusion Module (MLCF) in baseline. This incorporates a blend of Multihead Mixed Convolution and Large and Small Convolutional Kernels to amplify the spatial intricacies of the backbone. To extract gradient details from different illumination and noise feature maps, we enhance the network’s neck by introducing the Adaptive Convolutional Attention Module (ACAM) in NECK, optimizing feature retention. Extensive comparative experiments validate the efficacy of our network, showcasing superior performance not only on the Fatigue and Distracted Driving Dataset but also competitive results on the public COCO dataset. Source code is available at https://github.com/SCNU-RISLAB/MFDD.
Similar content being viewed by others
Data availability
Data will be made available on request.
References
Prat,C.S., Seo,R., Yamasaki,B.L.: The role of individual differences in working memory capacity on reading comprehension ability,in Handbook of Individual Differences in Reading (Routledge, 2015), 331–347
Xing, Y., Lv, C., Wang, H., Cao, D., Velenis, E.: An ensemble deep learning approach for driver lane change intention inference. Transp. Res. Part C: Emerg Technol. 115, 102615 (2020)
Kapoor,K., Pamula,R., Murthy,S.V.: Real-time driver distraction detection system using convolutional neural networks, In: Proceedings of ICETIT 2019: Emerging Trends in Information Technology, (2020), 280–291
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceed. IEEE 86(11), 2278 (1998)
Krizhevsky, A., Sutskever, I., Hinton, E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(6), 84 (2017)
He,K., Zhang,X., Ren,S., Sun,J.: Deep residual learning for image recognition, In: Proceedings of the IEEE conference on computer vision and pattern recognition (2016), 770–778
Guo, M.H., Lu, C.Z., Hou, Q., Liu, Z., Cheng, M.M., Hu, S.M.: Segnext: Rethinking convolutional attention design for semantic segmentation. Adv. Neural. Inf. Process. Syst. 35, 1140 (2022)
Ding,X., Zhang,X., Han,J., Ding,G.: Scaling up your kernels to 31x31: Revisiting large kernel design in cnns, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2022), 11,963–11,975
Sandler,M., Howard,A., Zhu,M., Zhmoginov,A., Chen,L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks, In: Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 4510–4520
Hu,J., Shen,L., Sun,G.: Squeeze-and-excitation networks, In: Proceedings of the IEEE conference on computer vision and pattern recognition (2018), 7132–7141
Li,Y., Li,X., Yang,J.: Spatial group-wise enhance: Enhancing semantic feature learning in cnn,in Proceedings of the Asian Conference on Computer Vision (2022), 687–702
Woo,S., Park,J., Lee,J.Y., Kweon,I.S.: Cbam: Convolutional block attention module, In: Proceedings of the European conference on computer vision (ECCV) (2018), 3–19
Hou,Q., Zhou,D., Feng,J.: Coordinate attention for efficient mobile network design, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2021), 13,713–13,722
Lin,M., Chen,Q., Yan,S.: Network in network, arXiv preprint arXiv:1312.4400 (2013)
Lin,W., Wu,Z., Chen,J., Huang,J., Jin,L.: Scale-aware modulation meet transformer, In: Proceedings of the IEEE/CVF International Conference on Computer Vision (2023), 6015–6026
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural. Inf. Process. Syst. 34, 12077 (2021)
Guo, M.H., Lu, C.Z., Liu, Z.N., Cheng, M.M., Hu, S.M.: Visual attention network, Computational Visual. Media 9(4), 733 (2023)
Zou,Z., Chen,K., Shi,Z., Guo, Y., Ye,J.: Object detection in 20 years: A survey, Proceedings of the IEEE (2023)
Ge,Z., Liu,S., Wang,F., Li,Z., Sun,J.: Yolox: Exceeding yolo series in 2021, arXiv preprint arXiv:2107.08430 (2021)
Redmon,J., Farhadi,A.: Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767 (2018)
Jocher,G., Chaurasia,A., Qiu,J: .Jocher, Glenn and Chaurasia, Ayush and Qiu, Jing, Yolo by ultralytics (2023). https://github.com/ultralytics/ultralytics
Chen,Z., Yang,C., Li,Q., Zhao,F., Zha,Z.J., Wu,F.: Disentangle Your Dense Object Detector,TOOD: Task-aligned One-stage Object Detection, In: Proceedings of the 29th ACM International Conference on Multimedia (2021), 4939–4948
Feng,C., Zhong,Y., Gao,Y., Scott,M.R., Huang,W.: TOOD: Task-aligned One-stage Object Detection,in ICCV (2021)
Liu,S., Li,F., Zhang,H., Yang,X., Qi,X., Su,H., Zhu,J., Zhang,L.: DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR, in International Conference on Learning Representations (2022). https://openreview.net/forum?id=oMI9PjOb9Jl
Zhang,H., Li,F., Liu,S., Zhang,L., Su,H., Zhu,J., Ni,L.M., Shum,H.Y.: Dino: Detr with improved denoising anchor boxes for end-to-end object detection (2022)
Chen,S., Sun,P., Song,Y., Luo,P.: DiffusionDet: Diffusion Model for Object Detection, arXiv preprint arXiv:2211.09788 (2022)
Zhang,S., Wang,X., Wang,J., Pang,J., Lyu,C., Zhang,W., Luo,P., Chen,K.: Dense Distinct Query for End-to-End Object Detection, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2023), 7329–7338
Zhao,Y., Lv,W., Xu,S., Wei,J., Wang,G., Dang,Q., Liu,Y., Chen,J.: Detrs beat yolos on real-time object detection (2023)
Bochkovskiy,A., Wang,C.Y., Liao,H.Y.M.:Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934 (2020)
He,K., Zhang,X., Ren,S., Sun,J.: Deep Residual Learning for Image Recognition, In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), 770–778. https://doi.org/10.1109/CVPR.2016.90
Jocher,G., Chaurasia,A., Stoken,A., Borovec,J., Kwon,Y., Michael,K., Fang,J., Wong,C., Yifu,Z., Montes,D. et al.,ultralytics/yolov5: v6. 2-yolov5 classification models, apple m1, reproducibility, clearml and deci. ai integrations, Zenodo (2022)
Li,C., Li,L., Jiang,H., Weng,K., Geng,Y., Li,L., Ke,Z., Li,Q., Cheng,M., Nie,W. et al.,YOLOv6: A single-stage object detection framework for industrial applications, arXiv preprint arXiv:2209.02976 (2022)
Xu,X., Jiang,Y., Chen,W., Huang,Y., Zhang,Y., Sun,X.:Damo-yolo: A report on real-time object detection design, arXiv preprint arXiv:2211.15444 (2022)
Wang,W., Xie,E., Li,X., Fan,D.P., Song,K., Liang,D., Lu,T., Luo,P., Shao,L.: Pyramid vision transformer: A versatile backbone for dense prediction without convolutions,In: Proceedings of the IEEE/CVF international conference on computer vision (2021), 568–578
Chen,Y., Dai,X., Chen,D., Liu,M., Dong,X., Yuan,L., Liu,Z.:Mobile-former: Bridging mobilenet and transformer, In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2022), 5270–5279
Yan,H., Li,Z., Li,W., Wang,C., Wu,M., Zhang,C.:Contnet: Why not use convolution and transformer at the same time?, arXiv preprint arXiv:2104.13497 (2021)
Acknowledgements
This research was supported by the National Natural Science Foundation of China under Grants 62001173 and 62233013, the Project of Special Funds for the Cultivation of Guangdong College Students’Scientific and Technological Innovation (“Climbing Program”Special Funds) under Grants pdjh2022a0131 and pdjh2023b0141, the Science and Technology Commission of Shanghai Municipal under Grant 22511104500, and the Fundamental Research Funds for the Central Universities. (Corresponding author: Xiaoyu Tang.)
Author information
Authors and Affiliations
Contributions
Yulin Shi participated in methodology, surveys, experimental work, writing-first draft. Jintao Cheng participated in experimental work and writing-original manuscripts. Xingming Chen participated in project management work organizing, validating, writing-reviewing and editing. Jiehao Luo participated in validation, writing reviews and editing. Xiaoyu Tang participated in the writing, reviewing, and editing of resource resources.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Shi, Y., Cheng, J., Chen, X. et al. Mfdd: Multi-scale attention fatigue and distracted driving detector based on facial features. J Real-Time Image Proc 21, 170 (2024). https://doi.org/10.1007/s11554-024-01549-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11554-024-01549-y