GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model
<p>Architecture of GL-YOLO-Lite. The number of each layer is marked in black numbers.</p> "> Figure 2
<p>A variety of transformer stylized self-attention modules.</p> "> Figure 3
<p>Stem module’s structure.</p> "> Figure 4
<p>Examples of the FPDD and Pascal VOC datasets, (<b>a</b>–<b>f</b>) and (<b>g</b>–<b>l</b>), respectively. Please note that we resized these images so that they could be better displayed.</p> "> Figure 5
<p>Detection results by GL-YOLO-Lite and other representative algorithms. The ground truth for each image was fall. The images in the Rows 3–7 were collected from the web separately and were not included in the FPDD.</p> "> Figure 6
<p>The Android application: (<b>a</b>) the app user interface, (<b>b</b>) images detected by the CPU, and (<b>c</b>) images detected by the GPU.</p> ">
Abstract
:1. Introduction
- Drawing from YOLOv5, GL-YOLO-Lite introduced transformer and attention modules, which were capable of capturing long-range dependencies and enabled the model to better integrate global and local features. This improved the detection accuracy significantly.
- We improved GL-YOLO-Lite by using a stem module instead of the focus module, adding rep blocks for re-parameterization, and designing a light-weight detection head. These changes made GL-YOLO-Lite faster.
- We created and labeled a large-scale, well-structured dataset, FPDD, by collecting online images and taking photos. This filled the gap in existing FPD datasets.
- The efficacy of the proposed GL-YOLO-Lite was validated through experiments on the FPDD and the Pascal VOC dataset. Our results showed that GL-YOLO-Lite had a 2.4–18.9 mAP improvement over the state-of-the-art methods on FPDD and a 1.8–23.3 mAP improvement on the Pascal VOC dataset. Furthermore, our model achieved top-tier TOPSIS scores.
2. Related Works
2.1. Fallen Person Detection Based on Scene Perception
2.2. Fallen Person Detection Based on Wearable Devices
2.3. Fallen Person Detection Based on Visual Information
3. GL-YOLO-Lite
3.1. Overview
3.2. Loss Function in GL-YOLO-Lite
3.3. More Accurate Anchors Generation
3.4. Stronger Feature Extract
3.4.1. Transformer Block
3.4.2. Attention Block
3.5. Lightweight Model Structure Design
3.5.1. Stem Block
- Convolution. The convolutional operation employed a 3 × 3 kernel, a stride of 2, and an output channel of 32, resulting in a feature map of 320 × 320 × 32 after down-sampling.
- Focus. The focus module operated on an input image by slicing it before it entered the backbone of the network. Specifically, this involved selecting a value for every other pixel in the image, similar to the nearest-neighbor down-sampling method, resulting in four images that retained all original information. Through this approach, W and H could be concentrated in the channel space, expanding the input channels by a factor of 4, or 12 channels when using an RGB 3-channel image. Ultimately, the newly obtained image was subjected to a convolutional operation, yielding a feature map that was twice the down-sampling result without any loss of information. By inputting a 640 × 640 × 3 image into the focus module and applying the slicing operation, the image was first transformed into a 320 × 320 × 12 feature map, which then underwent additional convolutional operations and, ultimately, resulted in a 320 × 320 × 32 feature map. By utilizing these steps, we obtained the parameters and FLOPs associated with the focus module, as follows:
3.5.2. Rep Block
3.5.3. Lightweight Detection Head
4. Datasets
4.1. Fallen Person Detection Dataset
4.2. PASCAL VOC Dataset
5. Experiments
5.1. Metrics and Implementation
5.1.1. Metrics
5.1.2. Implementation
5.2. Comparison with the State-of-the-Art Modeling
- The proposed GL-YOLO series algorithms exhibited superior model detection accuracy, as determined by [email protected], on both the FPDD and the Pascal VOC datasets. On the FPDD, GL-YOLO achieved the highest [email protected] of 89.1%, while GL-YOLO-Lite achieved 88.5%. Even though YOLOv5-s attained the highest [email protected] of 85.7%, among other advanced object-detection models, it still lagged behind GL-YOLO-Lite by 2.8%. Similarly, on the Pascal VOC dataset, GL-YOLO again achieved the highest [email protected] of 82.5%, while GL-YOLO-Lite attained an [email protected] of 80%. The highest [email protected] of 78.2% was obtained by YOLOv5-Lite-g [62] on this dataset, but this result still fell short of GL-YOLO-Lite by 1.8%. These findings suggested that the transformer and attention modules in GL-YOLO-Lite effectively enhanced the feature extraction capability of the model by fully utilizing global contextual information and, thereby, improving its object-detection performance.
- While GL-YOLO achieved the highest [email protected] values on both datasets, its sub-optimal speed performance was limited by its model structure. However, GL-YOLO-Lite (FPDD: 88.5% [email protected] FPS, Pascal VOC: 80% [email protected] FPS) ranked high in terms of mAP (second in the FPDD and the Pascal VOC dataset) and was in the middle tier in terms of FPS. As compared to the baseline model YOLOv5-s, GL-YOLO-Lite was a significant improvement in terms of the parameters GFLOPs, [email protected], and FPS-CPU, with only a slight reduction in FPS-GPU, indicating the effectiveness of the proposed algorithms presented in this paper.
- The comprehensive rankings in the last column of Table 7 and Table 8 illustrated that GL-YOLO-Lite outperformed the baseline YOLOv5-s model, achieving the highest TOPSIS scores (FPDD: 0.573961, Pascal VOC: 0.563583). These results demonstrated that GL-YOLO-Lite was a significant advancement, as compared to YOLOv5-s, due to its robust feature extraction and efficient structural design with significantly fewer parameters and GFLOPs, while still maintaining a high object-detection precision (mAP). Furthermore, its real-time processing speed (FPS greater than 30 FPS) on the desktop GPU Titan Xp indicated its potential for handling FPD on typical workstations.
5.3. Ablation Study and Visualization
5.4. Experiments on a Mobile Phone
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- World Health Organization. World Report on Ageing and Health; World Health Organization: Geneve, Switzerland, 2015; 246p.
- Tanwar, R.; Nandal, N.; Zamani, M.; Manaf, A.A. Pathway of trends and technologies in fall detection: A systematic review. Healthcare 2022, 10, 172. [Google Scholar] [CrossRef]
- Irtaza, A.; Adnan, S.M.; Aziz, S.; Javed, A.; Ullah, M.O.; Mahmood, M.T. A framework for fall detection of elderly people by analyzing environmental sounds through acoustic local ternary patterns. In Proceedings of the 2017 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Banff, AB, Canada, 5–8 October 2017; pp. 1558–1563. [Google Scholar]
- Jefiza, A.; Pramunanto, E.; Boedinoegroho, H.; Purnomo, M.H. Fall detection based on accelerometer and gyroscope using back propagation. In Proceedings of the 2017 4th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Yogyakarta, Indonesia, 19–21 September 2017; pp. 1–6. [Google Scholar]
- Yacchirema, D.; de Puga, J.S.; Palau, C.; Esteve, M. Fall detection system for elderly people using IoT and ensemble machine learning algorithm. Pers. Ubiquitous Comput. 2019, 23, 801–817. [Google Scholar] [CrossRef]
- Wang, X.; Jia, K. Human fall detection algorithm based on YOLOv3. In Proceedings of the 2020 IEEE 5th International Conference on Image, Vision and Computing (ICIVC), Beijing, China, 10–12 July 2020; pp. 50–54. [Google Scholar]
- Zhang, J.; Wu, C.; Wang, Y. Human fall detection based on body posture spatio-temporal evolution. Sensors 2020, 20, 946. [Google Scholar] [CrossRef] [Green Version]
- Wang, L.; Hu, Z.; Kong, Q.; Qi, Q.; Liao, Q. Infrared and Visible Image Fusion via Attention-Based Adaptive Feature Fusion. Entropy 2023, 25, 407. [Google Scholar] [CrossRef]
- Hsu, F.S.; Su, Z.J.; Kao, Y.; Tsai, S.W.; Lin, Y.C.; Tu, P.H.; Gong, C.S.A.; Chen, C.C. Lightweight Deep Neural Network Embedded with Stochastic Variational Inference Loss Function for Fast Detection of Human Postures. Entropy 2023, 25, 336. [Google Scholar] [CrossRef]
- Dai, Y.; Liu, W.; Wang, H.; Xie, W.; Long, K. YOLO-Former: Marrying YOLO and Transformer for Foreign Object Detection. IEEE Trans. Instrum. Meas. 2022, 71, 1–14. [Google Scholar] [CrossRef]
- Dai, Y.; Liu, W.; Xie, W.; Liu, R.; Zheng, Z.; Long, K.; Wang, L.; Mao, L.; Qiu, Q.; Ling, G. Making you only look once faster: Toward real-time intelligent transportation detection. IEEE Intell. Transp. Syst. Mag. 2022. [Google Scholar] [CrossRef]
- Li, J.; Wei, Y.; Liang, X.; Dong, J.; Xu, T.; Feng, J.; Yan, S. Attentive contexts for object detection. IEEE Trans. Multimed. 2016, 19, 944–954. [Google Scholar] [CrossRef] [Green Version]
- Chen, Q.; Song, Z.; Dong, J.; Huang, Z.; Hua, Y.; Yan, S. Contextualizing object detection and classification. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 37, 13–27. [Google Scholar] [CrossRef] [Green Version]
- Cai, H.; Lin, J.; Lin, Y.; Liu, Z.; Tang, H.; Wang, H.; Zhu, L.; Han, S. Enable deep learning on mobile devices: Methods, systems, and applications. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2022, 27, 1–50. [Google Scholar] [CrossRef]
- Li, Y.; Yao, T.; Pan, Y.; Mei, T. Contextual transformer networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1489–1500. [Google Scholar] [CrossRef]
- Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. Simam: A simple, parameter-free attention module for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Vienna, Austria, 18–24 July 2021; pp. 11863–11874. [Google Scholar]
- Jocher, G.; Stoken, A.; Borovec, J.; NanoCode012; Chaurasia, A.; Xie, T.; Changyu, L.; Abhiram, V.; Laughing; tkianai; et al. Ultralytics/yolov5: V5.0—YOLOv5-P6 1280 Models, AWS, Supervise.ly and YouTube Integrations. 2021. Available online: https://github.com/ultralytics/yolov5/tree/v5.0 (accessed on 2 January 2023).
- Wang, R.J.; Li, X.; Ling, C.X. Pelee: A real-time object detection system on mobile devices. Adv. Neural Inf. Process. Syst. 2018, 31, 1–10. [Google Scholar]
- Ding, X.; Zhang, X.; Ma, N.; Han, J.; Ding, G.; Sun, J. Repvgg: Making vgg-style convnets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual, 19–25 June 2021; pp. 13733–13742. [Google Scholar]
- Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2010, 88, 303–338. [Google Scholar] [CrossRef] [Green Version]
- Everingham, M.; Eslami, S.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes challenge: A retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
- Yazar, A.; Keskin, F.; Töreyin, B.U.; Çetin, A.E. Fall detection using single-tree complex wavelet transform. Pattern Recognit. Lett. 2013, 34, 1945–1952. [Google Scholar] [CrossRef]
- Luo, K.; Li, J.; Wu, J.; Yang, H.; Xu, G. Fall detection using three wearable triaxial accelerometers and a decision-tree classifier. Biomed. Eng. Appl. Basis Commun. 2014, 26, 1450059. [Google Scholar] [CrossRef]
- Bilski, P.; Mazurek, P.; Wagner, J. Application of k Nearest Neighbors Approach to the fall detection of elderly people using depth-based sensors. In Proceedings of the 2015 IEEE 8th International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Warsaw, Poland, 24–26 September 2015; Volume 2, pp. 733–739. [Google Scholar]
- Wang, Z.; Xu, Z.; Chen, L. Human Behavior Recognition System Based on Infrared Array Sensors. Infrared Technol. 2020, 42, 231–237. [Google Scholar] [CrossRef]
- Zhang, D.; Lan, H.; Wu, Y. Bathroom fall detection based on ultrasonic Doppler effect. J. Shanghai Norm. Univ. (Nat. Sci.) 2018, 47, 225–229. [Google Scholar]
- Peng, Y.; He, Q.; Ke, X.; Hu, J.; Wu, L. Fall Detection belt based on acceleration sensor. Electron. Meas. Technol. 2018, 41, 117–120. [Google Scholar]
- Rakhman, A.Z.; Nugroho, L.E. Fall detection system using accelerometer and gyroscope based on smartphone. In Proceedings of the 2014 The 1st International Conference on Information Technology, Computer, and Electrical Engineering, Toronto, ON, Canada, 4–7 May 2014; pp. 99–104. [Google Scholar]
- Shahiduzzaman, M. Fall detection by accelerometer and heart rate variability measurement. Glob. J. Comput. Sci. Technol. 2015, 15, 1–5. [Google Scholar]
- Cui, C.; Bian, G.B.; Hou, Z.G.; Zhao, J.; Su, G.; Zhou, H.; Peng, L.; Wang, W. Simultaneous recognition and assessment of post-stroke hemiparetic gait by fusing kinematic, kinetic, and electrophysiological data. IEEE Trans. Neural Syst. Rehabil. Eng. 2018, 26, 856–864. [Google Scholar] [CrossRef]
- Wang, P.; Ding, H.; Li, J. A method of fall detection based on human posture in video. Mod. Electron. Tech. 2021, 44, 98–102. [Google Scholar]
- Cao, Z.; Simon, T.; Wei, S.E.; Sheikh, Y. Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 7291–7299. [Google Scholar]
- Zhu, Z. Research of Fall Behavior Detection Based on Complex Scenes. Master’s Thesis, Lanzhou University, Lanzhou, China, 2021. [Google Scholar]
- Wojke, N.; Bewley, A.; Paulus, D. Simple online and realtime tracking with a deep association metric. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 3645–3649. [Google Scholar]
- Lin, F.; Hou, T.; Jin, Q.; You, A. Improved YOLO Based Detection Algorithm for Floating Debris in Waterway. Entropy 2021, 23, 1111. [Google Scholar] [CrossRef]
- Kim, M.; Jeong, J.; Kim, S. ECAP-YOLO: Efficient Channel Attention Pyramid YOLO for Small Object Detection in Aerial Image. Remote Sens. 2021, 13, 4851. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. k-Means++: The Advantages of Careful Seeding; Technical Report; Stanford University: Stanford, CA, USA, 2006. [Google Scholar]
- MacQueen, J. Some methods for classification and analysis of multivariate observations. In Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 27 December 1965–7 January 1966; p. 281. [Google Scholar]
- Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 October 2017; pp. 7263–7271. [Google Scholar]
- Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 630–645. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
- Park, J.; Woo, S.; Lee, J.Y.; Kweon, I.S. BAM: Bottleneck Attention Module. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Ding, X.; Guo, Y.; Ding, G.; Han, J. ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1911–1920. [Google Scholar]
- Tzutalin. LabelImg. 2015. Available online: https://github.com/tzutalin/labelImg (accessed on 14 March 2023).
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Chen, X.; Gong, Z. YOLOv5-Lite: Lighter, Faster and Easier to Deploy; v1.0; Zenodo: Honolulu, HI, USA, 2021. [Google Scholar] [CrossRef]
- Tencent. ncnn: An Optimized Neural Network Computing Framework. 2021. Available online: https://github.com/Tencent/ncnn (accessed on 5 March 2023).
FPD Technology | Main Technical Principles | Highlights | Limitations |
---|---|---|---|
Scene perception | Infrared sensors, | Non-intrusiveness | The device has a single deployment environment |
radar technology, | Real-time performance | High false alarm rate; | |
millimeter wave radar, etc. | Scalability | The device is expensive | |
Wearable device | Accelerometers | Easy to use | Long-term wear reduces comfort |
Gyroscopes | Strong applicability | Battery life issues | |
Magnetometers | Lower cost | High hardware and software requirements | |
Visual information | Camera captures data | Non-intrusiveness | Poor quality of anchor box generation |
Machine learning | Easy installation | Inadequate utilization of global features | |
Deep learning | Visual effectiveness | Huge parameters and FLOPs | |
GL-YOLO-Lite | Automatically generating high-quality anchors | ||
Combining global contextual information and local features using transformer and attention modules to improve model detection accuracy, robustness | |||
Reducing parameters and FLOPs while increasing detection speed using stem module, rep modules, and redesigned lightweight detection head |
YOLOv5 | GL-YOLO | |||||||
---|---|---|---|---|---|---|---|---|
Input | Module | Number of Modules | Args | Input | Module | Number of Modules | Args | |
Layer 0 | Image | Focus | 1 | (64, 3) | Image | Focus | 1 | (64, 3) |
Layer 1 | Layer 0 | Conv | 1 | (128, 3, 2) | Layer 0 | Conv | 1 | (128, 3, 2) |
Layer 2 | Layer 1 | C3 | 3 | (128) | Layer 1 | CoT3 | 3 | (128) |
Layer 3 | Layer 2 | Conv | 1 | (256, 3, 2) | Layer 2 | Conv | 1 | (256, 3, 2) |
Layer 4 | Layer 3 | C3 | 9 | (256) | Layer 3 | CoT3 | 9 | (256) |
Layer 5 | Layer 4 | Conv | 1 | (512, 3, 2) | Layer 4 | Conv | 1 | (512, 3, 2) |
Layer 6 | Layer 5 | C3 | 9 | (512) | Layer 5 | CoT3 | 9 | (512) |
Layer 7 | Layer 6 | Conv | 1 | (1024, 3, 2) | Layer 6 | Conv | 1 | (1024, 3, 2) |
Layer 8 | Layer 7 | SPP | 1 | (1024, (5, 9, 13)) | Layer 7 | SPP | 1 | (1024, (5, 9, 13)) |
Layer 9 | Layer 8 | C3 | 3 | (1024, False) | Layer 8 | CoT3 | 3 | (1024, False) |
YOLOv5 | GL-YOLO | |||||||
---|---|---|---|---|---|---|---|---|
Input | Module | Number of Modules | Args | Input | Module | Number of Modules | Args | |
Layer 21 | Layer 20 | Conv | 1 | (512, 3, 2) | Layer 20 | Conv | 1 | (512, 3, 2) |
Layer 22 | Layer 21 + Layer 10 | Concat | 1 | (1) | Layer 21 + Layer 10 | Concat | 1 | (1) |
Layer 23 | Layer 22 | C3 | 3 | (1024, False) | Layer 22 | C3 | 3 | (1024, False) |
Layer 24 | - | - | - | - | Layer 24 | SimAM | 1 | (1024) |
GL-YOLO | GL-YOLO-Lite | |||||||
---|---|---|---|---|---|---|---|---|
Input | Module | Number of Modules | Args | Input | Module | Number of Modules | Args | |
Layer 0 | Image | Focus | 1 | (64, 3) | Image | Stem block | 1 | (64, 3) |
Layer 1 | Layer 0 | Conv | 1 | (128, 3, 2) | Layer 0 | Rep block | 1 | (128, 3, 2) |
Layer 2 | Layer 1 | CoT3 | 3 | (128) | Layer 1 | CoT3 | 3 | (128) |
Layer 3 | Layer 2 | Conv | 1 | (256, 3, 2) | Layer 2 | Rep block | 1 | (256, 3, 2) |
Layer 4 | Layer 3 | CoT3 | 9 | (256) | Layer 3 | CoT3 | 9 | (256) |
Layer 5 | Layer 4 | Conv | 1 | (512, 3, 2) | Layer 4 | Rep block | 1 | (512, 3, 2) |
Layer 6 | Layer 5 | CoT3 | 9 | (512) | Layer 5 | CoT3 | 9 | (512) |
Layer 7 | Layer 6 | Conv | 1 | (1024, 3, 2) | Layer 6 | Rep block | 1 | (1024, 3, 2) |
Layer 8 | Layer 7 | SPP | 1 | (1024, (5, 9, 13)) | Layer 7 | SPP | 1 | (1024, (5, 9, 13)) |
Layer 9 | Layer 8 | CoT3 | 3 | (1024, False) | Layer 8 | CoT3 | 3 | (1024, False) |
GL-YOLO | GL-YOLO-Lite | |||||||
---|---|---|---|---|---|---|---|---|
Input | Module | Number of Modules | Args | Input | Module | Number of Modules | Args | |
Layer 10 | Layer 9 | Conv | 1 | (512, 1, 1) | Layer 9 | Conv | 1 | (128, 1, 1) |
Layer 11 | Layer 10 | Up-Sample | 1 | (None, 2, ’nearest’) | Layer 10 | Up-Sample | 1 | (None, 2, ’nearest’) |
Layer 12 | Layer 6 + Layer 11 | Concat | 1 | (1) | Layer 6 + Layer 11 | Concat | 1 | (1) |
Layer 13 | Layer 12 | C3 | 3 | (512, False) | Layer 12 | C3 | 3 | (128, False) |
Layer 14 | Layer 13 | Conv | 1 | (256, 1, 1) | Layer 13 | Conv | 1 | (128, 1, 1) |
Layer 15 | Layer 14 | Up-Sample | 1 | (None, 2, ’nearest’) | Layer 14 | Up-Sample | 1 | (None, 2, ’nearest’) |
Layer 16 | Layer 4 + Layer 15 | Concat | 1 | (1) | Layer 4 + Layer 15 | Concat | 1 | (1) |
Layer 17 | Layer 16 | C3 | 3 | (256, False) | Layer 16 | C3 | 3 | (128, False) |
Layer 18 | Layer 17 | Conv | 1 | (256, 3, 2) | Layer 17 | Conv | 1 | (128, 3, 2) |
Layer 19 | Layer 14 + Layer 18 | Concat | 1 | (1) | Layer 14 + Layer 18 | Concat | 1 | (1) |
Layer 20 | Layer 19 | C3 | 3 | (512, False) | Layer 19 | C3 | 3 | (128, False) |
Layer 21 | Layer 20 | Conv | 1 | (512, 3, 2) | Layer 20 | Conv | 1 | (128, 3, 2) |
Layer 22 | Layer 10 + Layer 21 | Concat | 1 | (1) | Layer 10 + Layer 21 | Concat | 1 | (1) |
Layer 23 | Layer 22 | C3 | 3 | (1024, False) | Layer 22 | C3 | 3 | (128, False) |
Layer 24 | Layer 23 | SimAM | 1 | (1024) | Layer 23 | SimAM | 1 | (128) |
Workstation | Hyperparameters of GL-YOLO-Lite | ||
---|---|---|---|
CPU | Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10 GHz | Initial Learning Rate | 0.01 |
GPU | TITAN Xp(12 GB) | Optimizer | Adam |
Memory | 16 GB | Momentum | 0.937 |
Operating System | ubuntu18.04 | Weight Decay | 0.0005 |
Deep Learning Framework | PyTorch 1.7.0 | IoU Threshold | 0.45 |
CUDA version | 11 | Training Epochs | 300 |
Methods | Backbone | Input Size | Parameters | GFLOPs | [email protected](%) | FPS-GPU | FPS-CPU | TOPSIS Score | Ranking | |
---|---|---|---|---|---|---|---|---|---|---|
1 | YOLOv5-mbv3-small | MobileNetv3-small [46], ICCV | 640 × 640 | 3.54 | 6.3 | 80.1 | 55.25 | 13.40 | 0.458764 | 6 |
2 | YOLOv5-mbv3-large | MobileNetv3-large [46], ICCV | 5.2 | 10.3 | 83.9 | 47.62 | 7.50 | 0.485195 | 3 | |
3 | YOLOv5-ShuffleNetv2 | ShuffleNetv2 [58], ECCV | 0.44 | 1.3 | 70.9 | 52.91 | 18.32 | 0.457568 | 7 | |
4 | YOLOv3-Tiny | Darknet-53 [43] | 8.67 | 12.9 | 69.6 | 243.90 | 6.83 | 0.457558 | 8 | |
5 | YOLOv5-s | CSPDarknet-SPP [17] | 7.05 | 16.3 | 85.7 | 73.53 | 7.03 | 0.440520 | 9 | |
6 | YOLOv5-lite-g | RepVGG [19], CVPR | 5.3 | 15.1 | 85.5 | 62.89 | 6.98 | 0.468680 | 5 | |
7 | YOLOv5s-Ghost | GhostNet [61], CVPR | 3.68 | 8.1 | 86.1 | 56.18 | 7.71 | 0.555261 | 2 | |
8 | GL-YOLO | GL-YOLO | 640 × 640 | 7.03 | 16.2 | 89.1 | 49.75 | 5.51 | 0.469409 | 4 |
9 | GL-YOLO-Lite | GL-YOLO | 4.41 | 3.3 | 88.5 | 52.63 | 9.89 | 0.573961 | 1 |
Methods | Backbone | Input Size | Parameters | GFLOPs | [email protected](%) | FPS-GPU | FPS-CPU | TOPSIS Score | Ranking | |
---|---|---|---|---|---|---|---|---|---|---|
1 | YOLOv5-mbv3-small | MobileNetv3-small [46], ICCV | 640 × 640 | 3.59 | 6.4 | 69.1 | 61.35 | 15.11 | 0.438815 | 8 |
2 | YOLOv5-mbv3-large | MobileNetv3-large [46], ICCV | 5.25 | 10.3 | 77 | 54.95 | 8.01 | 0.500519 | 3 | |
3 | YOLOv5-ShuffleNetv2 | ShuffleNetv2 [58], ECCV | 0.45 | 1.4 | 56.7 | 61.35 | 19.19 | 0.453154 | 7 | |
4 | YOLOv3-Tiny | Darknet-53 [43] | 8.71 | 13 | 57.7 | 277.78 | 7.58 | 0.462992 | 6 | |
5 | YOLOv5-s | CSPDarknet-SPP [17] | 7.11 | 16.4 | 77.8 | 69.93 | 8.08 | 0.431950 | 9 | |
6 | YOLOv5-lite-g | RepVGG [19], CVPR | 5.32 | 15.3 | 78.2 | 70.42 | 7.86 | 0.471702 | 4 | |
7 | YOLOv5s-Ghost | GhostNet [61], CVPR | 3.73 | 8.3 | 77 | 61.73 | 8.29 | 0.549522 | 2 | |
8 | GL-YOLO | GL-YOLO | 640 × 640 | 7.08 | 16.4 | 82.5 | 51.55 | 6.43 | 0.467550 | 5 |
9 | GL-YOLO-Lite | GL-YOLO | 4.42 | 3.4 | 80 | 56.82 | 11.07 | 0.563583 | 1 |
Methods | Precision | Recall | F1 Score | [email protected] |
---|---|---|---|---|
YOLOv5-s | 0.812 | 0.838 | 0.825 | 0.857 |
GL-YOLO-Lite | 0.843 | 0.859 | 0.851 | 0.885 |
Methods | Components | Input Size | Parameters | GFLOPs | [email protected](%) | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
1 | YOLOv5s | 640 × 640 | 7.11 | 16.4 | 77.7 | ||||||
2 | K-means++ | ✓ | 640 × 640 | 7.11 | 16.4 | 79.5 | |||||
3 | Transformer block | ✓ | ✓ | 7.08 | 16.4 | 82 | |||||
4 | Attention block | ✓ | ✓ | ✓ | 7.07 | 16.4 | 82.5 | ||||
5 | Stem block | ✓ | ✓ | ✓ | ✓ | 7.09 | 4.5 | 80.9 | |||
6 | Rep block | ✓ | ✓ | ✓ | ✓ | ✓ | 7.09 | 4.5 | 80.3 | ||
7 | Lighter head | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | 4.42 | 3.4 | 80 |
Configuration of Honor V20 | |
---|---|
Brand | Honor |
Model | V20 |
System on Chip | HiSilicon Kirin 980 |
CPU | 2 × A76 2.6 GHz + 2 × A76 1.92 GHz + 4 × A55 1.8 GHz |
GPU | Mali-G76 MP10 (720 MHz): 691 GFLOPs |
Random Access Memory | 8 GB |
Operating System | Android 11 |
Methods | Honor V20 | Time (ms) | Trimmed Mean (ms) | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
YOLOv5s | CPU | 116.24 | 104.46 | 113.28 | 102.41 | 107.4 | 122.67 | 96.46 | 96.81 | 105.21 | 106.73 | 123.78 | 97.17 | 104.6 | 84.76 | 95.3 | 105.29 |
GPU | 200.13 | 130.42 | 192.44 | 188.62 | 205.3 | 125.33 | 193.83 | 195.48 | 189.53 | 182 | 200.83 | 130.57 | 193.9 | 128.19 | 130.06 | 173.54 | |
YOLOv5s-Ghost | CPU | 104.46 | 87.67 | 75.2 | 75.62 | 68.19 | 79.98 | 67.59 | 82.66 | 71.47 | 73.07 | 71.9 | 78.42 | 80.02 | 89.8 | 69.55 | 77.20 |
GPU | 258.94 | 79.87 | 72.82 | 92.29 | 75 | 93.2 | 74.6 | 125.18 | 75.82 | 95.62 | 76.5 | 109.59 | 75.37 | 93.58 | 74.84 | 87.80 | |
GL-YOLO | CPU | 61.78 | 117.57 | 48.19 | 57.03 | 59.07 | 56.84 | 48.55 | 62.62 | 49.53 | 64.03 | 52.03 | 64.90 | 49.12 | 59.97 | 108.68 | 63.99 |
GPU | 69.58 | 67.91 | 104.79 | 75.90 | 64.19 | 100.88 | 104.79 | 74.05 | 106.31 | 105.62 | 83.42 | 78.41 | 107.60 | 76.49 | 89.47 | 87.29 | |
GL-YOLO-Lite | CPU | 76.75 | 41.56 | 46.27 | 48.02 | 52.74 | 70.15 | 68.94 | 59.48 | 73.90 | 75.24 | 72.70 | 44.00 | 44.27 | 71.11 | 66.88 | 60.80 |
GPU | 122.67 | 84.06 | 78.43 | 49.24 | 92.05 | 58.79 | 57.74 | 50.87 | 49.44 | 77.70 | 63.86 | 66.05 | 59.75 | 82.78 | 64.28 | 70.51 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dai, Y.; Liu, W. GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model. Entropy 2023, 25, 587. https://doi.org/10.3390/e25040587
Dai Y, Liu W. GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model. Entropy. 2023; 25(4):587. https://doi.org/10.3390/e25040587
Chicago/Turabian StyleDai, Yuan, and Weiming Liu. 2023. "GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model" Entropy 25, no. 4: 587. https://doi.org/10.3390/e25040587
APA StyleDai, Y., & Liu, W. (2023). GL-YOLO-Lite: A Novel Lightweight Fallen Person Detection Model. Entropy, 25(4), 587. https://doi.org/10.3390/e25040587