Applications of the FusionScratchNet Algorithm Based on Convolutional Neural Networks and Transformer Models in the Detection of Cell Phone Screen Scratches
<p>Structure of the FS-Net designed for mobile screen scratch detection.</p> "> Figure 2
<p>(<b>a</b>) Part of the network structure of ResNet50; and (<b>b</b>) residual connection block.</p> "> Figure 3
<p>GLFI module structure.</p> "> Figure 4
<p>Spatial attention and channel attention modules.</p> "> Figure 5
<p>BA attention module.</p> "> Figure 6
<p>Sample dataset display.</p> "> Figure 7
<p>Different overlap levels with the same IoU values.</p> "> Figure 8
<p>Overlap between predicted and actual boxes.</p> "> Figure 9
<p>(<b>a</b>) Regression error curves of different loss functions; and (<b>b</b>) variation trends of IoU box plots under different loss functions.</p> "> Figure 10
<p>(<b>a</b>) Regression error curves of different loss functions; and (<b>b</b>) trends in the IoU box plots under different loss functions.</p> ">
Abstract
:1. Introduction
- Proposal of a detection architecture that combines the transformer and CNN networks to effectively capture scratches on the surface of mobile phone screens.
- Proposal of the GLFI module, which is designed to facilitate the effective fusion of two branch features through fine-grained interactions for improving the detection accuracy.
- Proposal of a detection algorithm that combines the transformer and CNN networks to calculate attention based on multi-layer fusion features through the BA module to improve the detection accuracy.
2. FS-NET
2.1. CNN Branch
2.2. Transformer Branch
2.3. GLFI
2.4. BA Attention Module
3. Loss Function
3.1. Module Loss Functions
3.2. Combined Loss Functions
4. Experiment and Result Analysis
4.1. Datasets
4.2. Experimental Environment and Parameters
4.3. Data Augmentation
4.4. Evaluation Metrics
5. Experimental Results Analysis
5.1. Ablation Experiments
5.2. Comparison of Attention Modules
5.3. Transformer Scale Analysis
5.4. Comparison with Mainstream Methods
6. Conclusions and Future Research
6.1. Conclusions
6.2. Limitations
6.3. Future Research
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Li, C.; Zhang, X.; Huang, Y.; Tang, C.; Fatikow, S. A novel algorithm for defect extraction and classification of mobile phone screen based on machine vision. Comput. Ind. Eng. 2020, 146, 106530. [Google Scholar] [CrossRef]
- Jian, C.; Gao, J.; Ao, Y. Automatic surface defect detection for mobile phone screen glass based on machine vision. Appl. Soft Comput. 2017, 52, 348–358. [Google Scholar] [CrossRef]
- Kuang, Y.; Zhang, K.; Xie, H. Adaptive intelligent detection technology for digital products’ shell surface. J. South China Univ. Technology. Nat. Sci. 2015, 43, 1–8. [Google Scholar]
- Weimer, D.; Scholz-Reiter, B.; Shpitalni, M. Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection. CIRP Ann. 2016, 65, 417–420. [Google Scholar] [CrossRef]
- Maaz, M.; Shaker, A.; Cholakkal, H.; Khan, S.; Zamir, S.W.; Anwer, R.M.; Shahbaz Khan, F. Edgenext: Efficiently amalgamated cnn-transformer architecture for mobile vision applications. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 3–20. [Google Scholar]
- Pan, Y.; Zhou, C.; Su, L.; Hassan, H.; Huang, B. Bridging the Gap: A Fusion of CNN and Transformer Models for Real-Time Object Detection. In Proceedings of the 2023 IEEE 11th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 8–10 December 2023; Volume 11, pp. 1916–1921. [Google Scholar]
- Zhao, J.; Zhu, B.; Peng, M.; Li, L. Mobile phone screen surface scratch detection based on optimized YOLOv5 model (OYm). IET Image Process. 2023, 17, 1364–1374. [Google Scholar] [CrossRef]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
- Ming, W.; Cao, C.; Zhang, G.; Zhang, H.; Zhang, F.; Jiang, Z.; Yuan, J. Application of convolutional neural network in defect detection of 3C products. IEEE Access 2021, 9, 135657–135674. [Google Scholar] [CrossRef]
- Kuijper, A. p-Laplacian driven image processing. In Proceedings of the 2007 IEEE International Conference on Image Processing, San Antonio, TX, USA, 16 September–19 October 2007; Volume 5, pp. V-257–V-260. [Google Scholar] [CrossRef]
- Lin, X.; Wang, J.; Lin, C. Research on 3D reconstruction in binocular stereo vision based on feature point matching method. In Proceedings of the 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), Dalian, China, 27–29 September 2020; pp. 551–556. [Google Scholar]
- Shang, J.Y.; Zhang, Y.; Zhang, Q.B.; Wang, W.S. Distorted target recognition based on prewitt operator combined with MACH filter. Key Eng. Mater. 2013, 552, 523–528. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, W.; Li, S.; Zhu, K.; Chen, H. Optimization of Binocular Vision Ranging Based on Sparse Stereo Matching and Feature Point Extraction. IEEE Access 2024, 12, 153859–153873. [Google Scholar] [CrossRef]
- Bruni, V.; Vitulano, D. A generalized model for scratch detection. IEEE Trans. Image Process. 2004, 13, 44–50. [Google Scholar] [CrossRef] [PubMed]
- Yuan, F.; Zhang, Z.; Fang, Z. An effective CNN and Transformer complementary network for medical image segmentation. Pattern Recognit. 2023, 136, 109228. [Google Scholar] [CrossRef]
- Xu, Z.; Wang, Z. MCV-UNet: A modified convolution & transformer hybrid encoder-decoder network with multi-scale information fusion for ultrasound image semantic segmentation. PeerJ Comput. Sci. 2024, 10, e2146. [Google Scholar] [PubMed]
- Kanadath, A.; Jothi, J.A.A.; Urolagin, S. CViTS-Net: A CNN-ViT Network with Skip Connections for Histopathology Image Classification. IEEE Access 2024. [Google Scholar] [CrossRef]
- Deng, K.; Meng, Y.; Gao, D.; Bridge, J.; Shen, Y.; Lip, G.; Zheng, Y. Transbridge: A lightweight transformer for left ventricle segmentation in echocardiography. In Simplifying Medical Ultrasound: Second International Workshop, ASMUS 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, 27 September 2021; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 63–72. [Google Scholar]
- Zheng, S.; Lu, J.; Zhao, H.; Zhu, X.; Luo, Z.; Wang, Y.; Zhang, L. Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6881–6890. [Google Scholar]
- Zhao, Y.; Chen, J.; Zhang, Z.; Zhang, R. BA-Net: Bridge attention for deep convolutional neural networks. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022; pp. 297–312. [Google Scholar]
- Chunmou, C. Lead line image enhancement algorithm based on histogram equalization and Laplace. Foreign Electron. Meas. Technol. 2019, 38, 131–135. [Google Scholar]
- Yu, J.; Jiang, Y.; Wang, Z.; Cao, Z.; Huang, T. Unitbox: An advanced object detection network. In Proceedings of the 24th ACM International Conference on Multimedia, Amsterdam, The Netherlands, 15–19 October 2016; pp. 516–520. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Deshpande, A.; Estrela, V.V.; Patavardhan, P. The DCT-CNN-ResNet50 architecture to classify brain tumors with super-resolution, convolutional neural network, and the ResNet50. Neurosci. Inform. 2021, 1, 100013. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized intersection over union: A metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 658–666. [Google Scholar]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, Association for the Advancement of Artificial Intelligence (AAAI), New York, NY, USA, 7 February 2020; Volume 34, pp. 12993–13000. [Google Scholar]
- Zhang, Y.F.; Ren, W.; Zhang, Z.; Jia, Z.; Wang, L.; Tan, T. Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 2022, 506, 146–157. [Google Scholar] [CrossRef]
- Yunpeng, G.; Rui, Z.; Mingxu, Y.; Sabah, F. YOLOv8-TDD: An Optimized YOLOv8 Algorithm for Targeted Defect Detection in Printed Circuit Boards. J. Electron. Test. 2024, 1–12. [Google Scholar] [CrossRef]
- Liang, A.; Wang, Q.; Wu, X. Context-Enhanced Network with Spatial-Aware Graph for Smartphone Screen Defect Detection. Sensors 2024, 24, 3430. [Google Scholar] [CrossRef] [PubMed]
- Zhao, C.; Pan, J.; Tan, Q.; Wu, Z.; Chen, Z. DSU-Net: Dynamic Stacked U-Net for Enhancing Mobile Screen Defect Detection. In Proceedings of the 2023 China Automation Congress (CAC), Chongqing, China, 17–19 November 2023; pp. 7454–7459. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective kernel networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3146–3154. [Google Scholar]
- Park, J. Bam: Bottleneck attention module. arXiv 2018, arXiv:1807.06514. [Google Scholar]
- Cao, Y.; Xu, J.; Lin, S.; Wei, F.; Hu, H. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Su, R.; Huang, W.; Ma, H.; Song, X.; Hu, J. SGE net: Video object detection with squeezed GRU and information entropy map. In Proceedings of the 2021 IEEE International Conference on Image Processing (ICIP), Anchorage, AK, USA, 19–22 September 2021; pp. 689–693. [Google Scholar]
- Shi, X.; Zhou, S.; Tai, Y.; Wang, J.; Wu, S.; Liu, J.; Xu, K.; Peng, T.; Zhang, Z. An improved faster R-CNN for steel surface defect detection. In Proceedings of the 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), Shanghai, China, 26–28 September 2022; pp. 1–5. [Google Scholar]
- Dai, J.; Li, Y.; He, K.; Sun, J. R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Volume 29. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
- Lin, T. Focal Loss for Dense Object Detection. arXiv 2017, arXiv:1708.02002. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13029–13038. [Google Scholar]
Equipment | Computer Configuration Parameters |
---|---|
Operating system | Linux |
Type of operating system | Ubuntu20.04 |
RAM | 64 G |
CPU | 12 vCPU Intel(R) Xeon(R) Platinum 8255C CPU @ 2.50 GHz |
GPU | RTX 3080 (10 GB) × 1 |
Hard disk drive | System disk: 30 GB Data disks: 50 GB SSD |
Development language | Python 3.8 |
Deep learning framework | PyTorch 2.0.0 |
Method | AP |
---|---|
IoU | 36.8 |
GIoU | 36.8 |
CIoU | 36.9 |
EIoU | 37.0 |
Method | OA/% | F1-Score/% | EIoU/% |
---|---|---|---|
YOLOv1 | 96.38 | 84.12 | 63.13 |
YOLOv1 + GLFI | 96.27 | 86.36 | 64.37 |
Transformer + YOLOv1 | 97.08 | 85.97 | 64.01 |
Transformer + YOLOv1 + GLFI | 98.04 | 88.03 | 65.13 |
Attention Module | OA/% | F1-Score/% | EIoU/% | Number of Parameters/106 MB |
---|---|---|---|---|
SKNet [31] | 97.20 | 86.60 | 64.40 | 26.15 |
CBAM [32] | 97.10 | 85.95 | 64.01 | 28.09 |
ECA-Net [33] | 97.05 | 87.10 | 64.13 | 25.65 |
BAM [34] | 97.00 | 86.20 | 64.25 | 25.80 |
DANet [35] | 96.90 | 85.80 | 64.10 | 26.50 |
GC-Net [36] | 96.40 | 84.10 | 63.12 | 28.08 |
SGE-Net [37] | 96.25 | 86.35 | 64.35 | 25.50 |
GLFI | 98.05 | 88.05 | 65.15 | 24.20 |
Method | OA/% | F1-Score/% | EIoU/% | Time/s | Parameters/106 MB |
---|---|---|---|---|---|
R-CNN | 95.27 | 81.34 | 60.63 | 43.2 | 30.6 |
Faster R-CNN [38] | 96.12 | 83.45 | 62.34 | 41.5 | 28.9 |
R-FCN [39] | 96.54 | 84.32 | 62.87 | 40.8 | 27.5 |
Cascade R-CNN [40] | 97.23 | 85.67 | 64.21 | 40.2 | 29.3 |
Libra R-CNN [41] | 97.15 | 85.34 | 63.98 | 39.8 | 28.7 |
RetinaNet [42] | 97.51 | 86.24 | 63.92 | 40.6 | 27.6 |
YOLO | 96.39 | 84.12 | 63.12 | 42.6 | 25.3 |
YOLOv1 [43] | 95.87 | 82.76 | 61.89 | 41.9 | 24.8 |
YOLOv2 [43] | 96.18 | 83.65 | 62.56 | 41.2 | 25.1 |
YOLOv3 [43] | 96.84 | 84.87 | 63.54 | 41.8 | 25.7 |
FS-Net | 98.04 | 88.03 | 65.13 | 39.1 | 24.2 |
Serial Number | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
R-CNN | |||||
Faster R-CNN | |||||
R-FCN | |||||
Cascade R-CNN | |||||
Libra R-CNN | |||||
RetinaNet | |||||
YOLO | |||||
YOLOv1 | |||||
YOLOv2 | |||||
YOLOv3 | |||||
FS-Net |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Cao, Z.; Liang, K.; Tang, S.; Zhang, C. Applications of the FusionScratchNet Algorithm Based on Convolutional Neural Networks and Transformer Models in the Detection of Cell Phone Screen Scratches. Electronics 2025, 14, 134. https://doi.org/10.3390/electronics14010134
Cao Z, Liang K, Tang S, Zhang C. Applications of the FusionScratchNet Algorithm Based on Convolutional Neural Networks and Transformer Models in the Detection of Cell Phone Screen Scratches. Electronics. 2025; 14(1):134. https://doi.org/10.3390/electronics14010134
Chicago/Turabian StyleCao, Zhihong, Kun Liang, Sheng Tang, and Cheng Zhang. 2025. "Applications of the FusionScratchNet Algorithm Based on Convolutional Neural Networks and Transformer Models in the Detection of Cell Phone Screen Scratches" Electronics 14, no. 1: 134. https://doi.org/10.3390/electronics14010134
APA StyleCao, Z., Liang, K., Tang, S., & Zhang, C. (2025). Applications of the FusionScratchNet Algorithm Based on Convolutional Neural Networks and Transformer Models in the Detection of Cell Phone Screen Scratches. Electronics, 14(1), 134. https://doi.org/10.3390/electronics14010134