Research on an Eye Control Method Based on the Fusion of Facial Expression and Gaze Intention Recognition
<p>The technical route of this paper’s research.</p> "> Figure 2
<p>Face image dataset example.</p> "> Figure 3
<p>This eye movement intent detection flow chart describes the conversion of eye movement data to intent classification.</p> "> Figure 4
<p>Integration framework based on attention mechanism.</p> "> Figure 5
<p>Comparison of performance in single-mode and multimodal prediction.</p> "> Figure 6
<p>Line charts of five indicators of different models.</p> "> Figure 7
<p>Loss function curve of Anchor method before and after improvement.</p> "> Figure 8
<p>Structure diagram of the CA attention mechanism [<a href="#B9-applsci-14-10520" class="html-bibr">9</a>].</p> "> Figure 9
<p>Improved YOLOv5 model structure.</p> "> Figure 10
<p>Improved loss variation diagram for the YOLOv5 model.</p> "> Figure 10 Cont.
<p>Improved loss variation diagram for the YOLOv5 model.</p> "> Figure 11
<p>The average accuracy (AP) curve of the improved model.</p> "> Figure 12
<p>The F1 score curve of the improved model.</p> "> Figure 13
<p>Test results before and after improvement.</p> "> Figure 14
<p>Human–computer interaction experiment platform.</p> "> Figure 15
<p>The overall flow chart of the experiment.</p> "> Figure 16
<p>Comparison of calculation efficiency indicators.</p> "> Figure 17
<p>Complete human–computer interaction process.</p> "> Figure 18
<p>Test results.</p> "> Figure 19
<p>Test results for different tasks.</p> ">
Abstract
:1. Introductory
Research Ideas
2. Research on the Key Technology of Eye Control Methods
2.1. Facial Expression Recognition
2.1.1. Datasets
2.1.2. Face Image Preprocessing
2.1.3. Example of Face Image Data Set
2.2. Extraction and Recognition of Eye Movement Features
2.3. Interactive Intent Recognition Feature Fusion
Feature Fusion Strategy Based on Attention Mechanism
3. Eye–Machine Interaction Technology Based on the YOLOv5 Network
3.1. Experimental Design of Eye–Machine Interaction Technology
3.1.1. Adaptive Anchor and Its Improvement
3.1.2. Adding an Attention Mechanism
3.1.3. Analysis of Experimental Results of Improved Model Structure
3.2. Discrimination of Eye Movement Behavior
3.3. Construction of Experimental Platform
3.4. Eye Movement Interactive Grasping Experiment
3.5. Analysis of Eye–Machine Interaction Experiment Results
4. Summary of This Article
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tatsumi, E.; Yasumura, M.; Rashid, M. Deep Learning-Based Eye Movement Analysis for Predicting Landing Performance in Virtual Reality. Appl. Ergon. 2019, 76, 167–175. [Google Scholar]
- Duchowski, A.T. Eye Tracking Methodology: Theory and Practice, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
- Lucey, P.; Cohn, J.F.; Kanade, T.; Saragih, J.; Ambadar, Z.; Matthews, I. The Extended Cohn-Kanade Dataset (CK+): A Complete Dataset for Action Unit and Emotion-Specified Expression. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; pp. 94–101. [Google Scholar]
- Plopski, A.; Hirzle, T.; Norouzi, N. The Eye in Extended Reality: A Survey on Gaze Interaction and Eye Tracking in Head-Worn Extended Reality Interface. J. Virtual Real. 2022, 55, 1–39. [Google Scholar] [CrossRef]
- Schindler, K.; Van Gool, L.; de Gelder, B. Recognizing Emotions Expressed by Body Pose: A Biologically Inspired Neural Model. Neural Netw. 2008, 21, 1238–1246. [Google Scholar] [CrossRef] [PubMed]
- Berndt, E.K.; Hall, B.H. Hidden Markov Models for Economic Time Series Analysis. Econometrica 1963, 31, 63–84. [Google Scholar]
- Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Shanghai, China, 15–17 October 2021; pp. 13713–13722. [Google Scholar]
- Liao, L.; Han, C.; He, C. Rice Disease Image Classification Method Based on VGG-19 Convolutional Neural Network and Transfer Learning. Surv. Mapp. 2023, 46, 153–157+181. [Google Scholar]
- Liu, S.; Huang, J.; Wang, Z.; Wang, X.; Qiu, S.; Wen, S. Improved YOLOv5 for Real-Time Aerial Target Detection. Remote Sens. 2020, 12, 303. [Google Scholar]
- Zhou, X.; Wang, D.; Kratz, L.; Lin, Y. Bottom-Up Attention: Fine-Grained Visual Question Answering with Bottom-Up Attention Flow. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: Red Hook, NY, USA, 2015; pp. 91–99. [Google Scholar]
Method | Accuracy | Precision | Recall | F1 |
---|---|---|---|---|
SVM | 90.8% | 91.6% | 92.7% | 94% |
RF | 89.6% | 88.4% | 91.6% | 91% |
KNN | 90.1% | 87.3% | 93.6% | 89% |
Feature Type | Accuracy Rate | Precision | Recall Rate | F1 Score |
---|---|---|---|---|
Facial expression | 87.6% | 89.1% | 92.6% | 90.6% |
Eye movement | 90.8% | 91.6% | 93.7% | 92.1% |
Fusion Technique | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
Concatenation | 92.0% | 91.0% | 93.0% | 92.0% |
Bilinear Pooling | 92.5% | 92.0% | 93.5% | 92.8% |
Self-Attention | 94.6% | 93.8% | 97.7% | 94.5% |
Test Group | Characteristic Information | Identification Method |
---|---|---|
Single-feature | Expression feature | VGG-19 |
Eye movement feature | SVM | |
Multi-feature | Facial features + eye | Concatenation + SVM |
movement features + attention | ||
mechanisms |
Model Class | Evaluation Index | ||||
---|---|---|---|---|---|
Accuracy | Precision | Recall | F1 | AP | |
Facial | |||||
expression | 87.6% | 89.1% | 92.6% | 90.6% | 89.8% |
feature | |||||
Eye | |||||
movement | 90.8% | 91.6% | 93.7% | 92.1% | 92.8% |
feature | |||||
Decision | |||||
level fusion | 94.6% | 93.8% | 97.7% | 94.5% | 95.3% |
features |
Attention Mechanism | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
SE | 93.1% | 92.1% | 94.0% | 93.0% |
CBAM | 92.8% | 91.5% | 94.5% | 93.0% |
CA | 96.3% | 95.0% | 97.5% | 96.0% |
Experimental Environment | Version Model |
---|---|
Operating system | Window10 |
GPU | NVIDIA GeForce RTX3060 (NVIDIA, Santa Clara, CA, USA) |
Compiled language | Python3.8 |
Compilation environment | Pycharm2021 |
Deep learning framework | Pytorch1.8.1 |
Before Anchors: | After Anchors: | ||
---|---|---|---|
-[10,13, 16,30, 33,23] | P3/8 | -[5,6, 8,14, 15,11] | 4 |
-[30,61, 62,45, 59,119] | P4/16 | -[10,13, 16,30, 33,23] | P3/8 |
-[116,90, 156,198, 373,326] | P5/32 | -[30,61, 62,45, 59,119] | P4/16 |
-[116,90, 156,198, 373,326] | P5/32 |
Model | Accurate/% | Recall Rate/% | F1 Score | Average Precision/% | mAP/% |
---|---|---|---|---|---|
SSD | 87.3 | 89.6 | 88.5 | 86.4 | 43.3 |
YOLOv3-tiny | 90.8 | 93.7 | 92.2 | 88.2 | 44.2 |
Fast-Rcnn | 88.7 | 90.5 | 89.6 | 87.8 | 43.6 |
YOLOv5 | 92.2 | 95.6 | 93.9 | 88.9 | 44.6 |
Improvement of YOLOv5 | 96.8 | 97.6 | 96.2 | 91.2 | 48.5 |
Model | Join the Small Goal Tier | Incorporation of the Attention Mechanism | Average Precision/% | Mean Average Precision/% |
---|---|---|---|---|
YOLOv5s | × | × | 87.5 | 44.6 |
Improvement 1 | ✓ | × | 88.5 | 46.3 |
Improvement 2 | × | ✓ | 89.3 | 46.5 |
Methodology of this paper | ✓ | ✓ | 91.2 | 48.5 |
Ocular State | Criterion of Discrimination |
---|---|
Up, down, left, and right | Last more than 1.5 s |
Close one’s eyes | Last more than 1 s |
Single wink | There was a continuous change in “closed eye—open eye” eye state within 1 s |
Double wink | Two consecutive changes in “closed eye—open eye” appeared within 1.5 s |
Model | Processing Time (ms) | Memory Usage (MB) | Power Dissipation (W) |
---|---|---|---|
The original YOLOv5 | 23.6 | 256 | 2.1 |
Improved YOLOv5 | 18.9 | 220 | 1.9 |
Serial Number | |||
---|---|---|---|
1 | NOP | 14 | ELSEIF M(530 = 1)THEN |
2 | MOUT M(528)OFF | 15 | CALLJOB:leftmove |
3 | MOUT M(529)OFF | 16 | MOUT M(530)OFF |
4 | MOUT M(530)OFF | 17 | ELSEIF M(531 = 1)THEN |
5 | MOUT M(531)OFF | 18 | CALLJOB:rightmove |
6 | MOUT M(532)OFF | 19 | MOUT M(531)OFF |
7 | WHILE LB000=0 DO | 20 | ELSEIF M(532 = 1)THEN |
8 | IF M(528) = 1 THEN | 21 | CALLJOB:toolclose |
9 | CALLJOB:upmove | 22 | MOUT M(532)OFF |
10 | MOUT M(528)OFF | 23 | ENDIF |
11 | ELSEIF M(529 = 1)THEN | 24 | COUNTNUE |
12 | CALLJOB:downmove | 25 | ENDWHILE |
13 | MOUT M(529)OFF | 26 | END |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, X.; Cai, Z. Research on an Eye Control Method Based on the Fusion of Facial Expression and Gaze Intention Recognition. Appl. Sci. 2024, 14, 10520. https://doi.org/10.3390/app142210520
Sun X, Cai Z. Research on an Eye Control Method Based on the Fusion of Facial Expression and Gaze Intention Recognition. Applied Sciences. 2024; 14(22):10520. https://doi.org/10.3390/app142210520
Chicago/Turabian StyleSun, Xiangyang, and Zihan Cai. 2024. "Research on an Eye Control Method Based on the Fusion of Facial Expression and Gaze Intention Recognition" Applied Sciences 14, no. 22: 10520. https://doi.org/10.3390/app142210520
APA StyleSun, X., & Cai, Z. (2024). Research on an Eye Control Method Based on the Fusion of Facial Expression and Gaze Intention Recognition. Applied Sciences, 14(22), 10520. https://doi.org/10.3390/app142210520