Real-Time Robotic Presentation Skill Scoring Using Multi-Model Analysis and Fuzzy Delphi–Analytic Hierarchy Process
<p>Developed scoring approach for our multi-model analysis and RPSS.</p> "> Figure 2
<p>Joints generated from MediaPipe BlazePose.</p> "> Figure 3
<p>Face detection applied to one video from our dataset.</p> "> Figure 4
<p>The detection of body movement, showing the right movement when the x-coordinate of the left shoulder at time (t) is higher than the x-coordinate of the nose at time (t − 1).</p> "> Figure 5
<p>The detection of body movement, showing the left movement when the x-coordinate of the right shoulder at time (t) is lower than the x-coordinate of the nose at time (t − 1).</p> "> Figure 6
<p>The scoring results in the real-time case study in the lab environment.</p> ">
Abstract
:1. Introduction
2. Contributions
- To the best of our knowledge, this article is the first to propose a multi-model analysis approach for presentation scoring based on four criteria, namely facial expressions, eye contact, hand gesture and body movement. The proposed RPSS captures the sensing data by using an Intel real sensor D435 camera mounted on a turtle robot with a GUI interface for interaction.
- RPSS identifies five academic facial expressions and eye contact to improve its scoring evaluation accuracy compared with other approaches that only rely on facial expressions without eye contact.
- RPSS adopts fuzzy Delphi for criteria selection and incorporated AHP for weighting or prioritizing the criteria used in scoring. The outcome of Fuzzy Delphi-AHP is a weighting vector that assigns different weight for each criterion according to the system manager preference. The manager of RPSS defines the relative importance between the criteria and AHP to calculate the weights.
- RPSS generates the final score by following predetermined rules that are formulated based on the selected criteria whilst considering the weights of each criterion according to the AHP-based estimation method.
- RPSS maximises its prediction accuracy by using plausible actions from the robot (active learning) before providing its predictions to the AHP for calculating the score.
- The study introduces the concept of active learning for robotic presentation scoring, where the robot actively adjusts its position to improve the quality of data recording, by leveraging active learning techniques. Active learning demonstrates the potential to enhance the prediction performance of presentation scoring systems.
3. Literature Review
3.1. Presentation Scoring
3.2. Basic and Learning-Centred Emotion Classification
4. Methodology
4.1. RPSS
Algorithm 1: Pseudocode of our proposed RPSS |
Input: (1) videoStream—real-time video stream from the robot’s camera. (2) expertOpinions—collection of criteria from experts. (3) scoringCriteria—as found by Delphi method. (4) AHPWeights—as found by AHP method. Output: scoringResult—group A, B, C, or D. 1: Start algorithm: 2: skeletonFeatures = ExtractSkeletonFeatures(videoStream) 3: faceFeatures = ExtractFaceFeatures(videoStream) 4: eyeFeatures = ExtractEyeFeatures(videoStream) 5: eyeContact = EyeContactDetection(eyeFeatures) 6: facialExpressions = FacialExpressionsIdentification(faceFeatures) 7: handGestures = HandGestureDetection(skeletonFeatures) 8: bodyMovements = BodyMovementAnalysis(skeletonFeatures) 9: fusedData = Fusion(eyeContact, facialExpressions, handGestures, bodyMovements) 10: score = ScoringUnit(fusedData, scoringCriteria) 11: scoringResult = RuleBasedClassification(score,AHPWeights) 12: Return scoringResult 13: End algorithm. |
4.1.1. Criteria Collection and Selection
- Stage 1: Identifying the yardstick and criteria for the research
- Stage 2: Collecting expert judgements and opinions via group decisions.
- denotes the number of experts;
- denotes the number of factors.
- Stage 3: Specifying the criteria
4.1.2. Criteria Weighting
4.1.3. Data Collection
4.1.4. Skeleton Identification
4.1.5. Face Identification
Algorithm 2: Pseudocode of extracting faces from videos |
Input: (1) Videos: The original videos. Output: framesArray. facesArray. cords: face coordinates in image 1: start algorithm 2:for each video in videos do 3: framesArray = extractFrames(video) 4: for each frame in framesArray do 5: cords = extractFaceCords(frame) 6: face = cutImage([cords]) 7: facesArray = append(facesArray,face) 8: end for 9: end for 10:end algorithm |
4.1.6. Eye Contact
Algorithm 3: Detect eye contact using Deep EC model |
Input: (1) videoPath: Path where the video is stored (2) modelWeight: Path where weights of DeepEC model is stored Output: (1) finalResult: it will be 1 if video with eye contact or 0 if not 1: start algorithm 2: y = list stores 0 or 1 for each frame 3: confidence = list stores confidence of eye contact for each frame 4: k = list stores confidence then more 0.5 5: model = modelStatic(modelWeight) ▷ load model weights 6: confidenceThresh = 0.9 7: TotalConfThresh = 0.75 8: TotalScoreThresh = 0.85 9: while videoCapture is opened do 10: frame ← readFrame 11: boundedBox ← FaceDetectionModel(frame) 12: face ← faceCrop(boundedBox) 13: image ← face.resize(224,224) 14: output ← DeepECModel(image) 15: confidence ← confidence(output) 16: if confidence ≥ confidenceThresh then 17: y ← 1 18: else 19: y ← 0 20: end if 21: if confidence > 0.5 then 22: k ← confidence 23: end if 24: end while 25: totalConf ← size(k)/size(y) 26: totalScore ← average(k) 27: if totalScore > TotalScoreThresh ∧ totalConf > TotalConfThresh then 28: finalResult ← 1 29: else 30: finalResult ← 0 31: end if 32: return finalResult 33: end algorithm |
4.1.7. Learning-Centred Emotions Classification
- Data Preparation
Algorithm 4: Pseudocode of link faces with labels |
Input: (1) faces: images cropped to show only the face of the participant. (2) labels: dataframe of daisee dataset. Output: numpyFaces&labels. face&label: each face image with it's corrosponding label. 1:start algorithm 2: for each faces,label in zip(faces,label) do 3: face = toNumpy(face) 4: label = toNumpy(label) 5: face&label = toNumpy(link(face,label)) 6: numpyFaces&labels = append(numpyFaces&labels,face&label) 7: end for 8: End Algorithm |
- B.
- Model Training
4.1.8. Hand Gesture Detection
- Calculate the difference in Xs between the wrist and elbow points.
- Calculate the difference in Ys between the wrist and elbow points.
- Calculate for previous values.
- Repeat the above steps for the shoulder and elbow points.
4.1.9. Body Movement Analysis
- denotes the x-coordinate of the left shoulder at moment ;
- denotes the x-coordinate of the right shoulder at moment ;
- denotes the x-coordinate of the nose at moment .
4.1.10. Rule-Based Scoring Model
- Group A: Students in this group exhibit predominantly positive emotions, maintain consistent eye contact and frequently utilise hand and body gestures throughout the session.
- Group B: Students in this group display a mix of positive and negative emotions, maintain moderate eye contact and make regular use of hand and body gestures.
- Group C: Students in this group demonstrate a high prevalence of negative emotions, varying levels of eye contact and occasional use of hand and body movements.
- Group D: Students in this group primarily exhibit negative emotions, have poor eye contact and do not utilise hand or body gestures during the session.
4.1.11. Active Learning
Algorithm 5: Pseudocode of active learning for RPSS |
Input: (1) weights (2) classification (3) control interval Output: Start algorithm: 1:Initiate an empty list 2: For each between do 3: Calculate the corresponding value of 4: Add the value of to the list of with the corresponding 5: End for 6: Find the value of that is associated with the maximum value of inside using linear search. 7: Return 8: End algorithm. |
5. Experimental Works and Analysis
5.1. Datasets
5.2. Decision Making Results
5.2.1. Eye Contact
5.2.2. Face Emotions
5.2.3. Hand Movement
5.2.4. Body Movement
5.3. Integrated the Scoring Approach and the Selected Components
- Body Movement: Amongst the three methods, the Hybrid method shows the most promise, given its relatively high precision, recall, F1 score and accuracy for both Videos 1 and 2 from the TEDx dataset. By combining the benefits of the “With Kalman-Filter” and “Without Kalman-Filter” methods, this method achieves an improved performance in detecting body movement.
- Hand Movement: The model incorporating Kalman filtering achieves the highest precision, recall, F1 score and accuracy for both scores A and D in the hand movement classification task.
- Eye Contact: DeepEC is selected, which is a supervised model.
- Face Emotion: EfficientNet outperforms the other models (Xception, Inception, ResNet and MobileNet) in classifying facial expressions as reflected in its higher F1 scores and accuracy across the five emotion categories.
5.3.1. Case Study: User Experience Evaluation
5.3.2. Robot Scoring Group vs. Experts Scoring Group
6. Conclusions and Future Works
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ramli, I.S.M.; Maat, S.M.; Khalid, F. The Design of Game-Based Learning and Learning Analytics. Cypriot J. Educ. Sci. 2022, 17, 1742–1759. [Google Scholar] [CrossRef]
- Saini, M.K.; Goel, N. How smart are smart classrooms? A review of smart classroom technologies. ACM Comput. Surv. 2019, 52, 1–28. [Google Scholar] [CrossRef]
- Hussin, M.; Said, M.S.; Mohd Norowi, N.; Husin, N.A.; Mustaffa, M.R. Authentic Assessment for Affective Domain through Student Participant in Community Services. Asia-Pac. J. Inf. Technol. Multimed. 2021, 10, 52–62. [Google Scholar] [CrossRef]
- Sun, Z.; Li, Z.; Nishimorii, T. Development and assessment of robot teaching assistant in facilitating learning. In Proceedings of the 6th International Conference of Educational Innovation through Technology EITT, Osaka, Japan, 7–9 December 2017; pp. 165–169. [Google Scholar]
- Alshammari, R.F.N.; Arshad, H.; Rahman, A.H.A.; Albahri, O.S. Robotics Utilization in Automatic Vision-Based Assessment Systems from Artificial Intelligence Perspective: A Systematic Review. IEEE Access 2022, 10, 77537–77570. [Google Scholar] [CrossRef]
- Ahmed, H.; La, H.M. Education-Robotics Symbiosis: An Evaluation of Challenges and Proposed Recommendations. In Proceedings of the 2019 9th IEEE Integrated STEM Education Conference (ISEC), Princeton, NJ, USA, 16 March 2019; pp. 222–229. [Google Scholar]
- Ahmed Soliman, S. Efficiency of an Educational Robotic Computer-mediated Training Program for Developing Students’ Creative Thinking Skills: An Experimental Study. Arab. World Engl. J. 2019, 5, 124–140. [Google Scholar] [CrossRef]
- Abd Rahman, A.H.; Sulaiman, R.; Sani, N.S.; Adam, A.; Amini, R. Evaluation of peer robot communications using cryptoros. Int. J. Adv. Comput. Sci. Appl. 2019, 10, 658–663. [Google Scholar] [CrossRef]
- Hsieh, Y.Z.; Lin, S.S.; Luo, Y.C.; Jeng, Y.L.; Tan, S.W.; Chen, C.R.; Chiang, P.Y. ARCS-assisted teaching robots based on anticipatory computing and emotional Big Data for improving sustainable learning efficiency and motivation. Sustainability 2020, 12, 5605. [Google Scholar] [CrossRef]
- Yoshino, K.; Zhang, S. Construction of Teaching Assistant Robot in Programming Class. In Proceedings of the 2018 7th International Congress on Advanced Applied Informatics, IIAI-AAI, Yonago, Japan, 8–13 July 2018; pp. 215–220. [Google Scholar]
- Fekry, A.; Dafoulas, G.; Ismail, M. Automatic detection for students behaviors in a group presentation. In Proceedings of the ICCES 2019: 2019 14th International Conference on Computer Engineering and Systems, Cairo, Egypt, 17 December 2019; pp. 11–15. [Google Scholar]
- Bhole, G.P.; Deshmukh, T. Multi-criteria decision making (MCDM) methods and its applications. Int. J. Res. Appl. Sci. Eng. Technol. 2018, 6, 899–915. [Google Scholar] [CrossRef]
- Ochoa, X.; Domínguez, F.; Guamán, B.; Maya, R.; Falcones, G.; Castells, J. The RAP system: Automatic feedback of oral presentation skills using multimodal analysis and low-Cost sensors. ACM Int. Conf. Proc. Ser. 2018, 14, 360–364. [Google Scholar] [CrossRef]
- Shahrim, K.; Abd Rahman, A.H.; Goudarzi, S. Hazardous Human Activity Recognition in Hospital Environment Using Deep Learning. IAENG Int. J. Appl. Math. 2022, 52, 748–753. [Google Scholar]
- Ashwin, T.S.; Guddeti, R.M.R. Affective database for e-learning and classroom environments using Indian students’ faces, hand gestures and body postures. Future Gener. Comput. Syst. 2020, 108, 334–348. [Google Scholar]
- Gupta, A.; D’Cunha, A.; Awasthi, K.; Balasubramanian, V. DAiSEE: Towards User Engagement Recognition in the Wild. arXiv 2016, arXiv:1609.01885. [Google Scholar]
- Haider, F.; Koutsombogera, M.; Conlan, O.; Vogel, C.; Campbell, N.; Luz, S. An Active Data Representation of Videos for Automatic Scoring of Oral Presentation Delivery Skills and Feedback Generation. Front. Comput. Sci. 2020, 2, 1. [Google Scholar] [CrossRef]
- Tun, S.S.Y.; Okada, S.; Huang, H.H.; Leong, C.W. Multimodal Transfer Learning for Oral Presentation Assessment. IEEE Access 2023, 11, 84013–84026. [Google Scholar] [CrossRef]
- Daud, S.A.A.; Lutfi, S.L. Towards the detection of learner’s uncertainty through face. In Proceedings of the 2016 4th International Conference on User Science and Engineering, i-USEr, Melaka, Malaysia, 23–25 August 2016; pp. 227–231. [Google Scholar]
- Shi, Z.; Zhang, Y.; Bian, C.; Lu, W. Automatic academic confusion recognition in online learning based on facial expressions. In Proceedings of the 14th International Conference on Computer Science and Education, ICCSE, Toronto, ON, Canada, 19–21 August 2019; pp. 528–532. [Google Scholar]
- Sharma, P.; Joshi, S.; Gautam, S.; Maharjan, S.; Filipe, V. Student Engagement Detection Using Emotion Analysis, Eye Tracking and Head Movement with Machine Learning Cabral Reis Universidade de Tras-os-Montes e Alto Douro, Vila Real, Portugal Institute of Electronics and Informatics Engineering of Aveiro, Port. arXiv 2019, arXiv:1909.12913. [Google Scholar]
- Liao, D.; Wu, T.; Chen, Y. An interactive robot for fatigue detection in the learning process of children. In Proceedings of the 2017 2nd International Conference on Advanced Robotics and Mechatronics (ICARM), Hefei and Tai’an, China, 27–31 August 2017; Volume 2018-January, pp. 218–222. [Google Scholar]
- Filntisis, P.P.; Efthymiou, N.; Koutras, P.; Potamianos, G.; Maragos, P. Fusing body posture with facial expressions for joint recognition of affect in child—Robot interaction. IEEE Robot. Autom. Lett. 2019, 4, 4011–4018. [Google Scholar] [CrossRef]
- Li, G.; Wang, Y. Research on leamer’s emotion recognition for intelligent education system. In Proceedings of the 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference, IAEAC, Chongqing, China, 12–14 October 2018; pp. 754–758. [Google Scholar]
- Xie, W.; Jia, X.; Shen, L.; Yang, M. Sparse deep feature learning for facial expression recognition. Pattern Recognit. 2019, 96, 106966. [Google Scholar] [CrossRef]
- He, Z.; Jin, T.; Basu, A.; Soraghan, J.; Di Caterina, G.; Petropoulakis, L. Human emotion recognition in video using subtraction pre-processing. In Proceedings of the ICMLC’19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; Part F1481, pp. 374–379. [Google Scholar]
- Khanh, T.L.B.; Kim, S.H.; Lee, G.; Yang, H.J.; Baek, E.T. Korean video dataset for emotion recognition in the wild. Multimed. Tools Appl. 2021, 80, 9479–9492. [Google Scholar] [CrossRef]
- Espinosa-Aranda, J.L.; Vallez, N.; Rico-Saavedra, J.M.; Parra-Patino, J.; Bueno, G.; Sorci, M.; Moloney, D.; Pena, D.; Deniz, O. Smart doll: Emotion recognition using embedded deep learning. Symmetry 2018, 10, 387. [Google Scholar] [CrossRef]
- Webb, N.; Ruiz-Garcia, A.; Elshaw, M.; Palade, V. Emotion Recognition from Face Images in an Unconstrained Environment for usage on Social Robots. In Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
- Müller, P.; Huang, M.X.; Zhang, X.; Bulling, A. Robust eye contact detection in natural multi-person interactions using gaze and speaking behaviour. In Proceedings of the Eye Tracking Research and Applications Symposium (ETRA), Warsaw, Poland, 14–17 June 2018; pp. 1–10. [Google Scholar]
- Chong, E.; Clark-whitney, E.; Southerland, A.; Stubbs, E.; Miller, C.; Ajodan, E.L.; Silverman, M.R.; Lord, C.; Rozga, A.; Jones, R.M.; et al. Is As Accurate as Human Experts. Nat. Commun. 2020, 11, 6386. [Google Scholar] [CrossRef] [PubMed]
- Sahebi, I.G.; Masoomi, B.; Ghorbani, S. Expert oriented approach for analyzing the blockchain adoption barriers in humanitarian supply chain. Technol. Soc. 2020, 63, 101427. [Google Scholar] [CrossRef]
- Nayak, S.; Pattanayak, S.; Choudhury, B.B.; Kumar, N. Selection of Industrial Robot Using Fuzzy Logic Approach. In Proceedings of the 5th International Conference on Computational Intelligence in Data Mining (ICCIDM-2018), Burla, India, 15–16 December 2018; Behera, H.S., Nayak, J., Naik, B., Pelusi, D., Eds.; Springer: Singapore, 2020; pp. 221–232. [Google Scholar]
- Yusoff, A.F.M.; Hashim, A.; Muhamad, N.; Hamat, W.N.W. Application of Fuzzy Delphi Technique to Identify the Elements for Designing and Developing the e-PBM PI-Poli Module. Asian J. Univ. Educ. 2021, 17, 292–304. [Google Scholar] [CrossRef]
- Patrona, F.; Chatzitofis, A.; Zarpalas, D.; Daras, P. Motion analysis: Action detection, recognition and evaluation based on motion capture data. Pattern Recognit. 2018, 76, 612–622. [Google Scholar] [CrossRef]
- Docekal, J.; Rozlivek, J.; Matas, J.; Hoffmann, M. Human keypoint detection for close proximity human-robot interaction. arXiv 2022, arXiv:2207.07742. [Google Scholar]
- Minatour, Y.; Bonakdari, H.; Aliakbarkhani, Z.S. Extension of Fuzzy Delphi AHP Based on Interval-Valued Fuzzy Sets and its Application in Water Resource Rating Problems. Water Resour. Manag. 2016, 30, 3123–3141. [Google Scholar] [CrossRef]
- Coffey, L.; Claudio, D. In defense of group fuzzy AHP: A comparison of group fuzzy AHP and group AHP with confidence intervals. Expert Syst. Appl. 2021, 178, 114970. [Google Scholar] [CrossRef]
- Albahri, O.S.; Zaidan, A.A.; Albahri, A.S.; Zaidan, B.B.; Abdulkareem, K.H.; Al-qaysi, Z.T.; Alamoodi, A.H.; Aleesa, A.M.; Chyad, M.A.; Alesa, R.M.; et al. Systematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: Taxonomy analysis, challenges, future solutions and methodological aspects. J. Infect. Public Health 2020, 13, 1381–1396. [Google Scholar] [CrossRef]
- Hassouneh, A.; Mutawa, A.M.; Murugappan, M. Development of a Real-Time Emotion Recognition System Using Facial Expressions and EEG based on machine learning and deep neural network methods. Inform. Med. Unlocked 2020, 20, 100372. [Google Scholar] [CrossRef]
- Bazarevsky, V.; Grishchenko, I.; Raveendran, K.; Zhu, T.; Zhang, F.; Grundmann, M. BlazePose: On-device Real-time Body Pose tracking. arXiv 2020, arXiv:2006.10204. [Google Scholar]
- Bazarevsky, V.; Kartynnik, Y.; Vakunov, A.; Raveendran, K.; Grundmann, M. Blazeface: Sub-millisecond neural face detection on mobile gpus. arXiv 2019, arXiv:1907.05047. [Google Scholar]
- Zhang, X.; Sugano, Y.; Fritz, M.; Bulling, A. MPIIGaze: Real-World Dataset and Deep Appearance-Based Gaze Estimation. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 162–175. [Google Scholar] [CrossRef] [PubMed]
- Mora, K.A.F.; Monay, F.; Odobez, J.M. EYEDIAP: A database for the development and evaluation of gaze estimation algorithms from RGB and RGB-D cameras. In Proceedings of the Eye Tracking Research and Applications Symposium (ETRA), Safety Harbor, FL, USA, 26 March–28 May 2014; pp. 255–258. [Google Scholar]
- Gu, J.; Yang, X.; De Mello, S.; Kautz, J. Dynamic facial analysis: From Bayesian filtering to recurrent neural network. In Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR, Honolulu, HI, USA, 21–26 July 2017; Volume 2017-January, pp. 1531–1540. [Google Scholar]
- Savchenko, A. V Video-based frame-level facial analysis of affective behavior on mobile devices using EfficientNets. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 2359–2366. [Google Scholar]
- McLaren, L.; Koutsombogera, M.; Vogel, C. A Heuristic Method for Automatic Gaze Detection in Constrained Multi-Modal Dialogue Corpora. In Proceedings of the 2020 11th IEEE International Conference on Cognitive Infocommunications (CogInfoCom), Mariehamn, Finland, 23–25 September 2020; pp. 55–60. [Google Scholar] [CrossRef]
Article | Sensing | Face Emotion Classifier | Eye Contact Classifier | Hand Gesture Classifier | Body Movement | Robot | Overall Scoring | |
---|---|---|---|---|---|---|---|---|
Depth | Video | |||||||
[17] | × | √ | × | √ | √ | √ | × | Regression methods to predict the score (awarded by tutors) |
[11] | × | √ | √ | × | × | √ | × | × |
[18] | × | √ | √ | √ | × | × | × | × |
[13] | × | √ | × | √ | √ | √ | × | Weighted sum according to the weight of each factor |
Ours | √ | √ | √ | √ | √ | √ | √ | AHP Rule-based |
Refs. | Face | Eye Contact | Body and Hand | Basic Emotion | Learning Emotions | Feature Extraction | Classifier | Ensemble Learning | Deep Learning Model |
---|---|---|---|---|---|---|---|---|---|
[19] | √ | × | × | 0 | 2 | Gabor wavelets | SVM | × | × |
[20] | √ | × | × | 0 | 2 | Histogram of oriented gradient (HOG); Local binary patterns (LBP) | SVM/FC | × | VGG16 |
[21] | √ | √ | × | 7 | 1 | Viola–Jones algorithm as face detector | Softmax | × | Eye contact: CNN; face: mini-Xception |
[22] | √ | × | × | 1 | OpenFace detects and tracks | Softmax | XGBoost classifier for 2 models. | ||
[23] | √ | × | √ | 8 | 0 | OpenFace 2 toolkit; for body: OpenPose | Softmax | × | For face: Resnet-50, AffectNet; for body: DNN with global temporal average pooling (GTAP) |
[24] | √ | √ | × | 6 | 0 | OpenCV and Dlib library | Softmax/SVM | × | CNN |
[25] | √ | × | × | 6 | 0 | Feature sparseness of the FC input; proposed L2 sparseness | Softmax | × | VGG, ResNet |
[26] | √ | × | × | 6 | 0 | Haar features detects. | Softmax | × | AlexNet, GoogleNet, ResNet structures |
[27] | √ | × | × | 7 | 0 | Integrated OpenCV, Dlib, Mtcnn, and Tinyface. VGG16 model | Softmax | × | Multi-layer perceptron (MLP) classifier with Adam optimiser |
[28] | √ | × | × | 7 | 0 | Tiny_dnn | × | nViso and Oxford approaches | × |
[29] | √ | × | × | 7 | 0 | HOG face detector | Softmax | × | CNN with unsupervised training gradual greedy layer-wise algorithm (Gradual-GLW) |
[30] | × | √ | × | 0 | 0 | Unsupervised eye contact pipeline/CNN Model | Binary support vector machine (SVC) classifier/Softmax | × | CNN Model |
[31] | × | √ | × | 0 | 0 | ResNet50 | Softmax | × | CNN Model |
Ours | √ | √ | √ | 0 | 4 | Face features using MediaPipe | Softmax | × | EfficientNet; DeepEC |
Refs. | Application | Facial Expression | Eye Contact | Hand Gesture | Body Posture | ID | Gender | Duration | Slides | Audio |
---|---|---|---|---|---|---|---|---|---|---|
[24] | Learner’s emotion recognition model in online learning. | √ | × | × | × | × | × | × | × | × |
[26] | Facial expression recognition in robotics. | √ | × | × | × | × | × | × | × | × |
[28] | Facial expression recognition for robots. | √ | × | × | × | × | × | × | × | × |
[29] | Facial expression recognition for robots. | √ | × | × | × | × | × | × | × | × |
[16] | Learner’s emotion recognition model in online learning. | √ | × | × | × | × | × | × | × | × |
[19] | Learner’s emotion recognition model in online learning. | √ | × | × | × | × | × | × | × | × |
[20] | Learner’s emotion recognition model in online learning. | √ | × | × | × | × | × | × | × | × |
[21] | Learner’s emotion recognition model in online learning. | √ | √ | × | × | × | × | × | × | × |
[22] | Learner’s emotion recognition models for TA robot. | √ | √ | × | × | × | × | × | × | × |
[11] | Learner’s emotion recognition models in presentation sessions. | √ | × | √ | √ | √ | √ | √ | × | × |
[23] | Learner’s emotion recognition models for robots. | √ | × | √ | √ | × | × | × | × | × |
[15] | Learner’s emotion recognition model in online and classroom learning. | √ | × | √ | √ | × | × | × | × | × |
[17] | Learner’s emotion recognition system in presentation sessions. | × | √ | √ | √ | × | × | × | √ | √ |
[13] | Learner’s emotion recognition models in presentation sessions. | × | √ | √ | √ | × | × | × | √ | √ |
[30] | Eye contact detection in human–computer interaction. | × | √ | × | × | × | × | × | × | × |
[31] | Eye contact detection in human–computer interaction. | × | √ | × | × | × | × | × | × | × |
[35] | Real-time human action detection. | × | × | √ | √ | × | × | × | × | × |
[36] | Human–robot interaction for the human body avoidance scenario. | × | × | √ | √ | × | × | × | × | × |
Factor | Question |
---|---|
(1) Facial expressions | From your point of view, facial expressions or emotions are classified as |
(2) Eye contact | Eye contact is classified as |
(3) Hand gestures and movements | Hand gestures and movements are classified as |
Refs. | Factor | Definition |
---|---|---|
[16] | Facial expression | Indicates facial expressions that are associated with certain emotions, such as happiness and seriousness. |
[31] | Eye contact | A form of non-verbal communication that can have a large influence on social behaviours. |
[15] | Hand gesture and movement | A form of non-verbal communication where visible bodily actions send messages. |
[36] | Body posture and movement | A form of non-verbal communication where the whole body is used to send out a message, which can be a critical indicator of attitude. |
[11] | ID | Information used by computer systems to represent a person |
[11] | Gender | The distinction between gender identities, whether male or female. |
[11] | Duration | The time or period elapsed during a presentation. |
[17] | Slides | The content of slides used in a presentation. |
[17] | Audio | A representation of spoken sound data. |
Scale | Level of Importance | Mean (µ) | Standard Deviation (σ) |
---|---|---|---|
1 | Extremely strongly non-important | 0.0 | 0.09 |
2 | Strongly non-important | 0.1 | 0.09 |
3 | Non-important | 0.3 | 0.09 |
4 | Moderately important | 0.5 | 0.09 |
5 | Important | 0.7 | 0.09 |
6 | Strongly important | 0.9 | 0.09 |
7 | Extremely strongly important | 1.0 | 0.09 |
Factor | Weight of Fuzzy Delphi | Decision |
---|---|---|
Facial expression | 0.9010 | Select |
Eye contact | 0.9048 | Select |
Hand gesture and movement | 0.7999 | Select |
Body posture and movement | 0.7750 | Select |
ID | 0.7750 | Select |
Gender | 0.57 | Select |
Duration | 0.8392 | Select |
Slides | 0.8931 | Select |
Audio | 0.8628 | Select |
Eye Contact | Facial Expressions | Hand Movement | Body Movement |
---|---|---|---|
0.479307 | 0.283206 | 0.146526 | 0.090961 |
Nationality | Number of Participants |
---|---|
Malay | 6 |
Chinese | 4 |
Indian | 2 |
Bangladeshi | 5 |
Iraqi | 5 |
Number of Participants | Age Range |
---|---|
11 | 20–24 |
6 | 25–29 |
5 | 30–40 |
Name | Number of Videos | Number of Presenters | Period of Video |
---|---|---|---|
Custom Presentation Dataset | 88 | 22 | 5 h and 30 min |
DAiSEE Dataset | 9068 | 112 | 25 h |
TEDx-based videos | 8 | 7 | 39 min |
Class/State | Accuracy | Precision | Recall | F1 Score | Support |
---|---|---|---|---|---|
No-Eye contact | 0.729688 | 0.758452 | 0.704478 | 0.730469 | 10,764 |
Eye contact | 0.729688 | 0.702821 | 0.756993 | 0.728902 | 9938 |
Class/State | Accuracy | Precision | Recall | F1 Score | Support |
---|---|---|---|---|---|
No-Eye contact | 0.519787 | 0.575928 | 0.412434 | 0.480658 | 10,198 |
Eye contact | 0.519787 | 0.484515 | 0.645206 | 0.553432 | 8729 |
Face Expression | Xception | Inception | ResNet | MobileNet | EfficientNet |
---|---|---|---|---|---|
Boredom | 0.62 | 0.58 | 0.01 | 0.36 | 0.62 |
Engagement | 0.84 | 0.78 | 0.54 | 0.55 | 0.81 |
Confusion | 0.60 | 0.64 | 0.33 | 0.08 | 0.69 |
Frustration | 0.52 | 0.51 | 0.34 | 0.27 | 0.62 |
Delight | 0.38 | 0.08 | 0.00 | 0.00 | 0.44 |
Accuracy | 0.64 | 0.61 | 0.40 | 0.40 | 0.69 |
Macro avg | 0.59 | 0.52 | 0.25 | 0.25 | 0.64 |
Weighted avg | 0.60 | 0.69 | 0.33 | 0.33 | 0.68 |
Boredom | Engaged | Confusion | Frustration | Delight | |
---|---|---|---|---|---|
Boredom | 0.689 | 0.051 | 0.073 | 0.144 | 0.043 |
Engaged | 0.06 | 0.723 | 0.038 | 0.048 | 0.131 |
Confusion | 0.097 | 0.016 | 0.756 | 0.124 | 0.008 |
Frustration | 0.244 | 0.036 | 0.089 | 0.612 | 0.019 |
Delight | 0.141 | 0.085 | 0.092 | 0.106 | 0.577 |
Hand | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Score A | Non-movement | 0.53 | 0.94 | 0.68 | 0.68 |
Movement | 0.94 | 0.53 | 0.68 | 0.68 | |
Score D | Non-movement | 1 | 1 | 1 | 1 |
Movement | 0 | 0 | 0 | 1 |
Hand | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Score A | Non-movement | 0.53 | 0.95 | 0.68 | 0.68 |
Movement | 0.95 | 0.52 | 0.67 | 0.68 | |
Score D | Non-movement | 1 | 1 | 1 | 1 |
Movement | 0 | 0 | 0 | 1 |
Hand | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Score A | Non-movement | 0.53 | 0.94 | 0.68 | 0.68 |
Movement | 0.94 | 0.53 | 0.68 | 0.68 | |
Score D | Non-movement | 1 | 0.72 | 0.84 | 0.72 |
Movement | 0 | 0 | 0 | 0.72 |
Body | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Video1_TEDx | No-body movement | 0.62 | 0.32 | 0.43 | 0.47 |
Body movement | 0.4 | 0.7 | 0.51 | 0.47 | |
Video2_TEDx | No-body movement | 0.57 | 0.38 | 0.46 | 0.56 |
Body movement | 0.55 | 0.73 | 0.63 | 0.56 |
Body | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Video1_TEDx | No-body movement | 0.64 | 0.36 | 0.46 | 0.49 |
Body movement | 0.41 | 0.68 | 0.51 | 0.49 | |
Video2_TEDx | No-body movement | 0.56 | 0.39 | 0.46 | 0.55 |
Body movement | 0.54 | 0.71 | 0.62 | 0.55 |
Body | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Video1_TEDx | No-body movement | 0.62 | 0.32 | 0.43 | 0.47 |
Body movement | 0.4 | 0.7 | 0.51 | 0.47 | |
Video2_TEDx | No-body movement | 0.57 | 0.38 | 0.46 | 0.56 |
Body movement | 0.55 | 0.73 | 0.63 | 0.56 |
Questions | The Five Scales | ||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | |
1 | 0% | 0% | 8.3% | 16.7% | 75% |
2 | 0% | 0% | 8.3% | 33.3% | 58.3% |
3 | 0% | 0% | 0% | 25% | 75% |
4 | 0% | 0% | 0% | 25% | 75% |
5 | 0% | 0% | 0% | 33.3% | 66.7% |
6 | 0% | 0% | 0% | 50% | 50% |
7 | 0% | 0% | 8.3% | 25% | 66.7% |
8 | 0% | 0% | 8.3% | 41.7% | 50% |
9 | 0% | 0% | 0% | 33.3% | 66.7% |
10 | 0% | 0% | 0% | 41.7% | 58.3% |
11 | 0% | 0% | 0% | 33.3% | 66.7% |
12 | 0% | 0% | 8.3% | 25% | 66.7% |
Traditional | Active5 | |||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Robot | Tutor1 | Tutor2 | Tutor3 | Tutor4 | Tutor5 | Avg | Robot | Tutor1 | Tutor2 | Tutor3 | Tutor4 | Tutor5 | Avg | |
Presenter 1 | group:C | group:B | group:A | group:C | group:C | group:C | 60% | group:A | group:A | group:A | group:A | group:A | group:A | 100% |
Presenter 2 | group:B | group:B | group:B | group:B | group:B | group:B | 100% | group:A | group:A | group:A | group:A | group:A | group:A | 100% |
Presenter 3 | group:D | group:D | group:D | group:D | group:D | group:D | 100% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 4 | group:D | group:D | group:D | group:D | group:D | group:D | 100% | group:B | group:A | group:B | group:B | group:B | group:B | 80% |
Presenter 5 | group:A | group:A | group:A | group:A | group:A | group:A | 100% | group:A | group:A | group:A | group:A | group:A | group:A | 100% |
Presenter 6 | group:B | group:B | group:B | group:C | group:B | group:B | 80% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 7 | group:B | group:A | group:B | group:B | group:B | group:B | 80% | group:A | group:A | group:A | group:A | group:A | group:A | 100% |
Presenter 8 | group:C | group:C | group:C | group:C | group:C | group:C | 100% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 9 | group:B | group:B | group:B | group:B | group:A | group:B | 80% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 10 | group:B | group:B | group:B | group:B | group:B | group:B | 100% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 11 | group:D | group:D | group:D | group:D | group:D | group:D | 100% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 12 | group:C | group:C | group:C | group:C | group:C | group:C | 100% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 13 | group:B | group:A | group:B | group:B | group:B | group:A | 60% | group:B | group:B | group:B | group:B | group:B | group:B | 100% |
Presenter 14 | group:D | group:D | group:D | group:D | group:D | group:D | 100% | group:C | group:C | group:C | group:C | group:C | group:C | 100% |
Total Avg | 90% | Total Avg | 99% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Alshammari, R.F.N.; Abd Rahman, A.H.; Arshad, H.; Albahri, O.S. Real-Time Robotic Presentation Skill Scoring Using Multi-Model Analysis and Fuzzy Delphi–Analytic Hierarchy Process. Sensors 2023, 23, 9619. https://doi.org/10.3390/s23249619
Alshammari RFN, Abd Rahman AH, Arshad H, Albahri OS. Real-Time Robotic Presentation Skill Scoring Using Multi-Model Analysis and Fuzzy Delphi–Analytic Hierarchy Process. Sensors. 2023; 23(24):9619. https://doi.org/10.3390/s23249619
Chicago/Turabian StyleAlshammari, Rafeef Fauzi Najim, Abdul Hadi Abd Rahman, Haslina Arshad, and Osamah Shihab Albahri. 2023. "Real-Time Robotic Presentation Skill Scoring Using Multi-Model Analysis and Fuzzy Delphi–Analytic Hierarchy Process" Sensors 23, no. 24: 9619. https://doi.org/10.3390/s23249619
APA StyleAlshammari, R. F. N., Abd Rahman, A. H., Arshad, H., & Albahri, O. S. (2023). Real-Time Robotic Presentation Skill Scoring Using Multi-Model Analysis and Fuzzy Delphi–Analytic Hierarchy Process. Sensors, 23(24), 9619. https://doi.org/10.3390/s23249619