IMITASD: Imitation Assessment Model for Children with Autism Based on Human Pose Estimation
<p>List of imitation tasks.</p> "> Figure 2
<p>Landmarks from MediaPipe Hand and Body Pose Tracking module [<a href="#B69-mathematics-12-03438" class="html-bibr">69</a>,<a href="#B70-mathematics-12-03438" class="html-bibr">70</a>].</p> "> Figure 3
<p>Room setting inside medical clinic.</p> "> Figure 4
<p>GUI control available for the admin.</p> "> Figure 5
<p>GUI interface, where the left part (child preview) is visible on the child’s screen.</p> "> Figure 6
<p>Child attention module.</p> "> Figure 7
<p>Imitation Assessment Block module.</p> "> Figure 8
<p>Feature extraction flowchart.</p> "> Figure 9
<p>Comparison between IMITASD score and medical evaluation.</p> "> Figure 10
<p>Detailed comparison of distance metrics and expert evaluation scores.</p> "> Figure 11
<p>Comparison of distance metrics and expert evaluation scores for each imitation task.</p> "> Figure 12
<p>Running time to process a video segment.</p> "> Figure 13
<p>Number of videos that were unable to be processed by MediaPipe grouped by participant.</p> "> Figure 14
<p>Number of videos that were unable to be processed by MediaPipe grouped by participant and task.</p> "> Figure 15
<p>Number of videos that were unable to be processed by MediaPipe grouped by task.</p> ">
Abstract
:1. Introduction
2. Related Works
3. The Techniques Employed in Implementing IMITASD
3.1. Gross Motor Imitation
3.2. Time-Series Measures
3.3. Human Pose Estimation
4. Methodology
4.1. Dataset Description
4.2. Experimental Setup
4.3. Graphical User Interface Tool
4.4. Subjective Assessment by Psychiatric Doctors
4.5. Parental Engagement and Bias Control
4.6. Hardware Requirements
5. System Architecture Overview
Algorithm 1 Imitation Lesson Preparation |
|
- Face detection: A frontal face detector is deployed to identify and locate faces within an input image.
- Facial landmark detection: This is responsible for extracting spatial information about key facial points through applying Yin Guobing’s Facial Landmark Detector [72,73]. It goes beyond simple face detection, capturing the nuances of facial expressions and features. The Facial Landmark Detector model operates on square boxes of size 128 × 128, each containing a face. Upon analysis, it returns 68 facial landmarks, with the resulting landmarks serving as a critical input for subsequent phases. Yin Guobing’s Facial Landmark Detector was employed for tracking children’s attention based on facial orientation and head pose estimation. While this detector is accurate under normal conditions, autistic children may exhibit atypical facial expressions and frequent gaze aversion, which could impact the precision of attention tracking. To mitigate these limitations, the IMITASD tool includes a calibration step to ensure the child’s face is properly aligned with the camera, reducing the likelihood of tracking errors during the task.
- Head pose estimation: This obtains the head’s pose relative to the camera. Both the rotation and translation vectors are computed to provide a robust representation of the head’s spatial orientation and position.
- Head pose angle calculation: This calculates specific head angles: yaw, pitch, and roll. These angles depict the head’s orientation in three-dimensional space, capturing horizontal and vertical rotations and tilt motion. These angles are crucial for understanding the user’s head position. Thirty-four videos were excluded due to anomalies such as incorrect child’s seat position in front of the camera.
- Head movement tracking: To assess continual attention levels, the script simulates the head angle over time. The aim is to capture the changes in head orientation over time. The frequency and magnitude of head movements are quantified based on the differences between consecutive head angles.
- Attention measurement: This uses facial analysis and head movements to measure a child’s attention. The frequency of head movements represents changes over time, while the magnitude of head movements indicates the angular displacement. These metrics feed into the computation of the concentration level, a weighted combination of frequency and magnitude, as detailed in Algorithm 2.
Algorithm 2 Checking Child Attention |
|
- Display the lesson video: After selecting the imitation lesson and ensuring the appropriate attention for the child toward the display screen (steps 1 and 2), the selected video is played through the LCD in front of the child.
- Record the child’s movement: While playing the lesson video, the child instantly begins imitating the lesson. Therefore, the recording is started once the child begins the imitation. It stops the recording once the child finishes performing the imitation.
- Feature extraction: Features for imitation lessons and the videos of the child are obtained. The features from the former are extracted offline while their data are stored in a pickle data format. For the latter videos, the features are extracted while assessing the imitation behavior of autistic children. Features are based on pose and hand landmarks as predicted by MediaPipe. The connections between MediaPipe landmarks, articulated as pairs of indices, are transformed to vectors. These vectors serve as the foundation for subsequent angle calculations. The angles, meticulously computed using the dot product and vector norms, collectively contribute to the feature vectors. These vectors represent the trace and hand, where the former corresponds to the child’s arm and head positions. The latter supports hand tracking, where fine details are considered during the child’s imitation. Both vectors are extracted from pose and hand landmarks, respectively. Note that color conversion is necessary as distinct color representation exists between OpenCV (BGR) and MediaPipe (RGB). After that, the system leverages the angles between hand parts, referred to as connections, incorporating all 21 connections intrinsic to MediaPipe’s Hand Model. Given is the video of the child, the detailed processes are depicted in Figure 8.
Algorithm 3 Imitation Assessment - 1:
- Input: Child video , Ground truth video
- 2:
- Output: Similarity measure S
- 3:
- Display the Lesson Video:
- 4:
- Play the selected lesson video .
- 5:
- Record the Child Movement:
- 6:
- Begin recording the child’s movement as soon as imitation starts.
- 7:
- Stop recording when the child finishes the imitation.
- 8:
- Feature Extraction:
- 9:
- Extract features for both the ground truth lesson video (prerecorded) and the child’s video .
- 10:
- for each frame in do
- 11:
- Extract pose and hand landmarks using MediaPipe.
- 12:
- Convert color format from BGR to RGB if necessary.
- 13:
- Initialize feature vectors for trace and hand vectors.
- 14:
- for each frame in do
- 15:
- Extract pose landmarks for trace vector.
- 16:
- Extract all 21 hand landmarks for hand vector.
- 17:
- Normalize the trace data by the maximum x and y coordinates.
- 18:
- Update and store the trace and hand vectors for further processing.
- 19:
- end for
- 20:
- end for
- 21:
- Dynamic Time Warping (DTW):
- 22:
- Apply DTW for the child’s trace vector and the ground truth trace vector.
- 23:
- Apply DTW for the child’s hand vector and the ground truth hand vector.
- 24:
- Compute the average distance D:
- 25:
- 26:
- Similarity Measure Output:
- 27:
- Map the distance D to a similarity measure S in the range of 0 to 10.
- 28:
- The similarity measure S is defined as
- 29:
- 30:
- return Similarity measure S
- –
- The feature vectors, hand and trace vectors, are initialized for the given video, where the first frame that belongs to the video of the child is prepared for processing.
- –
- It applies an iterative process over ’s frames, where pose and hand landmarks for each frame are predicted. These landmarks are used to obtain trace and hand vectors for the current frame. These vectors are appended to the corresponding vectors representing the video of the child. Once, the video’s vectors are updated, the next frame is fetched to be processed. The hand and pose landmarks predicted by MediaPipe are a set of 3D points, as depicted in Figure 2, where each point is characterized by an (x, y, z) coordinate. For each frame, the trace vector focuses on four points extracted from the pose landmarks, they are points number 13 and 17 for the child’s left arm and points 14 and 18 for the child’s right arm. The hand vector uses all 21 points for each hand. Based on pose landmarks, the extraction of trace_left and trace_right unfolds as a process governed by precision and meticulousness. The initiation of these variables as lists of coordinate points paves the way for detailed scrutiny of detected landmarks.
- –
- The normalization procedure focuses on normalizing the trace data. Identifying both maximum x and y coordinates for the trace and reference sets the stage for normalization. This process normalizes the trace coordinates through division by the respective maximum values, thereby laying the groundwork for meaningful distance calculations between traces.
- –
- Saving trace and hand vectors: Both vectors are stored for further processing.
- Dynamic time warping algorithm (DTW): This measures the distance between features extracted from both the imitation lesson and the video of the child. Based on the videos’ trace and hand vectors, DTW calculates the distances between these vectors. The output from both distances is averaged to obtain the final distance for the child’s behavior to the given lesson. There are challenges when dealing with autistic children. The spectrum of variability when handling children’s videos is large. They tend to begin the imitation process instantly when they begin watching the lessons and the children’s imitation speed is varied. Here, IMITASD relies on the attention module and the DTW features. The former estimates the child’s focus, therefore displaying the lesson and recording the child’s imitation at convenient times. It supports DTW for better similarity estimation between the lesson and the child’s imitation videos. Furthermore, the proposed model deploys fast DTW that features fast computation through processing the given inputs using their down-sampled sets to accelerate measuring the similarity between the given sets. DTW by default supports temporal alignment, as it can measure the similarity between sets with non-equal length.
- Similarity measure output: The resulting distance is mapped into a similarity measure in the range of 0 to 10. Therefore, the assessment module output does not require further processing. It should output 10 when the child’s behavior matches the given imitation lesson, while it obtains zero when the child’s imitation does not match the lesson video.
6. Results and Discussion
- Is the proposed method suitable for scoring the children’s performance, given the six imitation training lessons?
- What is the IMITASD performance when using different time-series measures?
- How long does the proposed method take to rate the child’s performance?
- What are the limitations of IMITASD?
- The proposed method rates the children’s imitation videos very similar to the therapist’s score. The closest match occurred in “wave by hand”, while the worst match was for the “arms up” task.
- The IMITASD results using different time-series measures highlight the superior performance of the IMITASD score based on dynamic time warping compared to Euclidean distance, cosine similarity, and Pearson correlation. Both the tasks “thumbs up”, and “hands fold together” attain high correlations of 0.9912 and 0.9689, respectively. These results confirm the IMITASD score’s precision in aligning temporal dynamics during imitation, outshining traditional metrics.
- The proposed method’s running time is on average less than three seconds to score a single video of a child. Therefore, the proposed method could be embedded in a training program that should be fast enough during the child imitation session.
- IMITASD faces challenges due to relying on a single camera. The landmarks based on a single camera are sensitive to occlusion. This affects IMITASD’s capability to estimate the similarity accurately for the child imitation video.
- It is important to note that during the data cleaning, thirty-four videos were excluded due to anomalies. The exclusion of these videos ensured the integrity of the remaining dataset and prevented the introduction of bias due to sub-optimal video quality. To address challenges related to improper seat posture in future studies, measures such as seat markers, adjustable seating, or real-time posture feedback systems could be implemented to ensure that participants are consistently positioned correctly in front of the camera. These measures would help reduce the number of unusable videos and enhance the overall quality of data collection.
7. Conclusions
8. Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Magazine, A.P. Autism Statistics: Facts and Figures. 2024. Available online: https://www.autismparentingmagazine.com/autism-statistics/ (accessed on 5 October 2024).
- American Academy of Pediatrics. CDC: Autism Rate Rises to 1 in 36 Children. 2024. Available online: https://publications.aap.org/aapnews/news/23904/CDC-Autism-rate-rises-to-1-in-36-children?autologincheck=redirected#/ (accessed on 5 October 2024).
- Treetop, T. Autism Prevalence Statistics. 2024. Available online: https://www.thetreetop.com/statistics/autism-prevalence/ (accessed on 5 October 2024).
- Gitimoghaddam, M.; Chichkine, N.; McArthur, L.; Sangha, S.S.; Symington, V. Applied behavior analysis in children and youth with autism spectrum disorders: A scoping review. Perspect. Behav. Sci. 2022, 45, 521–557. [Google Scholar] [CrossRef] [PubMed]
- Silva, A.P.d.; Bezerra, I.M.P.; Antunes, T.P.C.; Cavalcanti, M.P.E.; Abreu, L.C.d. Applied behavioral analysis for the skill performance of children with autism spectrum disorder. Front. Psychiatry 2023, 14, 1093252. [Google Scholar] [CrossRef] [PubMed]
- Maula, M.I.; Ammarullah, M.I.; Fadhila, H.N.; Afif, I.Y.; Hardian, H.; Jamari, J.; Winarni, T.I. Comfort evaluation and physiological effects/autonomic nervous system response of inflatable deep pressure vest in reducing anxiety. Heliyon 2024, 10, e36065. [Google Scholar] [CrossRef]
- Maula, M.I.; Afif, I.Y.; Ammarullah, M.I.; Lamura, M.D.P.; Jamari, J.; Winarni, T.I. Assessing the calming effects of a self-regulated inflatable vest: An evaluation based on Visual Analogue Scale and Electroencephalogram. Cogent Eng. 2024, 11, 2313891. [Google Scholar] [CrossRef]
- Husaini, F.A.; Maula, M.I.; Ammarullah, M.I.; Afif, I.Y.; Lamura, M.D.P.; Jamari, J.; Winarni, T.I. Control design of vibrotactile stimulation on weighted vest for deep pressure therapy. Bali Med. J. 2024, 13, 860–865. [Google Scholar] [CrossRef]
- Nielsen, M. The social glue of cumulative culture and ritual behavior. Child Dev. Perspect. 2018, 12, 264–268. [Google Scholar] [CrossRef]
- Bravo, A.; Schwartz, I. Teaching imitation to young children with autism spectrum disorder using discrete trial training and contingent imitation. J. Dev. Phys. Disabil. 2022, 34, 655–672. [Google Scholar] [CrossRef]
- Halbur, M.; Preas, E.; Carroll, R.; Judkins, M.; Rey, C.; Crawford, M. A comparison of fixed and repetitive models to teach object imitation to children with autism. J. Appl. Behav. Anal. 2023, 56, 674–686. [Google Scholar] [CrossRef]
- Posar, A.; Visconti, P. Autism spectrum disorder in 2023: A challenge still open. Turk. Arch. Pediatr. 2023, 58, 566. [Google Scholar]
- Chiappini, M.; Dei, C.; Micheletti, E.; Biffi, E.; Storm, F.A. High-Functioning Autism and Virtual Reality Applications: A Scoping Review. Appl. Sci. 2024, 14, 3132. [Google Scholar] [CrossRef]
- Liu, L.; Li, S.; Tian, L.; Yao, X.; Ling, Y.; Chen, J.; Wang, G.; Yang, Y. The Impact of Cues on Joint Attention in Children with Autism Spectrum Disorder: An Eye-Tracking Study in Virtual Games. Behav. Sci. 2024, 14, 871. [Google Scholar] [CrossRef] [PubMed]
- Cano, S.; Díaz-Arancibia, J.; Arango-López, J.; Libreros, J.E.; García, M. Design path for a social robot for emotional communication for children with autism spectrum disorder (ASD). Sensors 2023, 23, 5291. [Google Scholar] [CrossRef] [PubMed]
- López-Florit, L.; García-Cuesta, E.; Gracia-Expósito, L.; García-García, G.; Iandolo, G. Physiological Reactions in the Therapist and Turn-Taking during Online Psychotherapy with Children and Adolescents with Autism Spectrum Disorder. Brain Sci. 2021, 11, 586. [Google Scholar] [CrossRef] [PubMed]
- Nunez, E.; Matsuda, S.; Hirokawa, M.; Yamamoto, J.; Suzuki, K. Effect of sensory feedback on turn-taking using paired devices for children with ASD. Multimodal Technol. Interact. 2018, 2, 61. [Google Scholar] [CrossRef]
- Jameson, J. Autism and Imitation Skills Importance. 2020. Available online: https://jewelautismcentre.com/jewel_blog/autism-and-imitation-skills-importance/ (accessed on 17 October 2024).
- Sandhu, G.; Kilburg, A.; Martin, A.; Pande, C.; Witschel, H.F.; Laurenzi, E.; Billing, E. A learning tracker using digital biomarkers for autistic preschoolers. In Proceedings of the Society 5.0, Integrating Digital World and Real World to Resolve Challenges in Business and Society, 2nd Conference, Hybrid (Online and Physical) at the FHNW University of Applied Sciences and Arts Northwestern Switzerland, Windisch, Switzerland, 20–22 June 2022; EasyChair. pp. 219–230. [Google Scholar]
- Al-Jubouri, A.A.; Ali, I.H.; Rajihy, Y. Generating 3D dataset of Gait and Full body movement of children with Autism spectrum disorders collected by Kinect v2 camera. Compusoft 2020, 9, 3791–3797. [Google Scholar]
- Liu, X.; Zhao, W.; Qi, Q.; Luo, X. A Survey on Autism Care, Diagnosis, and Intervention Based on Mobile Apps: Focusing on Usability and Software Design. Sensors 2023, 23, 6260. [Google Scholar] [CrossRef]
- Zhang, W.; Sun, Z.; Lv, D.; Zuo, Y.; Wang, H.; Zhang, R. A Time Series Prediction-Based Method for Rotating Machinery Detection and Severity Assessment. Aerospace 2024, 11, 537. [Google Scholar] [CrossRef]
- Sun, S.; Gu, M.; Liu, T. Adaptive Sliding Window–Dynamic Time Warping-Based Fluctuation Series Prediction for the Capacity of Lithium-Ion Batteries. Electronics 2024, 13, 2501. [Google Scholar] [CrossRef]
- Isa, I.G.T.; Ammarullah, M.I.; Efendi, A.; Nugroho, Y.S.; Nasrullah, H.; Sari, M.P. Constructing an elderly health monitoring system using fuzzy rules and Internet of Things. AIP Adv. 2024, 14, 055317. [Google Scholar] [CrossRef]
- Sen, B.; Bhowmik, A.; Prakash, C.; Ammarullah, M.I. Prediction of specific cutting energy consumption in eco-benign lubricating environment for biomedical industry applications: Exploring efficacy of GEP, ANN, and RSM models. AIP Adv. 2024, 14, 085216. [Google Scholar] [CrossRef]
- Kaur, G.; Kaur, J.; Sharma, A.; Jain, A.; Kumar, R.; Alsubih, M.; Islam, S.; Ammarullah, M.I. Techno-economic investigation and empowering rural resilience through bioengineering: A case study on self-sustainable village energy models. Int. J. Low-Carbon Technol. 2024, 19, 1275–1287. [Google Scholar]
- Farooq, M.S.; Tehseen, R.; Sabir, M.; Atal, Z. Detection of autism spectrum disorder (ASD) in children and adults using machine learning. Sci. Rep. 2023, 13, 9605. [Google Scholar] [CrossRef] [PubMed]
- Awad, M.; Khanna, R.; Awad, M.; Khanna, R. Support vector machines for classification. In Efficient Learning Machines: Theories, Concepts, and Applications for Engineers and System Designers; Apress: Berkeley, CA, USA, 2015; pp. 39–66. [Google Scholar]
- Panda, N.R. A review on logistic regression in medical research. Natl. J. Community Med. 2022, 13, 265–270. [Google Scholar] [CrossRef]
- Raj, S.; Masood, S. Analysis and detection of autism spectrum disorder using machine learning techniques. Procedia Comput. Sci. 2020, 167, 994–1004. [Google Scholar] [CrossRef]
- Yang, F.J. An Implementation of Naive Bayes Classifier. In Proceedings of the 2018 International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, NV, USA, 12–14 December 2018; pp. 301–306. [Google Scholar] [CrossRef]
- Anava, O.; Levy, K. k*-nearest neighbors: From global to local. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Ayeni, J. Convolutional neural network (CNN): The architecture and applications. Appl. J. Phys. Sci. 2022, 4, 42–50. [Google Scholar] [CrossRef]
- Wang, M.; Yang, N. OTA-NN: Observational therapy-assistance neural network for enhancing autism intervention quality. In Proceedings of the 2022 IEEE 19th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA,, 8–11 January 2022; pp. 1–7. [Google Scholar]
- Wang, M.; Yang, N. OBTAIN: Observational Therapy-Assistance Neural Network for Training State Recognition. IEEE Access 2023, 11, 31951–31961. [Google Scholar] [CrossRef]
- Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph convolutional networks: A comprehensive review. Comput. Soc. Netw. 2019, 6, 1–23. [Google Scholar] [CrossRef]
- Zahan, S.; Gilani, Z.; Hassan, G.M.; Mian, A. Human Gesture and Gait Analysis for Autism Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 3327–3336. [Google Scholar]
- Papaefstathiou, E. A Thorough Presentation of Autism Diagnostic Observation Schedule (ADOS-2). In Interventions for Improving Adaptive Behaviors in Children With Autism Spectrum Disorders; IGI Global: Hershey, PA, USA, 2022; pp. 21–38. [Google Scholar]
- Prakash, V.G.; Kohli, M.; Kohli, S.; Prathosh, A.; Wadhera, T.; Das, D.; Panigrahi, D.; Kommu, J.V.S. Computer vision-based assessment of autistic children: Analyzing interactions, emotions, human pose, and life skills. IEEE Access 2023, 11, 47907–47929. [Google Scholar] [CrossRef]
- Kojovic, N.; Natraj, S.; Mohanty, S.P.; Maillart, T.; Schaer, M. Using 2D video-based pose estimation for automated prediction of autism spectrum disorders in young children. Sci. Rep. 2021, 11, 15069. [Google Scholar] [CrossRef]
- Song, C.; Wang, S.; Chen, M.; Li, H.; Jia, F.; Zhao, Y. A multimodal discrimination method for the response to name behavior of autistic children based on human pose tracking and head pose estimation. Displays 2023, 76, 102360. [Google Scholar] [CrossRef]
- Stenum, J.; Cherry-Allen, K.M.; Pyles, C.O.; Reetzke, R.D.; Vignos, M.F.; Roemmich, R.T. Applications of pose estimation in human health and performance across the lifespan. Sensors 2021, 21, 7315. [Google Scholar] [CrossRef] [PubMed]
- Vallée, L.N.; Lohr, C.; Kanellos, I.; Asseu, O. Human Skeleton Detection, Modeling and Gesture Imitation Learning for a Social Purpose. Engineering 2020, 12, 90–98. [Google Scholar] [CrossRef]
- Conti, D.; Trubia, G.; Buono, S.; Di Nuovo, S.; Di Nuovo, A. Evaluation of a robot-assisted therapy for children with autism and intellectual disability. In Proceedings of the Annual Conference Towards Autonomous Robotic Systems, Bristol, UK, 25–27 July 2018; Springer: Berlin/Heidelberg, Germany, 2018; pp. 405–415. [Google Scholar]
- Peterson, T.; Dodson, J.; Sherwin, R.; Strale, F., Jr.; Strale, F., Jr. Evaluating the Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP) Scores Using Principal Components Analysis. Cureus 2024, 16, e66602. [Google Scholar] [CrossRef]
- Bringmann, K.; Fischer, N.; van der Hoog, I.; Kipouridis, E.; Kociumaka, T.; Rotenberg, E. Dynamic Dynamic Time Warping. In Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). SIAM, Alexandria, VA, USA, 7–10 January 2024; pp. 208–242. [Google Scholar]
- Wang, Z.; Ning, J.; Gao, M. Complex Network Model of Global Financial Time Series Based on Different Distance Functions. Mathematics 2024, 12, 2210. [Google Scholar] [CrossRef]
- Kraprayoon, J.; Pham, A.; Tsai, T.J. Improving the Robustness of DTW to Global Time Warping Conditions in Audio Synchronization. Appl. Sci. 2024, 14, 1459. [Google Scholar] [CrossRef]
- Wang, H.; Li, Y.; Jin, Y.; Zhao, S.; Han, C.; Song, L. Remaining Useful Life Prediction Method Enhanced by Data Augmentation and Similarity Fusion. Vibration 2024, 7, 560–581. [Google Scholar] [CrossRef]
- Molina, M.; Tardón, L.J.; Barbancho, A.M.; De-Torres, I.; Barbancho, I. Enhanced average for event-related potential analysis using dynamic time warping. Biomed. Signal Process. Control. 2024, 87, 105531. [Google Scholar] [CrossRef]
- Castellano Ontiveros, R.; Elgendi, M.; Menon, C. A machine learning-based approach for constructing remote photoplethysmogram signals from video cameras. Commun. Med. 2024, 4, 109. [Google Scholar] [CrossRef]
- Liu, Y.; Guo, H.; Zhang, L.; Liang, D.; Zhu, Q.; Liu, X.; Lv, Z.; Dou, X.; Gou, Y. Research on correlation analysis method of time series features based on dynamic time warping algorithm. IEEE Geosci. Remote Sens. Lett. 2023, 20, 1–5. [Google Scholar] [CrossRef]
- Stenger, M.; Leppich, R.; Foster, I.; Kounev, S.; Bauer, A. Evaluation is key: A survey on evaluation measures for synthetic time series. J. Big Data 2024, 11, 66. [Google Scholar] [CrossRef]
- Martins, A.A.; Vaz, D.C.; Silva, T.A.; Cardoso, M.; Carvalho, A. Clustering of Wind Speed Time Series as a Tool for Wind Farm Diagnosis. Math. Comput. Appl. 2024, 29, 35. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, S.; Wang, X.; Wu, R.; Yang, J.; Zhang, H.; Wu, J.; Li, Z. Clustering Method Comparison for Rural Occupant’s Behavior Based on Building Time-Series Energy Data. Buildings 2024, 14, 2491. [Google Scholar] [CrossRef]
- Novák, V.; Mirshahi, S. On the similarity and dependence of time series. Mathematics 2021, 9, 550. [Google Scholar] [CrossRef]
- Berthold, M.R.; Höppner, F. On clustering time series using euclidean distance and pearson correlation. arXiv 2016, arXiv:1601.02213. [Google Scholar]
- Cuemath. Euclidean Distance Formula. Available online: https://www.cuemath.com/euclidean-distance-formula/ (accessed on 12 October 2023).
- Zhang, W.; Wang, J.; Zhang, L. Cosine Similarity: A Comprehensive Review. J. Stat. Res. 2020, 54, 175–185. [Google Scholar]
- Nakamura, T.; Taki, K.; Nomiya, H.; Seki, K.; Uehara, K. A shape-based similarity measure for time series data with ensemble learning. Pattern Anal. Appl. 2013, 16, 535–548. [Google Scholar] [CrossRef]
- To, S.H. Correlation Coefficient: Simple Definition, Formula, Easy Steps. Available online: https://www.statisticshowto.com/probability-and-statistics/correlation-coefficient-formula/ (accessed on 12 October 2023).
- Müller, M. Information Retrieval for Music and Motion; Springer: New York, NY, USA, 2007; Volume 2. [Google Scholar]
- AudioLabs. Dynamic Time Warping (DTW). Available online: https://www.audiolabs-erlangen.de/resources/MIR/FMP/C3/C3S2_DTWbasic.html/ (accessed on 12 October 2023).
- Dong, C.; Du, G. An enhanced real-time human pose estimation method based on modified YOLOv8 framework. Sci. Rep. 2024, 14, 8012. [Google Scholar] [CrossRef]
- Nguyen, T.D.; Kresovic, M. A survey of top-down approaches for human pose estimation. arXiv 2022, arXiv:2202.02656. [Google Scholar]
- Bisht, S.; Joshi, S.; Rana, U. Comprehensive Review of R-CNN and its Variant Architectures. Int. Res. J. Adv. Eng. Hub (IRJAEH) 2024, 2, 959–966. [Google Scholar]
- Chung, J.L.; Ong, L.Y.; Leow, M.C. Comparative analysis of skeleton-based human pose estimation. Future Internet 2022, 14, 380. [Google Scholar] [CrossRef]
- Kim, J.W.; Choi, J.Y.; Ha, E.J.; Choi, J.H. Human pose estimation using mediapipe pose and optimization method based on a humanoid model. Appl. Sci. 2023, 13, 2700. [Google Scholar] [CrossRef]
- Google. Hand Landmarks Detection. Available online: https://developers.google.com/mediapipe/solutions/vision/hand_landmarker/ (accessed on 12 October 2023).
- Google. Pose Landmark Detection. Available online: https://developers.google.com/mediapipe/solutions/vision/pose_landmarker/ (accessed on 12 October 2023).
- Perry, A.; Condillac, R.A.; Freeman, N.L.; Dunn-Geier, J.; Belair, J. Multi-site study of the Childhood Autism Rating Scale (CARS) in five clinical groups of young children. J. Autism Dev. Disord. 2005, 35, 625–634. [Google Scholar] [CrossRef] [PubMed]
- Google for Developers. Yin Guobing’s Facial Landmark Detector. Available online: https://github.com/yinguobing/facial-landmark-detection-hrnet (accessed on 12 October 2023).
- Wu, Y.; Ji, Q. Facial landmark detection: A literature survey. Int. J. Comput. Vis. 2019, 127, 115–142. [Google Scholar] [CrossRef]
# | Study | Models/Techniques | Contributions | Limitations |
---|---|---|---|---|
1 | Sandhu, G. et al., 2022 [19] | Eye gaze tracked through wearable devices like smartphones for aiding in the early diagnosis of ASD children | Use digital biomarkers to monitor ASD children’s performance | Absence of information about training activities |
2 | Ahmed, A. et al., 2020 [20] | Principal component analysis; multi-layer perceptron network | Model’s accuracy is 95% in classifying videos of children according to the level of autism | Covers only gait behavior, which does not align with imitation |
3 | Farooq, M. S. et al., 2023 [27] | Support vector machine and logistic regression models | Federated learning model shows its effectiveness in detecting ASD | Certain measures must be fed manually into the system, e.g., sensory processing, repetitive behavior, and other parameters |
4 | Suman Raj and Sarfaraz Masood, 2020 [30] | Support vector machine, naive Bayes, k-nearest neighbor, artificial neural network, and convolutional neural network | Obtained high performance, with accuracy levels ranging from 95.75% to 99.53% | Absence of a standardized medical test for ASD |
5 | M. Wang and N. Yang, 2023 [34,35] | Spatial–temporal Transformer; multiple-instance learning; graph convolutional networks | Potential tool could predict child’s training state during therapy | Framework is not suitable to be deployed with a single-camera system |
6 | S. Zahan et al., 2023 [37] | Graph convolutional networks and Vision Transformer | Model predicts ADOS for children with ASD, having high correlation with the true ADOS | Relies on camera with Kinect v2 to capture human skeleton |
7 | Varun, G. et al., 2023 [39] | Spatiotemporal Transformer; ResNet-34 deep learning model; R-convolutional network with ResNet-50 | Results indicate acceptable performance for activity comprehension | Did not focus on acting as a stand-alone model that could interact with children |
No. | Imitation Behavior | Number of Videos | Amount (%) |
---|---|---|---|
1 | Wave by hand | 49 | 18.3% |
2 | Arm up | 43 | 16.0% |
3 | Hands fold together | 55 | 20.5% |
4 | Thumbs up | 41 | 15.3% |
5 | Fold hands together over head | 38 | 14.2% |
6 | Arms up | 42 | 15.7% |
Wave by Hand | Arm Up | Hands Fold Together | Thumbs Up | Fold Hands Together | Arms Up | Overall Correlation | |
---|---|---|---|---|---|---|---|
Euclidean distance | 0.01 | 0.55 | 0.06 | −0.05 | 0.22 | −0.20 | −0.04 |
Cosine similarity | 0.09 | −0.45 | −0.08 | −0.14 | −0.03 | −0.11 | −0.10 |
Pearson correlation | −0.15 | 0.17 | −0.17 | 0.09 | 0.01 | −0.02 | 0.05 |
IMITASD score (DTW) | 0.94 | 0.64 | 0.97 | 0.99 | 0.87 | 0.86 | 0.94 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Said, H.; Mahar, K.; Sorour, S.E.; Elsheshai, A.; Shaaban, R.; Hesham, M.; Khadr, M.; Mehanna, Y.A.; Basha, A.; Maghraby, F.A. IMITASD: Imitation Assessment Model for Children with Autism Based on Human Pose Estimation. Mathematics 2024, 12, 3438. https://doi.org/10.3390/math12213438
Said H, Mahar K, Sorour SE, Elsheshai A, Shaaban R, Hesham M, Khadr M, Mehanna YA, Basha A, Maghraby FA. IMITASD: Imitation Assessment Model for Children with Autism Based on Human Pose Estimation. Mathematics. 2024; 12(21):3438. https://doi.org/10.3390/math12213438
Chicago/Turabian StyleSaid, Hany, Khaled Mahar, Shaymaa E. Sorour, Ahmed Elsheshai, Ramy Shaaban, Mohamed Hesham, Mustafa Khadr, Youssef A. Mehanna, Ammar Basha, and Fahima A. Maghraby. 2024. "IMITASD: Imitation Assessment Model for Children with Autism Based on Human Pose Estimation" Mathematics 12, no. 21: 3438. https://doi.org/10.3390/math12213438
APA StyleSaid, H., Mahar, K., Sorour, S. E., Elsheshai, A., Shaaban, R., Hesham, M., Khadr, M., Mehanna, Y. A., Basha, A., & Maghraby, F. A. (2024). IMITASD: Imitation Assessment Model for Children with Autism Based on Human Pose Estimation. Mathematics, 12(21), 3438. https://doi.org/10.3390/math12213438