Abstract
This study applies eye-tracking paradigms to cluster data based on participants’ gaze patterns, while performing the Korean version of the Reading the Mind in Eyes Test, and to investigate whether there were differences in the neurocognitive and other Theory of Mind (ToM) tests among the classified clusters. A total of 89 (50 males) non-clinical youths were recruited. The k-means algorithm was adopted, and the optimised number of clusters was determined using the elbow, silhouette and NbClust methods. Furthermore, multivariate analysis was employed to determine whether there were differences among the clusters in the neurocognitive and other ToM tests. Four clusters were proposed based on the index used to estimate the optimised cluster. The long word fixation time cluster had significantly more total errors and fewer categories completed in the Wisconsin Card Sorting Test, lower backward Digit Span score, and lower sequencing scores in the Theory of Mind Picture Stories Task than in the other clusters. The main findings suggested that even when performing a perceptual-level ToM task that requires the ability to understand mental states, at least in some individuals, gaze patterns are related to neurocognitive strategies, especially executive function, rather than to the specific social cognitive function itself.
Similar content being viewed by others
Introduction
In the process of continuously interacting with others, individuals try to adapt to the environment through appropriate actions in the given situation. To this end, they first attempt to understand and predict what others think or do through facial expressions, gestures, and language. However, unlike behaviour, the mind cannot be observed or measured independently. Nonetheless, it is assumed that, just as an individual experiences various mental states, others also have their own mental states based on which they act. As such, the ability to understand mental states such as intentions, perceptions, beliefs, desires, attitudes, and emotions of others is referred to as ‘Theory of mind’ (ToM)1,2.
Humans primarily rely on visual information to explore the external environment. In particular, this tendency is stronger in highly social animals, while in humans, the social stimuli through vision are processed in more sophisticated ways in more than 30 brain regions3,4. Among visually perceived social stimuli, facial expressions provide important information related to the other person’s ‘emotion’5, ‘social motivation’6and ‘direction of attention’7. In general, when people look at another person’s face, they focus on the ‘T’ shape where the eyes, nose, and mouth are arranged and tend to pay more attention to the eyes, which have the most information on identifying facial expressions8,9. However, individuals experiencing mental disorders have difficulty accurately recognising emotions and lack the ability to understand the emotions of others10,11,12. This tendency is even more pronounced in mental disorders, such as autism spectrum disorder and schizophrenia, whereby deficits in ToM play a key role in symptoms13,14,15.
Humans look at attractive and interesting objects through their eyes. In addition, eye movements are closely related to an individual’s will, intention, and direction of attention16,17. Given that humans move their pupils to focus on objects of interest, the pupillary response is generally used as a physiological indicator to measure eye movement. By tracking and quantifying pupil movements, researchers obtain a variety of information related to ‘what’ individuals look at, ‘in what order’, and ‘for how long’ they look at objects. In this way, the technology and method of recording eye movements are comprehensively defined as ‘eye-tracking’, and the eye movement information is mainly recorded and stored in a tracking device, ‘eye-tracker’. As the precision of measurement technology improves the accuracy of tracking gaze and quantifying data, eye-tracking technology has been actively applied in various industrial18,19and clinical20 settings.
In eye-tracking research, two metrics are broadly considered important measurements. The first is ‘fixation’, which is defined as the act of keeping one’s gaze in a specific area for a certain period of time21. In general, a longer fixation time is closely related to an individual paying more attention to a specific stimulus and also in situations in which the individual is interested or has a large amount of information22-,23,24,25. In previous studies, individuals experiencing ToM deficit were reported to exhibit less efficient fixation patterns in processing social information compared to non-clinical samples. When processing facial emotional stimuli, individuals with autism spectrum disorder tended to fixate longer on the surrounding environment than on faces26and significantly shorter on the eyes, even among facial stimuli27. Additionally, the schizophrenia group showed significantly shorter fixations on negative (anger and sadness), positive (happiness and surprise), and neutral facial expression stimuli compared to the non-clinical group28. Although it is relatively insufficient compared to facial emotion-related research, there was research supporting differences in processing patterns in text stimuli. In a study by Au-Yeung et al. (2015)29 presented ironic (e.g., ‘I know “A” is smart’ even though ‘A’ received the lowest grade in the class) or non-ironic passage to the typically-developing group and the autism spectrum disorder group and conducted an experiment to see how well they understood the intention of the question and where they fixed their gaze. The results showed that neither group accurately understood ironic passages compared to non-ironic passages, but the autism spectrum disorder group took longer than the typically-developing group to re-read, a process of going back to a previous point when they did not clearly understand the presented content. It suggested that it took a long time for the autism spectrum disorder group to decide that the comments presented were reasonable based on their knowledge and beliefs about the external world. Therefore, when analysing stimuli such as facial emotions and texts, fixation patterns may vary depending on the internal characteristics of the individual and may be an important indicator of the individual’s attention focus and problem-solving efficiency.
The second is ‘saccades’, which are rapid movements of the eyes from one fixation to another30. They are a natural process in visual information processing. However, frequent gaze transitions indicate an increased cognitive load25and are also related to inefficiency in information search31. Meanwhile, ToM deficits have a negative impact on timely saccades. In the scan path related to the sum of gaze transitions, children with autism spectrum disorder showed irregular, undirected, and disordered patterns, such as having a longer scan path and exceeding the frame of the presented picture stimulus when observing a picture of a face, compared to typically-developing children14,32.
A significant number of studies on the relationship between ToM deficit and eye tracking have focused on measuring temporal data, such as fixation time, but research related to spatial data, such as saccades and gaze transition, has not been sufficiently accumulated. With the exception of a few studies33, there is a lack of research examining both measures simultaneously. In a recent study, Atyabi et al. (2023) reported that when classifying children on the autism spectrum and typically-developing children, classification accuracy was improved when spatio-temporal information such as the sum of fixation counts and gaze movements was combined compared to using only spatial information such as scan paths34. Therefore, if temporal information such as fixation and spatial information such as gaze transition are analysed together in the eye-tracking paradigm, it is expected that more diverse and reliable data can be obtained for understanding individual gaze patterns.
The ‘Reading the Mind in The Eyes Test’ (RMET) was developed to provide abundant information related to perceptual-level ToM deficits35. The RMET consists of 36 pictures of the eyes and words representing mental states. Because some examples within the items have similar meanings and emotional valences (e.g., one target word: upset; three foils: terrified, arrogant and annoyed), one must (1) examine each word and picture carefully, (2) store the internal characteristics and information of the presented picture in working memory, and (3) compare the stored information with the meaning (semantic) of the word to obtain the correct answer. The RMET has been translated into various languages and widely used in ToM research36,37,38,39. Recently, a Korean version of the RMET (K-RMET), which reflects racial characteristics, was developed and has been used to measure ToM in Koreans40.
The RMET not only measures ToM but also visually presents eye pictures and words related to various mental states; thus, it can obtain a variety of information related to the participant’s problem-solving strategy and is considered a suitable tool for eye-tracking research. The present study aimed to perform the RMET by applying an eye-tracking paradigm and clustering based on gaze patterns, including spatio-temporal information. Neurocognitive and other ToM tasks were subsequently performed to investigate the characteristics of the identified clusters.
Results
Preliminary analyses
The means, standard deviations, and correlations for the main measurements of eye-tracking data, namely eye fixation time, word fixation time, and gaze transition frequency, are presented in Table 1. For correlations among variables, word fixation time was significantly correlated with eye fixation time (r = − .284, p = .010) and gaze transition frequency (r = .337, p = .002). The correlation between eye fixation time and gaze transition frequency was not statistically significant (r = .083, p = .461).
K-means clustering
To determine the optimal k, eye fixation time, word fixation time, and gaze transition frequency were standardised to Z-scores, and the elbow method, silhouette method, and NbClust were performed consecutively. The results are presented in Fig. 1. Based on the coefficient change range of the elbow method (Fig. 1. A), silhouette method (Fig. 1. B), and number of clusters suggested by most indicators in the NbClust package (Fig. 1.C), the optimal k was estimated to be 4.
K-means clustering was performed based on the optimal k of 4, and the final cluster centers are presented in Table 2. The number of samples was as follows: Cluster 1, 18 samples (22.22%); Cluster 2, 10 samples (12.34%); Cluster 3, 11 samples (13.58%); and Cluster 4, 42 samples (51.85%). Cluster 1 was a group with a long fixation time in the eye area and was named ‘LE (long eye fixation)’. Cluster 2 represented a group with a short fixation time between the eye area and the word area and a low frequency of gaze transition between the eye area and the word area and was named the ‘NS (non-specific cluster)’. Cluster 3 was a group with a long fixation time in the word area and was named the ‘LW (long word fixation)’. Cluster 4 represented a group with a high frequency of gaze transition between the eye and the word area and was named the ‘HT (high frequency of gaze transition)’.
Multivariate analysis of variance
Multivariate analysis of variance (MANOVA) was conducted to examine the differences among clusters in the K-RMET, the Wisconsin Card Sorting Test (WCST), the Digit Span Test (DST), and the Theory of Mind Picture Stories Task (ToM-PST). Since the homogeneity assumption was not met in Box’s test [F(135, 3269) = 1.749, p = .002], Pillai’s Trace value, which could be tested robustly from the assumption, was used, and the multivariate effect was statistically significant [Pillai’s Trace = 0.544, F(27, 213) = 1.749, p = .016, partial η2 = 0.181]. The differences in means among clusters are presented in Table 3. In the K-RMET, the difference in performance among clusters was not significant [F(3) = 0.162, p = .922, partial η2 = 0.006]. In the WCST, the differences among clusters in perseverative responses [F(3) = 1.262, p = .293, partial η2 = 0.047] and perseverative errors [F(3) = 0.974, p = .409, partial η2 = 0.037] were not significant. However, total errors [F(3) = 3.367, p = .023, partial η2 = 0.116] and categories completed [F(3) = 3.773, p = .014, partial η2 = 0.128] were statistically significant. In the DST, there was no significant difference between groups in forward recall [F(3) = 1.168, p = .328, partial η2 = 0.044], but backward recall [F(3) = 3.870, p = .012, partial η2 = 0.131] was significant. In the ToM-PST, there was no significant difference among clusters in questionnaire scores [F(3) = 2.122, p = .104, partial η2 = 0.076], but in sequencing scores [F(3) = 3.784, p = .014, partial η2 = 0.128], the difference among clusters was significant.
To specifically determine the clusters in which the differences occurred, post-hoc tests were conducted on variables for which differences among clusters were significant in the MANOVA. Before conducting the post-hoc test, the homogeneity of variance was calculated using Levene’s test. All variables used in the analysis met the criteria, and post-hoc tests were performed using the Bonferroni method. In the WCST, the LW group had significantly more total errors than the LE (p = .049) and HT (p = .034), and the LW group had significantly fewer categories completed than the NS (p = .037) and HT (p = .015). In the DST, the LW group had a significantly lower backward recall score than the HT (p = .010). In the ToM-PST, LW had significantly lower sequencing scores than LE (p = .010) and HT (p = .050).
Discussion
To the best of our knowledge, this is the first study to cluster gaze patterns using the gaze transition frequency, a measure of saccades, and fixation time, a measure commonly used in the eye-tracking paradigm. The purpose of this study was to classify clusters based on the participants’ fixation time and gaze transition frequency between eyes and words in each item of the K-RMET using eye-tracking techniques and to examine whether there were differences in neurocognitive and other ToM tests among the classified clusters. Based on the gaze patterns, healthy individuals were clustered into four clusters. Cluster 1 had a long fixation time in the eye area (LE), Cluster 2 had a short fixation time in the eye and word areas, and the frequency of gaze transition between the eyes and word areas was low (NS). Cluster 3 had a long fixation time in the word area (LW), and Cluster 4 had a high frequency of gaze transitions between the eye and word areas (HT). The K-RMET scores did not differ among the four clusters, but on tests of neurocognition, particularly those closely related to executive functions including cognitive flexibility, inhibition, implicit learning, and updating of the working memory, LW consistently performed worse than the other clusters. Therefore, the classified clusters may have shown differential cognitive strategies without differences in performance capability to read the mind in the eyes in non-clinical youths.
The LW group, which had a long fixation time on the words in the neurocognitive test, performed worse than LE and HT and HT and NS in the total errors and categories completed of the WCST, respectively. The LW group showed no significant differences in perseverative responses, and perseverative errors compared to the other clusters, but had more total errors and fewer categories completed. This suggests that the LW group’s performance is related to the instability of executive functions such as cognitive flexibility and implicit learning41,42. The LW group also had a significantly lower backward recall score on the DST than the HT group. In the DST, the backward test required more sophisticated and complex working memory in that the presented numbers had to be remembered in order and then reversed to respond. In addition, in a previous study that focused on the backward recall, it was reported that compared to the forward recall, the backward recall was more closely related to executive functions such as inhibition, flexibility, and updating of the working memory43. In the ToM-PST, the LW group also had significantly lower sequencing scores, but not questionnaire scores, than the LE and HT groups. In the K-RMET, pictures of the eye area were difficult to grasp because they provided only limited information about the mental state. Therefore, they could be considered as the part that required the most attention among the information presented. One possible explanation may be that individuals belonging to a cluster that allocates attention for a long time to the words area rather than the eyes area, such as the LW group, may not efficiently use the cognitive resources needed to solve problems. Another possible explanation may be that the gaze processing pattern of fixating on a word for a long time may be associated with a decline in executive function, considering that the LW group performed poorly in some of the DST and that the WCST also had many overall errors and a small number of completion categories. More specifically, according to the original authors of the RMET, for participants to match words describing complex mental states to the eye areas on faces, they first map the eyes presented in each picture stimulus to examples of eye regions in their memory. Then, they decode the semantics of the words linked to relatively specific eye areas35. These connections were established through each participant’s past interpersonal experiences, and neurocognition, including executive function, may have played a key role in this process. It may also be possible that when participants infer the model’s mental state from eye stimuli, executive functions may be required in the process of comparing presented pictures of the eye region with images stored in their memory44. Thus, a longer fixation time on the words of LW may be a coping strategy for solving the K-RMET because the LW group was found to show lower performance in some executive functions.
In the HT group, which had a high frequency of gaze transition, there was no significant decline in performance on the neurocognitive and ToM tests. This result was contrary to that in previous studies, which found that long and inefficient scan paths affected the quantitative and qualitative aspects of task performance14. In the present study, each cluster was derived from data (data-driven), and among them, HT had a relatively distinct characteristic of frequent gaze-switching compared to the other clusters. However, given that the standardised score (Z-score) was 0.685, the size itself could be considered not large. In addition, because more than half (51.85%) of all participants were classified as HT, the gaze pattern of HT was not an inefficient scan path, but was the most frequent and common pattern among all clusters. Another study found that a longer scan path did not negatively affect performance. Meanwhile, Bucher and Schumacher (2006) conducted a study on a web surfing experiment with a non-clinical group and found that participants preferred a wide-ranging strategy when information was clearly presented45. Thus, the relatively high gaze transition of the HT may reflect a neurocognitive strategy for solving a given task rather than an inefficient movement. Looking at the relationship among the clusters classified by gaze pattern and tests related to neurocognitive and other ToM abilities, in the neurocognitive tests, we observed a significant difference in performance among the clusters in the total errors and categories completed in the WCST and backward recall scores of the DST. However, in the ToM tests, the difference in average scores among the clusters in the K-RMET was not significant, and in the ToM-PST, there was a difference in performance in sequencing scores but not in questionnaire scores. For participants to obtain a high score in the ToM-PST, in addition to inferring the mental state of the target, organizational skills were required to arrange the presented pictures to fit the context and flow of the overall story. The picture arrangement subtest of the Wechsler Adult Intelligence Scale-Revised (WAIS-R)46, a similar task to the ToM-PST in that it arranges the order of cartoons to fit the situation, is known to require perceptual organisation and working memory, in addition to the social understanding and reasoning skills that the test originally intended to measure47. Therefore, the characteristics of spatio-temporal gaze patterns in each cluster could be more closely related to attention, organisation, and executive function than to ToM ability itself including the inferential-level false-belief test.
This study has some limitations, which inform the suggestions for future studies. First, this study classified clusters based on measured data rather than planning or predicting cluster characteristics in advance. Therefore, a large number of participants were classified into a specific cluster, such as HT, and there were clusters, such as NS, that were difficult to interpret clearly. Second, the participants were asked to look continuously at the monitor while the task was progressing. Unlike the original version of the test, the participants could not directly look at the glossary created to check the meaning of unfamiliar words. Although the participants were instructed to ask the researcher if they did not understand the meaning of a word, they may not have been as proactive in checking the meaning on their own. In addition to individual differences in the scan path, an individual’s vocabulary may affect task performance. Third, in the present study, only the frequency of gaze transitions was used to analyse saccade data. In future studies, if saccade accuracy, peak and mean velocities48, and latency, which are known to be related to Parkinson’s disease49, are used together, this should help in more accurate cluster classification. Fourth, the participants were recruited in late adolescence or early adulthood, which limits the applicability of the results obtained from the study to all age groups. Through follow-up studies conducted in children and clinical groups, our understanding of the relationship among gaze patterns, ToM, and neurocognition can be expanded. Finally, the cluster with long fixations in the word area performed poorly in tasks related to executive function, such as working memory, abstract thinking, and problem solving, compared to other clusters. Thus, it can be assumed that the strategies used by individuals to determine the correct answer in the K-RMET test, a perceptual-level ToM test, may be closely related to executive function, at least partly. In research on schizophrenia, where ToM deficits and disorders are known to be directly related, there is still debate on whether the ToM deficits experienced by individuals are an inherent characteristic or a secondary result of neurocognitive problems50. In a follow-up study, it would be useful to verify the relationship between the two variables by conducting a ToM battery, including a hint task51and a false-belief task52 along with neurocognitive tests.
Methods
Participants
A total of 89 late adolescents and young adults were recruited through online job advertisements. The Mini-International Neuropsychiatric Interview (MINI) was administered to all participants to exclude current and past psychiatric and neurological illnesses. The final sample included 50 men (56.2%) and 39 women (43.8%), with a mean age of 22.75 years (SD = 2.45, range = 19–28) and education of 14.24 years (SD = 1.29). The study protocol was approved in advance by the Severance Hospital Research Review Board (IRB Nos. 4–2014-0744 and 2014-1767-035), and all measurements and experiments were executed in accordance with the relevant guidelines and regulations (including the Declaration of Helsinki). All of the participants signed a written form of informed consent.
Measures
Apparatus
Eye movements were recorded using Senso-Motoric Instruments (SMI) (Teltow, Germany), such as an SMI RED 250 eye tracker and SMI iView X™ system. A 22-inch monitor (resolution: 1680 × 1050 pixels) was placed approximately 70 cm away from the participant. During the experimental paradigm, the actual gaze position coordinates were continuously stored by the tracking device at a frequency of 250 Hz. Eye movement data with signal loss owing to eye blinks or offscreen gazes were automatically excluded from the system53. The SMI Experiment Center™ 3.7.60 and BeGaze™ software 3.7.42 were used to present the experimental tasks and analyse the data.
Reading the mind in the eyes
The RMET was a measure of perceptual-level ToM ability54. The K-RMET was used as the stimulus for the eye-tracking experiment. The K-RMET, similar to the original version of the RMET, consisted of 36 questions, including practice questions, and participants were asked to choose one of the four choice words (e.g., one target word: cautious; three foils: insisting, bored and aghast) that best described the mental state of the person in the picture, after viewing a picture of a face in which only the eye area was shown35,40. The K-RMET stimulus used for eye-tracking was 25 cm wide and 16.5 cm high. The Area of Interest (AOI) was defined in a rectangular shape for the stimulus around the eyes and four option words using BeGaze™ software 3.7.42 (SMI, Teltow, Germany). The AOI definition of the eye tracking experiment stimulus is illustrated in Fig. 2.
Wisconsin Card sorting test
The Wisconsin Card Sorting Test (WCST)55was developed to assess the ability for abstract reasoning and to switch cognitive strategies in response to environmental changes. Participants were presented with 4 stimulus cards containing geometric figures and 128 response cards, each containing information related to the categories: colours, shapes, and numbers. They were asked to match the presented response card with one of the stimulus cards without any additional explanation. Positive (correct) or negative (wrong) feedback was provided for the participants’ responses, and through this feedback, the participants inferred the rule on their own. When the participant correctly classified the cards 10 times in a row, a category was considered complete and the rules were changed. The test ended when a total of 6 categories were completed or all 128 response cards were presented. In the present study, four variables (total errors, perseverative responses, perseverative errors, and categories completed) were used in the analysis. A perseverative response refers to continuing a previous response despite receiving negative feedback for a previous response using an incorrect category, and a perseverative error was defined as failure to respond correctly during a perseverative response41. While all perseverative errors are considered perseverative responses, not all perseverative responses are necessarily counted as errors. For instance, considering a case where the rule shifts from selecting red colours to selecting star shapes after a participant has correctly responded 10 times in a row, completing the colour selection rule. Even after this rule changed, a participant might continue responding based on the previous colour rule (showing perseverative responses). At this point, if the participant consistently judged red to be the criterion for classification and responded incorrectly, the series of responses would be counted as perseverative errors. In contrast, if a red star-shaped card happens to be presented during this phase, even though the participant makes their selection based on colour, one’s response would be marked correct since the chosen card coincidentally fits the current shape rule, regardless of one’s reasoning, and counted as a perseverative response.
Digit span test
The Digit Span Test (DST)56 was developed to measure attention and working memory capacity. We used the visual version, Inquisit 3.0 software (Millisecond Software LLC, Seattle, WA, USA), for the test in our study. Digit span refers to the maximum number of numbers that can be correctly remembered after seeing a series of numbers. It consists of forward recall, which involves responding to numbers in the order presented, and backward recall, which involves responding to numbers in the opposite order. The DST was measured as the sum of the scores correctly remembered before making two consecutive errors in the same recall task (forward recall range = 0–14, backward recall range = 0–14).
Theory of mind picture stories Task
The Theory of Mind Picture Stories Task (ToM-PST)57was developed to measure the ability to infer the mental states of others. The ToM-PST was administered using six cartoon-picture stories, each consisting of four picture cards. Two stories presented situations in which two characters cooperate to solve a problem, two other stories presented situations in which one character deceived another character, and the remaining two stories depicted two characters cooperating to deceive a third character. First, the participants were asked to rearrange the shuffled cards according to a logical sequence of events. The sequencing score, that is, how accurately each picture story was arranged, was measured, with two points given if the first and last cards were in the correct order and one point if the second and third cards were in the correct order (subtotal score range = 0–36). Next, participants were asked questions related to the cartoon character’s mental state (e.g., ‘What does the blonde-haired person think is in the box?’, ‘Now, what does the store owner think the boys intended?’). Of the six stories, two stories depicting situations in which two characters cooperated were given two questions, and the remaining four stories were given 4–5 questions each, with each correct answer being scored as 1 (subtotal score range = 0–23). The questionnaire scores were a measure of inferential-level ToM ability including false beliefs, second and third-order beliefs and intentions54.
Procedure
All measurement procedures were conducted with individual participants in an independent research space with minimal noise and distractions. The eye-tracking task was conducted in four phases. The first was the ‘calibration phase’, in which a nine-point calibration and validation procedure was performed using the SMI iView X™ system before the start and midpoint (question 19) of the paradigm to improve the accuracy of the data. The second was the ‘preparation phase’, in which a black fixation cross on a gray background was presented for 1000 ms between each stimulus to induce participants to direct their gaze to the centre of the screen. The third was the ‘stimulus presentation phase’, in which participants were requested to look freely at the presented K-RMET stimulus for 6000 ms. The fourth was the ‘response phase’, in which participants looked at a screen where only choice words were presented without eye picture stimulation compared to the stimulus presented just before and were asked to tell the researcher the correct answer. There was no time limit. The eye-tracking measurement procedure is illustrated in Fig. 3.
After the eye-tracking paradigm was completed, participants performed computerised WCST and DST under the guidance of the researcher, as well as the ToM-PST in a face-to-face situation with the researcher. The total test time was approximately 60–80 min.
Pre-processing
Prior to pre-processing, operational definitions were developed for the variables used in the eye-tracking analysis. First, fixation was defined as a stable eye movement lasting more than 50 ms within 1° of the visual angle. Eye fixation time was defined as the sum of fixation times in the eye area, and word fixation time was defined as the sum of fixation times in the four-option word areas. The gaze transition frequency was defined as the sum of the movement frequencies between the eye area and the word area within the transition matrix provided by SMI BeGaze™ software.
Calibration accuracy was referred to as ‘the difference between the actual gaze position and the gaze position recorded in the eye-tracking data (x, y axis)’. In the present study, data falling within 1.0°, which was the ‘acceptable’ range in previous studies58,59, was used for analysis. The tracking ratio was referred to as ‘the percentage of time during which the eye-tracker was able to successfully track the participant’s gaze during the entire recording’, and analysis was performed on data representing more than 80% of the entire recording60,61. Data from three participants who did not meet the above criteria were excluded from the analysis. In addition, the researcher reviewed the recorded data and excluded data from participants who had significantly less total gaze transitions during the eye-tracking process. More specifically, given that the K-RMET used in the eye-tracking task consisted of 36 questions, it was judged that for an appropriate response, at least two saccades (once each in the eye picture and word stimuli) should have appeared for each question. Based on these criteria, the data from five participants whose total gaze transition frequency was less than 72 times were excluded from the analysis. From the pre-processing, data from 81 participants were analysed.
Statistical analysis
The Statistical Package for the Social Sciences (SPSS) 25 program for Windows (IBM Corporation, Armonk, NY, USA) was used for descriptive statistics, correlations between variables, and multivariate analysis. K-means clustering, a non-hierarchical technique that divides a set of observations into k clusters, was applied to classify the eye-tracking data. This technique has the advantage of being relatively simple to implement and generalizable to clusters of various shapes and sizes. However, the value of k, which is the number of clusters, must be manually determined by the researcher. Therefore, when performing k-means clustering, it is important to set the optimal number of clusters (optimal k) based on agreed-upon criteria62,63. The optimal k value was determined using the RStudio 1.2.1335 program package for Windows (RStudio, Integrated Development for R., RStudio, Inc., Boston, MA, USA). The commonly used ‘elbow method’64and ‘silhouette method’65were applied as criteria for determining the optimal k value. In addition, we used ‘NbClust’66 package, which proposes an optimal clustering method from 30 indicators. After determining the optimal k value, the fixation times and gaze transition frequencies from the recorded data were converted into standardised scores (Z-scores) and then used for k-means clustering. All tests were two-tailed, and statistical significance was set at 0.05.
Data availability
Data supporting the findings of this study are available from the corresponding author upon a reasonable request.
References
Premack, D. & Woodruff, G. Does the chimpanzee have a theory of mind? Behav. Brain Sci. 1, 515–526 (1978).
Wellman, H. M. & Woolley, J. D. From simple desires to ordinary beliefs: the early development of everyday psychology. Cognition 35, 245–275 (1990).
Adolphs, R. The social brain: neural basis of social knowledge. Ann. Rev. Psychol. 60, 693–716 (2009).
Emery, N. J. The eyes have it: the neuroethology, function and evolution of social gaze. Neurosci. Biobehavioral Reviews. 24, 581–604 (2000).
Russell, J. A. What does a facial expression mean? Psychol. Facial Expression, 3–30 (1997).
Fridlund, A. J. Human Facial Expression: An Evolutionary view (Academic, 2014).
Rutter, D. R. [BOOK REVIEW] communicating by telephone. Br. J. Psychol. 79, 554–555 (1988).
Hsiao, J. H., An, J., Hui, V. K. S., Zheng, Y. & Chan, A. B. Understanding the role of eye movement consistency in face recognition and autism through integrating deep neural networks and hidden Markov models. Npj Sci. Learn. 7, 28. https://doi.org/10.1038/s41539-022-00139-6 (2022).
Mehoudar, E., Arizpe, J., Baker, C. I. & Yovel, G. Faces in the eye of the beholder: unique and stable eye scanning patterns of individual observers. J. Vis. 14, 6–6 (2014).
Aldao, A., Nolen-Hoeksema, S. & Schweizer, S. Emotion-regulation strategies across psychopathology: a meta-analytic review. Clin. Psychol. Rev. 30, 217–237 (2010).
Bylsma, L. M., Morris, B. H. & Rottenberg, J. A meta-analysis of emotional reactivity in major depressive disorder. Clin. Psychol. Rev. 28, 676–691 (2008).
Kret, M. E. & Ploeger, A. Emotion processing deficits: a liability spectrum providing insight into comorbidity of mental disorders. Neurosci. Biobehavioral Reviews. 52, 153–171 (2015).
Lee, S. Y. et al. Impaired facial emotion recognition in individuals at ultra-high risk for psychosis and with first-episode schizophrenia, and their associations with neurocognitive deficits and self-reported schizotypy. Schizophr. Res. 165, 60–65 (2015).
Pelphrey, K. A. et al. Visual scanning of faces in autism. J. Autism Dev. Disord. 32, 249–261 (2002).
Rodríguez Sosa, J. T. & Acosta Ojeda, M. Rodríguez Del Rosario, L. Theory of mind, facial recognition and emotional processing in schizophrenia. Revista De Psiquiatría Y Salud Mental (English Edition). 4, 28–37. https://doi.org/10.1016/S2173-5050(11)70005-X (2011).
Just, M. A. & Carpenter, P. A. Eye fixations and cognitive processes. Cogn. Psychol. 8, 441–480 (1976).
Tsai, M. J., Hou, H. T., Lai, M. L., Liu, W. Y. & Yang, F. Y. Visual attention for solving multiple-choice science problem: an eye-tracking analysis. Comput. Educ. 58, 375–385 (2012).
Braunagel, C., Kasneci, E., Stolzmann, W. & Rosenstiel, W. in IEEE 18th International Conference on Intelligent Transportation Systems. 1652–1657 (IEEE). (2015).
Lee, J. & Ahn, J. H. Attention to banner ads and their effectiveness: an eye-tracking approach. Int. J. Electron. Commer. 17, 119–137 (2012).
Armstrong, T. & Olatunji, B. O. Eye tracking of attention in the affective disorders: a meta-analytic review and synthesis. Clin. Psychol. Rev. 32, 704–723 (2012).
Salvucci, D. D. & Goldberg, J. H. in Proceedings of the 2000 symposium on Eye tracking research & applications. 71–78.
Chen, S., Epps, J., Ruiz, N. & Chen, F. in Proceedings of the 16th international conference on Intelligent user interfaces. 315–318.
Lin, J. J. & Lin, S. S. Tracking eye movements when solving geometry problems with handwriting devices. J. Eye Mov. Res. 7 (2014).
Wang, Q., Yang, S., Liu, M., Cao, Z. & Ma, Q. An eye-tracking study of website complexity from cognitive load perspective. Decis. Support Syst. 62, 1–10 (2014).
Zagermann, J., Pfeil, U. & Reiterer, H. in Proceedings of the sixth workshop on beyond time and errors on novel evaluation methods for visualization. 78–85.
Kirchner, J. C., Hatri, A., Heekeren, H. R. & Dziobek, I. Autistic symptomatology, face processing abilities, and eye fixation patterns. J. Autism Dev. Disord. 41, 158–167 (2011).
Auyeung, B. et al. Oxytocin increases eye contact during a real-time, naturalistic social interaction in males with and without autism. Translational Psychiatry. 5, e507–e507 (2015).
Asgharpour, M., Tehrani-Doost, M., Ahmadi, M. & Moshki, H. Visual attention to emotional face in schizophrenia: an eye tracking study. Iran. J. Psychiatry. 10, 13 (2015).
Au-Yeung, S. K., Kaakinen, J. K., Liversedge, S. P. & Benson, V. Processing of written irony in Autism Spectrum Disorder: an eye‐movement study. Autism Res. 8, 749–760 (2015).
Gilchrist, I. in The Oxford Handbook of Eye Movements (eds Simon P. Liversedge, Iain Gilchrist, & Stefan Everling) 0Oxford University Press, (2011).
Joseph, A. W. & Murugesh, R. Potential eye tracking metrics and indicators to measure cognitive load in human-computer interaction research. J. Sci. Res. 64, 168–175 (2020).
Rutherford, M. D. & Towns, A. M. Scan path differences and similarities during emotion perception in those with and without autism spectrum disorders. J. Autism Dev. Disord. 38, 1371–1381 (2008).
Usée, F., Jacobs, A. M. & Lüdtke, J. From abstract symbols to emotional (in-) sights: an eye tracking study on the effects of emotional vignettes and pictures. Front. Psychol. 11, 905 (2020).
Atyabi, A. et al. Stratification of children with autism spectrum disorder through fusion of temporal information in eye-gaze scan-paths. ACM Trans. Knowl. Discovery Data. 17, 1–20 (2023).
Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y. & Plumb, I. The reading the mind in the eyes test revised version: a study with normal adults, and adults with Asperger syndrome or high-functioning autism. J. Child. Psychol. Psychiatry Allied Disciplines. 42, 241–251 (2001).
Bora, E., Yucel, M. & Pantelis, C. Theory of mind impairment in schizophrenia: meta-analysis. Schizophr. Res. 109, 1–9 (2009).
Kettle, J. W., O’Brien-Simpson, L. & Allen, N. B. Impaired theory of mind in first-episode schizophrenia: comparison with community, university and depressed controls. Schizophr. Res. 99, 96–102 (2008).
Peñuelas-Calvo, I., Sareen, A., Sevilla-Llewellyn-Jones, J. & Fernández-Berrocal, P. The reading the mind in the eyes test in autism-spectrum disorders comparison with healthy controls: a systematic review and meta-analysis. J. Autism Dev. Disord. 49, 1048–1061 (2019).
Stewart, E., Catroppa, C. & Lah, S. Theory of mind in patients with epilepsy: a systematic review and meta-analysis. Neuropsychol. Rev. 26, 3–24 (2016).
Koo, S. J. et al. Reading the mind in the eyes test: translated and Korean versions. Psychiatry Invest. 18, 295 (2021).
Miles, S. et al. Considerations for using the Wisconsin Card sorting test to assess cognitive flexibility. Behav. Res. Methods. 53, 2083–2091 (2021).
Buchsbaum, B. R., Greer, S., Chang, W. L. & Berman, K. F. Meta-analysis of neuroimaging studies of the Wisconsin Card‐sorting task and component processes. Hum. Brain. Mapp. 25, 35–45 (2005).
Hilbert, S., Nakagawa, T. T., Puci, P., Zech, A. & Bühner, M. The digit span backwards task. Eur. J. Psychol. Assess. (2014).
Seo, E. et al. Reading the mind in the eyes test: relationship with neurocognition and facial emotion recognition in non-clinical youths. Psychiatry Invest. 17, 835 (2020).
Bucher, H. J. & Schumacher, P. The relevance of attention for selecting news content. An eye-tracking study on attention patterns in the reception of print and online media. (2006).
Wechsler, D. W. A. I. S. R. Manual: Wechsler adult intelligence scale-revised. (no Title) (1981).
Campbell, J. M. & McCord, D. M. The WAIS-R comprehension and picture arrangement subtests as measures of social intelligence: testing traditional interpretations. J. Psychoeducational Assess. 14, 240–249 (1996).
Yang, Q., Wang, T., Su, N., Xiao, S. & Kapoula, Z. Specific saccade deficits in patients with Alzheimer’s disease at mild to moderate stage and in patients with amnestic mild cognitive impairment. Age 35, 1287–1298 (2013).
Antoniades, C. A. & FitzGerald, J. J. Using saccadometry with deep brain stimulation to study normal and pathological brain function. JoVE (Journal Visualized Experiments), e53640 (2016).
Ayesa-Arriola, R. et al. Evidence for trait related theory of mind impairment in first episode psychosis patients and its relationship with processing speed: a 3 year follow-up study. Front. Psychol. 7, 592 (2016).
Corcoran, R., Mercer, G. & Frith, C. D. Schizophrenia, symptomatology and social inference: investigating theory of mind in people with schizophrenia. Schizophr. Res. 17, 5–13 (1995).
Rowe, A. D., Bullock, P. R., Polkey, C. E. & Morris, R. G. Theory of mind’impairments and their relationship to executive functioning following frontal lobe excisions. Brain 124, 600–616 (2001).
Chesnet, D. & Alamargot, D. iViewX is SensoMotoric Instruments GmbH (SMI) product. (2004).
Byom, L. J. & Mutlu, B. Theory of mind: mechanisms, methods, and new directions. Front. Hum. Neurosci. 7, 413 (2013).
Heaton, R. K. & Staff, P. Wisconsin card sorting test: computer version 2. Odessa: Psychol. Assess. Resour. 4, 1–4 (1993).
Lumley, F. & Calhoon, S. Memory span for words presented auditorially. J. Appl. Psychol. 18, 773 (1934).
Brüne, M., Ribbert, H. & Schiefenhövel, W. The social brain: Evolution and pathology. (No Title) (2003).
Hotta, K., Prima, O. D. A., Imabuchi, T. & Ito, H. in 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR). 1843–1847 (IEEE).
Kar, A. & Corcoran, P. Performance evaluation strategies for eye gaze estimation systems with quantitative metrics and visualizations. Sensors 18, 3151 (2018).
Kruger, J. L., Hefer, E. & Matthew, G. in Proceedings of the Conference on Eye Tracking South Africa. 62–66. (2013).
Cho, J. Y. & Suh, J. Spatial color efficacy in perceived luxury and preference to stay: an eye-tracking study of retail interior environment. Front. Psychol. 11, 296 (2020).
Hamerly, G. & Elkan, C. Learning the k in k-means. Adv. Neural. Inf. Process. Syst. 16 (2003).
Sinaga, K. P. & Yang, M. S. Unsupervised K-means clustering algorithm. IEEE Access. 8, 80716–80727 (2020).
Cui, M. Introduction to the k-means clustering algorithm based on the elbow method. Acc. Auditing Finance. 1, 5–8 (2020).
Rousseeuw, P. J. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
Charrad, M., Ghazzali, N., Boiteau, V. & Niknafs, A. NbClust: an R package for determining the relevant number of clusters in a data set. J. Stat. Softw. 61, 1–36 (2014).
Acknowledgements
This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning, Republic of Korea (Grant number 2022R1A2B5B03002611 to Eun Lee).
Author information
Authors and Affiliations
Contributions
S.J.K. and S.K.A. are responsible for the study concept and design. S.J.K. wrote the draft. S.J.K. is responsible for the data analysis and interpretation. E.J.C., J.E.M., E.S., and E.L. contributed to the discussion. E.J.C., J.E.M., E.S., and E.L. reviewed and edited the manuscript. All authors approved the final manuscript and agreed to its publication. S.K.A. is the guarantor of this work and has full access to all study data.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Koo, S.J., Cha, E.J., Min, J.E. et al. Eye tracking based clustering using the Korean version of the reading the mind in the eyes test. Sci Rep 15, 3929 (2025). https://doi.org/10.1038/s41598-025-88483-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-88483-6