Abstract
Mobile eye tracking is beneficial for the analysis of human–machine interactions of tangible products, as it tracks the eye movements reliably in natural environments, and it allows for insights into human behaviour and the associated cognitive processes. However, current methods require a manual screening of the video footage, which is time-consuming and subjective. This work aims to automatically detect cognitive demanding phases in mobile eye tracking recordings. The approach presented combines the user’s perception (gaze) and action (hand) to isolate demanding interactions based upon a multi-modal feature level fusion. It was validated in a usability study of a 3D printer with 40 participants by comparing the usability problems found to a thorough manual analysis. The new approach detected 17 out of 19 problems, while the time for manual analyses was reduced by 63%. More than eye tracking alone, adding the information of the hand enriches the insights into human behaviour. The field of AI could significantly advance our approach by improving the hand-tracking through region proposal CNNs, by detecting the parts of a product and mapping the demanding interactions to these parts, or even by a fully automated end-to-end detection of demanding interactions via deep learning. This could set the basis for machines providing real-time assistance to the machine’s users in cases where they are struggling.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The investigation is based on the SMI Eye Tracking Glasses 2w, providing a scene camera resolution of 960 by 720 px (60° horizontal, 46° vertical) at 30 Hz. This refresh rate is interpolated to match the 60 Hz of the eye movement signal, which has a precision of 0.5° over all distances.
References
Bowman M, Johannson R, Flanagan J (2009) Eye–hand coordination in a sequential target contact task. Exp Brain Res 195(2):273–283. doi:10.1007/s00221-009-1781-x
Buettner R (2013) Cognitive workload of humans using artificial intelligence systems: towards objective measurement applying eye-tracking technology. In: KI 2013: 36th German conference on artificial intelligence, 16–20 Sept 2013, vol 8077, pp 37–48. Springer, Berlin. doi:10.1007/978-3-642-40942-4_4
Crawford J, Medendorp W, Marotta J (2004) Spatial transformations for eye hand coordination. J Neurophysiol 92(1):10–19. doi:10.1152/jn.00117.2004
Duchowski A (2007) Eye tracking methodology theory and practice. Springer, London
Essig K, Sand N, Schack T, Kunsemoller J, Weigelt M, Ritter H (2010) Fully-automatic annotation of scene videos: establish eye tracking effectively in various industrial applications. In: Proceedings of SICE annual conference 2010, Taipei, pp 3304–3307
Hayhoe M, Shrivastava A, Mruczek R, Pelz J (2003) Visual memory and motor planning in a natural task. J Vis 3(1):49–63. doi:10.1167/3.1.6
Henderson J (2013) Eye movements. In: Reisberg D (ed) The Oxford handbook of cognitive psychology. Oxford University Press, Oxford, pp 69–82. doi:10.1093/oxfordhb/9780195376746.013.0005
Just M, Carpenter P (1980) A theory of reading: from eye fixations to comprehension. Psychol Rev 87(4):329–354. doi:10.1037/0033-295X.87.4.329
König P, Wilming N, Kietzmann T, Ossandón J, Onat S, Ehinger B, Gameiro R, Kaspar K (2016) Eye movements as a window to cognitive processes. J Eye Mov Res 9(5):1–16
Land M, Mennie N, Rusted J (1999) The roles of vision and eye movements in the control of activities of daily living. Perception 28(11):1311–1328. doi:10.1068/p2935
Land M, Tatler B (2009) Looking and acting—vision and eye movements in natural behaviour. Oxford University Press, Oxford. doi:10.1093/acprof:oso/9780198570943.001.0001
Li C, Kitani KM (2013) Pixel-level hand detection in ego-centric videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 3570–3577. doi:10.1109/CVPR.2013.458
Mennie N, Hayhoe M, Sullivan B (2007) Look-ahead fixations: anticipatory eye movements in natural tasks. Exp Brain Res 179(3):427–442. doi:10.1007/s00221-006-0804-0
Mussgnug M, Sadowska A, Meboldt M (2017) Accepted: Target based analysis—a model to analyse usability tests based on mobile eye tracking recordings. In: Proceedings of the 21st international conference on engineering design (ICED 17), 21–25 Aug 2017. Design Society, Vancouver
Mussgnug M, Waldern F, Meboldt M (2015) Mobile eye tracking in usability testing : designers analysing the user–product interaction. In: Proceedings of the 20th international conference on engineering design (ICED 15), 27–30 July 2015. Design Society, Milan, pp 349–358
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Process Syst: 91–99
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) OverFeat: integrated recognition, localization and detection using convolutional networks. https://arxiv.org/abs/1312.6229
Song J, Sörös G, Pece F, Fanello S, Izadi S, Keskin C, Hilliges O (2014) In-air gestures around unmodified mobile devices. In: Proceedings of the 27th annual ACM symposium on user interface software and technology-UIST ’14, pp 319–329. doi:10.1145/2642918.2647373
Suchan J, Bhatt M (2016) Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. In: Proceedings of the 25th international joint conference on artificial intelligence (IJCAI-16), 9–15 July 2016. AAAI Press, New York, pp 2633–2639
Tatler B, Hayhoe M, Land M, Ballard D (2011) Eye guidance in natural vision: reinterpreting salience. J Vis 11(5):1–23. doi:10.1167/11.5.5
Tatler B, Land M (2015) Everyday visual attention. In: Fawcett J, Risko E, Kingstone A (eds) The handbook of attention. MIT Press, Cambridge, pp 391–421
Triesch J, Ballard D, Hayhoe M, Sullivan B (2003) What you see is what you need. J Vis 3(1):86–94. doi:10.1167/3.1.9
Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 7–12 June 2015. IEEE, Boston, pp 4694–4702
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Mussgnug, M., Singer, D., Lohmeyer, Q. et al. Automated interpretation of eye–hand coordination in mobile eye tracking recordings. Künstl Intell 31, 331–337 (2017). https://doi.org/10.1007/s13218-017-0503-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13218-017-0503-y