Abstract
Multimodal interaction represents a more natural style of human-computer interaction permitting our developed communicative skills to interact with computer systems. It remains a challenging task to design reliable multimodal systems. Employing advanced methods providing optimal performance depends on precise modeling of integration patterns that allows adapting to preferences and differences of individual users. While basic foundation and empirical evidence around these differences has already been described and confirmed in previous research works, introduced measures and classifications seem oversimplified and insufficiently precise to design reliable and robust interaction models. In this paper, results of our study of multimodal integration patterns in systems combining speech and gesture input are presented. Important interaction differences of subjects and their specific multimodal integration patterns were confirmed and completed with our own findings. Based on the obtained results, a new integration pattern categorization is defined and analyzed. The introduced categorization provides more reliable and consistent results in comparison with classifications presented in related literature. Moreover, its generality means it is applicable on other input modality combinations.
Similar content being viewed by others
Notes
The original algorithm is rotation invariant.
In order to distinguish between the two definitions, the one from Oviatt et al. will be denoted as SEQ\(_O\)/SIM\(_O\) and our redefined as SEQ\(_R\)/SIM\(_R\) in the rest of the work.
References
Bangalore S, Johnston M (2009) Robust understanding in multimodal interfaces. Comput Linguist 35(3):345–397. doi:10.1162/coli.08-022-R2-06-26
Billinghurst M, Lee M (2012) Multimodal interfaces for augmented reality, expanding the frontiers of visual analytics and visualization. pp 449–465 doi:10.1007/978-1-4471-2804-5
Bolt RA (1980) Put-that-there: voice and gesture at the graphics interface. In: Proceedings of the 7th annual conference on computer graphics and interactive techniques - SIGGRAPH ’80, vol 32. ACM Press, pp 262–270. doi:10.1145/800250.807503
Cohen PR, Johnston M, McGee D, Oviatt S, Pittman J, Smith I, Chen L, Clow J (1997) QuickSet: multimodal interaction for distributed applications. In: Proceedings of the fifth ACM international conference on multimedia-MULTIMEDIA ’97, ACM Press, pp 31–40. doi:10.1145/266180.266328
Cohen PR, Kaiser EC, Buchanan MC, Lind S, Corrigan MJ, Wesson RM (2015) Sketch-Thru-Plan: a multimodal interface for command and control. Commun of the ACM 58(4):56–65. doi:10.1145/2735589
Dumas B, Lalanne D, Oviatt S (2009) Multimodal interfaces: a durvey of principles, models and frameworks. In: Lalanne D, Kohlas J (eds) Human machine interaction, Lecture notes in computer science, vol 5440. Springer, Berlin, pp 3–26. doi:10.1007/978-3-642-00437-7_1
Ehlen P, Johnston M (2012) Multimodal interaction patterns in mobile local search. In: Proceedings of the 2012 ACM international conference on intelligent user interfaces - IUI ’12, pp 21–24 . doi:10.1145/2166966.2166970
Haas EC, Pillalamarri KS, Stachowiak CC, McCullough G (2011) Temporal binding of multimodal controls for dynamic map displays. In: Proceedings of the 13th international conference on multimodal interfaces - ICMI ’11, ACM Press. p 409. doi:10.1145/2070481.2070558
Huang X, Oviatt S (2006) Toward adaptive information fusion in multimodal systems. In: Renals S, Bengio S (eds) Machine learning for multimodal interaction, Lecture notes in computer science, vol 3869. Springer, Berlin, pp 15–27. doi:10.1007/11677482_2
Huang X, Oviatt S, Lunsford R (2006) Combining user modeling and machine learning to predict users multimodal integration patterns. In: Renals S, Bengio S, Fiscus JG (eds) Machine Learning for multimodal interaction, Lecture notes in computer science, vol 4299. Springer, Berlin, pp 50–62. doi:10.1007/11965152_5
Huggins-Daines D, Kumar M, Chan A, Black A, Ravishankar M, Rudnicky A (2006) Pocketsphinx: a free, real-time continuous speech recognition system for hand-held devices. In: Proceedings of IEEE international conference on acoustics speech and signal processing. pp 185–188. doi:10.1109/ICASSP.2006.1659988
Johnston M, Bangalore S (2005) Finite-state multimodal integration and understanding. Nat Lang Eng 11(2):159–187. doi:10.1017/S1351324904003572
Johnston M, Bangalore S, Vasireddy G, Stent A, Ehlen P, Walker M, Whittaker S, Maloor P (2002) MATCH: an architecture for multimodal dialogue systems. In: Proceedings of the 40th annual meeting on association for computational linguistics - ACL ’02, July, pp 376–383. doi:10.3115/1073083.1073146
Lee M, Billinghurst M, Baek W, Green R, Woo W (2013) A usability study of multimodal input in an augmented reality environment. Virtual Real 17(4):293–305. doi:10.1007/s10055-013-0230-0
Lewis JR (2012) Usability testing. In: Handbook of human factors and ergonomics. Wiley, pp 1267–1312. doi:10.1002/9781118131350.ch46
Oviatt S (1999) Ten myths of multimodal interaction. Commun of the ACM 42(11):74–81. doi:10.1145/319382.319398
Oviatt S (2003) User-centered modeling and evaluation of multimodal interfaces. Proc of the IEEE 91(9):1457–1468. doi:10.1109/JPROC.2003.817127
Oviatt S, Coulston R, Lunsford R (2004) When do we interact multimodally?. In: Proceedings of the 6th international conference on multimodal interfaces - ICMI ’04, ACM Press, pp 129–136. doi:10.1145/1027933.1027957
Oviatt S, Coulston R, Tomko S, Xiao B, Lunsford R, Wesson M, Carmichael L (2003) Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of the 5th international conference on multimodal interfaces - ICMI ’03, ACM Press, pp 44–51. doi:10.1145/958432.958443
Oviatt S, DeAngeli A, Kuhn K (1997) Integration and synchronization of input modes during multimodal human-computer interaction. In: Proceedings of the SIGCHI conference on human factors in computing systems - CHI ’97, ACM Press, pp 415–422. doi:10.1145/258549.258821
Oviatt S, Lunsford R, Coulston R (2005) Individual differences in multimodal integration patterns: what are they and why do they exist?. In: Proceedings of the SIGCHI conference on human factors in computing systems - CHI ’05, ACM Press, pp 241–249. doi:10.1145/1054972.1055006
Schüssel F, Honold F, Schmidt M, Bubalo N, Huckauf A, Weber M (2014) Multimodal interaction history and its use in error detection and recovery. In: Proceedings of the 16th international conference on multimodal interaction - ICMI ’14, ACM Press, pp 164–171. doi:10.1145/2663204.2663255
Serrano M, Nigay L (2010) A wizard of oz component-based approach for rapidly prototyping and testing input multimodal interfaces. J Multimodal User Interfaces 3(3):215–225. doi:10.1007/s12193-010-0042-4
Wobbrock JO, Wilson AD, Li Y (2007) Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes. In: Proceedings of the 20th annual ACM symposium on user interface software and technology - UIST ’07, ACM Press, pp 159–169. doi:10.1145/1294211.1294238
Xiao B, Girand C, Oviatt S (2002) Multimodal integration patterns in children. In: Proceedings of international conference on spoken language processing, pp 629–632
Xiao B, Oviatt S (2003) Modeling multimodal integration patterns and performance in seniors : toward adaptive processing of individual differences. In: Proceedings of the 5th international conference on multimodal interfaces - ICMI ’03, pp 256–272. doi:10.1145/958432.958480
Acknowledgements
We would like to thank Michal Vondra for providing an initial feedback during a pilot test, and all volunteers for participating in the study. Thanks also to the anonymous reviewers for their helpful comments and suggestions. This work has been supported by the Grant Agency of the Czech Technical University in Prague, Grant No. SGS16/156/OHK3/2T/13.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix
Testing scenarios
The following listing contains a complete set of objectives as introduced to tested subjects:
-
1.
Zoom in and out the map view.
-
2.
Get your current location and find the nearest petrol station.
-
3.
Get detail information about two gas stations.
-
4.
Get directions between Oloumouc and Liberec.
-
5.
Get information about cinemas in your location.
-
6.
Find estimated travel time between an airport near Prague and a theatre in the downtown of Prague.
-
7.
Get coordinates of at least 3 hospitals in Pilsen.
-
8.
Find the nearest police and emergency.
-
9.
Find a travel distance between a railway station in Brno and the closest airport.
-
10.
Find a name of the nearest bus and subway station.
-
11.
Find names of some pubs and restaurants in the downtown of Ceske Budejovice.
-
12.
Find phone numbers of libraries in the surrounding area.
-
13.
Get a postal address of a coffeehouse around a museum in Cesky Krumlov.
-
14.
Get phone numbers and postal addresses of churches in the surrounding area of Brno.
-
15.
Get details of the two nearest restaurants in your current location.
-
16.
Find a travel distance from the westernmost to the easternmost point and then from the northernmost to the southernmost point of Czech Republic.
Rights and permissions
About this article
Cite this article
Hak, R., Zeman, T. Consistent categorization of multimodal integration patterns during human–computer interaction. J Multimodal User Interfaces 11, 251–265 (2017). https://doi.org/10.1007/s12193-017-0243-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12193-017-0243-1