More Web Proxy on the site http://driver.im/

Article

Free access

Unification-based multimodal integration

Authors:

Michael Johnston,

Philip R. Cohen,

Sharon L. Oviatt,

James A. Pittman,

Ira SmithAuthors Info & Claims

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

Pages 281 - 288

https://doi.org/10.3115/976909.979653

Published: 07 July 1997 Publication History

Abstract

Recent empirical research has shown conclusive advantages of multimodal interaction over speech-only interaction for map-based tasks. This paper describes a multimodal language processing architecture which supports interfaces allowing simultaneous input from speech and gesture recognition. Integration of spoken and gestural input is driven by unification of typed feature structures representing the semantic contributions of the different modes. This integration method allows the component modalities to mutually compensate for each others' errors. It is implemented in Quick-Set, a multimodal (pen/voice) system that enables users to set up and control distributed interactive simulations.

References

[1]

Bolt, R. A., 1980. "Put-That-There": Voice and gesture at the graphics interface. Computer Graphics, 14.3:262--270.

Digital Library

[2]

Brison, E., and N. Vigouroux. (unpublished ms.). Multimodal references: A generic fusion process. URIT-URA CNRS. Universit Paul Sabatier, Toulouse, France.

[3]

Calder, J. 1987. Typed unification for natural language processing. In E. Klein and J. van Benthem, editors, Categories, Polymorphisms, and Unification, pages 65--72. Centre for Cognitive Science, University of Edinburgh, Edinburgh.

[4]

Carpenter, R. 1990. Typed feature structures: Inheritance, (In)equality, and Extensionality. In W. Daelemans and G. Gazdar, editors, Proceedings of the ITK Workshop: Inheritance in Natural Language Processing, pages 9--18, Tilburg. Institute for Language Technology and Artificial Intelligence, Tilburg University, Tilburg.

[5]

Carpenter, R. 1992. The logic of typed feature structures. Cambridge University Press, Cambridge, England.

Digital Library

[6]

Cheyer, A., and L. Julia. 1995. Multimodal maps: An agent-based approach. In International Conference on Cooperative Multimodal Communication (CMC/95), pages 24--26, May 1995. Eind-hoven, The Netherlands.

Digital Library

[7]

Clarkson, J. D., and J. Yi. 1996. LeatherNet: A synthetic forces tactical training system for the USMC commander. In Proceedings of the Sixth Conference on Computer Generated Forces and Behavioral Representation, pages 275--281. Institute for simulation and training. Technical Report IST-TR-96-18.

[8]

Cohen, P. R. 1991. Integrated interfaces for decision support with simulation. In B. Nelson, W. D. Kelton, and G. M. Clark, editors, Proceedings of the Winter Simulation Conference, pages 1066--1072. ACM, New York.

Digital Library

[9]

Cohen, P. R. 1992. The role of natural language in a multimodal interface. In Proceedings of UIST'92, pages 143--149. ACM Press, New York.

Digital Library

[10]

Cohen, P. R., A. Cheyer, M. Wang, and S. C. Baeg. 1994. An open agent architecture. In Working Notes of the AAAI Spring Symposium on Software Agents (March 21--22, Stanford University, Stanford, California), pages 1--8.

Digital Library

[11]

Courtemanche, A. J., and A. Ceranowicz. 1995. ModSAF development status. In Proceedings of the Fifth Conference on Computer Generated Forces and Behavioral Representation, pages 3--13, May 9--11, Orlando, Florida. University of Central Florida, Florida.

[12]

King, P. 1989. A logical formalism for head-driven phrase structure grammar. Ph.D. Thesis, University of Manchester, Manchester, England.

[13]

Koons, D. B., C. J. Sparrell, and K. R. Thorisson. 1993. Integrating simultaneous input from speech, gaze, and hand gestures. In M. T. Maybury, editor, Intelligent Multimedia Interfaces, pages 257--276. AAAI Press/ MIT Press, Cambridge, Massachusetts.

Digital Library

[14]

Moore, R. C., J. Dowding, H. Bratt, J. M. Gawron, Y. Gorfu, and A. Cheyer 1997. Command Talk: A Spoken-Language Interface for Battlefield Simulations. In Proceedings of Fifth Conference on Applied Natural Language Processing, pages 1--7, Washington, D.C. Association for Computational Linguistics, Morristown, New Jersey.

Digital Library

[15]

Moshier, D. 1988. Extensions to unification grammar for the description of programming languages. Ph.D. Thesis, University of Michigan, Ann Arbor, Michigan.

Digital Library

[16]

Neal, J. G., and S. C. Shapiro. 1991. Intelligent multi-media interface technology. In J. W. Sullivan and S. W. Tyler, editors, Intelligent User Interfaces, pages 45--68. ACM Press, Frontier Series, Addison Wesley Publishing Co., New York, New York.

Digital Library

[17]

Oviatt, S. L. 1996. Multimodal interfaces for dynamic interactive maps. In Proceedings of Conference on Human Factors in Computing Systems: CHI'96, pages 95--102, Vancouver, Canada. ACM Press, New York.

Digital Library

[18]

Oviatt, S. L., A. DeAngeli, and K. Kuhn. 1997. Integration and synchronization of input modes during multimodal human-computer interaction. In Proceedings of the Conference on Human Factors in Computing Sytems: CHI'97, pages 415--422, Atlanta, Georgia. ACM Press, New York.

Digital Library

[19]

Oviatt, S. L., and R. van Gent. 1996. Error resolution during multimodal human-computer interaction. In Proceedings of International Conference on Spoken Language Processing, vol 1, pages 204--207, Philadelphia, Pennsylvania.

[20]

Pollard, C. J., and I. A. Sag. 1987. Information-based syntax and semantics: Volume I, Fundamentals., Volume 13 of CSLI Lecture Notes. Center for the Study of Language and Information, Stanford University, Stanford, California.

Digital Library

[21]

Vo, M. T., and C. Wood. 1996. Building an application framework for speech and pen input integration in multimodal learning interfaces. In Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Atlanta, GA.

Digital Library

[22]

Wahlster, W. 1991. User and discourse models for multimodal communication. In J. Sullivan and S. Tyler, editors, Intelligent User Interfaces, ACM Press, Addison Wesley Publishing Co., New York, New York.

Digital Library

[23]

Wauchope, K. 1994. Eucalyptus: Integrating natural language input with a graphical user interface. Naval Research Laboratory, Report NRL/FR/5510-94-9711.

Cited By

Rodriguez RSullivan BBarrera Machuca MBatmaz ATornatzky COrtega F(2024)An Artists' Perspectives on Natural Interactions for Virtual Reality 3D SketchingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642758(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642758
Williams AOrtega F(2020)Understanding Gesture and Speech Multimodal Interactions for Manipulation Tasks in Augmented Reality Using Unconstrained ElicitationProceedings of the ACM on Human-Computer Interaction10.1145/34273304:ISS(1-21)Online publication date: 4-Nov-2020
https://dl.acm.org/doi/10.1145/3427330
Wolf EKlüber SZimmerer CLugrin JLatoschik M(2019)”Paint that object yellow”: Multimodal Interaction to Enhance Creativity During Design Tasks in VR2019 International Conference on Multimodal Interaction10.1145/3340555.3353724(195-204)Online publication date: 14-Oct-2019
https://dl.acm.org/doi/10.1145/3340555.3353724
Show More Cited By

Unification-based multimodal integration
1. Human-centered computing

Recommendations

A framework for multimodal integration
An efficient unification-based multimodal language processor in multimodal input fusion
OZCHI '07: Proceedings of the 19th Australasian conference on Computer-Human Interaction: Entertaining User Interfaces

A Multimodal User Interface (MMUI) allows a user to interact with a computer in a way similar to human-to-human communication, for example, through speech and gesture. Being an essential component in MMUIs, Multimodal Input Fusion should be able to find ...
Prediction of Various Backchannel Utterances Based on Multimodal Information
IVA '23: Proceedings of the 23rd ACM International Conference on Intelligent Virtual Agents

The listener's backchannels are an important part of dialogues. With appropriate backchannels, people are able to smoothly promote dialogues. Thus, backchannels are considered to be important in dialogues between not only humans but also humans and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACL '98/EACL '98: Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics

July 1997

543 pages

Program Chairs:
Philip R. Cohen
Oregon Graduate Institute
,
Wolfgang Wahlster
DFKI Saarbrücken, Germany

Sponsors

Directorate General XIII (European Commission)
Universidad Complutense de Madrid
Universidad Autónoma de Madrid
Universidad Nacional de Educación a Distancia
Universidad Politécnica de Madrid

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 07 July 1997

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

66
Total Citations
View Citations
1,042
Total Downloads

Downloads (Last 12 months)76
Downloads (Last 6 weeks)15

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rodriguez RSullivan BBarrera Machuca MBatmaz ATornatzky COrtega F(2024)An Artists' Perspectives on Natural Interactions for Virtual Reality 3D SketchingProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642758(1-20)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642758
Williams AOrtega F(2020)Understanding Gesture and Speech Multimodal Interactions for Manipulation Tasks in Augmented Reality Using Unconstrained ElicitationProceedings of the ACM on Human-Computer Interaction10.1145/34273304:ISS(1-21)Online publication date: 4-Nov-2020
https://dl.acm.org/doi/10.1145/3427330
Wolf EKlüber SZimmerer CLugrin JLatoschik M(2019)”Paint that object yellow”: Multimodal Interaction to Enhance Creativity During Design Tasks in VR2019 International Conference on Multimodal Interaction10.1145/3340555.3353724(195-204)Online publication date: 14-Oct-2019
https://dl.acm.org/doi/10.1145/3340555.3353724
Johnston M(2019)Multimodal integration for interactive conversational systemsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233798(21-76)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3233795.3233798
(2019)IntroductionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233797(1-20)Online publication date: 1-Jul-2019
https://dl.acm.org/doi/10.1145/3233795.3233797
Kennington CHan TSchlangen DLank EVinciarelli AHoggan ESubramanian SBrewster S(2017)Temporal alignment using the incremental unit frameworkProceedings of the 19th ACM International Conference on Multimodal Interaction10.1145/3136755.3136769(297-301)Online publication date: 3-Nov-2017
https://dl.acm.org/doi/10.1145/3136755.3136769
Cohen POviatt S(2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
https://dl.acm.org/doi/10.1145/3015783.3015795
Cohen PKaiser EBuchanan MLind SCorrigan MWesson R(2015)Sketch-Thru-PlanCommunications of the ACM10.1145/273558958:4(56-65)Online publication date: 23-Mar-2015
https://dl.acm.org/doi/10.1145/2735589
Pui-Yu Hui Meng H(2014)Latent Semantic Analysis for Multimodal User Input With Speech and GesturesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2013.229458622:2(417-429)Online publication date: 1-Feb-2014
https://dl.acm.org/doi/10.1109/TASLP.2013.2294586
Latoschik MFischbach M(2014)Engineering Variance16th International Conference on Human-Computer Interaction. Theories, Methods, and Tools - Volume 851010.1007/978-3-319-07233-3_29(308-319)Online publication date: 22-Jun-2014
https://dl.acm.org/doi/10.1007/978-3-319-07233-3_29
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten