[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/964442.964457acmconferencesArticle/Chapter ViewAbstractPublication PagesiuiConference Proceedingsconference-collections
Article

A probabilistic approach to reference resolution in multimodal user interfaces

Published: 13 January 2004 Publication History

Abstract

Multimodal user interfaces allow users to interact with computers through multiple modalities, such as speech, gesture, and gaze. To be effective, multimodal user interfaces must correctly identify all objects which users refer to in their inputs. To systematically resolve different types of references, we have developed a probabilistic approach that uses a graph-matching algorithm. Our approach identifies the most probable referents by optimizing the satisfaction of semantic, temporal, and contextual constraints simultaneously. Our preliminary user study results indicate that our approach can successfully resolve a wide variety of referring expressions, ranging from simple to complex and from precise to ambiguous ones.

References

[1]
Bolt, R.A. Put that there: Voice and Gesture at the Graphics Interface. Computer Graphics, 1980, 14(3): 262--270.
[2]
Cassell, J., Bickmore, T., Billinghurst, M., Campbell, L., Chang, K., Vilhjalmsson, H. and Yan, H. Embodiment in Conversational Interfaces: Rea. In Proceedings of the CHI'99 Conference, 1999, pp. 520--527. Pittsburgh, PA.
[3]
Chai, J., Pan, S., Zhou, M., and Houck, K. Context-based Multimodal Interpretation in Conversational Systems. Fourth International Conference on Multimodal Interfaces, 2002.
[4]
Cohen, P., Johnston, M., McGee, D., Oviatt, S., Pittman, J., Smith, I., Chen, L., and Clow, J. Quickset: Multimodal Interaction for Distributed Applications. Proceedings of ACM Multimedia, 1996. pp. 31--40.
[5]
Gold, S. and Rangarajan, A. A graduated assignment algorithm for graph-matching. IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 18, no. 4, 1996.
[6]
Grosz, B. J. and Sidner, C. Attention, intention, and the structure of discourse. Computational Linguistics, 12(3):175--204. 1986.
[7]
Gustafson, J., Bell, L., Beskow, J., Boye J., Carlson, R., Edlund, J., Granstrom, B., House D., and Wiren, M. AdApt -- a Multimodal Conversational Dialogue System in an Apartment Domain. Proceedings of 6th International Conference on Spoken Language Processing (ICSLP), 2000.
[8]
Huls, C., Bos, E., and Classen, W. 1995. Automatic Referent Resolution of Deictic and Anaphoric Expressions. Computational Linguistics, 21(1):59--79.
[9]
Johnston, M, Cohen, P., McGee, D., Oviatt, S., Pittman, J. and Smith, I. Unification-based Multimodal Integration, Proceedings of ACL'97, 1997.
[10]
Johnston, M. Unification-based Multimodal parsing, Proceedings of COLING-ACL'98, 1998.
[11]
Johnston, M. and Bangalore, S. Finite-state multimodal parsing and understanding. Proc. COLING'00. 2000.
[12]
Kehler, A. Cognitive Status and Form of Reference in Multimodal Human-Computer Interaction, Proceedings of AAAI'01, 2000, pp. 685--689.
[13]
Koons, D. B., Sparrell, C. J. and Thorisson, K. R. Integrating Simultaneous Input from Speech, Gaze, and Hand Gestures. In Intelligent Multimedia Interfaces, M. Maybury, Ed. MIT Press: Menlo Park, CA, 1993.
[14]
Neal, J. G., and Shapiro, S. C. Intelligent Multimedia Interface Technology. In Intelligent User Interfaces, J. Sullivan & S. Tyler, Eds. ACM: New York, 1991.
[15]
Neal, J. G., Thielman, C. Y., Dobes, Z. Haller, S. M., and Shapiro, S. C. Natural Language with Integrated Deictic and Graphic Gestures. Intelligent User Interfaces, M. Maybury and W. Wahlster (eds.), 38--51, 1998.
[16]
Oviatt, S. L. Multimodal interfaces for dynamic interactive maps. In Proceedings of Conference on Human Factors in Computing Systems: CHI '96, 1996, pp. 95--102.
[17]
Oviatt, S., DeAngeli, A., and Kuhn, K., Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction, In Proceedings of Conference on Human Factors in Computing Systems: CHI '97, 1997.
[18]
Oviatt, S. L. Mutual Disambiguation of Recognition Errors in a Multimodal Architecture. In Proceedings of Conference on Human Factors in Computing Systems: CHI '99.
[19]
Oviatt, S.L., Multimodal System Processing in Mobile Environments. In Proceedings of the Thirteenth Annual ACM Symposium on User Interface Software Technology (UIST'2000), 21-30. New York: ACM Press.
[20]
Stent, A., J. Dowding, J. M. Gawron, E. O. Bratt, and R. Moore, The commandtalk spoken dialog system. Proc. ACL'99, 1999, pages 183--190.
[21]
Stock, Oliviero, ALFRESCO: Enjoying the combination of natural language processing and hypermedia for information exploration. Intelligent Multimedia Interfaces, M. Maybury (ed.), 1993, pp. 197--224.
[22]
Tsai, W.H. and Fu, K.S. Error-correcting isomorphism of attributed relational graphs for pattern analysis. IEEE Trans. Sys., Man and Cyb., vol. 9, 1979, pp. 757--768.
[23]
Wahlster, W., User and Discourse Models for Multimodal Communication. Intelligent User Interfaces, M. Maybury and W. Wahlster (eds.), 1998, pp 359--370.
[24]
Zancanaro, M., Stock, O., and Strapparava, C. 1997. Multimodal Interaction for Information Access: Exploiting Cohesion. Computational Intelligence 13(7):439--464.
[25]
Zhou, M. X. and Pan, S. Automated authoring of coherent multimedia discourse for conversation systems. Proc. ACM MM'01, pages 555--559, 2001.

Cited By

View all
  • (2024)Hierarchical Rule-Base Reduction-Based ANFIS With Online Optimization Through DDPGIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.344914732:11(6350-6362)Online publication date: 1-Nov-2024
  • (2023)Recording multimodal pair-programming dialogue for reference resolution by conversational agentsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614231(731-735)Online publication date: 9-Oct-2023
  • (2022)Givenness Hierarchy Informed Optimal Document Planning for Situated Human-Robot Interaction2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS47612.2022.9981811(6109-6115)Online publication date: 23-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
IUI '04: Proceedings of the 9th international conference on Intelligent user interfaces
January 2004
396 pages
ISBN:1581138156
DOI:10.1145/964442
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. graph matching
  2. multimodal user interfaces
  3. reference resolution

Qualifiers

  • Article

Conference

IUI-CADUI04
IUI-CADUI04: Intelligent User Interface
January 13 - 16, 2004
Funchal, Madeira, Portugal

Acceptance Rates

IUI '04 Paper Acceptance Rate 72 of 140 submissions, 51%;
Overall Acceptance Rate 746 of 2,811 submissions, 27%

Upcoming Conference

IUI '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)0
Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Hierarchical Rule-Base Reduction-Based ANFIS With Online Optimization Through DDPGIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.344914732:11(6350-6362)Online publication date: 1-Nov-2024
  • (2023)Recording multimodal pair-programming dialogue for reference resolution by conversational agentsProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614231(731-735)Online publication date: 9-Oct-2023
  • (2022)Givenness Hierarchy Informed Optimal Document Planning for Situated Human-Robot Interaction2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS47612.2022.9981811(6109-6115)Online publication date: 23-Oct-2022
  • (2020)Robots That Use LanguageAnnual Review of Control, Robotics, and Autonomous Systems10.1146/annurev-control-101119-0716283:1(25-55)Online publication date: 3-May-2020
  • (2019)Multimodal integration for interactive conversational systemsThe Handbook of Multimodal-Multisensor Interfaces10.1145/3233795.3233798(21-76)Online publication date: 1-Jul-2019
  • (2017)Human-Computer Interaction: Overview on State of the ArtInternational Journal on Smart Sensing and Intelligent Systems10.21307/ijssis-2017-2831:1(137-159)Online publication date: 13-Dec-2017
  • (2017)A Serious Games Platform for Cognitive Rehabilitation with Preliminary EvaluationJournal of Medical Systems10.1007/s10916-016-0656-541:1(1-15)Online publication date: 1-Jan-2017
  • (2017)Computer Interfaces in Diagnostic Process of Industrial EngineeringDiagnostic Techniques in Industrial Engineering10.1007/978-3-319-65497-3_5(157-170)Online publication date: 21-Oct-2017
  • (2016)Collaborative Language Grounding Toward Situated Human‐Robot DialogueAI Magazine10.1609/aimag.v37i4.268437:4(32-45)Online publication date: 1-Dec-2016
  • (2016)MCBF: Multimodal Corpora Building FrameworkHuman Language Technology. Challenges for Computer Science and Linguistics10.1007/978-3-319-43808-5_14(177-190)Online publication date: 30-Jul-2016
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media