[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/990820.990874dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free access

Finite-state multimodal parsing and understanding

Published: 31 July 2000 Publication History

Abstract

Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a multi-dimensional chart parser to compose inputs. This approach is highly expressive and supports a broad class of interfaces, but offers only limited potential for mutual compensation among the input modes, is subject to significant concerns in terms of computational complexity, and complicates selection among alternative multimodal interpretations of the input. In this paper, we present an alternative approach in which multimodal parsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. This approach is significantly more efficient, enables tight-coupling of multimodal understanding with speech recognition, and provides a general probabilistic framework for multimodal ambiguity resolution.

References

[1]
Steven Abney. 1991. Parsing by chunks. In Robert Berwick, Steven Abney, and Carol Tenny, editors, Principle-based parsing. Kluwer Academic Publishers.
[2]
Srinivas Bangalore and Giuseppe Riccardi. 2000. Stochastic finite-state models for spoken language machine translation. In Proceedings of the Workshop on Embedded Machine Translation Systems.
[3]
Srinivas Bangalore. 1997. Complexity of Lexical Descriptions and its Relevance to Partial Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, August.
[4]
Robert A. Bolt. 1980. "put-that-there": voice and gesture at the graphics interface. Computer Graphics, 14(3):262--270.
[5]
Bruce Buntschuh, C. Kamm, G. DiFabbrizio, A. Abella, M. Mohri, S. Narayanan, I. Zeljkovic, R. D. Sharp, J. Wright, S. Marcus, J. Shaffer, R. Duncan, and J. G. Wilpon. 1998. Vpq: A spoken language interface to large scale directory information. In Proceedings of ICSLP, Sydney, Australia.
[6]
Robert Carpenter. 1992. The logic of typed feature structures. Cambridge University Press, England.
[7]
Philip R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. 1998. Multimodal interaction for distributed interactive simulation. In M. Maybury and W. Wahlster, editors, Readings in Intelligent Interfaces. Morgan Kaufmann Publishers.
[8]
Mark Johnson. 1998. Finite-state approximation of constraint-based grammars using left-corner grammar transforms. In Proceedings of COLING-ACL, pages 619--623, Montreal, Canada.
[9]
Michael Johnston and Srinivas Bangalore. 2000. Tight-coupling of multimodal language processing with speech recognition. Technical report, AT&T Labs---Research.
[10]
Michael Johnston, P. R. Cohen, D. McGee, S. L. Oviatt, J. A. Pittman, and I. Smith. 1997. Unification-based multimodal integration. In Proceedings of the 35th ACL, pages 281--288, Madrid, Spain.
[11]
Michael Johnston. 1998a. Multimodal language processing. In Proceedings of ICSLP, Sydney, Australia.
[12]
Michael Johnston. 1998b. Unification-based multi-modal parsing. In Proceedings of COLING-ACL, pages 624--630, Montreal, Canada.
[13]
Aravind Joshi and Philip Hopely. 1997. A parser from antiquity. Natural Language Engineering, 2(4).
[14]
Ronald M. Kaplan and M. Kay. 1994. Regular models of phonological rule systems. Computational Linguistics, 20(3):331--378.
[15]
K. K. Koskenniemi. 1984. Two-level morphology: a general computation model for word-form recognition and production. Ph.D. thesis, University of Helsinki.
[16]
Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. 1998. A rational design for a weighted finite-state transducer library. Number 1436 in Lecture notes in computer science. Springer, Berlin; New York.
[17]
Mehryar Mohri. 1997. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269--312.
[18]
J. G. Neal and S. C. Shapiro. 1991. Intelligent multimedia interface technology. In J. W. Sullivan and S. W. Tyler, editors, Intelligent User Interfaces, pages 45--68. ACM Press, Addison Wesley, New York.
[19]
Sharon L. Oviatt. 1997. Multimodal interactive maps: Designing for human performance. In Human-Computer Interaction, pages 93--129.
[20]
Sharon L. Oviatt. 1999. Mutual disambiguation of recognition errors in a multimodal architecture. In CHI '99, pages 576--583. ACM Press, New York.
[21]
Fernando C. N. Pereira and Michael D. Riley. 1997. Speech recognition by composition of weighted finite automata. In E. Roche and Schabes Y., editors, Finite State Devices for Natural Language Processing, pages 431--456. MIT Press, Cambridge, Massachusetts.
[22]
Giuseppe Riccardi, R. Pieraccini, and E. Bocchieri. 1996. Stochastic Automata for Language Modeling. Computer Speech and Language, 10(4):265--293.
[23]
Emmanuel Roche, 1999. Finite state transducers: parsing free and frozen sentences. In András Kornai, editor, Extended Finite State Models of Language. Cambridge University Press.
[24]
A. L. Rosenberg. 1964. On n-tape finite state acceptors. FOCS, pages 76--81.
[25]
Lizhong Wu, Sharon L. Oviatt, and Philip R. Cohen. 1999. Multimodal integration---a statistical view. IEEE Transactions on Multimedia, 1(4):334--341, December.

Cited By

View all
  • (2017)Multimodal gesture recognitionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015796(449-487)Online publication date: 24-Apr-2017
  • (2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
  • (2017)The Handbook of Multimodal-Multisensor InterfacesundefinedOnline publication date: 24-Apr-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
COLING '00: Proceedings of the 18th conference on Computational linguistics - Volume 1
July 2000
616 pages
ISBN:155860717X

Sponsors

  • DFKI: DFKI GmbH
  • Ministète de la Recherche Français
  • Deutsche Forschungsgemeinschaft
  • Loria
  • Centre Universitaire de Luxembourg
  • Universität des Saarlandes
  • Université; Nancy 2
  • Ministerium für Bildung, Kultur und Wissenschaft des Saarlandes

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 31 July 2000

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 1,537 of 1,537 submissions, 100%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)4
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2017)Multimodal gesture recognitionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015796(449-487)Online publication date: 24-Apr-2017
  • (2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
  • (2017)The Handbook of Multimodal-Multisensor InterfacesundefinedOnline publication date: 24-Apr-2017
  • (2016)An IDE for multimodal controls in smart buildingsProceedings of the 18th ACM International Conference on Multimodal Interaction10.1145/2993148.2993162(61-65)Online publication date: 31-Oct-2016
  • (2014)Latent Semantic Analysis for Multimodal User Input With Speech and GesturesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2013.229458622:2(417-429)Online publication date: 1-Feb-2014
  • (2013)Mutual disambiguation of eye gaze and speech for sight translation and readingProceedings of the 6th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction10.1145/2535948.2535953(35-40)Online publication date: 13-Dec-2013
  • (2010)Usage patterns and latent semantic analyses for task goal inference of multimodal user interactionsProceedings of the 15th international conference on Intelligent user interfaces10.1145/1719970.1719989(129-138)Online publication date: 7-Feb-2010
  • (2009)A multimodal pervasive framework for ambient assisted livingProceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments10.1145/1579114.1579153(1-8)Online publication date: 9-Jun-2009
  • (2008)Gesture salience as a hidden variable for coreference resolution and keyframe extractionJournal of Artificial Intelligence Research10.5555/1622655.162266631:1(353-398)Online publication date: 1-Feb-2008
  • (2008)Robust gesture processing for multimodal interactionProceedings of the 10th international conference on Multimodal interfaces10.1145/1452392.1452439(225-232)Online publication date: 20-Oct-2008
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media