More Web Proxy on the site http://driver.im/

Article

Free access

Finite-state multimodal parsing and understanding

Authors:

Michael Johnston,

Srinivas BangaloreAuthors Info & Claims

COLING '00: Proceedings of the 18th conference on Computational linguistics - Volume 1

Pages 369 - 375

https://doi.org/10.3115/990820.990874

Published: 31 July 2000 Publication History

Abstract

Multimodal interfaces require effective parsing and understanding of utterances whose content is distributed across multiple input modes. Johnston 1998 presents an approach in which strategies for multimodal integration are stated declaratively using a unification-based grammar that is used by a multi-dimensional chart parser to compose inputs. This approach is highly expressive and supports a broad class of interfaces, but offers only limited potential for mutual compensation among the input modes, is subject to significant concerns in terms of computational complexity, and complicates selection among alternative multimodal interpretations of the input. In this paper, we present an alternative approach in which multimodal parsing and understanding are achieved using a weighted finite-state device which takes speech and gesture streams as inputs and outputs their joint interpretation. This approach is significantly more efficient, enables tight-coupling of multimodal understanding with speech recognition, and provides a general probabilistic framework for multimodal ambiguity resolution.

References

[1]

Steven Abney. 1991. Parsing by chunks. In Robert Berwick, Steven Abney, and Carol Tenny, editors, Principle-based parsing. Kluwer Academic Publishers.

[2]

Srinivas Bangalore and Giuseppe Riccardi. 2000. Stochastic finite-state models for spoken language machine translation. In Proceedings of the Workshop on Embedded Machine Translation Systems.

Digital Library

[3]

Srinivas Bangalore. 1997. Complexity of Lexical Descriptions and its Relevance to Partial Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, August.

Digital Library

[4]

Robert A. Bolt. 1980. "put-that-there": voice and gesture at the graphics interface. Computer Graphics, 14(3):262--270.

Digital Library

[5]

Bruce Buntschuh, C. Kamm, G. DiFabbrizio, A. Abella, M. Mohri, S. Narayanan, I. Zeljkovic, R. D. Sharp, J. Wright, S. Marcus, J. Shaffer, R. Duncan, and J. G. Wilpon. 1998. Vpq: A spoken language interface to large scale directory information. In Proceedings of ICSLP, Sydney, Australia.

[6]

Robert Carpenter. 1992. The logic of typed feature structures. Cambridge University Press, England.

Digital Library

[7]

Philip R. Cohen, M. Johnston, D. McGee, S. L. Oviatt, J. Pittman, I. Smith, L. Chen, and J. Clow. 1998. Multimodal interaction for distributed interactive simulation. In M. Maybury and W. Wahlster, editors, Readings in Intelligent Interfaces. Morgan Kaufmann Publishers.

Digital Library

[8]

Mark Johnson. 1998. Finite-state approximation of constraint-based grammars using left-corner grammar transforms. In Proceedings of COLING-ACL, pages 619--623, Montreal, Canada.

Digital Library

[9]

Michael Johnston and Srinivas Bangalore. 2000. Tight-coupling of multimodal language processing with speech recognition. Technical report, AT&T Labs---Research.

[10]

Michael Johnston, P. R. Cohen, D. McGee, S. L. Oviatt, J. A. Pittman, and I. Smith. 1997. Unification-based multimodal integration. In Proceedings of the 35th ACL, pages 281--288, Madrid, Spain.

Digital Library

[11]

Michael Johnston. 1998a. Multimodal language processing. In Proceedings of ICSLP, Sydney, Australia.

[12]

Michael Johnston. 1998b. Unification-based multi-modal parsing. In Proceedings of COLING-ACL, pages 624--630, Montreal, Canada.

Digital Library

[13]

Aravind Joshi and Philip Hopely. 1997. A parser from antiquity. Natural Language Engineering, 2(4).

Digital Library

[14]

Ronald M. Kaplan and M. Kay. 1994. Regular models of phonological rule systems. Computational Linguistics, 20(3):331--378.

Digital Library

[15]

K. K. Koskenniemi. 1984. Two-level morphology: a general computation model for word-form recognition and production. Ph.D. thesis, University of Helsinki.

[16]

Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. 1998. A rational design for a weighted finite-state transducer library. Number 1436 in Lecture notes in computer science. Springer, Berlin; New York.

Digital Library

[17]

Mehryar Mohri. 1997. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269--312.

Digital Library

[18]

J. G. Neal and S. C. Shapiro. 1991. Intelligent multimedia interface technology. In J. W. Sullivan and S. W. Tyler, editors, Intelligent User Interfaces, pages 45--68. ACM Press, Addison Wesley, New York.

Digital Library

[19]

Sharon L. Oviatt. 1997. Multimodal interactive maps: Designing for human performance. In Human-Computer Interaction, pages 93--129.

Digital Library

[20]

Sharon L. Oviatt. 1999. Mutual disambiguation of recognition errors in a multimodal architecture. In CHI '99, pages 576--583. ACM Press, New York.

Digital Library

[21]

Fernando C. N. Pereira and Michael D. Riley. 1997. Speech recognition by composition of weighted finite automata. In E. Roche and Schabes Y., editors, Finite State Devices for Natural Language Processing, pages 431--456. MIT Press, Cambridge, Massachusetts.

[22]

Giuseppe Riccardi, R. Pieraccini, and E. Bocchieri. 1996. Stochastic Automata for Language Modeling. Computer Speech and Language, 10(4):265--293.

[23]

Emmanuel Roche, 1999. Finite state transducers: parsing free and frozen sentences. In András Kornai, editor, Extended Finite State Models of Language. Cambridge University Press.

Digital Library

[24]

A. L. Rosenberg. 1964. On n-tape finite state acceptors. FOCS, pages 76--81.

[25]

Lizhong Wu, Sharon L. Oviatt, and Philip R. Cohen. 1999. Multimodal integration---a statistical view. IEEE Transactions on Multimedia, 1(4):334--341, December.

Digital Library

Cited By

Katsamanis APitsikalis VTheodorakis SMaragos P(2017)Multimodal gesture recognitionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015796(449-487)Online publication date: 24-Apr-2017
Cohen POviatt S(2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
(2017)The Handbook of Multimodal-Multisensor InterfacesundefinedOnline publication date: 24-Apr-2017
Show More Cited By

Finite-state multimodal parsing and understanding
1. Hardware
  1. Power and energy
    1. Power estimation and optimization
2. Theory of computation

Recommendations

Finite-state multimodal integration and understanding

Multimodal interfaces are systems that allow input and/or output to be conveyed over multiple channels such as speech, graphics, and gesture. In addition to parsing and understanding separate utterances from different modes such as speech or gesture, ...
LLLR parsing
SAC '13: Proceedings of the 28th Annual ACM Symposium on Applied Computing

The idea of an LLLR parsing is presented. An LLLR(k) parser can be constructed for any LR(k) grammar but it produces the left parse of the input string in linear time (in respect to the length of the derivation) without backtracking. If used as a basis ...
Robust understanding in multimodal interfaces

Multimodal grammars provide an effective mechanism for quickly creating integration and understanding capabilities for interactive systems supporting simultaneous use of multiple input modalities. However, like other approaches based on hand-crafted ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

COLING '00: Proceedings of the 18th conference on Computational linguistics - Volume 1

July 2000

616 pages

ISBN:155860717X

Program Chair:
Martin Kay

Sponsors

DFKI: DFKI GmbH
Ministète de la Recherche Français
Deutsche Forschungsgemeinschaft
Loria
Centre Universitaire de Luxembourg
Universität des Saarlandes
Université; Nancy 2
Ministerium für Bildung, Kultur und Wissenschaft des Saarlandes

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 31 July 2000

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 1,537 of 1,537 submissions, 100%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

39
Total Citations
View Citations
610
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)4

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Katsamanis APitsikalis VTheodorakis SMaragos P(2017)Multimodal gesture recognitionThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015796(449-487)Online publication date: 24-Apr-2017
Cohen POviatt S(2017)Multimodal speech and pen interfacesThe Handbook of Multimodal-Multisensor Interfaces10.1145/3015783.3015795(403-447)Online publication date: 24-Apr-2017
(2017)The Handbook of Multimodal-Multisensor InterfacesundefinedOnline publication date: 24-Apr-2017
Peters SJohanssen JBruegge BNakano YAndré ENishida TMorency LBusso CPelachaud C(2016)An IDE for multimodal controls in smart buildingsProceedings of the 18th ACM International Conference on Multimodal Interaction10.1145/2993148.2993162(61-65)Online publication date: 31-Oct-2016
Pui-Yu Hui Meng H(2014)Latent Semantic Analysis for Multimodal User Input With Speech and GesturesIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2013.229458622:2(417-429)Online publication date: 1-Feb-2014
Kulkarni RJain KBansal HBangalore SCarl MBednarik RHuang HJokinen KNakano Y(2013)Mutual disambiguation of eye gaze and speech for sight translation and readingProceedings of the 6th workshop on Eye gaze in intelligent human machine interaction: gaze in multimodal interaction10.1145/2535948.2535953(35-40)Online publication date: 13-Dec-2013
Hui PLo WMeng HRich CYang QCavazza MZhou M(2010)Usage patterns and latent semantic analyses for task goal inference of multimodal user interactionsProceedings of the 15th international conference on Intelligent user interfaces10.1145/1719970.1719989(129-138)Online publication date: 7-Feb-2010
D'Andrea AD'Ulizia AFerri FGrifoni P(2009)A multimodal pervasive framework for ambient assisted livingProceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments10.1145/1579114.1579153(1-8)Online publication date: 9-Jun-2009
Eisenstein JBarzilay RDavis R(2008)Gesture salience as a hidden variable for coreference resolution and keyframe extractionJournal of Artificial Intelligence Research10.5555/1622655.162266631:1(353-398)Online publication date: 1-Feb-2008
Bangalore SJohnston MDigalakis VPotamianos ATurk MPieraccini RIvanov Y(2008)Robust gesture processing for multimodal interactionProceedings of the 10th international conference on Multimodal interfaces10.1145/1452392.1452439(225-232)Online publication date: 20-Oct-2008
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents