[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue

Published: 01 December 1999 Publication History

Abstract

Interactive spoken dialogue provides many new challenges for natural language understanding systems. One of the most critical challenges is simply determining the speaker's intended utterances: both segmenting a speaker's turn into utterances and determining the intended words in each utterance. Even assuming perfect word recognition, the latter problem is complicated by the occurrence of speech repairs, which occur where speakers go back and change (or repeat) something they just said. The words that are replaced or repeated are no longer part of the intended utterance, and so need to be identified. Segmenting turns and resolving repairs are strongly interwined with a third task: identifying discourse markers. Because of the interactions, and interactions with POS tagging and speech recognition, we need to address these tasks together and early on in the processing stream. This paper presents a statistical language model in which we redefine the speech recognition problem so that it includes the identification of POS tags, discourse markers, speech repairs, and intonational phrases. By solving these simultaneously, we obtain better results on each task than addressing them separately. Our model is able to identify 72% of turn-internal intonational boundaries with a precision of 71%, 97% of discourse markers with 96% precision, and detect and correct 66% of repairs with 74% precision.

References

[1]
Allen, James F., Lenhart K. Schubert, George Ferguson, Peter Heeman, Chung Hee Hwang, Tsuneaki Kato, Marc Light, Nathaniel Martin, Bradford Miller, Massimo Poesio, and David Traum. 1995. The Trains project: A case study in building a conversational planning agent. Journal of Experimental and Theoretical AI, 7:7--48.]]
[2]
Bahl, Lalit R., J. K. Baker, Frederick Jelinek, and Robert L. Mercer. 1977. Perplexity---A measure of the difficulty of speech recognition tasks. In Proceedings of the 94th Meeting of the Acoustical Society of America.]]
[3]
Bahl, Lalit R., Peter F. Brown, Peter V. deSouza, and Robert L. Mercer. 1989. A tree-based statistical language model for natural language speech recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(7):1001--1008.]]
[4]
Bard, Ellen G. and Robin J. Lickley. 1997. On not remembering disfluencies. In Proceedings of the 5th European Conference on Speech Communication and Technology, pages 2855--2858.]]
[5]
Beach, Cheryl M. 1991. The interpretation of prosodic patterns at points of syntactic structure ambiguity: Evidence for cue trading relations. Journal of Memory and Language, 30(6):644--663.]]
[6]
Bear, John, John Dowding, and Elizabeth E. Shriberg. 1992. Integrating multiple knowledge sources for detection and correction of repairs in human-computer dialogue. In Proceedings of the 30th Annual Meeting, pages 56--63. Association for Computational Linguistics.]]
[7]
Bear, John, John Dowding, Elizabeth E. Shriberg, and Patti Price. 1993. A system for labeling self-repairs in speech. Technical Note 522, SRI International.]]
[8]
Bear, John and Patti Price. 1990. Prosody, syntax, and parsing. In Proceedings of the 28th Annual Meeting, pages 17--22. Association for Computational Linguistics.]]
[9]
Black, Ezra, Fred Jelinek, John Lafferty, David Magerman, Robert Mercer, and Salim Roukos. 1992. Towards history-based grammars: Using richer models for probabilistic parsing. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 134--139. Morgan Kaufman.]]
[10]
Blackmer, Elizabeth R. and Janet L. Mitton. 1991. Theories of monitoring and the timing of repairs in spontaneous speech. Cognition, 39:173--194.]]
[11]
Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. 1984. Classification and Regression Trees. Wadsworth & Brooks.]]
[12]
Brown, Gillian and George Yule. 1983. Discourse Analysis. Cambridge University Press.]]
[13]
Brown, Peter F., Vincent J. Della Pietra, Peter V. deSouza, Jenifer C. Lai, and Robert L. Mercer. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18(4):467--479.]]
[14]
Charniak, Eugene, C. Hendrickson, N. Jacobson, and M. Perkowitz. 1993. Equations for part-of-speech tagging. In Proceedings of the National Conference on Artificial Intelligence, pages 784--789.]]
[15]
Chow, Yen-Lu and Richard Schwartz. 1989. The n-best algorithm: An efficient procedure for finding top n sentence hypotheses. In Proceedings of the DARPA Speech and Natural Language Workshop, pages 199--202.]]
[16]
Church, Kenneth. 1988. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of the 2nd Conference on Applied Natural Language Processing, pages 136--143.]]
[17]
Dowding, John, Jean M. Gawron, Doug Appelt, John Bear, Lynn Cherny, Robert Moore, and Douglas Moran. 1993. Gemini: A natural language system for spoken-language understanding. In Proceedings of the 31st Annual Meeting, pages 54--61. Association for Computational Linguistics.]]
[18]
Entropic Research Laboratory, Inc. 1994. Aligner Reference Manual. Version 1.3.]]
[19]
Godfrey, John J., Edward C. Holliman, and Jane McDaniel. 1992. SWITCHBOARD: Telephone speech corpus for research and development. In Proceedings of the International Conference on Audio, Speech and Signal Processing, pages 517--520.]]
[20]
Heeman, Peter A. 1997. Speech Repairs, Intonational Boundaries and Discourse Markers: Modeling Speakers' Utterances in Spoken Dialogue. Doctoral dissertation, Department of Computer Science, University of Rochester.]]
[21]
Heeman, Peter A. 1999. POS tags and decision trees for language modeling. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 129--137.]]
[22]
Heeman, Peter A. and James F. Allen. 1994. Detecting and correcting speech repairs. In Proceedings of the 32nd Annual Meeting, pages 295--302. Association for Computational Linguistics.]]
[23]
Heeman, Peter A. and James F. Allen. 1995. The Trains Spoken Dialog Corpus. CD-ROM, Linguistics Data Consortium.]]
[24]
Heeman, Peter A. and K. H. Loken-Kim. 1999. Detecting and correcting speech repairs in Japanese. In Proceedings of the ICPhS Satellite Meeting on Disfluency in Spontaneous Speech, pages 43--46.]]
[25]
Heeman, Peter A., K. H. Loken-Kim, and James F. Allen. 1996. Combining the detection and correction of speech repairs. In Proceedings of the 4th International Conference on Spoken Language Processing, pages 358--361.]]
[26]
Hindle, Donald. 1983. Deterministic parsing of syntactic non-fluencies. In Proceedings of the 21st Annual Meeting, pages 123--128. Association for Computational Linguistics.]]
[27]
Hirschberg, Julia and Diane Litman. 1993. Empirical studies on the disambiguation of cue phrases. Computational Linguistics, 19(3):501--530.]]
[28]
Jelinek, Frederick. 1985. Self-organized language modeling for speech recognition. Technical report, IBM T. J. Watson Research Center.]]
[29]
Jelinek, Frederick. and Robert L. Mercer. 1980. Interpolated estimation of Markov source parameters from sparse data. In Proceedings of the Workshop on Pattern Recognition in Practice, pages 381--397.]]
[30]
Katz, Slava M. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing, 35(3):400--401.]]
[31]
Kikui, Gen-ichiro and Tsuyoshi Morimoto. 1994. Similarity-based identification of repairs in Japanese spoken language. In Proceedings of the 3rd International Conference on Spoken Language Processing, pages 915--918.]]
[32]
Kompe, Ralf, Andreas Kießling, Heinrich Niemann, Elmar Nöth, E. Günter Schukat-Talamazzini, A. Zottmann, and Anton Batliner. 1995. Prosodic scoring of word hypothese graphs. In Proceedings of the 4th European Conference on Speech Communication and Technology, pages 1333--1336.]]
[33]
Levelt, Willem J. 1983. Monitoring and self-repair in speech. Cognition, 14:41--104.]]
[34]
Lickley, Robin J. and Ellen G. Bard. 1992. Processing disfluent speech: Recognizing disfluency before lexical access. In Proceedings of the 2nd International Conference on Spoken Language Processing, pages 935--938.]]
[35]
Lickley, Robin J., Richard C. Shillcock, and Ellen G. Bard. 1991. Processing disfluent speech: How and when are disfluencies found? In Proceedings of the 2nd European Conference on Speech Communication and Technology, pages 1499--1502.]]
[36]
Litman, Diane J. 1996. Cue phrase classification using machine learning. Journal of Artificial Intelligence Research, 5:53--94.]]
[37]
MADCOW. 1992. Multi-site data collection for a spoken language corpus. In Proceedings of the DARPA Workshop on Speech and Natural Language Processing, pages 7--14.]]
[38]
Magerman, David M. 1994. Natural Language Parsing as Statistical Pattern Recognition. Doctoral dissertation, Department of Computor Science, Stanford University.]]
[39]
Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313--330.]]
[40]
Martin, J. G. and W. Strange. 1968. The perception of hesitation in spontaneous speech. Perception and Psychophysics, 53:1--15.]]
[41]
Mast, Marion, Ralf Kompe, Stefan Harbeck, Andreas Kießling, Heinrich Niemann, Elmar Nöth, E. Günther Schukat-Talamazzini, and Volker Warnke. 1996. Dialogue act classification with the help of prosody. In Proceedings of the 4th International Conference on Spoken Language Processing, pages 1728--1731.]]
[42]
Meteer, Marie and Rukmini Iyer. 1996. Modeling conversational speech for speech recognition. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 33--47.]]
[43]
Nakatani, Christine H. and Julia Hirschberg. 1994. A corpus-based study of repair cues in spontaneous speech. Journal of the Acoustical Society of America, 95(3):1603--1616.]]
[44]
Nooteboom, S. G. 1980. Speaking and unspeaking: Detection and correction of phonological and lexical errors. In Victoria A. Fromkin, editor, Errors in Linguistic Performance: Slips of the Tongue, Ear, Pen, and Hand. Academic Press, pages 86--97.]]
[45]
O'Shaughnessy, Douglas. 1994. Correcting complex false starts in spontaneous speech. In Proceedings of the International Conference on Audio, Speech and Signal Processing, pages 349--352.]]
[46]
Ostendorf, Mari, Colin Wightman, and Nanette Veilleux. 1993. Parse scoring with prosodic information: An analysis/synthesis approach. Computer Speech and Language, 7(2):193--210.]]
[47]
Schiffrin, Deborah. 1987. Discourse Markers. Cambridge University Press.]]
[48]
Shriberg, Elizabeth E. 1994. Preliminaries to a Theory of Speech Disfluencies. Doctoral dissertation, University of California at Berkeley.]]
[49]
Shriberg, Elizabeth E., Rebecca Bates, and Andreas Stolcke. 1997. A prosody-only decision-tree model for disfluency detection. In Proceedings of the 5th European Conference on Speech Communication and Technology, pages 2383--2386.]]
[50]
Shriberg, Elizabeth E. and Robin J. Lickley. 1993. Intonation of clause-internal filled pauses. Phonetica, 50(3):172--179.]]
[51]
Silverman, Ken, Mary Beckman, John Pitrelli, Mari Ostendorf, Colin Wightman, Patti Price, Janet Pierrehumbert, and Julia Hirschberg. 1992. ToBI: A standard for labelling English prosody. In Proceedings of the 2nd International Conference on Spoken Language Processing, pages 867--870.]]
[52]
Siu, Man-hung and Mari Ostendorf. 1996. Modeling disfluencies in conversational speech. In Proceedings of the 4th International Conference on Spoken Language Processing, pages 382--391.]]
[53]
Stolcke, Andreas and Elizabeth E. Shriberg. 1996a. Automatic linguistic segmentation of conversational speech. In Proceedings of the 4th International Conference on Spoken Language Processing, pages 1001--1004.]]
[54]
Stolcke, Andreas and Elizabeth E. Shriberg. 1996b. Statistical language modeling for speech disfluencies. In Proceedings of the International Conference on Audio, Speech and Signal Processing, pages 405--408.]]
[55]
Traum, David R. and Peter A. Heeman. 1997. Utterance units in spoken dialogue. In Elisabeth Maier, Marion Mast, and Susann LuperFoy, editors, Dialogue Processing in Spoken Language Systems, Lecture Notes in Artificial Intelligence. Springer-Verlag, Heidelberg, pages 125--140.]]
[56]
Wang, Michelle Q. and Julia Hirschberg. 1992. Automatic classification of intonational phrase boundaries. Computer Speech and Language, 6:175--196.]]
[57]
Wightman, Colin and Mari Ostendorf. 1994. Automatic labeling of prosodic patterns. IEEE Transactions on Speech and Audio Processing, 2(4):469--481.]]
[58]
Zeppenfeld, Torsten, Michael Finke, Klaus Ries, Martin Westphal, and Alex Waibel. 1997. Recognition of conversational telephone speech using the Janus speech engine. In Proceedings of the International Conference on Audio, Speech and Signal Processing, pages 1815--1818.]]

Cited By

View all
  • (2023)“Do This Instead”—Robots That Adequately Respond to Corrected InstructionsACM Transactions on Human-Robot Interaction10.1145/362338513:3(1-23)Online publication date: 22-Sep-2023
  • (2019)ScratchThatProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33289343:2(1-17)Online publication date: 21-Jun-2019
  • (2016)The Semantics of CorrectionsProceedings of the 23rd International Workshop on Logic, Language, Information, and Computation - Volume 980310.1007/978-3-662-52921-8_22(358-374)Online publication date: 16-Aug-2016
  • Show More Cited By
  1. Speech repairs, intonational phrases, and discourse markers: modeling speakers' utterances in spoken dialogue

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Computational Linguistics
      Computational Linguistics  Volume 25, Issue 4
      December 1999
      180 pages
      ISSN:0891-2017
      EISSN:1530-9312
      Issue’s Table of Contents

      Publisher

      MIT Press

      Cambridge, MA, United States

      Publication History

      Published: 01 December 1999
      Published in COLI Volume 25, Issue 4

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)33
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 16 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)“Do This Instead”—Robots That Adequately Respond to Corrected InstructionsACM Transactions on Human-Robot Interaction10.1145/362338513:3(1-23)Online publication date: 22-Sep-2023
      • (2019)ScratchThatProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/33289343:2(1-17)Online publication date: 21-Jun-2019
      • (2016)The Semantics of CorrectionsProceedings of the 23rd International Workshop on Logic, Language, Information, and Computation - Volume 980310.1007/978-3-662-52921-8_22(358-374)Online publication date: 16-Aug-2016
      • (2011)Detecting structural events for assessing non-native speechProceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications10.5555/2043132.2043137(38-45)Online publication date: 24-Jun-2011
      • (2011)Interruption Point Detection of Spontaneous Speech Using Inter-Syllable Boundary-Based Prosodic FeaturesACM Transactions on Asian Language Information Processing10.1145/1929908.192991410:1(1-21)Online publication date: 1-Mar-2011
      • (2011)On the semantics and pragmatics of dysfluencyProceedings of the 18th Amsterdam colloquim conference on Logic, Language and Meaning10.1007/978-3-642-31482-7_33(321-330)Online publication date: 19-Dec-2011
      • (2010)Autism and interactional aspects of dialogueProceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/1944506.1944551(249-252)Online publication date: 24-Sep-2010
      • (2010)Cross-domain speech disfluency detectionProceedings of the 11th Annual Meeting of the Special Interest Group on Discourse and Dialogue10.5555/1944506.1944548(237-240)Online publication date: 24-Sep-2010
      • (2009)Word buffering models for improved speech repair parsingProceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 210.5555/1699571.1699609(737-745)Online publication date: 6-Aug-2009
      • (2009)Using integer linear programming for detecting speech disfluenciesProceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers10.5555/1620853.1620885(109-112)Online publication date: 31-May-2009
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Full Access

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media