ΦDmDialog: A speech-to-speech dialogue translation system

Hiroaki Kitano^1,2

87 Accesses
9 Citations
Explore all metrics

Abstract

Development of interpreting telephony or a speech-to-speech translation system is one of the ultimate goals of speech recognition, natural language processing, artificial intelligence and machine translation. This paper describes ΦDmDialog, a speech-to-speech dialogue translation system. ΦDmDialog is one of the first experimental systems to perform speech-to-speech translation and the first to demonstrate the possibility of simultaneous interpretation. The hybrid parallel system integrates parallel marker-passing and connectionist networks. Other characteristics of the system include a simultaneous interpretation capability, mixed-initiative discourse understanding, cost-based ambiguity resolution and an integration of case-based and constraint-based processing. ΦDmDialog is implemented and has been publicly demonstrated since March 1989. The current implementation translates Japanese into English and operates on the ATR's conference registration domain. Massively parallel implementations have been carried out on various machines and attained high performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Speech and Dialogue Technologies, Assets for the Multilingual Digital Single Market

Multilingualization of Speech Processing

WebTransc — A WWW Interface for Speech Corpora Production and Processing

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Crain, S., and M. Steedman. 1985. On Not Being Led Up The Garden Path: The Use of Context by the Psychological Syntax Processor. In D. Dowty, L. Karttunen and A. Zwicky, (eds.), Natural Language Parsing, Cambridge: Cambridge University Press, 320–358.
Google Scholar
De Smedt, K. 1990. Incremental Sentence Generation. NICI Technical Report 90-01, Nijmegen Institute for Cognition Research and Information Technology.
Ford, M., J. Bresnan and R. Kaplan. 1981. A Competence-Based Theory of Syntactic Closure. In J. Bresnan (ed.), The Mental Representation of Grammatical Relations, Cambridge, MA: MIT Press, 727–796.
Google Scholar
Goodman, K., and S. Nirenburg, Forthcoming. KBMT-89: A Case Study in Knowledge-Based Machine Translation. San Mateo, CA: Morgan Kaufmann.
Grosz, B., and C. Sidner. 1985. The Structure of Discourse Structure. CSLI Report 85-39.
Higuchi, T., T. Furuya, H. Kusumoto, K. Handa and A. Kokubu. 1990. IXM2: A Parallel Associate Processor for Semantic Net Processing. In Proceedings of the International Conference on Tools for Artificial Intelligence.
Hovy, E.H. 1988 Generating Natural Language Under Pragmatic Constraints. Hillsdale, N.J.: Erlbaum.
Google Scholar
Kaplan, R., and J. Bresnan 1982. Lexical-Functional Grammar: A Formal System for Grammatical Representation. In J. Bresnan (ed.), The Mental Representation of Grammatical Relations, Cambridge, MA: MIT Press, 173–281.
Google Scholar
Kempen, G., and E. Hoenkamp. 1987. An Incremental Procedural Grammar for Sentence Formulation. Cognitive Science 11: 201–258.
Google Scholar
Kita, K., T. Kawabata and H. Saito. 1989. HMM Continuous Speech Recognition using Predictive LR Parsing. In Proceedings of Icassp-ieee, International Conference on Acoustic, Speech, and Signal Processing.
Kitano, H. 1989. Hybrid Parallelism: A Case of Speech-to-Speech Dialogue Translation. In Proceedings of Ijcai-89 Workshop on Parallel Algorithms for Machine Intelligence, Detroit.
Kitano, H. 1990a. Parallel Incremental Sentence Production for a Model of Simultaneous Interpretation. In R. Dale, C. Mellish and M. Zock, (eds.), Current Research in Natural Language Generation, London: Academic Press, 321–351.
Google Scholar
Kitano, H. 1990b. The Making of a Speech-to-Speech Dialogue Translation System: Some Findings from the ΦDmDialog Project. In Proceedings of International Conference on Spoken Language Processing, Kobe.
Kitano, H. 1990c. Incremental Sentence Production with a Parallel Marker-Passing Algorithm. In Proceedings of Coling-90, Helsinki, 217–222.
Litman, D., and J. Allen. 1987. A Plan Recognition Model for Subdialogues in Conversation. Cognitive Science 11: 163–200.
Google Scholar
Moldovan, D., W. Lee and C. Lin. 1989. Snap: A Marker-Propagation Architecture for Knowledge Processing. Department of Electrical Engineering Systems, University of Southern California, CENG 89-10.
Morii, S., K. Niyada, S. Fujii and M. Hoshimi. 1985. Large Vocabulary Speaker-Independent Japanese Speech Recognition System. In Proceedings of Icssp-ieee International Conference on Acoustics, Speech, and Signal Processing.
Morimoto, T., H. Iida, A. Kurematsu, K. Shikano and T. Aizawa. 1990. Spoken Language Translation: Toward Realizing an Automatic Telephone Interpretation System. Proceedings of InfoJapan-90, Tokyo.
Nirenburg, S., V. Lesser and E. Nyberg. 1989. Controlling a Language Generation Planner. In Proceedings of Ijcai-89, Detroit, 1524–1530.
Oviatt, S., P. Cohen and A. Podlozny. 1990. Spoken Language in Interpreted Telephone Dialogues. SRI International, Technical Note 496.
Pollard, C., and I. Sag. 1987. Information-based Syntax and Semantics. CSLI Lecture Notes No. 13.
Prather, P., and D. Swinney. 1988. Lexical Processing and Ambiguity Resolution: An Autonomous Processing in an Interactive Box. In S. Small et al. (eds.), Lexical Ambiguity Resolution, San Mateo, CA: Morgan Kaufmann, 289–310.
Google Scholar
Riesbeck, C., and C. Martin. 1985. Direct Memory Access Parsing. Yale University Report 354.
Riesbeck, C., and R. Schank. 1989. Inside Case-Based Reasoning. Hillsdale, NJ: Erlbaum.
Google Scholar
Saito, H., and M. Tomita. 1988. Parsing Noisy Sentences. In Proceedings of Coling-88, Budapest, 561–565.
Schank, R. 1982. Dynamic Memory: A Theory of Learning in Computer and People. Cambridge: Cambridge University Press.
Google Scholar
Stanfill, C., and D. Waltz. 1986. Toward Memory-Based Reasoning. Communications of the ACM 29: 1213–1228.
Google Scholar
Thinking Machines Corporation. 1989. Model CM-2 Technical Summary. Technical Report TR89-1.
Tomabechi, H. 1987. Direct Memory Access Translation. In Proceedings of Ijcai-87, Milan, 722–725.
Tomabechi, H., H. Saito and M. Tomita. 1989. SpeechTrans: An Experimental Real-Time Speech-to-Speech Translation. In Proceedings of the 1989 Spring Symposium of the American Association for Artificial Intelligence.
Tsujii, J. 1985. The Roles of Dictionaries in Machine Translation (in Japanese). Jouhousyori 26(10). [“Information Proceesing,” journal of the Information Processing Society of Japan.]
Viterbi, A.J. 1967. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory IT-13(2): 260–269.
Google Scholar
Waibel, A., T. Hanazawa, G. Hinton, K. Shikano and K. Lang. 1989. Phoneme Recognition Using Time-Delay Neural Networks. Ieee Transactions on Acoustics, Speech and Signal Processing.
Waltz, D.L., and J.B. Pollack. 1985. Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation. Cognitive Science 9: 51–74.
Google Scholar
Webber, B. 1983. So What Can We Talk About Now? In M. Brady and P. Berwick (eds.), Computational Models of Discourse, Cambridge, MA: MIT Press, 331–371.
Google Scholar
Young, S., W. Ward and A. Hauptmann. 1989. Layering Predictions: Flexible Use of Dialogue Expectation in Speech Recognition. In Proceedings of Ijcai-89, Detroit, 1543–1549.

Download references

Author information

Authors and Affiliations

Center for Machine Translation, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Hiroaki Kitano
NEC Corporation, Tokyo, Japan
Hiroaki Kitano

Authors

Hiroaki Kitano
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Part of the work reported here (the massively parallel implementation) is supported by the National Science Foundation under grant MIP-90/09109. I want to thank Hideto Tomabechi, Teruko Mitamura, Lori Levin, Masaru Tomita, Jaime Carbonell, Alex Waibel, James McClelland and Hitoshi Iida for fruitful discussions and continuing support; also, Testuya Higuchi for IXM2 implementation and Dan Moldovan and members of the SNAP project for SNAP implementations. Several anonymous referees for this journal offered helpful suggestions. I would like to express special thanks to Mitsuko Saito who trained me as a simultaneous interpreter. Without my intuitions as a simultaneous interpreter, this work would not have started.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kitano, H. ΦDmDialog: A speech-to-speech dialogue translation system. Mach Translat 5, 301–338 (1990). https://doi.org/10.1007/BF00376645

Download citation

Issue Date: December 1990
DOI: https://doi.org/10.1007/BF00376645

ΦDmDialog: A speech-to-speech dialogue translation system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech and Dialogue Technologies, Assets for the Multilingual Digital Single Market

Multilingualization of Speech Processing

WebTransc — A WWW Interface for Speech Corpora Production and Processing

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ΦDmDialog: A speech-to-speech dialogue translation system

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speech and Dialogue Technologies, Assets for the Multilingual Digital Single Market

Multilingualization of Speech Processing

WebTransc — A WWW Interface for Speech Corpora Production and Processing

Explore related subjects

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation