Abstract
Development of interpreting telephony or a speech-to-speech translation system is one of the ultimate goals of speech recognition, natural language processing, artificial intelligence and machine translation. This paper describes ΦDmDialog, a speech-to-speech dialogue translation system. ΦDmDialog is one of the first experimental systems to perform speech-to-speech translation and the first to demonstrate the possibility of simultaneous interpretation. The hybrid parallel system integrates parallel marker-passing and connectionist networks. Other characteristics of the system include a simultaneous interpretation capability, mixed-initiative discourse understanding, cost-based ambiguity resolution and an integration of case-based and constraint-based processing. ΦDmDialog is implemented and has been publicly demonstrated since March 1989. The current implementation translates Japanese into English and operates on the ATR's conference registration domain. Massively parallel implementations have been carried out on various machines and attained high performance.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Crain, S., and M. Steedman. 1985. On Not Being Led Up The Garden Path: The Use of Context by the Psychological Syntax Processor. In D. Dowty, L. Karttunen and A. Zwicky, (eds.), Natural Language Parsing, Cambridge: Cambridge University Press, 320–358.
De Smedt, K. 1990. Incremental Sentence Generation. NICI Technical Report 90-01, Nijmegen Institute for Cognition Research and Information Technology.
Ford, M., J. Bresnan and R. Kaplan. 1981. A Competence-Based Theory of Syntactic Closure. In J. Bresnan (ed.), The Mental Representation of Grammatical Relations, Cambridge, MA: MIT Press, 727–796.
Goodman, K., and S. Nirenburg, Forthcoming. KBMT-89: A Case Study in Knowledge-Based Machine Translation. San Mateo, CA: Morgan Kaufmann.
Grosz, B., and C. Sidner. 1985. The Structure of Discourse Structure. CSLI Report 85-39.
Higuchi, T., T. Furuya, H. Kusumoto, K. Handa and A. Kokubu. 1990. IXM2: A Parallel Associate Processor for Semantic Net Processing. In Proceedings of the International Conference on Tools for Artificial Intelligence.
Hovy, E.H. 1988 Generating Natural Language Under Pragmatic Constraints. Hillsdale, N.J.: Erlbaum.
Kaplan, R., and J. Bresnan 1982. Lexical-Functional Grammar: A Formal System for Grammatical Representation. In J. Bresnan (ed.), The Mental Representation of Grammatical Relations, Cambridge, MA: MIT Press, 173–281.
Kempen, G., and E. Hoenkamp. 1987. An Incremental Procedural Grammar for Sentence Formulation. Cognitive Science 11: 201–258.
Kita, K., T. Kawabata and H. Saito. 1989. HMM Continuous Speech Recognition using Predictive LR Parsing. In Proceedings of Icassp-ieee, International Conference on Acoustic, Speech, and Signal Processing.
Kitano, H. 1989. Hybrid Parallelism: A Case of Speech-to-Speech Dialogue Translation. In Proceedings of Ijcai-89 Workshop on Parallel Algorithms for Machine Intelligence, Detroit.
Kitano, H. 1990a. Parallel Incremental Sentence Production for a Model of Simultaneous Interpretation. In R. Dale, C. Mellish and M. Zock, (eds.), Current Research in Natural Language Generation, London: Academic Press, 321–351.
Kitano, H. 1990b. The Making of a Speech-to-Speech Dialogue Translation System: Some Findings from the ΦDmDialog Project. In Proceedings of International Conference on Spoken Language Processing, Kobe.
Kitano, H. 1990c. Incremental Sentence Production with a Parallel Marker-Passing Algorithm. In Proceedings of Coling-90, Helsinki, 217–222.
Litman, D., and J. Allen. 1987. A Plan Recognition Model for Subdialogues in Conversation. Cognitive Science 11: 163–200.
Moldovan, D., W. Lee and C. Lin. 1989. Snap: A Marker-Propagation Architecture for Knowledge Processing. Department of Electrical Engineering Systems, University of Southern California, CENG 89-10.
Morii, S., K. Niyada, S. Fujii and M. Hoshimi. 1985. Large Vocabulary Speaker-Independent Japanese Speech Recognition System. In Proceedings of Icssp-ieee International Conference on Acoustics, Speech, and Signal Processing.
Morimoto, T., H. Iida, A. Kurematsu, K. Shikano and T. Aizawa. 1990. Spoken Language Translation: Toward Realizing an Automatic Telephone Interpretation System. Proceedings of InfoJapan-90, Tokyo.
Nirenburg, S., V. Lesser and E. Nyberg. 1989. Controlling a Language Generation Planner. In Proceedings of Ijcai-89, Detroit, 1524–1530.
Oviatt, S., P. Cohen and A. Podlozny. 1990. Spoken Language in Interpreted Telephone Dialogues. SRI International, Technical Note 496.
Pollard, C., and I. Sag. 1987. Information-based Syntax and Semantics. CSLI Lecture Notes No. 13.
Prather, P., and D. Swinney. 1988. Lexical Processing and Ambiguity Resolution: An Autonomous Processing in an Interactive Box. In S. Small et al. (eds.), Lexical Ambiguity Resolution, San Mateo, CA: Morgan Kaufmann, 289–310.
Riesbeck, C., and C. Martin. 1985. Direct Memory Access Parsing. Yale University Report 354.
Riesbeck, C., and R. Schank. 1989. Inside Case-Based Reasoning. Hillsdale, NJ: Erlbaum.
Saito, H., and M. Tomita. 1988. Parsing Noisy Sentences. In Proceedings of Coling-88, Budapest, 561–565.
Schank, R. 1982. Dynamic Memory: A Theory of Learning in Computer and People. Cambridge: Cambridge University Press.
Stanfill, C., and D. Waltz. 1986. Toward Memory-Based Reasoning. Communications of the ACM 29: 1213–1228.
Thinking Machines Corporation. 1989. Model CM-2 Technical Summary. Technical Report TR89-1.
Tomabechi, H. 1987. Direct Memory Access Translation. In Proceedings of Ijcai-87, Milan, 722–725.
Tomabechi, H., H. Saito and M. Tomita. 1989. SpeechTrans: An Experimental Real-Time Speech-to-Speech Translation. In Proceedings of the 1989 Spring Symposium of the American Association for Artificial Intelligence.
Tsujii, J. 1985. The Roles of Dictionaries in Machine Translation (in Japanese). Jouhousyori 26(10). [“Information Proceesing,” journal of the Information Processing Society of Japan.]
Viterbi, A.J. 1967. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Transactions on Information Theory IT-13(2): 260–269.
Waibel, A., T. Hanazawa, G. Hinton, K. Shikano and K. Lang. 1989. Phoneme Recognition Using Time-Delay Neural Networks. Ieee Transactions on Acoustics, Speech and Signal Processing.
Waltz, D.L., and J.B. Pollack. 1985. Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation. Cognitive Science 9: 51–74.
Webber, B. 1983. So What Can We Talk About Now? In M. Brady and P. Berwick (eds.), Computational Models of Discourse, Cambridge, MA: MIT Press, 331–371.
Young, S., W. Ward and A. Hauptmann. 1989. Layering Predictions: Flexible Use of Dialogue Expectation in Speech Recognition. In Proceedings of Ijcai-89, Detroit, 1543–1549.
Author information
Authors and Affiliations
Additional information
Part of the work reported here (the massively parallel implementation) is supported by the National Science Foundation under grant MIP-90/09109. I want to thank Hideto Tomabechi, Teruko Mitamura, Lori Levin, Masaru Tomita, Jaime Carbonell, Alex Waibel, James McClelland and Hitoshi Iida for fruitful discussions and continuing support; also, Testuya Higuchi for IXM2 implementation and Dan Moldovan and members of the SNAP project for SNAP implementations. Several anonymous referees for this journal offered helpful suggestions. I would like to express special thanks to Mitsuko Saito who trained me as a simultaneous interpreter. Without my intuitions as a simultaneous interpreter, this work would not have started.
Rights and permissions
About this article
Cite this article
Kitano, H. ΦDmDialog: A speech-to-speech dialogue translation system. Mach Translat 5, 301–338 (1990). https://doi.org/10.1007/BF00376645
Issue Date:
DOI: https://doi.org/10.1007/BF00376645