Abstract
Generalizations of sentence-pairs in Example-based Machine Translation (EBMT) have been shown to increase coverage and translation quality in the past. These template-based approaches (G-EBMT) find common patterns in the bilingual corpus to generate generalized templates. In the past, patterns in the corpus were found by only few of the following ways: finding similar or dissimilar portions of text in groups of sentence-pairs, finding semantically similar words, or use dictionaries and parsers to find syntactic correspondences. This paper combines all the three aspects for generating templates. In this paper, the boundaries for aligning and extracting members (phrase-pairs) for clustering are found using chunkers (hence, syntactic information) trained independently on the two languages under consideration. Then semantically related phrase-pairs are grouped based on the contexts in which they appear. Templates are then constructed by replacing these clustered phrase-pairs by their class labels. We also perform a filtration step by simulating human labelers to obtain only those phrase-pairs that have high correspondences between the source and the target phrases that make up the phrase-pairs. Templates with English-Chinese and English-French language pairs gave significant improvements over a baseline with no templates.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Block, H.U.: Example-Based Incremental Synchronous Interpretation. In: Wahlster, W. (ed.) Vermobil: Foundations of Speech-to-Speech Translation. Springer, Heidelberg (2000)
Brown, R.D.: Automated dictionary extraction for “knowledge-free” example-based translation. In: Proceedings of the Seventh International Conference on Theoretical and Methodological Issues in Machine Translation, pp. 111–118 (1997)
Brown, R.D.: Example-Based Machine Translation in the PANGLOSS System. In: Proceedings of The International Conference on Computational Linguistics, pp. 169–174 (1998)
Brown, R.D.: Automated Generalization of Translation Examples. In: Proceedings of The International Conference on Computational Linguistics, pp. 125–131 (2000)
Brown, R.D.: Transfer-Rule Induction for Example-Based Translation. In: Proceedings of The Machine Translation Summit VIII Workshop on Example-Based Machine Translation, pp. 1–11 (2001)
Brown, R.D.: A Modified BWT for highly scalable Example-based translation. In: Proceedings of The Association for Machine Translation in the Americas, pp. 27–36 (2004)
Bullinaria, J., Levy, J.: Extracting semantic representations from word co-occurrence statistics: A computational study. In: Behavior Research Methods, pp. 510–526 (2007)
Chiang, D.: A hierarchical phrase-based model for statistical machine translation. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 263–270 (2005)
Consortium, L.L.D.: Hansard corpus of parallel english and french. Linguistic Data Consortium (1997)
Gangadharaiah, R., Brown, R.D., Carbonell, J.G.: Spectral clustering for example based machine translation. In: HLT-NAACL (2006)
Gangadharaiah, R., Brown, R.D., Carbonell, J.G.: Automatic determination of number of clusters for creating templates in example-based machine translation. In: Proceedings of The Conference of the European Association for Machine Translation (2010)
Gough, N., Way, A.: Robust Large-Scale EBMT with Marker-Based Segmentation. In: Proceedings of The Conference on Theoretical and Methodological Issues in Machine Translation, pp. 95–104 (2004)
Goutte, C., Toft, P., Rostrup, E., Nielsen, F.A., Hansen, L.K.: On Clustering fMRI Time Series. NeuroImage, 298–310 (1998)
Guvenir, H.A., Cicekli, I.: Learning translation templates from examples. Information Systems, 353–363 (1998)
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Kaji, H., Kida, Y., Morimoto, Y.: Learning Translation Templates from Bilingual Text. In: Proceedings of The International Conference on Computational Linguistics, pp. 672–678 (1992)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the IEEE ICASSP, vol. I, pp. 181–184 (1995)
Koehn, P.: Pharaoh: A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models. In: Frederking, R.E., Taylor, K.B. (eds.) AMTA 2004. LNCS (LNAI), vol. 3265, pp. 115–124. Springer, Heidelberg (2004)
Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses: Open Source Toolkit for Statistical Machine Translation. In: Annual Meeting of ACL, demonstration (2007)
Lafferty, J., McCallum, A., Pereira, F.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of The International Conference on Machine Learning, pp. 282–289 (2002)
Levy, R., Manning, C.D.: Is it harder to parse chinese, or the chinese treebank? In: Association for Computational Linguistics, pp. 439–446 (2003)
McTait, K.: Translation patterns, linguistic knowledge and complexity in ebmt. In: Proceedings of The Machine Translation Summit VIII Workshop on Example-Based Machine Translation, pp. 23–34 (2001)
NIST: Machine translation evaluation (2003)
Och, F.J., Ney, H.: A systematic comparison of various statistical alignment models. Computational Linguistics 29(1), 19–51 (2003)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: Bleu: a method for automatic evaluation of machine translation. In: Association for Computational Linguistics, pp. 311–318 (2002)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, pp. 44–49 (1994)
Somers, H.L., McLean, I., Jones, D.: Experiments in multilingual example-based generation. In: International Conference on the Cognitive Science of Natural Language Processing (1994)
Tseng, H., Chang, P., Andrew, G., Jurafsky, D., Manning, C.: A conditional random field word segmenter. In: Fourth SIGHAN Workshop on Chinese Language Processing (2005)
Turney, P.D., Pantel, P.: From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 141–188 (2010)
Veale, T., Way, A.: Gaijin: A bootstrapping, template-driven approach to example-based mt. In: International Conference, Recent Advances in Natural Language Processing, pp. 239–244 (1997)
Vogel, S.: Pesa phrase pair extraction as sentence splitting. In: Machine Translation Summit X (2005)
Wilcoxon, F.: Individual comparisons by ranking methods (1945)
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Association for Computational Linguistics, pp. 523–530 (2001)
Zens, R., Och, F.J., Ney, H.: Phrase-based statistical machine translation. In: Jarke, M., Koehler, J., Lakemeyer, G. (eds.) KI 2002. LNCS (LNAI), vol. 2479, pp. 18–32. Springer, Heidelberg (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gangadharaiah, R., Brown, R.D., Carbonell, J. (2011). Phrasal Equivalence Classes for Generalized Corpus-Based Machine Translation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2011. Lecture Notes in Computer Science, vol 6609. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-19437-5_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-19437-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-19436-8
Online ISBN: 978-3-642-19437-5
eBook Packages: Computer ScienceComputer Science (R0)