Abstract
This paper presents a preliminary work on aligning Turkish and English parallel texts towards developing a statistical machine translation system for English and Turkish. To avoid the data sparseness problem and to uncover relations between sublexical components of words such as morphemes, we have converted our parallel texts to a morphemic representation and then used standard word alignment algorithms. Results from a mere 3K sentences of parallel English–Turkish texts show that we are able to link Turkish morphemes with English morphemes and function words quite successfully. We have also used the Turkish WordNet which is linked with the English WordNet, as a bootstrapping dictionary to constrain root word alignments.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Lafferty, J.D., Mercer, R.L.: Analysis, statistical transfer, and synthesis in machine translation. In: Proceeding of TMI: Fourth International Conference on Theoretical and Methodological Issues in MT, pp. 83–100 (1992)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Mercer, R.L.: The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics 19, 263–311 (1993)
Yamada, K., Knight, K.: A syntax-based statistical translation model. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, Toulouse (2001)
Lee, Y.S.: Morphological analysis for statistical machine translation. In: Proceedings of HLT-NAACL 2004 - Companion Volume, pp. 57–60 (2004)
Niessen, S., Ney, H.: Statistical machine translation with scarce resources using morpho-syntatic information. Computational Linguistics 30, 181–204 (2004)
Bilgin, O., Çetinoǧlu, O., Oflazer, K.: Building a Wordnet for Turkish. Romanian Journal of Information Science and Technology 7, 163–172 (2004)
Fellbaum, C. (ed.): WordNet, An Electronic Lexical Database. MIT Press, Cambridge (1998)
Oflazer, K.: Two-level description of Turkish morphology. Literary and Linguistic Computing 9, 137–148 (1994)
Karp, D., Schabes, Y., Zaidel, M., Egedi, D.: A freely available wide coverage morphological analyzer for english. In: Proceedings of the 14th International Conference on Computational Linguistics (1992)
Och, F.J., Ney, H.: Improved statistical alignment models. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Hong Kong, pp. 440–447 (2000)
Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Goldsmith, M., Hajic, J., Mercer, R.L., Mohanty, R.: But dictionaries are data too. In: Procedings of the ARPA Human Language Technology Workshop, Princeton, NJ, pp. 202–205 (2003)
Germann, U., Jahr, M., Knight, K., Marcu, D., Yamada, K.: Fast decoding and optimal decoding for machine translation. In: Procedings of ACL 2001, Toulouse, France (2001)
Ulrich, G.: Greedy decoding for statistical machine translation in almost linear time. In: Procedings of HLT-NAACL-2003, Edmonton, AB, Canada (2003)
Och, F.J., Ney, H.: The alignment template approach to statistical machine translation. Computational Linguistics 30, 417–449 (2004)
Koehn, P., Och, F.J., Marcu, D.: Statistical phrase-based translation. In: Proceedings of HLT/NAACL (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
El-Kahlout, İ.D., Oflazer, K. (2005). Aligning Turkish and English Parallel Texts for Statistical Machine Translation. In: Yolum, p., Güngör, T., Gürgen, F., Özturan, C. (eds) Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11569596_64
Download citation
DOI: https://doi.org/10.1007/11569596_64
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29414-6
Online ISBN: 978-3-540-32085-2
eBook Packages: Computer ScienceComputer Science (R0)