Abstract
This paper describes a language independent method for alignment of parallel texts that re-uses acquired knowledge. The system extracts word translation equivalents and re-uses them as correspondence points in order to enhance the alignment of parallel texts. Points that may cause misalignment are filtered using confidence bands of linear regression analysis instead of heuristics, which are not theoretically reliable. Homographs bootstrap the alignment process so as to build the primary word translation lexicon. At each step, the previously acquired lexicon is re-used so as to repeatedly make finer-grained alignments and produce more reliable translation lexicons.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brown, P., Lai, J., Mercer, R.: Aligning Sentences in Parallel Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 169–176
Church, K.: Char_align: A Program for Aligning Parallel Texts at the Character Level. In: Proceedings of the 31st Annual Meeting of the Association for Computational Linguistics, Columbus, Ohio, U.S.A. (1993) 1–8
ELRA (European Language Resources Association) (1997) Multilingual Corpora for Cooperation, Disk 2 of 2, Paris, France
Fung, P., McKeown, K.: Aligning Noisy Parallel Corpora across Language Groups: Word Pair Feature Matching by Dynamic Time Warping. In: Technology Partnerships for Crossing the Language Barrier: Proceedings of the First Conference of the Association for Machine Translation in the Americas, Columbia, Maryland, U.S.A. (1994) 81–88
Fung, P., McKeown, K.: A Technical Word-and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups. In: Machine Translation, Vol. 12, numbers 12 (Special issue) (1997) 53–87
Gale, W., Church, K.: A Program for Aligning Sentences in Bilingual Corpora. In: Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, California, U.S.A. (1991) 177–184 (short version). Also in: Computational Linguistics, Vol. 19, number 1 (1993) 75–102 (long version)
Kay, M., Röscheisen, M.: Text-Translation Alignment. In: Computational Linguistics, Vol. 19, number 1 (1993) 121–142
Melamed, I.: Bitext Maps and Alignment via Pattern Recognition. In: Computational Linguistics, Vol. 25, number 1 (1999) 107–130
Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Alignment with Hapaxes. In: Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’ 2000), Las Vegas, U.S.A.. CSREA Press, U.S.A. (2000)
Ribeiro, A., Lopes, G., Mexia, J.: Linear Regression Based Alignment of Parallel Texts Using Homograph Words. In: Horn, W. (ed.): ECAI 2000. Proceedings of the 14th European Conference on Artificial Intelligence, Berlin, Germany. IOS Press, Amsterdam, Netherlands (2000)
Ribeiro, A., Lopes, G., Mexia, J.: Using Confidence Bands for Parallel Texts Alignment. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000) (2000, to appear)
da Silva, J., Dias, G., Guilloré, S., Lopes, J.: Using Localmaxs Algorithms for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units. In: Barahona, P., Alferes, J. (eds.): Progress in Artificial Intelligence Lecture Notes in Artificial Intelligence, Vol. 1695. Springer-Verlag, Berlin Heidelberg New York (1999) 113–132
Simard, M., Foster, G., Isabelle, P.: Using Cognates to Align Sentences in Bilingual Corpora. In: Proceedings of the Fourth International Conference on Theoretical and Methodological Issues in Machine Translation TMI-92, Montreal, Canada (1992) 67–81
Simard, M., Plamondon, P.: Bilingual Sentence Alignment: Balancing Robustness and Accuracy. In: Machine Translation, Vol. 13, number 1 (1998) 59–80
Wonnacott, T., Wonnacott, R.: Introductory Statistics, 5th edition, John Wiley & Sons, New York Chichester Brisbane Toronto Singapore (1990)
Wu, D.: Aligning a Parallel English-Chinese Corpus Statistically with Lexical Criteria. In: Proceedings of the 32nd Annual Conference of the Association for Computational Linguistics, Las Cruces, New Mexico, U.S.A. (1994) 80–87
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ribeiro, A., Lopes, G., Mexia, J. (2000). A Self-Learning Method of Parallel Texts Alignment. In: White, J.S. (eds) Envisioning Machine Translation in the Information Future. AMTA 2000. Lecture Notes in Computer Science(), vol 1934. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-39965-8_4
Download citation
DOI: https://doi.org/10.1007/3-540-39965-8_4
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41117-8
Online ISBN: 978-3-540-39965-0
eBook Packages: Springer Book Archive