[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article
Free access

Text-translation alignment

Published: 01 March 1993 Publication History

Abstract

We present an algorithm for aligning texts with their translations that is based only on internal evidence. The relaxation process rests on a notion of which word in one text corresponds to which word in the other text that is essentially based on the similarity of their distributions. It exploits a partial alignment of the word level to induce a maximum likelihood alignment of the sentence level, which is in turn used, in the next iteration, to refine the word level estimate. The algorithm appears to converge to the correct sentence alignment in only a few iterations.

References

[1]
Baayen, H. (1991). "A stochastic process for word frequency distributions." In Proceedings, 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA.
[2]
Brown, P.; Lai, J. C.; and Mercer, R. L. (1991). "Aligning sentences in parallel corpora." In Proceedings, 29th Annual Meeting of the Association for Computational Linguistics, Berkeley, CA.
[3]
Brown, P.; Cocke, J.; Della Pietra, S.; Della Pietra, V.; Jelinek, F.; Lafferty, J.; Mercer, R.; and Roossin P. (1990). "A statistical approach to machine translation." Computational Linguistics, 16, 79--85.
[4]
Church, K. W., and Hanks, P. (1990). "Word association norms, mutual information, and lexicography." Computational Linguistics, 16(1), 22--29.
[5]
Drela, M., and Langford, J. S. (1985). "Human-powered flight." Scientific American, 253(5).
[6]
Drela, M., and Langford, J. S. (1986). "Fliegen mit Muskelkraft." Spektrum der Wissenschaft.
[7]
Fano, R. (1961). Transmission of Information. A Statistical Theory of Communications. MIT Press.
[8]
Gale, W. A., and Church, K. W. (1991). "A program for aligning sentences in bilingual corpora." In Proceedings, 29th Annual Meeting of the Association for Computational Linguistics. Berkeley, CA.
[9]
Kay, M., and Röscheisen, M. (1988). "Text-translation alignment." Technical Report, Xerox Palo Alto Research Center.
[10]
Knuth, D. E. (1973). The Art of Computer Programming. Vol. 3, Sorting and Searching. Addison-Wesley.
[11]
MacKeown, P. K., and Weekes, T. C. (1985). "Cosmic rays from Cygnus X-3." Scientific American, 253(5).
[12]
MacKeown, P. K., and Weekes, T. C. (1986). "Kosmische Strahlen von Cygnus X-3." Spektrum der Wissenschaft.
[13]
van Rijsbergen, C. J. (1979). Information Retrieval. Butterworths.
[14]
Sato, S., and Nagao, M. (1990). "Toward memory-based translation." In Proceedings, 15th International Conference on Computational Linguistics (COLING-90). Helsinki, Finland.

Cited By

View all
  • (2023)Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation SystemACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357916322:4(1-51)Online publication date: 6-Apr-2023
  • (2019)Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/331493618:3(1-22)Online publication date: 17-Jun-2019
  • (2017)Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic featuresInformation Processing and Management: an International Journal10.1016/j.ipm.2017.01.00253:3(640-652)Online publication date: 1-May-2017
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Computational Linguistics
Computational Linguistics  Volume 19, Issue 1
Special issue on using large corpora: I
March 1993
216 pages
ISSN:0891-2017
EISSN:1530-9312
Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 March 1993
Published in COLI Volume 19, Issue 1

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)40
  • Downloads (Last 6 weeks)7
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Malayalam Natural Language Processing: Challenges in Building a Phrase-Based Statistical Machine Translation SystemACM Transactions on Asian and Low-Resource Language Information Processing10.1145/357916322:4(1-51)Online publication date: 6-Apr-2023
  • (2019)Leveraging Additional Resources for Improving Statistical Machine Translation on Asian Low-Resource LanguagesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/331493618:3(1-22)Online publication date: 17-Jun-2019
  • (2017)Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic featuresInformation Processing and Management: an International Journal10.1016/j.ipm.2017.01.00253:3(640-652)Online publication date: 1-May-2017
  • (2015)Inducing implicit arguments from comparable textsComputational Linguistics10.1162/COLI_a_0023641:4(625-664)Online publication date: 1-Dec-2015
  • (2015)Towards non-monotonic sentence alignmentInformation Sciences: an International Journal10.1016/j.ins.2015.06.028323:C(34-47)Online publication date: 1-Dec-2015
  • (2014)Experiments with a PPM Compression-Based Method for English-Chinese Bilingual Sentence AlignmentStatistical Language and Speech Processing10.1007/978-3-319-11397-5_5(70-81)Online publication date: 14-Oct-2014
  • (2012)Application of clause alignment for statistical machine translationProceedings of the Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation10.5555/2392936.2392952(102-110)Online publication date: 12-Jul-2012
  • (2012)Book review: bitext alignment jörg tiedemann (uppsala university) morgan & claypool (synthesis lectures on human language technologies, edited by graeme hirst, volume 14), 2011, 153 pp; paperbound, isbn 978-1-60845-510-2, $45.00; e-book, isbn 978-1-60815-511-9, $30.00 or by subscriptionComputational Linguistics10.1162/COLI_r_0010038:2(439-440)Online publication date: 1-Jun-2012
  • (2011)Language-independent context aware query translation using WikipediaProceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web10.5555/2024236.2024260(145-150)Online publication date: 24-Jun-2011
  • (2011)New approach for collecting high quality parallel corpora from multilingual websitesProceedings of the 13th International Conference on Information Integration and Web-based Applications and Services10.1145/2095536.2095599(341-344)Online publication date: 5-Dec-2011
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media