[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1218955.1218977dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Collocation translation acquisition using monolingual corpora

Published: 21 July 2004 Publication History

Abstract

Collocation translation is important for machine translation and many other NLP tasks. Unlike previous methods using bilingual parallel corpora, this paper presents a new method for acquiring collocation translations by making use of monolingual corpora and linguistic knowledge. First, dependency triples are extracted from Chinese and English corpora with dependency parsers. Then, a dependency triple translation model is estimated using the EM algorithm based on a dependency correspondence assumption. The generated triple translation model is used to extract collocation translations from two monolingual corpora. Experiments show that our approach outperforms the existing monolingual corpus based methods in dependency triple translation and achieves promising results in collocation translation extraction.

References

[1]
Morton Benson. 1990. Collocations and general-purpose dictionaries. International Journal of Lexicography. 3(1):23--35
[2]
Yunbo Cao, Hang Li. 2002. Base noun phrase translation using Web data and the EM algorithm. The 19th International Conference on Computational Linguistics. pp. 127--133
[3]
Kenneth W. Church and Patrick Hanks. 1990. Word association norms, mutural information, and lexicography. Computational Linguistics, 16(1):22--29
[4]
Ido Dagan and Alon Itai. 1994. Word sense disambiguation using a second language monolingual corpus. Computational Linguistics, 20(4):563--596
[5]
Ted Dunning. 1993. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics. 19(1):61--74
[6]
Hiroshi Echizen-ya, Kenji Araki, Yoshi Momouchi, Koji Tochinai. 2003. Effectiveness of automatic extraction of bilingual collocations using recursive chain-link-type learning. The 9th Machine Translation Summit. pp. 102--109
[7]
Pascale Fung, and Yee Lo Yuen. 1998. An IR approach for translating new words from nonparallel, comparable Texts. The 36th annual conference of the Association for Computational Linguistics. pp. 414--420
[8]
Jianfeng Gao, Jianyun Nie, Hongzhao He, Weijun Chen, Ming Zhou. 2002. Resolving query translation ambiguity using a decaying cooccurrence model and syntactic dependence relations. The 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 183--190
[9]
G. Heidorn. 2000. Intelligent writing assistant. In R. Dale, H. Moisl, and H. Somers, editors, A Handbook of Natural Language Processing: Techniques and Applications for the Processing of Language as Text. Marcel Dekker.
[10]
Philipp Koehn and Kevin Knight. 2000. Estimating word translation probabilities from unrelated monolingual corpora using the EM algorithm. National Conference on Artificial Intelligence. pp. 711--715
[11]
Philipp Koehn and Kevin Knight. 2002. Learning a translation lexicon from monolingual corpora. Unsupervised Lexical Acquisition: Workshop of the ACL Special Interest Group on the Lexicon. pp. 9--16
[12]
Julian Kupiec. 1993. An algorithm for finding noun phrase correspondences in bilingual corpora. The 31st Annual Meeting of the Association for Computational Linguistics, pp. 23--30
[13]
Cong Li, Hang Li. 2002. Word translation disambiguation using bilingual bootstrapping. The 40th annual conference of the Association for Computational Linguistics. pp: 343--351
[14]
Dekang Lin. 1998. Extracting collocation from Text corpora. First Workshop on Computational Terminology. pp. 57--63
[15]
Dekang Lin 1999. Automatic identification of non-compositional phrases. The 37th Annual Meeting of the Association for Computational Linguistics. pp. 317--324
[16]
Ilya Dan Melamed. 1997. Automatic discovery of non-compositional compounds in parallel data. The 2nd Conference on Empirical Methods in Natural Language Processing. pp. 97--108
[17]
Brown P. F., Pietra, S. A. D., Pietra, V. J. D., and Mercer R. L. 1993. The mathematics of machine translation: parameter estimation. Computational Linguistics, 19(2):263--313
[18]
Reinhard Rapp. 1999. Automatic identification of word translations from unrelated English and German corpora. The 37th annual conference of the Association for Computational Linguistics. pp. 519--526
[19]
Violeta Seretan, Luka Nerima, Eric Wehrli. 2003. Extraction of Multi-Word collocations using syntactic bigram composition. International Conference on Recent Advances in NLP. pp. 424--431
[20]
Frank Smadja. 1993. Retrieving collocations from text: Xtract. Computational Linguistics, 19(1):143--177
[21]
Frank Smadja, Kathleen R. Mckeown, Vasileios Hatzivassiloglou. 1996. Translation collocations for bilingual lexicons: a statistical approach. Computational Linguistics, 22:1--38
[22]
Aristomenis Thanopoulos, Nikos Fakotakis, George Kokkinakis. 2002. Comparative evaluation of collocation extraction metrics. The 3rd International Conference on Language Resource and Evaluation. pp. 620--625
[23]
Hua Wu, Ming Zhou. 2003. Synonymous collocation extraction using translation Information. The 41th annual conference of the Association for Computational Linguistics. pp. 120--127
[24]
Kaoru Yamamoto, Yuji Matsumoto. 2000. Acquisition of phrase-level bilingual correspondence using dependency structure. The 18th International Conference on Computational Linguistics. pp. 933--939
[25]
Ming Zhou, Ding Yuan and Changning Huang. 2001. Improving translation selection with a new translation model trained by independent monolingual corpora. Computaional Linguistics & Chinese Language Processing. 6(1): 1--26

Cited By

View all
  • (2006)An improved method for finding bilingual collocation correspondences from monolingual corporaProceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead10.1007/11940098_6(51-62)Online publication date: 17-Dec-2006

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACL '04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
July 2004
729 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 21 July 2004

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)8
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2006)An improved method for finding bilingual collocation correspondences from monolingual corporaProceedings of the 21st international conference on Computer Processing of Oriental Languages: beyond the orient: the research challenges ahead10.1007/11940098_6(51-62)Online publication date: 17-Dec-2006

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media