[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1007/978-3-642-54903-8_26guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Improving Bilingual Lexicon Extraction from Comparable Corpora Using Window-Based and Syntax-Based Models

Published: 06 April 2014 Publication History

Abstract

This paper proposes two strategies for combining a window-based and a syntax-based context representation for the task of bilingual lexicon extraction from comparable corpora. The first strategy involves combining the scores assigned to translations by both models and using them for ranking and selection; the second strategy involves a combination of the context features provided by the two models prior to applying the lexicon extraction method. The reported results show that the combination of the two context representations significantly improves the performance of bilingual lexicon extraction compared to using each of the representations individually.

References

[1]
Fung, P.: A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In: Farwell, D., Gerber, L., Hovy, E. eds. AMTA 1998. LNCS LNAI, vol. 1529, pp. 1---17. Springer, Heidelberg 1998
[2]
Rapp, R.: Automatic identification of word translations from unrelated english and german corpora. In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics ACL 1999, College Park, MD, USA, pp. 519---526 1999
[3]
Chiao, Y.C., Zweigenbaum, P.: Looking for candidate translational equivalents in specialized, comparable corpora. In: Proceedings of the 19th International Conference on Computational Linguistics COLING 2002, Tapei, Taiwan, pp. 1208---1212 2002
[4]
Prochasson, E., Morin, E.: Anchor points for bilingual extraction from small specialized comparable corpora. TAL 501, 283---304 2009
[5]
Yu, K., Tsujii, J.: Extracting bilingual dictionary from comparable corpora with dependency heterogeneity. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, NAACL-Short 2009, Boulder, Colorado, Companion Volume: Short Papers, pp. 121---124 2009
[6]
Laroche, A., Langlais, P.: Revisiting context-based projection methods for term-translation spotting in comparable corpora. In: Proceedings of the 23rd International Conference on Computational Linguistics COLING 2010, Beijing, China, pp. 617---625 2010
[7]
Gaussier, E., Renders, J.M., Matveeva, I., Goutte, C., Déjean, H.: A geometric view on bilingual lexicon extraction from comparable corpora. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics ACL 2004, Barcelona, Spain, pp. 526---533 July 2004
[8]
Morin, E., Daille, B., Takeuchi, K., Kageura, K.: Bilingual Terminology Mining --- Using Brain, not brawn comparable corpora. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics ACL 2007, Prague, Czech Republic, pp. 664---671 2007
[9]
Déjean, H., Sadat, F., Gaussier, E.: An approach based on multilingual thesauri and model combination for bilingual lexicon extraction. In: Proceedings of the 19th International Conference on Computational Linguistics COLING 2002, Taipei, Taiwan, pp. 218---224 2002
[10]
Otero, P.G.: Evaluating two different methods for the task of extracting bilingual lexicons from comparable corpora. In: Proceedings of LREC 2008 Workshop on Comparable Corpora LREC 2008, Marrakech, Marroco, pp. 19---26 2008
[11]
Otero, P.G.: Learning bilingual lexicons from comparable english and spanish corpora. In: Proceedings of Machine Translation Summit XI, pp. 191---198 2007
[12]
Andrade, D., Matsuzaki, T., Tsujii, J.: Effective use of dependency structure for bilingual lexicon creation. In: Gelbukh, A. ed. CICLing 2011, Part II. LNCS, vol. 6609, pp. 80---92. Springer, Heidelberg 2011
[13]
Ismail, A., Manandhar, S.: Bilingual lexicon extraction from comparable corpora using indomain terms. In: Proceedings of the 23rd International Conference on Computational Linguistics COLING 2010, Beijing, China, pp. 481---489 2010
[14]
Bouamor, D., Semmar, N., Zweigenbaum, P.: Context vector disambiguation for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics ACL 2013, Sofia, Bulgaria, pp. 759---764 2013
[15]
Fano, R.M.: Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge 1961
[16]
Dunning, T.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 191, 61---74 1993
[17]
Salton, G., Lesk, M.E.: Computer evaluation of indexing and text processing. Journal of the Association for Computational Machinery 151, 8---36 1968
[18]
Grefenstette, G.: Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publisher, Boston 1994
[19]
Lin, D.: Dependency-based evaluation of minipar. In: Proceedings of the Workshop on the Evaluation of Parsing Systems, First International Conference on Language Resources and Evaluation LREC 1998, Granada, Spain 1998
[20]
Garera, N., Callison-Burch, C., Yarowsky, D.: Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In: Proceedings of the 13th Conference on Computational Natural Language Learning CoNLL 2009, Boulder, Colorado, pp. 129---137 2009
[21]
Otero, P.G.: The meaning of syntactic dependencies. Linguistik Online 2008
[22]
Grefenstette, G.: Corpus-derived first, second and third-order word affinities. In: Proceedings of the 6th Congress of the European Association for Lexicography EURALEX 1994, Amsterdam, The Netherlands, pp. 279---290 1994
[23]
Aslam, J.A., Montague, M.: Models for Metasearch. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 2001, New Orleans, Louisiana, USA, pp. 276---284 2001
[24]
Groc, C.D.: Babouk: Focused web crawling for corpus compilation and automatic terminology extraction. In: Proceedings of the IEEE-WICACM International Conferences on Web Intelligence, Lyon, France, pp. 497---498 2011
[25]
Daille, B., Morin, E.: French-english terminology extraction from comparable corpora. In: Dale, R., Wong, K.-F., Su, J., Kwong, O.Y. eds. IJCNLP 2005. LNCS LNAI, vol. 3651, pp. 707---718. Springer, Heidelberg 2005
[26]
Hazem, A., Morin, E.: Ica for bilingual lexicon extraction from comparable corpora. In: Proceedings of the 5th Workshop on Building and Using Comparable Corpora BUCC 2012, Istanbul, Turkey 2012
[27]
Manning, D.C., Raghavan, P., Schütze, H.: Introduction to information retrieval. Cambridge University Press 2008

Cited By

View all
  • (2020)Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual CorporaComputational Linguistics10.1162/coli_a_0035345:3(395-421)Online publication date: 9-Jun-2020
  • (2018)Advanced Text Mining Methods for Bilingual Lexicon Extraction from Speciliazed Comparable CorporaComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23804-8_31(400-411)Online publication date: 18-Mar-2018

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Guide Proceedings
CICLing 2014: Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8404
April 2014
577 pages
ISBN:9783642549021
  • Editor:
  • Alexander Gelbukh

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 06 April 2014

Author Tags

  1. Bilingual lexicon extraction
  2. Comparable corpora
  3. Context representation
  4. Dependency relations

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Contextualized Translations of Phrasal Verbs with Distributional Compositional Semantics and Monolingual CorporaComputational Linguistics10.1162/coli_a_0035345:3(395-421)Online publication date: 9-Jun-2020
  • (2018)Advanced Text Mining Methods for Bilingual Lexicon Extraction from Speciliazed Comparable CorporaComputational Linguistics and Intelligent Text Processing10.1007/978-3-031-23804-8_31(400-411)Online publication date: 18-Mar-2018

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media