[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1072228.1072248dlproceedingsArticle/Chapter ViewAbstractPublication PagescolingConference Proceedingsconference-collections
Article
Free access

Extracting word sequence correspondences with support vector machines

Published: 24 August 2002 Publication History

Abstract

This paper proposes a learning and extracting method of word sequence correspondences from non-aligned parallel corpora with Support Vector Machines, which have high ability of the generalization, rarely cause over-fit for training samples and can learn dependencies of features by using a kernel function. Our method uses features for the translation model which use the translation dictionary, the number of words, part-of-speech, constituent words and neighbor words. Experiment results in which Japanese and English parallel corpora are used archived 81.1% precision rate and 69.0% recall rate of the extracted word sequence correspondences. This demonstrates that our method could reduce the cost for making translation dictionaries.

References

[1]
Pascale Fung. 1997. Finding terminology translation from non-parallel corpora. In Proceeding of the 5th Workshop on Very Large Corpora, pages 192--202.
[2]
William A. Gale and Kenneth W. Church. 1991. Identifying word correspondences in parallel texts. In Proceedings of the 2nd Speech and Natural Language Workshop, pages 152--157.
[3]
Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. In the 10th European Conference on Machine Learning, pages 137--142.
[4]
Hiroyuki Kaji and Toshiko Aizono. 1996. Extracting word correspondences from bilingual corpora based on word co-occurrence information. In Proceedings of the 16th International Conference on Computational Linguistics, pages 23--28.
[5]
Martin Kay and Martin Röschesen. 1993. Text-translation alignment. Computational Linguistics, 19(1):121--142.
[6]
Mihoko Kitamura and Yuji Matsumoto. 1996. Automatic extraction of word sequence correspondences in parallel corpora. In Proceeding of the 4th Workshop on Very Large Corpora, pages 78--89.
[7]
Taku Kudo and Yuji Matsumoto. 2000a. Japanese dependency structure analysis based on support vector machines. In Proceedings of the 2000 Joint SIGDAT Conference on Emprical Methods in Natural Language Processing and Very Large Corpora, pages 18--25, Hong Kong, October.
[8]
Taku Kudo and Yuji Matsumoto. 2000b. Use of support vector learning for chunk identification. In Proceedings of the 4th Conference on Computational Natural Language Learning and the 2nd Learning Language in Logic Workshop, pages 142--144, Lisbon, September.
[9]
I. Dan Melamed. 1997. A word-to-word model of translation equivalence. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pages 490--497.
[10]
Kengo Sato and Masakazu Nakanishi. 1998. Maximum entropy model learning of the translation rules. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics, pages 1171--1175, August.
[11]
Alexander J. Smola, Peter J. Bartlett, bernha Schölkopf, and Dale Schuurmans, editors. 2000. Advances in Large Margin Classifiers. MIT Press.
[12]
Hirotoshi Taira and Masahiko Haruno. 1999. Feature selection in svm text categorization. In Proceedings of the 16th National Conference of the American Associtation of Artificial Intelligence, pages 480--486, Florida, July.
[13]
Kumiko Tanaka and Hideya Iwasaki. 1996. Extraction of lexical translatins from non-aligned corpora. In Proceedings of the 16th International Conference on Computational Linguistics, pages 580--585.
[14]
Vladimir Naumovich Vapnik. 1999. The Nature of Statistical Learning Theory (Statistics for Engineering and Information Science). Springer-Verlag Telos, 2nd edition, December.

Cited By

View all
  • (2011)Word AdHoc NetworkKnowledge-Based Systems10.1016/j.knosys.2010.11.00624:3(393-405)Online publication date: 1-Apr-2011
  • (2006)Weighted kernel model for text categorizationProceedings of the fifth Australasian conference on Data mining and analystics - Volume 6110.5555/1273808.1273823(111-114)Online publication date: 1-Nov-2006
  • (2004)Example-Based machine translation without saying inferable predicateProceedings of the First international joint conference on Natural Language Processing10.1007/978-3-540-30211-7_22(206-215)Online publication date: 22-Mar-2004
  1. Extracting word sequence correspondences with support vector machines

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      COLING '02: Proceedings of the 19th international conference on Computational linguistics - Volume 1
      August 2002
      1184 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 24 August 2002

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 1,537 of 1,537 submissions, 100%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)30
      • Downloads (Last 6 weeks)3
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2011)Word AdHoc NetworkKnowledge-Based Systems10.1016/j.knosys.2010.11.00624:3(393-405)Online publication date: 1-Apr-2011
      • (2006)Weighted kernel model for text categorizationProceedings of the fifth Australasian conference on Data mining and analystics - Volume 6110.5555/1273808.1273823(111-114)Online publication date: 1-Nov-2006
      • (2004)Example-Based machine translation without saying inferable predicateProceedings of the First international joint conference on Natural Language Processing10.1007/978-3-540-30211-7_22(206-215)Online publication date: 22-Mar-2004

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media