[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1220175.1220278dlproceedingsArticle/Chapter ViewAbstractPublication PagesaclConference Proceedingsconference-collections
Article
Free access

Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

Published: 17 July 2006 Publication History

Abstract

Named Entity recognition (NER) is an important part of many natural language processing tasks. Current approaches often employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an (almost) unsupervised learning algorithm for automatic discovery of Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language. NEs have similar time distributions across such corpora, and often some of the tokens in a multi-word NE are transliterated. We develop an algorithm that exploits both observations iteratively. The algorithm makes use of a new, frequency based, metric for time distributions and a resource free discriminative approach to transliteration. Seeded with a small number of transliteration pairs, our algorithm discovers multi-word NEs, and takes advantage of a dictionary (if one exists) to account for translated or partially translated NEs. We evaluate the algorithm on an English-Russian corpus, and show high level of NEs discovery in Russian.

References

[1]
Nasreen AbdulJaleel and Leah S. Larkey. 2003. Statistical transliteration for english-arabic cross language information retrieval. In Proceedings of CIKM, pages 139--146, New York, NY, USA.
[2]
George Arfken. 1985. Mathematical Methods for Physicists. Academic Press.
[3]
Avrim Blum. 1992. Learning boolean functions in an infinite attribute space. Machine Learning, 9(4):373--386.
[4]
Michael Collins and Yoram Singer. 1999. Unsupervised models for named entity classification. In Proc. of the Conference on Empirical Methods for Natural Language Processing (EMNLP).
[5]
Silviu Cucerzan and David Yarowsky. 1999. Language independent named entity recognition combining morphological and contextual evidence. In Proc. of the Conference on Empirical Methods for Natural Language Processing (EMNLP).
[6]
Magnus Lie Hetland, 2004. Data Mining in Time Series Databases, chapter A Survey of Recent Methods for Efficient Retrieval of Similar Time Sequences. World Scientific.
[7]
Sung Young Jung, SungLim Hong, and Eunok Paek. 2000. An english to korean transliteration model of extended markov window. In Proc. the International Conference on Computational Linguistics (COLING), pages 383--389.
[8]
Alexandre Klementiev and Dan Roth. 2006. Named entity transliteration and discovery from multilingual comparable corpora. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL).
[9]
Kevin Knight and Jonathan Graehl. 1997. Machine transliteration. In Proc. of the Meeting of the European Association of Computational Linguistics, pages 128--135.
[10]
Xin Li, Paul Morie, and Dan Roth. 2004. Identification and tracing of ambiguous names: Discriminative and generative approaches. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 419--424.
[11]
Robert C. Moore. 2005. A discriminative framework for bilingual word alignment. In Proc. of the Conference on Empirical Methods for Natural Language Processing (EMNLP), pages 81--88.
[12]
Frank Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65.
[13]
Dan Roth. 1998. Learning to resolve natural language ambiguities: A unified approach. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 806--813.
[14]
Dan Roth. 1999. Learning in natural language. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 898--904.
[15]
Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.
[16]
Yusuke Shinyama and Satoshi Sekine. 2004. Named entity discovery using comparable news articles. In Proc. the International Conference on Computational Linguistics (COLING), pages 848--853.
[17]
Ben Taskar, Simon Lacoste-Julien, and Michael Jordan. 2005. Structured prediction via the extragradient method. In The Conference on Advances in Neural Information Processing Systems (NIPS). MIT Press.

Cited By

View all
  • (2019)Low-Resource Machine Transliteration Using Recurrent Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326575218:2(1-14)Online publication date: 16-Jan-2019
  • (2017)Translation Quality Estimation Using Only Bilingual CorporaIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2017.271619525:9(1762-1772)Online publication date: 1-Sep-2017
  • (2015)Annotating Needles in the Haystack without LookingProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788580(2257-2266)Online publication date: 10-Aug-2015
  • Show More Cited By
  1. Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
      July 2006
      1214 pages

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 17 July 2006

      Qualifiers

      • Article

      Acceptance Rates

      Overall Acceptance Rate 85 of 443 submissions, 19%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)55
      • Downloads (Last 6 weeks)7
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Low-Resource Machine Transliteration Using Recurrent Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326575218:2(1-14)Online publication date: 16-Jan-2019
      • (2017)Translation Quality Estimation Using Only Bilingual CorporaIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2017.271619525:9(1762-1772)Online publication date: 1-Sep-2017
      • (2015)Annotating Needles in the Haystack without LookingProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788580(2257-2266)Online publication date: 10-Aug-2015
      • (2012)Report of NEWS 2012 machine transliteration shared taskProceedings of the 4th Named Entity Workshop10.5555/2392777.2392779(10-20)Online publication date: 12-Jul-2012
      • (2012)Name phylogenyProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390991(344-355)Online publication date: 12-Jul-2012
      • (2012)Regularized interlingual projectionsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390951(12-23)Online publication date: 12-Jul-2012
      • (2012)Toward statistical machine translation without parallel corporaProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380835(130-140)Online publication date: 23-Apr-2012
      • (2011)Improving bilingual projections via sparse covariance matricesProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145534(930-940)Online publication date: 27-Jul-2011
      • (2011)From bilingual dictionaries to interlingual document representationsProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002768(147-152)Online publication date: 19-Jun-2011
      • (2011)An algorithm for unsupervised transliteration mining with an application to word alignmentProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002527(430-439)Online publication date: 19-Jun-2011
      • Show More Cited By

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media