More Web Proxy on the site http://driver.im/

Article

Free access

Weakly supervised named entity transliteration and discovery from multilingual comparable corpora

Authors:

Alexandre Klementiev,

Dan RothAuthors Info & Claims

ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

Pages 817 - 824

https://doi.org/10.3115/1220175.1220278

Published: 17 July 2006 Publication History

Abstract

Named Entity recognition (NER) is an important part of many natural language processing tasks. Current approaches often employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an (almost) unsupervised learning algorithm for automatic discovery of Named Entities (NEs) in a resource free language, given a bilingual corpora in which it is weakly temporally aligned with a resource rich language. NEs have similar time distributions across such corpora, and often some of the tokens in a multi-word NE are transliterated. We develop an algorithm that exploits both observations iteratively. The algorithm makes use of a new, frequency based, metric for time distributions and a resource free discriminative approach to transliteration. Seeded with a small number of transliteration pairs, our algorithm discovers multi-word NEs, and takes advantage of a dictionary (if one exists) to account for translated or partially translated NEs. We evaluate the algorithm on an English-Russian corpus, and show high level of NEs discovery in Russian.

References

[1]

Nasreen AbdulJaleel and Leah S. Larkey. 2003. Statistical transliteration for english-arabic cross language information retrieval. In Proceedings of CIKM, pages 139--146, New York, NY, USA.

Digital Library

[2]

George Arfken. 1985. Mathematical Methods for Physicists. Academic Press.

[3]

Avrim Blum. 1992. Learning boolean functions in an infinite attribute space. Machine Learning, 9(4):373--386.

[4]

Michael Collins and Yoram Singer. 1999. Unsupervised models for named entity classification. In Proc. of the Conference on Empirical Methods for Natural Language Processing (EMNLP).

[5]

Silviu Cucerzan and David Yarowsky. 1999. Language independent named entity recognition combining morphological and contextual evidence. In Proc. of the Conference on Empirical Methods for Natural Language Processing (EMNLP).

[6]

Magnus Lie Hetland, 2004. Data Mining in Time Series Databases, chapter A Survey of Recent Methods for Efficient Retrieval of Similar Time Sequences. World Scientific.

[7]

Sung Young Jung, SungLim Hong, and Eunok Paek. 2000. An english to korean transliteration model of extended markov window. In Proc. the International Conference on Computational Linguistics (COLING), pages 383--389.

Digital Library

[8]

Alexandre Klementiev and Dan Roth. 2006. Named entity transliteration and discovery from multilingual comparable corpora. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL).

Digital Library

[9]

Kevin Knight and Jonathan Graehl. 1997. Machine transliteration. In Proc. of the Meeting of the European Association of Computational Linguistics, pages 128--135.

Digital Library

[10]

Xin Li, Paul Morie, and Dan Roth. 2004. Identification and tracing of ambiguous names: Discriminative and generative approaches. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 419--424.

Digital Library

[11]

Robert C. Moore. 2005. A discriminative framework for bilingual word alignment. In Proc. of the Conference on Empirical Methods for Natural Language Processing (EMNLP), pages 81--88.

Digital Library

[12]

Frank Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65.

[13]

Dan Roth. 1998. Learning to resolve natural language ambiguities: A unified approach. In Proceedings of the National Conference on Artificial Intelligence (AAAI), pages 806--813.

Digital Library

[14]

Dan Roth. 1999. Learning in natural language. In Proc. of the International Joint Conference on Artificial Intelligence (IJCAI), pages 898--904.

Digital Library

[15]

Gerard Salton and Michael J. McGill. 1986. Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York, NY, USA.

Digital Library

[16]

Yusuke Shinyama and Satoshi Sekine. 2004. Named entity discovery using comparable news articles. In Proc. the International Conference on Computational Linguistics (COLING), pages 848--853.

Digital Library

[17]

Ben Taskar, Simon Lacoste-Julien, and Michael Jordan. 2005. Structured prediction via the extragradient method. In The Conference on Advances in Neural Information Processing Systems (NIPS). MIT Press.

Cited By

Le NSadat FMenard LDinh D(2019)Low-Resource Machine Transliteration Using Recurrent Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326575218:2(1-14)Online publication date: 16-Jan-2019
https://dl.acm.org/doi/10.1145/3265752
Liu LFujita AUtiyama MFinch ASumita ELemao Liu Fujita AUtiyama MFinch ASumita E(2017)Translation Quality Estimation Using Only Bilingual CorporaIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2017.271619525:9(1762-1772)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1109/TASLP.2017.2716195
Zhang WAhmed AYang JJosifovski VSmola ACao LZhang CJoachims TWebb GMargineantu DWilliams G(2015)Annotating Needles in the Haystack without LookingProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788580(2257-2266)Online publication date: 10-Aug-2015
https://dl.acm.org/doi/10.1145/2783258.2788580
Show More Cited By

Weakly supervised named entity transliteration and discovery from multilingual comparable corpora
1. Computing methodologies
  1. Artificial intelligence
2. Hardware
  1. Power and energy
    1. Power estimation and optimization

Recommendations

Named entity transliteration and discovery from multilingual comparable corpora
HLT-NAACL '06: Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics

Named Entity recognition (NER) is an important part of many natural language processing tasks. Most current approaches employ machine learning techniques and require supervised data. However, many languages lack such resources. This paper presents an ...
Named entity transliteration with comparable corpora
ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

In this paper we investigate Chinese-English name transliteration using comparable corpora, corpora where texts in the two languages deal in some of the same topics --- and therefore share references to named entities --- but are not translations of ...
Mining English-Chinese Named Entity Pairs from Comparable Corpora

Bilingual Named Entity (NE) pairs are valuable resources for many NLP applications. Since comparable corpora are more accessible, abundant and up-to-date, recent researches have concentrated on mining bilingual lexicons using comparable corpora. ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings

ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics

July 2006

1214 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 17 July 2006

Qualifiers

Article

Acceptance Rates

Overall Acceptance Rate 85 of 443 submissions, 19%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
664
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)4

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Le NSadat FMenard LDinh D(2019)Low-Resource Machine Transliteration Using Recurrent Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326575218:2(1-14)Online publication date: 16-Jan-2019
https://dl.acm.org/doi/10.1145/3265752
Liu LFujita AUtiyama MFinch ASumita ELemao Liu Fujita AUtiyama MFinch ASumita E(2017)Translation Quality Estimation Using Only Bilingual CorporaIEEE/ACM Transactions on Audio, Speech and Language Processing10.1109/TASLP.2017.271619525:9(1762-1772)Online publication date: 1-Sep-2017
https://dl.acm.org/doi/10.1109/TASLP.2017.2716195
Zhang WAhmed AYang JJosifovski VSmola ACao LZhang CJoachims TWebb GMargineantu DWilliams G(2015)Annotating Needles in the Haystack without LookingProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining10.1145/2783258.2788580(2257-2266)Online publication date: 10-Aug-2015
https://dl.acm.org/doi/10.1145/2783258.2788580
Zhang MLi HKumaran ALiu MZhang MLi HKumaran A(2012)Report of NEWS 2012 machine transliteration shared taskProceedings of the 4th Named Entity Workshop10.5555/2392777.2392779(10-20)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2392777.2392779
Andrews NEisner JDredze MTsujii JHenderson JPasca M(2012)Name phylogenyProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390991(344-355)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2390948.2390991
Jagarlamudi JDaumé HTsujii JHenderson JPasca M(2012)Regularized interlingual projectionsProceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning10.5555/2390948.2390951(12-23)Online publication date: 12-Jul-2012
https://dl.acm.org/doi/10.5555/2390948.2390951
Klementiev AIrvine ACallison-Burch CYarowsky DDaelemans W(2012)Toward statistical machine translation without parallel corporaProceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics10.5555/2380816.2380835(130-140)Online publication date: 23-Apr-2012
https://dl.acm.org/doi/10.5555/2380816.2380835
Jagarlamudi JUdupa RDaumé HBhole AMerlo PBarzilay RJohnson M(2011)Improving bilingual projections via sparse covariance matricesProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145534(930-940)Online publication date: 27-Jul-2011
https://dl.acm.org/doi/10.5555/2145432.2145534
Jagarlamudi JDaumé HUdupa RLin D(2011)From bilingual dictionaries to interlingual document representationsProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 210.5555/2002736.2002768(147-152)Online publication date: 19-Jun-2011
https://dl.acm.org/doi/10.5555/2002736.2002768
Sajjad HFraser ASchmid HLin D(2011)An algorithm for unsupervised transliteration mining with an application to word alignmentProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002527(430-439)Online publication date: 19-Jun-2011
https://dl.acm.org/doi/10.5555/2002472.2002527
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten