[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/1870457.1870464dlproceedingsArticle/Chapter ViewAbstractPublication PagesnewsConference Proceedingsconference-collections
research-article
Free access

Transliteration mining with phonetic conflation and iterative training

Published: 16 July 2010 Publication History

Abstract

This paper presents transliteration mining on the ACL 2010 NEWS workshop shared transliteration mining task data. Transliteration mining was done using a generative transliteration model applied on the source language and whose output was constrained on the words in the target language. A total of 30 runs were performed on 5 language pairs, with 6 runs for each language pair. In the presence of limited resources, the runs explored the use of phonetic conflation and iterative training of the transliteration model to improve recall. Using letter conflation improved recall by as much as 48%, with improvements in recall dwarfing drops in precision. Using iterative training improved recall, but often at the cost of significant drops in precision. The best runs typically used both letter conflation and iterative learning.

References

[1]
Slaven Bilac, Hozumi Tanaka. Extracting transliteration pairs from comparable corpora. NLP-2005, 2005.
[2]
Eric Brill, Gary Kacmarcik, Chris Brockett. Automatically harvesting Katakana-English term pairs from search engine query logs. NLPRS 2001, pages 393--399, 2001.
[3]
Huang Fei, Stephan Vogel, and Alex Waibel. 2003. Extracting Named Entity Translingual Equivalence with Limited Resources. TALIP, 2(2):124--129.
[4]
Xiaodong He, 2007. Using Word-Dependent Transition Models in HMM based Word Alignment for Statistical Machine Translation. ACL-07 2nd SMT workshop.
[5]
Chengguo Jin, Dong-Il Kim, Seung-Hoon Na, Jong-Hyeok Lee. 2008. Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer. Sixth SIGHAN Workshop on Chinese Language Processing, 2008.
[6]
Alexandre Klementiev and Dan Roth. 2006. Named Entity Transliteration and Discovery from Multilingual Comparable Corpora. HLT Conf. of the North American Chapter of the ACL, pages 82--88.
[7]
Jin-Shea Kuo, Haizhou Li, Ying-Kuei Yang. 2006. Learning Transliteration Lexicons from the Web. COLING-ACL2006, Sydney, Australia, 1129--1136.
[8]
Jin-shea Kuo, Haizhou Li, Ying-kuei Yang. A phonetic similarity model for automatic extraction of transliteration pairs. TALIP, 2007
[9]
Jin-Shea Kuo, Haizhou Li, Chih-Lung Lin. 2008. Mining Transliterations from Web Query Results: An Incremental Approach. Sixth SIGHAN Workshop on Chinese Language Processing, 2008.
[10]
Jin-shea Kuo, Ying-kuei Yang. 2005. Incorporating Pronunciation Variation into Extraction of Transliterated-term Pairs from Web Corpora. Journal of Chinese Language and Computing, 15 (1): (33--44).
[11]
Chun-Jen Lee, Jason S. Chang. Acquisition of English-Chinese transliterated word pairs from parallel-aligned texts using a statistical machine transliteration model. Workshop on Building and Using Parallel Texts, HLT-NAACL-2003, 2003.
[12]
R. Mahesh, K. Sinha. 2009. Automated Mining Of Names Using Parallel Hindi-English Corpus. 7th Workshop on Asian Language Resources, ACL-IJCNLP 2009, pages 48--54, Suntec, Singapore, 2009.
[13]
Jong-Hoon Oh, Key-Sun Choi. 2006. Recognizing transliteration equivalents for enriching domain-specific thesauri. 3rd Intl. WordNet Conf. (GWC-06), pages 231--237, 2006.
[14]
Jong-Hoon Oh, Hitoshi Isahara. 2006. Mining the Web for Transliteration Lexicons: Joint-Validation Approach. pp. 254--261, 2006 IEEE/WIC/ACM Intl. Conf. on Web Intelligence (WI'06), 2006.
[15]
Raghavendra Udupa, K. Saravanan, Anton Bakalov, and Abhijit Bhole. 2009a. "They Are Out There, If You Know Where to Look": Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval. ECIR-2009, Toulouse, France, 2009.
[16]
Raghavendra Udupa, K. Saravanan, A. Kumaran, and Jagadeesh Jagarlamudi. 2009b. MINT: A Method for Effective and Scalable Mining of Named Entity Transliterations from Large Comparable Corpora. EACL 2009.
[17]
Raghavendra Udupa and Mitesh Khapra. 2010. Transliteration Equivalence using Canonical Correlation Analysis. ECIR-2010, 2010.
[18]
Robert Russell. 1918. Specifications of Letters. US patent number 1,261,167.
[19]
K Saravanan, A Kumaran. 2008. Some Experiments in Mining Named Entity Transliteration Pairs from Comparable Corpora. The 2nd Intl. Workshop on Cross Lingual Information Access addressing the need of multilingual societies, 2008.

Cited By

View all
  • (2017)Inducing a Bilingual Lexicon from Short Parallel Multiword SequencesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/300372616:3(1-20)Online publication date: 17-Mar-2017
  • (2013)A Bayesian Alignment Approach to Transliteration MiningACM Transactions on Asian Language Information Processing10.1145/2499955.249995712:3(1-22)Online publication date: 1-Aug-2013
  • (2012)A statistical model for unsupervised and semi-supervised transliteration miningProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390590(469-477)Online publication date: 8-Jul-2012
  • Show More Cited By
  1. Transliteration mining with phonetic conflation and iterative training

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image DL Hosted proceedings
      NEWS '10: Proceedings of the 2010 Named Entities Workshop
      July 2010
      156 pages
      ISBN:9781932432787
      • Program Chairs:
      • Haizhou Li,
      • A Kumaran

      Publisher

      Association for Computational Linguistics

      United States

      Publication History

      Published: 16 July 2010

      Qualifiers

      • Research-article

      Acceptance Rates

      NEWS '10 Paper Acceptance Rate 7 of 11 submissions, 64%;
      Overall Acceptance Rate 7 of 11 submissions, 64%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)56
      • Downloads (Last 6 weeks)14
      Reflects downloads up to 24 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2017)Inducing a Bilingual Lexicon from Short Parallel Multiword SequencesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/300372616:3(1-20)Online publication date: 17-Mar-2017
      • (2013)A Bayesian Alignment Approach to Transliteration MiningACM Transactions on Asian Language Information Processing10.1145/2499955.249995712:3(1-22)Online publication date: 1-Aug-2013
      • (2012)A statistical model for unsupervised and semi-supervised transliteration miningProceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 110.5555/2390524.2390590(469-477)Online publication date: 8-Jul-2012
      • (2012)Transliteration mining using large training and test setsProceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies10.5555/2382029.2382061(243-252)Online publication date: 3-Jun-2012
      • (2011)Improved transliteration mining using graph reinforcementProceedings of the Conference on Empirical Methods in Natural Language Processing10.5555/2145432.2145578(1384-1393)Online publication date: 27-Jul-2011
      • (2011)An algorithm for unsupervised transliteration mining with an application to word alignmentProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 110.5555/2002472.2002527(430-439)Online publication date: 19-Jun-2011
      • (2011)Is a query worth translatingProceedings of the 33rd European conference on Advances in information retrieval10.5555/1996889.1996920(238-250)Online publication date: 18-Apr-2011
      • (2011)Is a Query Worth TranslatingProceedings of the 33rd European Conference on Advances in Information Retrieval - Volume 661110.1007/978-3-642-20161-5_24(238-250)Online publication date: 18-Apr-2011
      • (2010)Report of NEWS 2010 transliteration mining shared taskProceedings of the 2010 Named Entities Workshop10.5555/1870457.1870460(21-28)Online publication date: 16-Jul-2010

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media