[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
article

Statistical models for unsupervised, semi-supervised, and supervised transliteration mining

Published: 01 June 2017 Publication History

Abstract

We present a generative model that efficiently mines transliteration pairs in a consistent fashion in three different settings: unsupervised, semi-supervised, and supervised transliteration mining. The model interpolates two sub-models, one for the generation of transliteration pairs and one for the generation of non-transliteration pairs i.e., noise. The model is trained on noisy unlabeled data using the EM algorithm. During training the transliteration sub-model learns to generate transliteration pairs and the fixed non-transliteration model generates the noise pairs. After training, the unlabeled data is disambiguated based on the posterior probabilities of the two sub-models. We evaluate our transliteration mining system on data from a transliteration mining shared task and on parallel corpora. For three out of four language pairs, our system outperforms all semi-supervised and supervised systems that participated in the NEWS 2010 shared task. On word pairs extracted from parallel corpora with fewer than 2% transliteration pairs, our system achieves up to 86.7% F-measure with 77.9% precision and 97.8% recall.

Cited By

View all
  1. Statistical models for unsupervised, semi-supervised, and supervised transliteration mining

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image Computational Linguistics
      Computational Linguistics  Volume 43, Issue 2
      June 2017
      188 pages
      ISSN:0891-2017
      EISSN:1530-9312
      Issue’s Table of Contents

      Publisher

      MIT Press

      Cambridge, MA, United States

      Publication History

      Published: 01 June 2017
      Published in COLI Volume 43, Issue 2

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 12 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Machine-Based Transliterate of Ottoman to Latin-Based ScriptScientific Programming10.1155/2021/71529352021Online publication date: 11-Nov-2021
      • (2021)Modified self-training based statistical models for image classification and speaker identificationInternational Journal of Speech Technology10.1007/s10772-021-09861-924:4(1007-1015)Online publication date: 1-Dec-2021
      • (2019)A Rule-Based Kurdish Text Transliteration SystemACM Transactions on Asian and Low-Resource Language Information Processing10.1145/327862318:2(1-8)Online publication date: 18-Jan-2019
      • (2019)Low-Resource Machine Transliteration Using Recurrent Neural NetworksACM Transactions on Asian and Low-Resource Language Information Processing10.1145/326575218:2(1-14)Online publication date: 16-Jan-2019

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media