[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.3115/1219044.1219047dlproceedingsArticle/Chapter ViewAbstractPublication PagesacldemosConference Proceedingsconference-collections
Article
Free access

Constructing transliteration lexicons from web corpora

Published: 21 July 2004 Publication History

Abstract

This paper proposes a novel approach to automating the construction of transliterated-term lexicons. A simple syllable alignment algorithm is used to construct confusion matrices for cross-language syllable-phoneme conversion. Each row in the confusion matrix consists of a set of syllables in the source language that are (correctly or erroneously) matched phonetically and statistically to a syllable in the target language. Two conversions using phoneme-to-phoneme and text-to-phoneme syllabification algorithms are automatically deduced from a training corpus of paired terms and are used to calculate the degree of similarity between phonemes for transliterated-term extraction. In a large-scale experiment using this automated learning process for conversions, more than 200,000 transliterated-term pairs were successfully extracted by analyzing query results from Internet search engines. Experimental results indicate the proposed approach shows promise in transliterated-term extraction.

References

[1]
Al-Onaizan Y. and Knight K. 2002. Machine Transliteration of Names in Arabic Text, In Proceedings of ACL Workshop on Computational Approaches to Semitic Languages, pp. 34--46.
[2]
Brill E., Kacmarcik G., Brockett C. 2001. Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs, In Proceedings of Natural Language Processing Pacific Rim Symposium, pp. 393--399.
[3]
Brin S. and Page L. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine, In Proceedings of 7th International World Wide Web Conference, pp. 107--117.
[4]
Fung P. and Yee L.-Y. 1998. An IR Approach for Translating New Words from Nonparallel, Comparable Texts. In Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 7th International Conference on Computational Linguistics, pp. 414--420.
[5]
Jurafsky D. and Martin J. H. 2000. Speech and Language Processing, pp. 102--120, Prentice-Hall, New Jersey.
[6]
Knight K. and Graehl J. 1998. Machine Transliteration, Computational Linguistics, Vol. 24, No. 4, pp. 599--612.
[7]
Kuo J. S. and Yang Y. K. 2003. Automatic Transliterated-term Extraction Using Confusion Matrix from Non-parallel Corpora, In Proceedings of ROCLING XV Computational Linguistics Conference, pp. 17--32.
[8]
Pagel V., Lenzo K., and Black A. 1998. Letter to Sound Rules for Accented Lexicon Compression, In Proceedings of ICSLP, pp. 2015--2020.

Cited By

View all
  • (2011)Machine transliteration surveyACM Computing Surveys10.1145/1922649.192265443:3(1-46)Online publication date: 29-Apr-2011
  • (2007)A phonetic similarity model for automatic extraction of transliteration pairsACM Transactions on Asian Language Information Processing (TALIP)10.1145/1282080.12820816:2(6-es)Online publication date: 1-Sep-2007
  • (2006)Learning transliteration lexicons from the webProceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics10.3115/1220175.1220317(1129-1136)Online publication date: 17-Jul-2006

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image DL Hosted proceedings
ACLdemo '04: Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
July 2004
144 pages

Publisher

Association for Computational Linguistics

United States

Publication History

Published: 21 July 2004

Qualifiers

  • Article

Acceptance Rates

Overall Acceptance Rate 91 of 488 submissions, 19%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)5
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2011)Machine transliteration surveyACM Computing Surveys10.1145/1922649.192265443:3(1-46)Online publication date: 29-Apr-2011
  • (2007)A phonetic similarity model for automatic extraction of transliteration pairsACM Transactions on Asian Language Information Processing (TALIP)10.1145/1282080.12820816:2(6-es)Online publication date: 1-Sep-2007
  • (2006)Learning transliteration lexicons from the webProceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics10.3115/1220175.1220317(1129-1136)Online publication date: 17-Jul-2006

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media