[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1008992.1009043acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
Article

Learning phonetic similarity for matching named entity translations and mining new translations

Published: 25 July 2004 Publication History

Abstract

We propose a novel named entity matching model which considers both semantic and phonetic clues. The matching is formulated as an optimization problem. One major component is a phonetic matching model which exploits similarity at the phoneme level. We investigate three learning algorithms for obtaining the similarity information of basic phoneme units based on training examples. By applying this proposed named entity matching model, we also develop a mining framework for discovering new, unseen named entity translations from online daily Web news. This framework harvests comparable news in different languages using an existing bilingual dictionary. It is able to discover new name translations not found in the dictionary.

References

[1]
R.K. Ahuja, T.L. Magnanti, and J.B. Orlin. Network Flows: Theory, Algorithms, and Applications. Prentice Hall, 1993.]]
[2]
P.-S. Cheung, R. Huang, W. Lam, and Y.-Y. Law. Mining unseen name translations via detecting comparable news. In Proceedings of the IASTED International Conference on Information and Knowledge Sharing, pages 120--125, November 2003.]]
[3]
P. Fung. A statistical view on bilingual lexicon extraction: From parallel corpora to non-parallel corpora. In Proceeding of The Association for Machine Translation in the Americas, pages 1--17, 1998.]]
[4]
J. Gao, M. Zhou, J.-Y. Nie, H. He, and W. Chen. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations. In Proceedings of 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 183--190, August 2002.]]
[5]
J. Kivinen and M.K. Warmuth. Exponentiated gradient versus gradient descent for linear predictions. Information and Computation, 132(1):1--63, 1997.]]
[6]
H. Kuhn. The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2:83--97, 1955.]]
[7]
W. H. Lin and H. H. Chen. Backward machine transliteration by learning phonetic similarity. In Proceedings of the Sixth Conference on Natural Language Learning (CoNLL), pages 139--145, 2002.]]
[8]
W.-H. Lu, L.-F. Chien, and H.-J. Lee. Mining anchor texts for translation of Web queries. ACM Transactions on Asian Language Information Processing, 1(2):159--172, 2002.]]
[9]
J. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 74--81, 1999.]]
[10]
R. Rapp. Automatic identification of word translations from unrelated English and German corpora. In Proceedings of ACL-99, pages 519--526, 1999.]]
[11]
P. Thompson and C. Dozier. Name searching and information retrieval. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pages 134--140, 1997.]]
[12]
E. Voorhees and D. Tice. The TREC-8 question answering track evaluation. In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000.]]
[13]
B. Widrow and M.E. Hoff. Adaptive switching circuits. 1960 IRE WESCON Convention Record, pages 96--104, 1960.]]

Cited By

View all
  • (2021)Similar Trademark Detection via Semantic, Phonetic and Visual Similarity InformationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463038(2025-2030)Online publication date: 11-Jul-2021
  • (2019)Study on Unknown Term Translation Mining from Google SnippetsInformation10.3390/info1009026710:9(267)Online publication date: 28-Aug-2019
  • (2018)Machine transliteration and transliterated text retrieval: a surveySādhanā10.1007/s12046-018-0828-843:6Online publication date: 7-Jun-2018
  • Show More Cited By

Index Terms

  1. Learning phonetic similarity for matching named entity translations and mining new translations

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '04: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
    July 2004
    624 pages
    ISBN:1581138814
    DOI:10.1145/1008992
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 25 July 2004

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. learning phonetic information
    2. named entity translation
    3. text mining

    Qualifiers

    • Article

    Conference

    SIGIR04
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 24 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2021)Similar Trademark Detection via Semantic, Phonetic and Visual Similarity InformationProceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3404835.3463038(2025-2030)Online publication date: 11-Jul-2021
    • (2019)Study on Unknown Term Translation Mining from Google SnippetsInformation10.3390/info1009026710:9(267)Online publication date: 28-Aug-2019
    • (2018)Machine transliteration and transliterated text retrieval: a surveySādhanā10.1007/s12046-018-0828-843:6Online publication date: 7-Jun-2018
    • (2014)Noise-aware Character Alignment for Extracting Transliteration FragmentsJournal of Natural Language Processing10.5715/jnlp.21.110721:6(1107-1131)Online publication date: 2014
    • (2011)Machine transliteration surveyACM Computing Surveys10.1145/1922649.192265443:3(1-46)Online publication date: 29-Apr-2011
    • (2011)A CLIR-oriented OOV translation mining method from bilingual webpages2011 International Conference on Machine Learning and Cybernetics10.1109/ICMLC.2011.6016958(1872-1877)Online publication date: Jul-2011
    • (2008)Translating OOV phrases based on lexical information and web mining2008 3rd International Conference on Intelligent System and Knowledge Engineering10.1109/ISKE.2008.4731037(791-796)Online publication date: Nov-2008
    • (2008)Entity matching across heterogeneous data sourcesData & Knowledge Engineering10.1016/j.datak.2008.04.00766:3(368-381)Online publication date: 1-Sep-2008
    • (2007)A phonetic similarity model for automatic extraction of transliteration pairsACM Transactions on Asian Language Information Processing10.1145/1282080.12820816:2(6-es)Online publication date: 1-Sep-2007
    • (2007)Named entity translation matching and learningACM Transactions on Information Systems10.1145/1198296.119829825:1(2-es)Online publication date: 1-Feb-2007
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media