[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1141277.1141525acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

FITE-TRT: a high quality translation technique for OOV words

Published: 23 April 2006 Publication History

Abstract

We devised a novel statistical technique for the identification of the translation equivalents of source words obtained by transformation rule based translation (TRT). The effectiveness of the devised FITE (frequency-based identification of translation equivalents) technique was tested using biological and medical cross-lingual spelling variants and OOV words in Spanish-English and Finnish-English TRT. For Spanish-English, translation recall was 89.2%-91.0% and for Finnish-English 71.9%-72.9%. For both language pairs FITE-TRT achieved high translation precision, i.e., 97.0%-98.8%. The technique also reliably identified native source language words, i.e., source words that cannot be correctly translated by TRT. Dictionary-based CLIR augmented with FITE-TRT performed substantially better than dictionary-based CLIR where OOV keys were kept intact.

References

[1]
Cheng, P-J., Teng, J-W., Chen, R-C., Wang, J. H., Lu, W-H. and Chien, L-F. (2004). Translating unknown queries with Web corpora for cross-language information retrieval. Proceedings of the 27th ACM SIGIR Conference, pp. 146--153.
[2]
Fujii, A. and Ishikawa, T. (2001). Japanese/English cross-language information retrieval: Exploration of query translation and transliteration. Computers and the Humanities, 35(4), 389--420.
[3]
Hedlund T., Airio E., Keskustalo H., Lehtokangas R., Pirkola A. and Järvelin K. (2004). Dictionary-based cross-language information retrieval: Learning experiences from CLEF 2000-2002. Information Retrieval, 7, 99--119.
[4]
Larkey, L. S. and Connell, M. E. (2005). Structured queries, language modeling, and relevance modeling in cross-language information retrieval. Information Processing & Management, 41(3), 457--473.
[5]
Pirkola, A., Toivonen, J., Keskustalo, H., Visala, K. & Jäärvelin, K. (2003). Fuzzy translation of cross-lingual spelling variants. Proceedings of the 26th ACM SIGIR Conference, pp. 345 -- 352.
[6]
Sperer, R. and Oard, D. (2000). Structured translation for cross-language IR. Proceedings of the 23rd ACM SIGIR Conference, pp. 120--127.
[7]
Toivonen, J., Pirkola, A., Keskustalo, H., Visala, K. & Jäärvelin, K. (2005). Translating cross-lingual spelling variants using transformation rules. Information Processing &Management, 41(4), 859--872.
[8]
Zhang, Y. and Vines P. (2004). Using the Web for automated translation extraction in cross-language information Retrieval. Proceedings of the 27th ACM SIGIR Conference, pp. 162--169.

Cited By

View all

Index Terms

  1. FITE-TRT: a high quality translation technique for OOV words

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
    April 2006
    1967 pages
    ISBN:1595931082
    DOI:10.1145/1141277
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 April 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. OOV words
    2. TRT
    3. cross-language information retrieval

    Qualifiers

    • Article

    Conference

    SAC06
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 18 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Study on Unknown Term Translation Mining from Google SnippetsInformation10.3390/info1009026710:9(267)Online publication date: 28-Aug-2019
    • (2018)Machine transliteration and transliterated text retrieval: a surveySādhanā10.1007/s12046-018-0828-843:6Online publication date: 7-Jun-2018
    • (2014)Using Semantic and Domain-Based Information in CLIR SystemsThe Semantic Web: Trends and Challenges10.1007/978-3-319-07443-6_17(240-254)Online publication date: 2014
    • (2011)Machine transliteration surveyACM Computing Surveys10.1145/1922649.192265443:3(1-46)Online publication date: 29-Apr-2011
    • (2010)Transliteration equivalence using canonical correlation analysisProceedings of the 32nd European conference on Advances in Information Retrieval10.1007/978-3-642-12275-0_10(75-86)Online publication date: 28-Mar-2010
    • (2009)"They Are Out There, If You Know Where to Look"Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval10.1007/978-3-642-00958-7_39(437-448)Online publication date: 18-Apr-2009
    • (2008)A novel implementation of the FITE-TRT translation methodProceedings of the IR research, 30th European conference on Advances in information retrieval10.5555/1793274.1793294(138-149)Online publication date: 30-Mar-2008
    • (2008)Data driven methods for improving mono- and cross-lingual IR performance in noisy environmentsProceedings of the second workshop on Analytics for noisy unstructured text data10.1145/1390749.1390762(75-82)Online publication date: 24-Jul-2008
    • (2008)Focused web crawling in the acquisition of comparable corporaInformation Retrieval10.1007/s10791-008-9058-811:5(427-445)Online publication date: 15-Mar-2008

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media