[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2542050.2542073acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Experiments with query translation and re-ranking methods in Vietnamese-English bilingual information retrieval

Published: 05 December 2013 Publication History

Abstract

Using bilingual dictionaries is a common way for query translation in Cross Language Information Retrieval. In this article, we focus on Vietnamese-English Bilingual Information Retrieval and present algorithms for query segmentation, word disambiguation and re-ranking to improve the dictionary-based query translation approach. An evaluation environment is implemented to verify and compare the application of proposed algorithms with the baseline method using manual translation.

References

[1]
P. Iswarya, V. Radha Cross Language Text Retrieval: A Review. International Journal Of Engineering Research And Applications. 2(5), pp. 1036--1043, 2012.
[2]
Bao-Quoc Ho, V. B. Dang, M. V. Luong, and T. T. B. Dong. English-Vietnamese Cross-Language Information Retrieval: An experimental study. IEEE International Conference on Research, Innovation and Vision for the Future, pp 107--113, 2008.
[3]
Douglas W Oard. Alternative approaches for cross-language text retrieval. Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, pp. 154âĂŞ162, 1997.
[4]
Ari Pirkola, Turid Hedlund, Heikki Keskustalo, and Kalervo Järvelin. Dictionary-based cross-language information retrieval: Problems, methods, and research findings. Information retrieval, 4(3-4): 209--230, 2001.
[5]
Jianfeng Gao, Jian-Yun Nie, Endong Xun, Jian Zhang, Ming Zhou, and Changning Huang. Improving query translation for cross-language information retrieval using statistical models. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, pp 96--104, 2001.
[6]
Hong Phuong Le, Azim Roussanaly, Thi Minh Huyen Nguyen, and Mathias Rossignol. An empirical study of maximum entropy approach for part-of-speech tagging of Vietnamese texts. Actes du Traitement Automatique des Langues Naturelles (TALN-2010)
[7]
Han Doan Nguyen. Vietnamese-English Cross-Language Information Retrieval (CLIR) Using Bilingual Dictionary. International Workshop on Advanced Computing and Applications Ho Chi Minh City, 2007.
[8]
Bui Thanh Hung, Nguyen Le Minh, and Akira Shimazu. Sentence Splitting for Vietnamese-English Machine Translation. Proceedings of the 2012 Fourth International Conference on Knowledge and Systems Engineering, pp 156--160, Washington, DC, USA, 2012. IEEE Computer Society.
[9]
Dinh, Quang Thang and Le, Hong Phuong and Nguyen, Thi Minh Huyen and Nguyen, Cam Tu and Rossignol, Mathias and Vu, Xuan Luong. Word segmentation of Vietnamese texts: a comparison of approaches. 6th international conference on Language Resources and Evaluation - LREC, 2008.
[10]
Quoc Hung Ngo and W. Winiwarter. Building an English-Vietnamese Bilingual Corpus for Machine Translation. Conference on Asian Language Processing (IALP), pp 157--160, 2012.
[11]
Lisa Ballesteros and W Bruce Croft. Resolving ambiguity for cross-language retrieval. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pp 64--71. ACM, 1998.
[12]
Jinxi Xu and W Bruce Croft. Corpus-based stemming using cooccurrence of word variants. ACM Transactions on Information Systems (TOIS), 16(1): 61--81, 1998.
[13]
Mirna Adriani. Using statistical term similarity for sense disambiguation in cross-language information retrieval. Information Retrieval, 2(1): 71--82, 2000.
[14]
F. Sadat, A. Maeda, M. Yoshikawa, and S. Uemura. A combined statistical query term disambiguation in cross-language information retrieval. Database and Expert Systems Applications, 2002. Proceedings. 13th International Workshop on, pp 251--255, 2002.
[15]
Radim Rehurek and Petr Sojka. Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp 45--50, Valletta, Malta, May 2010. ELRA. http://is.muni.cz/publication/884893/en.
[16]
Kyung-Soon Lee, Kyo Kageura, Key-Sun Choi Implicit Ambiguity Resolution Using Incremental Clustering in Korean-to-English Cross-Language Information Retrieval. Proceedings of the 19th international conference on Computational linguistics-Volume 1. Association for Computational Linguistics, 2002.
[17]
Baeza-Yates, Ricardo and Ribeiro-Neto, Berthier. Modern information retrieval, ACM press New York, 1999.

Cited By

View all
  • (2019)Cross-lingual Information Retrieval: application and Challenges for Indian Languages2019 IEEE 5th International Conference for Convergence in Technology (I2CT)10.1109/I2CT45611.2019.9033563(1-4)Online publication date: Mar-2019

Index Terms

  1. Experiments with query translation and re-ranking methods in Vietnamese-English bilingual information retrieval

      Recommendations

      Comments

      Please enable JavaScript to view thecomments powered by Disqus.

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      SoICT '13: Proceedings of the 4th Symposium on Information and Communication Technology
      December 2013
      345 pages
      ISBN:9781450324540
      DOI:10.1145/2542050
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      • SOICT: School of Information and Communication Technology - HUST
      • NAFOSTED: The National Foundation for Science and Technology Development
      • ACM Vietnam Chapter: ACM Vietnam Chapter
      • Danang Univ. of Technol.: Danang University of Technology

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 05 December 2013

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. CLIR
      2. disambiguation
      3. information retrieval
      4. precision
      5. query translation
      6. re-ranking

      Qualifiers

      • Research-article

      Conference

      SoICT '13
      Sponsor:
      • SOICT
      • NAFOSTED
      • ACM Vietnam Chapter
      • Danang Univ. of Technol.

      Acceptance Rates

      SoICT '13 Paper Acceptance Rate 40 of 80 submissions, 50%;
      Overall Acceptance Rate 147 of 318 submissions, 46%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 13 Dec 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2019)Cross-lingual Information Retrieval: application and Challenges for Indian Languages2019 IEEE 5th International Conference for Convergence in Technology (I2CT)10.1109/I2CT45611.2019.9033563(1-4)Online publication date: Mar-2019

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media