[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

An effective and efficient results merging strategy for multilingual information retrieval in federated search environments

Published: 01 February 2008 Publication History

Abstract

Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target languages in response to a user query in a single source language. In a multilingual federated search environment, different information sources contain documents in different languages. A general search strategy in multilingual federated search environments is to translate the user query to each language of the information sources and run a monolingual search in each information source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information sources that are in different languages. This is known as the results merging problem for multilingual information retrieval. Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the other side, a more effective merging method was proposed to download and translate all retrieved documents into the source language and generate the final ranked list by running a monolingual search in the search client. The latter method is more effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing both the query-based translation method and the document-based translation method. Then, query-specific and source-specific transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) (http://www.clef-campaign.org/) data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results merging algorithm with different transformation models. This paper also provides thorough experimental results as well as detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results of the cross-language evaluation forum-CLEF 2005, 2005).

References

[1]
Aslam, J. A., & Montague, M. (2001). Models for metasearch. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘01, New Orleans, Louisiana, United States (pp. 276–284). New York, NY: ACM.
[2]
Ballesteros, L., & Croft, W. B. (1997). Phrasal translation and query expansion techniques for cross-language information retrieval. In N. J. Belkin, A. D. Narasimhalu, P. Willett, & W. Hersh (Eds.), Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘97, Philadelphia, Pennsylvania, United States, July 27–31, 1997 (pp. 84–91). New York, NY: ACM.
[3]
Brown P. F., Pietra D., Pietra D, and Mercer R. L. The mathematics of statistical machine translation: Parameter estimation Computational Linguistics 1993 19 263-312
[4]
Callan J. and Connell M. Query-based sampling of text databases ACM Transactions on Information Systems 2001 19 2 97-130
[5]
CalIan, J. P., Croft, W. B., & Harding, S. M. (1992). The INQUERY retrieval system. In Proceedings of the Third International Conference on Database and Expert Systems Applications, Valencia, Spain (pp. 78–83). Springer-Verlag.
[6]
Chen A., & Gey, F. C. (2003). Combining query translation and document translation in cross-language retrieval. In C. Peters, J. Gonzalo, M. Braschler, et al. (Eds.), 4th Workshop of the Cross-Language Evaluation Forum, CLEF 2003, Lecture notes in Computer Science, Trondheim, Norway (pp. 108–121). Springer-Verlag.
[7]
Hull, D. A., & Grefenstette, G. (1996). Query across languages: A dictionary-based approach to multilingual information retrieval. In Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘96, Zurich, Switzerland, August 18–22, 1996 (pp. 49–57). New York, NY: ACM.
[8]
Jones, G. J. F., Burke, M., Judge, J., Khasin, A., Lam-Adesina, A. M., & Wagner, J. (2005). Dublin City University at CLEF 2004: Experiments in monolingual, bilingual and multilingual retrieval. In CLEF (pp. 207–220).
[9]
Kamps, J., Monz, C., de Rijke, M., & Sigurbjörnsson, B. (2003). The University of Amsterdam at CLEF-2003. In Results of the CLEF 2003 Cross-Language System Evaluation Campaign, Trondheim, Norway (pp. 71–78).
[10]
Lee, J. H. (1997). Analyses of multiple evidence combination. In N. J. Belkin, A. D. Narasimhalu, P. Willett, & W. Hersh (Eds.), Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘97, Philadelphia, Pennsylvania, United States, July 27–31, 1997 (pp. 267–276). New York, NY: ACM.
[11]
Levow G. A., Oard D. W., and Resnik P. Dictionary-based cross-language retrieval Information Processing and Management 2004 41 523-547
[12]
Martínez-Santiago, F., Martin, M., & Ureña, A. (2002). SINAI on CLEF 2002: Experiments with merging strategies. In C. Peters (Ed.), Results of the cross-language evaluation forum—CLEF 2002 (pp. 187–196).
[13]
Oard, D., & Diekema, A. (1998). Cross-language information retrieval. In M. Williams (Ed.), Annual review of information science (pp. 223–256).
[14]
Och, F. J., & Ney, H. (2000). Improved statistical alignment models. In Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Annual Meeting of the ACL, Hong Kong, October 03–06, 2000 (pp. 440–447). Morristown, NJ: Association for Computational Linguistics.
[15]
Robertson, S. E., & Walker, S. (1994). Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In W. B. Croft & C. J. van Rijsbergen (Eds.), Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, July 03–06, 1994 (pp. 232–241). New York, NY: Springer-Verlag New York.
[16]
Rogati, M., & Yang, Y. M. (2003). CONTROL: CLEF-2003 with open, transparent resources off-line. Experiments with merging strategies. In C. Peters (Ed.), Results of the cross-language evaluation forum-CLEF.
[17]
Savoy, J. (2002). Report on CLEF 2002 experiments: Combining multiple sources of evidence. In C. Peters et al. (Eds.), Advances in cross-language information retrieval, LNCS (Vol. 2785, pp. 66–90). Berlin: Springer-Verlag.
[18]
Savoy, J. (2003). Report on CLEF-2003 multilingual tracks. In: Procedings of CLEF 2003, Trondheim, Norway (pp. 7–12).
[19]
Si L. and Callan J. A semi-supervised learning method to merge search engine results ACM Transactions on Information Systems 2003 24 4 457-491
[20]
Si, L., & Callan, J. (2005). CLEF2005: Multilingual retrieval by combining multiple multilingual ranked lists. In C. Peters (Ed.), Results of the cross-language evaluation forum-CLEF 2005.
[21]
Turtle, H. (1990). Inference networks for document retrieval. Technical Report COINS Report 90-7, Computer and Information Science Department, University of Massachusetts, Amherst.
[22]
Xu, J., Weischedel, R., & Nguyen, C. (2001). Evaluating a probabilistic model for cross-lingual information retrieval. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ‘01, New Orleans, Louisiana, United States (pp. 105–110). New York, NY: ACM.

Cited By

View all
  • (2024)Distillation for Multilingual Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657955(2368-2373)Online publication date: 10-Jul-2024
  • (2023)Neural Approaches to Multilingual Information RetrievalAdvances in Information Retrieval10.1007/978-3-031-28244-7_33(521-536)Online publication date: 2-Apr-2023
  • (2017)BantuwebProceedings of the South African Institute of Computer Scientists and Information Technologists10.1145/3129416.3129446(1-10)Online publication date: 26-Sep-2017
  • Show More Cited By

Index Terms

  1. An effective and efficient results merging strategy for multilingual information retrieval in federated search environments
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Please enable JavaScript to view thecomments powered by Disqus.

          Information & Contributors

          Information

          Published In

          cover image Information Retrieval
          Information Retrieval  Volume 11, Issue 1
          Feb 2008
          74 pages

          Publisher

          Kluwer Academic Publishers

          United States

          Publication History

          Published: 01 February 2008
          Accepted: 16 October 2007
          Received: 11 December 2006

          Author Tags

          1. Results merging
          2. Federated search
          3. Multilingual information retrieval

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 16 Dec 2024

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Distillation for Multilingual Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657955(2368-2373)Online publication date: 10-Jul-2024
          • (2023)Neural Approaches to Multilingual Information RetrievalAdvances in Information Retrieval10.1007/978-3-031-28244-7_33(521-536)Online publication date: 2-Apr-2023
          • (2017)BantuwebProceedings of the South African Institute of Computer Scientists and Information Technologists10.1145/3129416.3129446(1-10)Online publication date: 26-Sep-2017
          • (2015)Multilingual information retrieval in the language modeling frameworkInformation Retrieval10.1007/s10791-015-9255-118:3(246-281)Online publication date: 1-Jun-2015
          • (2013)Distributed information retrieval and applicationsProceedings of the 35th European conference on Advances in Information Retrieval10.1007/978-3-642-36973-5_104(865-868)Online publication date: 24-Mar-2013
          • (2011)Federated SearchFoundations and Trends in Information Retrieval10.1561/15000000105:1(1-102)Online publication date: 1-Jan-2011

          View Options

          View options

          Media

          Figures

          Other

          Tables

          Share

          Share

          Share this Publication link

          Share on social media