Abstract
We compare query and document translation from and to English, French, German and Spanish for multilingual retrieval in an academic search portal: PubPsych. Both translation approaches improve the retrieval performance of the system with document translation providing better results. Performance inversely correlates with the amount of available original language documents. The more documents already available in a language, the fewer improvements can be observed. Retrieval performance with English as a source language does not improve with translation as most documents already contained English-language content in our text collection. The large-scale evaluation study is based on a corpus of more than 1M metadata documents and 50 real queries taken from the query log files of the portal.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
A reviewer of this paper pointed out that recall-oriented searches for systematic reviews are another important use case for academic search portals. This use case was not addressed in this study.
- 5.
This dataset is available at https://github.com/clubs-project/documentation/.
- 6.
References
Ammon, U.: Global scientific communication: open questions and policy suggestions. AILA Rev. 20, 123–133 (2007)
Banerjee, S., Lavie, A.: METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of Workshop on Intrinsic and Extrinsic Evaluation Measures for MT and/or Summarization at the 43rd Annual Meeting of the Association of Computational Linguistics (ACL-2005), Ann Arbor, Michigan, June 2005
Bernardi, R., et al.: Multilingual search in libraries. The case-study of the Free University of Bozen-Bolzano. In: LREC, pp. 2287–2290 (2006)
Biswas, S.C.: Multilingual access to information in a networked environment character encoding & unicode standard. In: INFLIBNET 3rd Convention Planner, Assam University, Silchar, 10–11 November 2005, pp. 176–186. INFLIBNET Centre (2005). http://hdl.handle.net/1944/1391
Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Am. Soc. Inf. Sci. Technol. 66(11), 2215–2222 (2015)
Braschler, M., Scháuble, P.: Experiments with the eurospider retrieval system for CLEF 2000. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 140–148. Springer, Heidelberg (2001). https://doi.org/10.1007/3-540-44645-1_13
Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)
Chen, A., Gey, F.C.: Combining query translation and document translation in cross-language retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 108–121. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30222-3_10
Clough, P., Sanderson, M.: User experiments with the eurovision cross-language image retrieval system. J. Am. Soc. Inform. Sci. Technol. 57(5), 697–708 (2006)
Di Bitetti, M.S., Ferreras, J.A.: Publish (in English) or perish: the effect on citation rate of using languages other than English in scientific publications. Ambio 46(1), 121–127 (2017)
Diekema, A.R.: Multilinguality in the digital library: a review. Electron. Libr. 30(2), 165–181 (2012). https://doi.org/10.1108/02640471211221313
España-Bonet, C., Ramthun, R.: M3.1—Cross-lingual thesaurus and controlled term translation. Technical report, CLUBS-Project, March 2018. https://doi.org/10.23668/psycharchives.2746
España-Bonet, C., Stiller, J., Henning, S.: M1.2—Corpora for the machine translation engines. Technical report, CLUBS-Project, July 2018. https://doi.org/10.23668/psycharchives.2746
España-Bonet, C., Stiller, J., Ramthun, R., van Genabith, J., Petras, V.: Query translation for cross-lingual search in the academic search engine PubPsych. In: Garoufallou, E., Sartori, F., Siatri, R., Zervas, M. (eds.) MTSR 2018. CCIS, vol. 846, pp. 37–49. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-14401-2_4
España-Bonet, C., Henning, S., Ramthun, R., Stiller, J., van Genabith, J.: MT models for multilingual CLuBS engine (en-de-fr-es), March 2020. https://doi.org/10.5281/zenodo.3709164
Henrich, J., Heine, S.J., Norenzayan, A.: Most people are not WEIRD. Nature 466, 29 (2010)
Johnson, M., et al.: Google’s multilingual neural machine translation system: enabling zero-shot translation. Trans. Assoc. Comput. Linguist. 5, 339–351 (2017). https://doi.org/10.1162/tacl_a_00065. https://www.aclweb.org/anthology/Q17-1024
Junczys-Dowmunt, M., et al.: Marian: fast neural machine translation in C++. In: Proceedings of ACL 2018, System Demonstrations, pp. 116–121. Association for Computational Linguistics, Melbourne, Australia, July 2018. http://www.aclweb.org/anthology/P18-4020
Khabsa, M., Wu, Z., Giles, C.L.: Towards better understanding of academic search. In: JCDL 2016, pp. 111–114. ACM (2016)
Király, P.: Query translation in Europeana. Code4Lib J. 27 (2015)
Kornadt, H.J., Trommsdorff, G., Kobayashi, R.B.: “Mein Hund hat mich bestorben”: sprachlicher Ausdruck von Gefühlen im deutsch-japanischen Vergleich. In: Kornadt, H.J. (ed.) Sprache und Kognition: Perspektiven moderner Sprachpsychologie, pp. 233–250. Spektrum Akad. Verl., Heidelberg (1994)
Li, X., Schijvenaars, B.J., de Rijke, M.: Investigating queries and search failures in academic search. Inf. Process. Manag. 53(3), 666–683 (2017)
McCarley, J.S.: Should we translate the documents or the queries in cross-language information retrieval? In: Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics on Computational Linguistics, ACL 1999, USA, pp. 208–299 (1999). https://doi.org/10.3115/1034678.1034716
Nikoulina, V., Kovachev, B., Lagos, N., Monz, C.: Adaptation of statistical machine translation model for cross-lingual information retrieval in a service context. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 109–119 (2012)
Nzomo, P., Ajiferuke, I., Vaughan, L., McKenzie, P.: Multilingual information retrieval & use: perceptions and practices amongst bi/multilingual academic users. J. Acad. Librariansh. 42(5), 495–502 (2016)
Oard, D.W.: Serving users in many languages: cross-language information retrieval for digital libraries. D-Lib Mag. (1997)
Oard, D.W.: A comparative study of query and document translation for cross-language information retrieval. In: Farwell, D., Gerber, L., Hovy, E. (eds.) AMTA 1998. LNCS (LNAI), vol. 1529, pp. 472–483. Springer, Heidelberg (1998). https://doi.org/10.1007/3-540-49478-2_42
Oard, D.W., Hackett, P.G.: Document translation for cross-language text retrieval at the University of Maryland. In: Proceedings of the Sixth Text REtrieval Conference (TREC-6), pp. 687–696 (1997)
Palotti, J.A., Hanbury, A., Müller, H., Kahn Jr., C.E.: How users search and what they search for in the medical domain. Inf. Retrieval 19(1–2), 189–224 (2016)
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the Association of Computational Linguistics, pp. 311–318 (2002)
Peters, C., Braschler, M., Clough, P.: Cross-language information retrieval. In: Peters, C., Braschler, M., Clough, P. (eds.) Multilingual Information Retrieval, pp. 57–84. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23008-0_3
Petras, V., Perelman, N., Gey, F.: UC Berkeley at CLEF-2003 – Russian language experiments and domain-specific retrieval. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 401–411. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30222-3_39
Sanderson, M., et al.: Test collection based evaluation of information retrieval systems. Found. Trends® Inform. Retrieval 4(4), 247–375 (2010)
Savoy, J., Braschler, M.: Lessons learnt from experiments on the ad hoc multilingual test collections at CLEF. In: Ferro, N., Peters, C. (eds.) Information Retrieval Evaluation in a Changing World. TIRS, vol. 41, pp. 177–200. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-22948-1_7
Schuers, M., et al.: Lost in translation? A multilingual query builder improves the quality of pubmed queries: a randomised controlled trial. BMC Med. Inform. Decis. Mak. 17(1), 94 (2017)
Türe, F., Boschee, E.: Learning to translate: a query-specific combination approach for cross-lingual information retrieval. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 589–599 (2014)
Uhl, M.: Survey on European psychology publication issues. Psychol. Sci. Q. 51(1), 19–26 (2009)
Vanopstal, K., Buysschaert, J., Laureys, G., Stichele, R.V.: Lost in PubMed. Factors influencing the success of medical information retrieval. Expert Syst. Appl. 40(10), 4106–4114 (2013)
Vassilakaki, E., Garoufallou, E., Johnson, F., Hartley, R.J.: An exploration of users’ needs for multilingual information retrieval and access. In: Garoufallou, E., Hartley, R.J., Gaitanou, P. (eds.) MTSR 2015. CCIS, vol. 544, pp. 249–258. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24129-6_22
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30. pp. 5998–6008. Curran Associates, Inc. (2017)
Waeldin, S.: Results from the PubPsych launch survey: short report. ZPID Sci. Inf. Online 15(2), 3 (2015). https://www.zpid.de/pub/research/2015_Waeldin_PubPsych-launch.pdf
Weichselgartner, E., Baier, C., Ramthun, R.: Pubpsych: a powerful research tool providing access to a broad supranational body of psychological knowledge. Datenbank-Spektrum 17(1), 35–39 (2017)
Yi, K., Beheshti, J., Cole, C., Leide, J.E., Large, A.: User search behavior of domain-specific information retrieval systems: an analysis of the query logs from PsycINFO and ABC-Clio’s historical abstracts-America: history and life: research articles. J. Am. Soc. Inf. Sci. Technol. 57(9), 1208–1220 (2006)
Zhang, Y.: Improved cross-language information retrieval via disambiguation and vocabulary discovery. Ph.D. thesis, School of Computer Science and Information Technology RMIT University, Melbourne, Victoria, Australia (2006)
Acknowledgments
This research was supported by the Leibniz-Gemeinschaft under grant SAW-2016-ZPID-2.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Petras, V., Lüschow, A., Ramthun, R., Stiller, J., España-Bonet, C., Henning, S. (2020). Query or Document Translation for Academic Search – What’s the Real Difference?. In: Arampatzis, A., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2020. Lecture Notes in Computer Science(), vol 12260. Springer, Cham. https://doi.org/10.1007/978-3-030-58219-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-58219-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58218-0
Online ISBN: 978-3-030-58219-7
eBook Packages: Computer ScienceComputer Science (R0)