Abstract
Professional search queries are often formulated in a structured manner, where multiple aspects are combined in a logical form. The information need is often fulfilled by an initial retrieval stage followed by a complex reranking algorithm. In this paper, we analyze a simple, explainable reranking model that follows the structured search criterion. Different aspects of the criterion are predicted by machine learning classifiers, which are then combined through the logical form to predict document relevance. On three years of data from the TREC Precision Medicine literature search track (2017–2019), we show that the simple model consistently performs as well as LambdaMART rerankers. Furthermore, many black-box rerankers developed by top-ranked TREC teams can be replaced by this simple model without statistically significant performance change. Finally, we find that the model can achieve remarkably high performance even when manually labeled documents are very limited. Together, these findings suggest that leveraging the structure in professional search queries is a promising direction towards building explainable, label-efficient, and high-performance retrieval models for professional search tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agosti, M., Nunzio, G.M.D., Marchesin, S.: The university of Padua IMS research group at TREC 2018 precision medicine track (2018)
Aromataris, E., Riitano, D.: Systematic reviews: constructing a search strategy and searching for evidence. Am. J. Nurs. 114(5), 49–56 (2014)
Beltagy, I., Lo, K., Cohan, A.: Scibert: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)
Burges, C.J.: From ranknet to lambdarank to lambdamart: an overview. Learning 11(23–581), 81 (2010)
Caucheteur, D., Pasche, E., Gobeill, J., Mottaz, A., Mottin, L., Ruch, P.: Designing retrieval models to contrast precision-driven ad hoc search vs. recall-driven treatment extraction in precision medicine. In: TREC (2019)
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
Cieslewicz, A., Dutkiewicz, J., Jedrzejek, C.: Poznan contribution to TREC-PM 2019. In: TREC (2019)
Dang, V., Bendersky, M., Croft, W.B.: Two-stage learning to rank for information retrieval. In: European Conference on Information Retrieval, pp. 423–434 (2013)
Faessler, E., Hahn, U., Oleynik, M.: Julie lab & med uni graz@ TREC 2019 precision medicine track. In: TREC (2019)
Faessler, E., Oleynik, M., Hahn, U.: What makes a top-performing precision medicine search engine? tracing main system features in a systematic way. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 459–468 (2020)
Feng, J., Yang, Z., Liu, Z., Luo, L., Lin, H., Wang, J.: Dutir at TREC 2019: Precision medicine track. In: TREC (2019)
Fernando, Z.T., Singh, J., Anand, A.: A study on the interpretability of neural retrieval models using deepshap. In: SIGIR 2019. pp. 1005–1008. ACM, New York, NY, USA (2019)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, pp. 55–64 (2016)
Hui, K., Yates, A., Berberich, K., de Melo, G.: PACRR: a position-aware neural IR model for relevance matching. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1049–1058. Association for Computational Linguistics, Copenhagen, Denmark (September 2017)
Kanoulas, E., Li, D., Azzopardi, L., Spijker, R.: Clef 2019 technology assisted reviews in empirical medicine overview. In: CEUR Workshop Proceedings, vol. 2380 (2019)
Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: International Conference on Machine Learning, pp. 1188–1196 (2014)
LexisNexis: Developing a search with lexisnexis. Accessed September 2020). http://www.lexisnexis.com/bis-user-information/docs/developingasearch.pdf
Li, P., Wu, Q., Burges, C.J.: Mcrank: learning to rank using multiple classification and gradient boosting. In: Advances in Neural Information Processing Systems, pp. 897–904 (2008)
Liu, T.Y.: Learning to Rank for Information Retrieval. Springer, Beijing (2011) https://doi.org/10.1007/978-3-642-1467-3
Liu, X., Li, L., Yang, Z., Dong, S.: SCUT-CCNL at TREC 2019 precision medicine track. In: TREC (2019)
López-Úbeda, P., Vera-Ramos, J.A., López-García, P.: TREC 2019 precision medicine - medical university of Graz. In: TREC (2019)
Nunzio, G.M.D., Marchesin, S., Agosti, M.: Exploring how to combine query reformulations for precision medicine. In: TREC (2019)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Johnson, D.: Terrier information retrieval platform. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 517–519. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_37
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., Ananiadou, S.: Using text mining for study identification in systematic reviews: a systematic review of current approaches. Syst. Rev. 4(1), 5 (2015)
Qu, J., Arguello, J., Wang, Y.: Towards explainable retrieval models for precision medicine literature search. In: Proceedings of the 43rd ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1593–1596 (2020)
Qu, J., Wang, Y.: UNC SILS at TREC 2019 precision medicine track. In: TREC (2019)
Roberts, K., et al.: Overview of the TREC 2017 precision medicine track (2017)
Roberts, K., et al.: Overview of the TREC 2019 precision medicine track. In: TREC (2019)
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Mach. Intell. 1(5), 206–215 (2019)
Russell-Rose, T., Chamberlain, J., Azzopardi, L.: Information retrieval in the workplace: a comparison of professional search practices. Inf. Process. Manage. 54(6), 1042–1057 (2018)
Rybinski, M., Karimi, S., Paris, C.: Csiro at 2019 TREC precision medicine track. In: TREC (2019)
Schardt, C., Adams, M.B., Owens, T., Keitz, S., Fontelo, P.: Utilization of the pico framework to improve searching pubmed for clinical questions. BMC Med. Inf. Decis. Making 7(1), 16 (2007)
Singh, J., Anand, A.: Posthoc interpretability of learning to rank models using secondary training data. arXiv:1806.11330 (2018)
Smucker, M.D., Allan, J., Carterette, B.: A comparison of statistical significance tests for information retrieval evaluation. In: CIKM, pp. 623–632 (2007)
Tian, A., Lease, M.: Active learning to maximize accuracy vs. effort in interactive information retrieval. In: Proceedings of the 34th ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 145–154 (2011)
Wallace, B.C., Trikalinos, T.A., Lau, J., Brodley, C., Schmid, C.H.: Semi-automated screening of biomedical citations for systematic reviews. BMC Bioinform. 11(1), 1–11 (2010)
Wu, D.T.Y., Su, W., Lee, J.J.: Retrieving scientific abstracts using venue- and concept-based approaches: Cincymedir at TREC 2019 precision medicine track. In: TREC (2019)
Zhang, Y., Chen, X.: Explainable recommendation: a survey and new perspectives. arXiv preprint: 1804.11192 (2018)
Zheng, Q., Li, Y., Hu, J., Yang, Y., He, L., Xue, Y.: ECNU-ICA team at TREC 2019 precision medicine track. In: TREC (2019)
Acknowledgment
UNC SILS Kilgour Research Grant supported this work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Qu, J., Arguello, J., Wang, Y. (2021). A Deep Analysis of an Explainable Retrieval Model for Precision Medicine Literature Search. In: Hiemstra, D., Moens, MF., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds) Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science(), vol 12656. Springer, Cham. https://doi.org/10.1007/978-3-030-72113-8_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-72113-8_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-72112-1
Online ISBN: 978-3-030-72113-8
eBook Packages: Computer ScienceComputer Science (R0)