Abstract
While entity-oriented neural IR models have advanced significantly, they often overlook a key nuance: the varying degrees of influence individual entities within a document have on its overall relevance. Addressing this gap, we present DREQ, an entity-oriented dense document re-ranking model. Uniquely, we emphasize the query-relevant entities within a document’s representation while simultaneously attenuating the less relevant ones, thus obtaining a query-specific entity-centric document representation. We then combine this entity-centric document representation with the text-centric representation of the document to obtain a “hybrid” representation of the document. We learn a relevance score for the document using this hybrid representation. Using four large-scale benchmarks, we show that DREQ outperforms state-of-the-art neural and non-neural re-ranking methods, highlighting the effectiveness of our entity-oriented representation approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Code and data: https://github.com/shubham526/ECIR2024-DREQ.
- 2.
Term coined by Dr. Laura Dietz at the SIGIR 2023 tutorial (https://github.com/laura-dietz/neurosymbolic-representations-for-IR).
References
Akkalyoncu Yilmaz, Z., Yang, W., Zhang, H., Lin, J.: Cross-domain modeling of sentence-level evidence for document retrieval. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, pp. 3490–3496. Association for Computational Linguistics, November 2019. https://doi.org/10.18653/v1/D19-1352. https://aclanthology.org/D19-1352
Balog, K., Bron, M., De Rijke, M.: Query modeling for entity search based on terms, categories, and examples. ACM Trans. Inf. Syst. 29(4) (2011). https://doi.org/10.1145/2037661.2037667
Cao, N.D., Izacard, G., Riedel, S., Petroni, F.: Autoregressive entity retrieval. In: International Conference on Learning Representations (2021). https://openreview.net/pdf?id=5k8F6UU39V
Chatterjee, S., Dietz, L.: Entity retrieval using fine-grained entity aspects. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 20021, pp. 1662–1666. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3404835.3463035
Chatterjee, S., Dietz, L.: BERT-ER: query-specific BERT entity representations for entity ranking. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 1466–1477. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3477495.3531944
Ciglan, M., Nørvåg, K., Hluchý, L.: The SemSets model for ad-hoc semantic list search. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 131–140. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2187836.2187855
Clark, K., Luong, M., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. CoRR abs/2003.10555 (2020). https://arxiv.org/abs/2003.10555
Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. CoRR abs/1905.09217 (2019). https://arxiv.org/abs/1905.09217
Dai, Z., Callan, J.: Context-aware term weighting for first stage passage retrieval. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 20020, pp. 1533–1536. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3397271.3401204
Dai, Z., Xiong, C., Callan, J., Liu, Z.: Convolutional neural networks for soft-matching n-grams in ad-hoc search. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, pp. 126–134. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3159652.3159659
Dalton, J., Dietz, L., Allan, J.: Entity query feature expansion using knowledge base links. In: Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp. 365–374. ACM (2014)
Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018). https://arxiv.org/abs/1810.04805
Dietz, L.: ENT rank: retrieving entities for topical information needs through entity-neighbor-text relations. In: Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, pp. 215–224. Association for Computing Machinery, New York (2019). https://doi.org/10.1145/3331184.3331257
Dietz, L., Gamari, B., Dalton, J., Craswell, N.: TREC complex answer retrieval overview. In: TREC (2018)
Dietz, L., Verma, M., Radlinski, F., Craswell, N.: TREC complex answer retrieval overview. In: Proceedings of Text REtrieval Conference (TREC) (2017)
Ensan, F., Bagheri, E.: Document retrieval model through semantic linking. In: Proceedings of the 10th ACM International Conference on Web Search and Data Mining, WSDM 2017, pp. 181–190. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3018661.3018692
Gabrilovich, E., Markovitch, S.: Wikipedia-based semantic interpretation for natural language processing. J. Artif. Intell. Res. 34, 443–498 (2009)
Garigliotti, D., Balog, K.: On type-aware entity retrieval. In: Proceedings of the ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR 2017, pp. 27–34. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3121050.3121054
Gerritse, E.J., Hasibi, F., de Vries, A.P.: Graph-embedding empowered entity retrieval. In: Jose, J.M., et al. (eds.) ECIR 2020. LNCS, vol. 12035, pp. 97–110. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_7
Gerritse, E.J., Hasibi, F., de Vries, A.P.: Entity-aware transformers for entity search. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 1455–1465. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3477495.3531971
Graus, D., Tsagkias, M., Weerkamp, W., Meij, E., de Rijke, M.: Dynamic collective entity representations for entity ranking. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, WSDM 2016, pp. 595–604. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2835776.2835819
Guo, J., Fan, Y., Ai, Q., Croft, W.B.: A deep relevance matching model for ad-hoc retrieval. In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, CIKM 2016, pp. 55–64. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2983323.2983769
Hasibi, F., Balog, K., Bratsberg, S.E.: Exploiting entity linking in queries for entity retrieval. In: Proceedings of the 2016 ACM International Conference on the Theory of Information Retrieval, ICTIR 2016, pp. 209–218. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2970398.2970406
He, P., Liu, X., Gao, J., Chen, W.: DeBERta: decoding-enhanced BERT with disentangled attention. CoRR abs/2006.03654 (2020). https://arxiv.org/abs/2006.03654
Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM 2013, pp. 2333–2338. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2505515.2505665
Hui, K., Yates, A., Berberich, K., de Melo, G.: PACRR: a position-aware neural IR model for relevance matching. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1049–1058. Association for Computational Linguistics, Copenhagen, September 2017. https://doi.org/10.18653/v1/D17-1110. https://aclanthology.org/D17-1110
Hui, K., Yates, A., Berberich, K., de Melo, G.: Co-PACRR: a context-aware neural IR model for ad-hoc retrieval. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, WSDM 2018, pp. 279–287. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3159652.3159689
Jiang, Z., Yu, W., Zhou, D., Chen, Y., Feng, J., Yan, S.: ConvBERT: improving BERT with span-based dynamic convolution. CoRR abs/2008.02496 (2020). https://arxiv.org/abs/2008.02496
Kaptein, R., Serdyukov, P., De Vries, A., Kamps, J.: Entity ranking using wikipedia as a pivot. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, CIKM 2010, pp. 69–78. Association for Computing Machinery, New York (2010). https://doi.org/10.1145/1871437.1871451
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.emnlp-main.550. https://aclanthology.org/2020.emnlp-main.550
Khattab, O., Zaharia, M.: ColBERT: efficient and effective passage search via contextualized late interaction over BERT. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, pp. 39–48. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3397271.3401075
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Lehmann, J., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semant. Web 6, 167–195 (2015). https://doi.org/10.3233/SW-140134
Li, C., et al.: NPRF: a neural pseudo relevance feedback framework for ad-hoc information retrieval. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 4482–4491. Association for Computational Linguistics, Brussels, October–November 2018. https://doi.org/10.18653/v1/D18-1478. https://aclanthology.org/D18-1478
Li, C., Yates, A., MacAvaney, S., He, B., Sun, Y.: PARADE: passage representation aggregation for document reranking. CoRR abs/2008.09093 (2020). https://arxiv.org/abs/2008.09093
Liu, X., Fang, H.: Latent entity space: a novel retrieval approach for entity-bearing queries. Inf. Retr. J. 18(6), 473–503 (2015)
Liu, Y., et al.: RoBERta: a robustly optimized BERT pretraining approach. CoRR abs/1907.11692 (2019). https://arxiv.org/abs/1907.11692
Liu, Z., Xiong, C., Sun, M., Liu, Z.: Entity-duet neural ranking: understanding the role of knowledge graph semantics in neural information retrieval. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Melbourne, Australia, pp. 2395–2405. Association for Computational Linguistics, July 2018. https://doi.org/10.18653/v1/P18-1223. https://aclanthology.org/P18-1223
MacAvaney, S., Yates, A., Cohan, A., Goharian, N.: CEDR: contextualized embeddings for document ranking. CoRR abs/1904.07094 (2019). https://arxiv.org/abs/1904.07094
Mackie, I., Owoicho, P., Gemmell, C., Fischer, S., MacAvaney, S., Dalton, J.: Codec: complex document and entity collection. In: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2022, pp. 3067–3077. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3477495.3531712
Manotumruksa, J., Dalton, J., Meij, E., Yilmaz, E.: CrossBERT: a triplet neural architecture for ranking entity properties. In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2020, pp. 2049–2052. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3397271.3401265
Metzler, D., Croft, W.B.: A Markov random field model for term dependencies. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2005, pp. 472–479. Association for Computing Machinery (2005). https://doi.org/10.1145/1076034.1076115
Mitra, B., Craswell, N.: An updated duet model for passage re-ranking. CoRR abs/1903.07666 (2019). https://arxiv.org/abs/1903.07666
Nalisnick, E., Mitra, B., Craswell, N., Caruana, R.: Improving document ranking with dual word embeddings. In: Proceedings of the 25th International Conference Companion on World Wide Web, WWW 2016 Companion, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, pp. 83–84 (2016). https://doi.org/10.1145/2872518.2889361
Naseri, S., Dalton, J., Yates, A., Allan, J.: CEQE: contextualized embeddings for query expansion. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12656, pp. 467–482. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72113-8_31
Nguyen, T., et al.: MS MARCO: a human generated machine reading comprehension dataset. CoRR abs/1611.09268 (2016). https://arxiv.org/abs/1611.09268
Nikolaev, F., Kotov, A.: Joint word and entity embeddings for entity retrieval from a knowledge graph. In: Jose, J.M., et al. (eds.) ECIR 2020, Part I. LNCS, vol. 12035, pp. 141–155. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-45439-5_10
Nikolaev, F., Kotov, A., Zhiltsov, N.: Parameterized fielded term dependence models for ad-hoc entity retrieval from knowledge graph. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 435–444. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2911451.2911545
Nogueira, R.F., Cho, K.: Passage re-ranking with BERT. CoRR abs/1901.04085 (2019). https://arxiv.org/abs/1901.04085
Nogueira, R.F., Yang, W., Cho, K., Lin, J.: Multi-stage document ranking with BERT. CoRR abs/1910.14424 (2019). https://arxiv.org/abs/1910.14424
Nogueira, R.F., Yang, W., Lin, J., Cho, K.: Document expansion by query prediction. CoRR abs/1904.08375 (2019). https://arxiv.org/abs/1904.08375
Piccinno, F., Ferragina, P.: From TagME to wat: a new entity annotator. In: Proceedings of the First International Workshop on Entity Recognition & Disambiguation, ERD 2014, pp. 55–62. Association for Computing Machinery, New York (2014). https://doi.org/10.1145/2633211.2634350
Raffel, C., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683 (2019). https://arxiv.org/abs/1910.10683
Raviv, H., Carmel, D., Kurland, O.: A ranking framework for entity oriented search using Markov random fields. In: Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search, JIWES 2012. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2379307.2379308
Raviv, H., Kurland, O., Carmel, D.: Document retrieval using entity-based language models. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2016, pp. 65–74. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2911451.2911508
Schuhmacher, M., Dietz, L., Paolo Ponzetto, S.: Ranking entities for web queries through text and knowledge. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM 2015, pp. 1461–1470. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2806416.2806480
Shen, W., Wang, J., Han, J.: Entity linking with a knowledge base: issues, techniques, and solutions. IEEE Trans. Knowl. Data Eng. 27(2), 443–460 (2015). https://doi.org/10.1109/TKDE.2014.2327028
Tonon, A., Demartini, G., Cudré-Mauroux, P.: Combining inverted indices and structured search for ad-hoc object retrieval. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 125–134. Association for Computing Machinery, New York (2012). https://doi.org/10.1145/2348283.2348304
Tran, H.D., Yates, A.: Dense retrieval with entity views. In: Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM 2022, pp. 1955–1964. Association for Computing Machinery, New York (2022). https://doi.org/10.1145/3511808.3557285
Wang, X., MacDonald, C., Tonellotto, N., Ounis, I.: ColBERT-PRF: semantic pseudo-relevance feedback for dense passage and document retrieval. ACM Trans. Web 17(1) (2023). https://doi.org/10.1145/3572405
Xiong, C., Callan, J.: EsdRank: connecting query and documents through external semi-structured data. In: Proceedings of the 24th ACM International Conference on Information and Knowledge Management, CIKM 2015, pp. 951–960. ACM, New York (2015). https://doi.org/10.1145/2806416.2806456. https://doi.acm.org/10.1145/2806416.2806456
Xiong, C., Callan, J., Liu, T.Y.: Word-entity duet representations for document ranking. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 763–772. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3077136.3080768
Xiong, C., Dai, Z., Callan, J., Liu, Z., Power, R.: End-to-end neural ad-hoc ranking with kernel pooling. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 55–64. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3077136.3080809
Xiong, C., Liu, Z., Callan, J., Hovy, E.: JointSem: combining query entity linking and entity based document ranking. In: Proceedings of the 2017 ACM SIGIR Conference on Information and Knowledge Management, CIKM 2017, pp. 2391–2394. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3132847.3133048
Xiong, C., Liu, Z., Callan, J., Liu, T.Y.: Towards better text understanding and retrieval through kernel entity salience modeling. In: The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2018, pp. 575–584. Association for Computing Machinery, New York (2018). https://doi.org/10.1145/3209978.3209982
Xiong, C., Power, R., Callan, J.: Explicit semantic ranking for academic search via knowledge graph embedding. In: Proceedings of the 26th International Conference on World Wide Web, WWW 2017, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE, pp. 1271–1279 (2017). https://doi.org/10.1145/3038912.3052558
Xiong, L., et al.: Approximate nearest neighbor negative contrastive learning for dense text retrieval. CoRR abs/2007.00808 (2020). https://arxiv.org/abs/2007.00808
Yamada, I., et al.: Wikipedia2Vec: an efficient toolkit for learning and visualizing the embeddings of words and entities from Wikipedia. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 23–30. Association for Computational Linguistics, Online, October 2020. https://doi.org/10.18653/v1/2020.emnlp-demos.4. https://aclanthology.org/2020.emnlp-demos.4
Yamada, I., Shindo, H., Takefuji, Y.: Representation learning of entities and documents from knowledge base descriptions. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 190–201. Association for Computational Linguistics, Santa Fe, August 2018. https://aclanthology.org/C18-1016
Yu, H., Xiong, C., Callan, J.: Improving query representations for dense retrieval with pseudo relevance feedback. In: Proceedings of the 30th ACM International Conference on Information & Knowledge Management, CIKM 2021, pp. 3592–3596. Association for Computing Machinery, New York (2021). https://doi.org/10.1145/3459637.3482124
Zhang, Z., Han, X., Liu, Z., Jiang, X., Sun, M., Liu, Q.: ERNIE: enhanced language representation with informative entities. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 1441–1451. Association for Computational Linguistics, Florence, July 2019. https://doi.org/10.18653/v1/P19-1139. https://aclanthology.org/P19-1139
Zheng, Z., Hui, K., He, B., Han, X., Sun, L., Yates, A.: BERT-QE: contextualized query expansion for document re-ranking. In: Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4718–4728. Association for Computational Linguistics, Online, November 2020. https://doi.org/10.18653/v1/2020.findings-emnlp.424. https://aclanthology.org/2020.findings-emnlp.424
Zhiltsov, N., Kotov, A., Nikolaev, F.: Fielded sequential dependence model for ad-hoc entity retrieval in the web of data. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2015, pp. 253–262. Association for Computing Machinery, New York (2015). https://doi.org/10.1145/2766462.2767756
Zhou, Y., Croft, W.B.: Query performance prediction in web search environments. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 543–550. Association for Computing Machinery, New York (2007). https://doi.org/10.1145/1277741.1277835
Zhuang, H., et al.: Rankt5: fine-tuning t5 for text ranking with ranking losses. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2023, pp. 2308–2313. Association for Computing Machinery, New York (2023). https://doi.org/10.1145/3539618.3592047
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chatterjee, S., Mackie, I., Dalton, J. (2024). DREQ: Document Re-ranking Using Entity-Based Query Understanding. In: Goharian, N., et al. Advances in Information Retrieval. ECIR 2024. Lecture Notes in Computer Science, vol 14608. Springer, Cham. https://doi.org/10.1007/978-3-031-56027-9_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-56027-9_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-56026-2
Online ISBN: 978-3-031-56027-9
eBook Packages: Computer ScienceComputer Science (R0)