More Web Proxy on the site http://driver.im/

research-article

PARADE: Passage Representation Aggregation forDocument Reranking

Authors:

Sean MacAvaney,

Yingfei SunAuthors Info & Claims

ACM Transactions on Information Systems, Volume 42, Issue 2

Article No.: 36, Pages 1 - 26

https://doi.org/10.1145/3600088

Published: 27 September 2023 Publication History

Abstract

Pre-trained transformer models, such as BERT and T5, have shown to be highly effective at ad hoc passage and document ranking. Due to the inherent sequence length limits of these models, they need to process document passages one at a time rather than processing the entire document sequence at once. Although several approaches for aggregating passage-level signals into a document-level relevance score have been proposed, there has yet to be an extensive comparison of these techniques. In this work, we explore strategies for aggregating relevance signals from a document’s passages into a final ranking score. We find that passage representation aggregation techniques can significantly improve over score aggregation techniques proposed in prior work, such as taking the maximum passage score. We call this new approach PARADE. In particular, PARADE can significantly improve results on collections with broad information needs where relevance signals can be spread throughout the document (such as TREC Robust04 and GOV2). Meanwhile, less complex aggregation techniques may work better on collections with an information need that can often be pinpointed to a single passage (such as TREC DL and TREC Genomics). We also conduct efficiency analyses and highlight several strategies for improving transformer-based aggregation.

References

[1]

Qingyao Ai, Brendan O’Connor, and W. Bruce Croft. 2018. A neural passage model for ad-hoc document retrieval. In ECIR(Lecture Notes in Computer Science, Vol. 10772). Springer, 537–543.

[2]

Jimmy Ba and Rich Caruana. 2014. Do deep nets really need to be deep? In NIPS. 2654–2662.

[3]

Lei Jimmy Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. 2016. Layer normalization. CoRR abs/1607.06450 (2016).

[4]

Iz Beltagy, Matthew E. Peters, and Arman Cohan. 2020. Longformer: The long-document transformer. CoRR abs/2004.05150 (2020).

[5]

Michael Bendersky and Oren Kurland. 2008. Utilizing passage-based language models for document retrieval. In ECIR(Lecture Notes in Computer Science, Vol. 4956). Springer, 162–174.

[6]

Yoshua Bengio, Aaron C. Courville, and Pascal Vincent. 2013. Representation learning: A review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 8 (2013), 1798–1828.

Digital Library

[7]

Leonid Boytsov, Tianyi Lin, Fangwei Gao, Yutian Zhao, Jeffrey Huang, and Eric Nyberg. 2022. Understanding performance of long-document ranking models through comprehensive evaluation and leaderboarding. CoRR abs/2207.01262 (2022).

[8]

James P. Callan. 1994. Passage-level evidence in document retrieval. In SIGIR. ACM/Springer, 302–310.

[9]

M. Catena, O. Frieder, Cristina Ioana Muntean, F. Nardini, R. Perego, and N. Tonellotto. 2019. Enhanced news retrieval: Passages lead the way! In SIGIR.

[10]

Xuanang Chen, Ben He, Kai Hui, Le Sun, and Yingfei Sun. 2020. Simplified TinyBERT: Knowledge distillation for document retrieval. CoRR abs/2009.07531 (2020).

[11]

Xuanang Chen, Ben He, Kai Hui, Yiran Wang, Le Sun, and Yingfei Sun. 2021. Contextualized offline relevance weighting for efficient and effective neural retrieval. In SIGIR. ACM, 1617–1621.

[12]

Xuanang Chen, Ben He, Le Sun, and Yingfei Sun. 2020. ICIP at TREC-2020 Deep Learning Track. In TREC(NIST Special Publication, Vol. 1266). National Institute of Standards and Technology (NIST).

[13]

Xuanang Chen, Canjia Li, Ben He, and Yingfei Sun. 2019. UCAS at TREC-2019 Deep Learning Track. In TREC.

[14]

Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. 2019. Generating long sequences with sparse transformers. CoRR abs/1904.10509 (2019).

[15]

Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. In ICLR. OpenReview.net.

[16]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, and Daniel Campos. 2020. Overview of the TREC 2020 Deep Learning Track. In TREC.

[17]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Jimmy Lin. 2022. Overview of the TREC 2021 Deep Learning Track. In Text REtrieval Conference (TREC). TREC. Retrieved from https://www.microsoft.com/en-us/research/publication/overview-of-the-trec-2021-deep-learning-track/.

[18]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2019. Overview of the TREC 2019 Deep Learning Track. In TREC.

[19]

Zhuyun Dai and Jamie Callan. 2019. Context-aware sentence/passage term importance estimation for first stage retrieval. CoRR abs/1910.10687 (2019).

[20]

Zhuyun Dai and Jamie Callan. 2019. Deeper text understanding for IR with contextual neural language modeling. In SIGIR. ACM, 985–988.

[21]

Zhuyun Dai, Chenyan Xiong, Jamie Callan, and Zhiyuan Liu. 2018. Convolutional neural networks for soft-matching n-grams in ad-hoc search. In WSDM. ACM, 126–134.

[22]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT.

[23]

Yixing Fan, Jiafeng Guo, Yanyan Lan, Jun Xu, Chengxiang Zhai, and Xueqi Cheng. 2018. Modeling diverse relevance patterns in ad-hoc retrieval. In SIGIR. ACM, 375–384.

[24]

Hui Fang, Tao Tao, and Chengxiang Zhai. 2011. Diagnostic evaluation of information retrieval models. ACM Trans. Inf. Syst. 29, 2 (2011).

Digital Library

[25]

Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. 2021. SPLADE v2: Sparse lexical and expansion model for information retrieval. arXiv preprint arXiv:2109.10086 (2021).

[26]

Luyu Gao, Zhuyun Dai, and Jamie Callan. 2020. Understanding BERT rankers under distillation. In ICTIR.

[27]

Luyu Gao, Zhuyun Dai, and James P. Callan. 2020. EARL: Speedup transformer-based rankers with pre-computed representation. ArXiv abs/2004.13313 (2020).

[28]

Jiafeng Guo, Yixing Fan, Qingyao Ai, and W. Bruce Croft. 2016. A deep relevance matching model for ad-hoc retrieval. In CIKM. ACM, 55–64.

[29]

William Hersh, Aaron Cohen, Lynn Ruslen, and Phoebe Roberts. 2007. TREC 2007 Genomics Track overview. In TREC.

[30]

William Hersh, Aaron M. Cohen, Phoebe Roberts, and Hari Krishna Rekapalli. 2006. TREC 2006 Genomics Track overview. In TREC.

[31]

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015).

[32]

Sebastian Hofstätter, Sophia Althammer, Michael Schröder, Mete Sertkan, and Allan Hanbury. 2020. Improving efficient neural ranking models with cross-architecture knowledge distillation. CoRR abs/2010.02666 (2020).

[33]

Sebastian Hofstätter, Aldo Lipani, Sophia Althammer, Markus Zlabinger, and Allan Hanbury. 2021. Mitigating the position bias of transformer models in passage re-ranking. In ECIR(Lecture Notes in Computer Science, Vol. 12656). Springer, 238–253.

[34]

Sebastian Hofstätter, Hamed Zamani, Bhaskar Mitra, Nick Craswell, and Allan Hanbury. 2020. Local self-attention over long text for efficient document retrieval. In SIGIR. ACM, 2021–2024.

[35]

Sebastian Hofstätter, Markus Zlabinger, and Allan Hanbury. 2019. TU Wien @ TREC deep learning’19—Simple contextualization for re-ranking. In TREC.

[36]

Sebastian Hofstätter, Markus Zlabinger, and Allan Hanbury. 2020. Interpretable & time-budget-constrained contextualization for re-ranking. CoRR abs/2002.01854 (2020).

[37]

Sebastian Hofstätter, Markus Zlabinger, Mete Sertkan, Michael Schröder, and Allan Hanbury. 2020. Fine-grained relevance annotations for multi-task document ranking and question answering. In CIKM. ACM, 3031–3038.

[38]

Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. 2014. Convolutional neural network architectures for matching natural language sentences. In NIPS. 2042–2050.

[39]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry P. Heck. 2013. Learning deep structured semantic models for web search using clickthrough data. In CIKM. ACM, 2333–2338.

[40]

Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2017. PACRR: A position-aware neural IR model for relevance matching. In EMNLP. Association for Computational Linguistics, 1049–1058.

[41]

Kai Hui, Andrew Yates, Klaus Berberich, and Gerard de Melo. 2018. Co-PACRR: A context-aware neural IR model for ad-hoc retrieval. In WSDM. ACM, 279–287.

[42]

Samuel Humeau, Kurt Shuster, Marie-Anne Lachaux, and Jason Weston. 2020. Poly-encoders: Architectures and pre-training strategies for fast and accurate multi-sentence scoring. In ICLR. OpenReview.net.

[43]

Jyun-Yu Jiang, Chenyan Xiong, Chia-Jung Lee, and Wei Wang. 2020. Long document ranking with query-directed sparse transformer. In EMNLP (Findings). Association for Computational Linguistics, 4594–4605.

[44]

Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin Li, Fang Wang, and Qun Liu. 2019. TinyBERT: Distilling BERT for natural language understanding. CoRR abs/1909.10351 (2019).

[45]

Mostafa Keikha, Jae Hyun Park, and W. Bruce Croft. 2014. Evaluating answer passages using summarization measures. In SIGIR. 963–966.

[46]

Mostafa Keikha, Jae Hyun Park, W. Bruce Croft, and Mark Sanderson. 2014. Retrieving passages and finding answers. In ADCS. 81–84.

[47]

Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In SIGIR.

[48]

Youngwoo Kim, Razieh Rahimi, Hamed R. Bonab, and James Allan. 2021. Query-driven segment selection for ranking long documents. In CIKM. ACM, 3147–3151.

[49]

Julien Knafou, Matthew Jeffryes, Sohrab Ferdowsi, and Patrick Ruch. 2020. SIB text mining at TREC 2020 Deep Learning Track. In TREC(NIST Special Publication, Vol. 1266). National Institute of Standards and Technology (NIST).

[50]

Victor Lavrenko and W. Bruce Croft. 2001. Relevance-based language models. In SIGIR. ACM, 120–127.

[51]

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240.

[52]

Canjia Li, Yingfei Sun, Ben He, Le Wang, Kai Hui, Andrew Yates, Le Sun, and Jungang Xu. 2018. NPRF: A neural pseudo relevance feedback framework for ad-hoc information retrieval. In EMNLP. Association for Computational Linguistics, 4482–4491.

[53]

Canjia Li and Andrew Yates. 2020. MPII at the TREC 2020 Deep Learning Track. In TREC.

[54]

Canjia Li and Andrew Yates. 2020. MPII at the NTCIR-15 WWW-3 task. In NTCIR.

[55]

Minghan Li and Éric Gaussier. 2021. KeyBLD: Selecting key blocks with local pre-ranking for long document information retrieval. In SIGIR. ACM, 2207–2211.

[56]

Jimmy Lin and Xueguang Ma. 2021. A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques. arXiv preprint arXiv:2106.14807 (2021).

[57]

Jimmy Lin, Rodrigo Nogueira, and Andrew Yates. 2020. Pretrained transformers for text ranking: BERT and beyond. arXiv preprint arXiv:2010.06467 (2020).

[58]

Jimmy J. Lin. 2009. Is searching full text more effective than searching abstracts? BMC Bioinform. 10 (2009).

[59]

Xiaoyong Liu and W. Bruce Croft. 2002. Passage retrieval based on language models. In CIKM. ACM, 375–382.

[60]

Yang Liu and Mirella Lapata. 2019. Hierarchical transformers for multi-document summarization. In ACL. 5070–5081.

[61]

Zhiyuan Liu, Yankai Lin, and Maosong Sun. 2020. Representation Learning for Natural Language Processing. Springer.

[62]

Yi Luan, Jacob Eisenstein, Kristina Toutanova, and Michael Collins. 2021. Sparse, dense, and attentional representations for text retrieval. Trans. Assoc. Computat. Ling. 9 (042021), 329–345. DOI:

[63]

Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Xiang Ji, and Xueqi Cheng. 2021. PROP: Pre-training with representative words prediction for ad-hoc retrieval. In WSDM. ACM, 283–291.

[64]

Xinyu Ma, Jiafeng Guo, Ruqing Zhang, Yixing Fan, Yingyan Li, and Xueqi Cheng. 2021. B-PROP: Bootstrapped pre-training with representative words prediction for ad-hoc retrieval. In SIGIR. ACM, 1318–1327.

[65]

Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Efficient document re-ranking for transformers by precomputing term representations. In SIGIR.

[66]

Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, and Ophir Frieder. 2020. Expansion via prediction of importance with contextualization. In SIGIR.

[67]

Sean MacAvaney, Andrew Yates, Arman Cohan, and Nazli Goharian. 2019. CEDR: Contextualized embeddings for document ranking. In SIGIR. ACM, 1101–1104.

[68]

Antonio Mallia, Omar Khattab, Torsten Suel, and Nicola Tonellotto. 2021. Learning passage impacts for inverted indexes. In SIGIR. ACM, 1723–1727.

[69]

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. 2008. Introduction to Information Retrieval. Cambridge University Press.

[70]

Bhaskar Mitra and Nick Craswell. 2019. Duet at TREC 2019 Deep Learning Track. In TREC(NIST Special Publication, Vol. 1250). National Institute of Standards and Technology (NIST).

[71]

Bhaskar Mitra, Fernando Diaz, and Nick Craswell. 2017. Learning to match using local and distributed representations of text for web search. In WWW. ACM, 1291–1299.

[72]

Bhaskar Mitra, Sebastian Hofstätter, Hamed Zamani, and Nick Craswell. 2020. Conformer-kernel with query term independence for document retrieval. CoRR abs/2007.10434 (2020).

[73]

Thong Nguyen, Sean MacAvaney, and Andrew Yates. 2023. A unified framework for learned sparse retrieval. In Advances in Information Retrieval: 45th European Conference on Information Retrieval, ECIR 2023, Dublin, Ireland, April 2–6, 2023, Proceedings, Part III. Springer, 101–116.

Digital Library

[74]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated MAchine Reading COmprehension dataset. In CoCo@NIPS(CEUR Workshop Proceedings, Vol. 1773). CEUR-WS.org.

[75]

Rodrigo Nogueira and Kyunghyun Cho. 2019. Passage re-ranking with BERT. CoRR abs/1901.04085 (2019).

[76]

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. 2020. Document ranking with a pretrained sequence-to-sequence model. In Findings of EMNLP.

[77]

Rodrigo Nogueira, Wei Yang, Jimmy Lin, and Kyunghyun Cho. 2019. Document expansion by query prediction. CoRR abs/1904.08375 (2019).

[78]

Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Shengxian Wan, and Xueqi Cheng. 2016. Text matching as image recognition. In AAAI. AAAI Press, 2793–2799.

[79]

Ronak Pradeep, Xueguang Ma, Xinyu Zhang, Hang Cui, Ruizhou Xu, Rodrigo Frassetto Nogueira, and Jimmy Lin. 2020. H2oloo at TREC 2020: When all you got is a hammer... deep learning, health misinformation, and precision medicine. In TREC(NIST Special Publication, Vol. 1266). National Institute of Standards and Technology (NIST).

[80]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2019. Exploring the limits of transfer learning with a unified text-to-text transformer. CoRR abs/1910.10683 (2019).

[81]

Fiana Raiber and Oren Kurland. 2020. The Technion at the WWW-3 task: Cluster-based document retrieval. In NTCIR. 247–248.

[82]

Stephen E. Robertson and Steve Walker. 1994. Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. In SIGIR. ACM/Springer, 232–241.

[83]

Stephen E. Robertson, Steve Walker, Micheline Hancock-Beaulieu, Mike Gatford, and A. Payne. 1995. Okapi at TREC-4. In TREC.

[84]

J. J. Rocchio. 1971. Relevance feedback in information retrieval. In The Smart Retrieval System - Experiments in Automatic Document Processing, G. Salton (Ed.). Englewood Cliffs, NJ: Prentice-Hall, 313–323.

[85]

Tetsuya Sakai, Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Topic set size design with the evaluation measures for short text conversation. In AIRS(Lecture Notes in Computer Science, Vol. 9460). Springer, 319–331.

[86]

Tetsuya Sakai, Sijie Tao, Zhaohao Zeng, Yukun Zheng, Jiaxin Mao, Zhumin Chu, Yiqun Liu, Maria Maistro, Zhicheng Dou, Nicola Ferro, and Ian Soboroff. 2020. Overview of the NTCIR-15 We want web with CENTRE (WWW-3) task. In NTCIR.

[87]

Dominik Scherer, Andreas C. Müller, and Sven Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In ICANN(Lecture Notes in Computer Science, Vol. 6354). Springer, 92–101.

[88]

Eilon Sheetrit, Anna Shtok, and Oren Kurland. 2020. A passage-based approach to learning to rank documents. Inf. Retr. J. 23, 2 (2020), 159–186.

[89]

Kohei Shinden, Atsuki Maruta, and Makoto P. Kato. 2020. KASYS at the NTCIR-15 WWW-3 task. In NTCIR. 235–238.

[90]

Siqi Sun, Yu Cheng, Zhe Gan, and Jingjing Liu. 2019. Patient knowledge distillation for BERT model compression. In EMNLP.

[91]

Amir Vakili Tahami, Kamyar Ghajar, and Azadeh Shakery. 2020. Distilling knowledge for fast retrieval-based chat-bots. CoRR abs/2004.11045 (2020).

[92]

Raphael Tang, Yao Lu, Linqing Liu, Lili Mou, Olga Vechtomova, and Jimmy Lin. 2019. Distilling task-specific knowledge from BERT into simple neural networks. CoRR abs/1903.12136 (2019).

[93]

Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR abs/1908.08962 (2019).

[94]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In NIPS. 5998–6008.

[95]

Wei Wang, Bin Bi, Ming Yan, Chen Wu, Jiangnan Xia, Zuyi Bao, Liwei Peng, and Luo Si. 2020. StructBERT: Incorporating language structures into pre-training for deep language understanding. In ICLR. OpenReview.net.

[96]

Chuhan Wu, Fangzhao Wu, Tao Qi, and Yongfeng Huang. 2021. Hi-transformer: Hierarchical interactive transformer for efficient and effective long document modeling. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 848–853. DOI:

[97]

Zhijing Wu, Jiaxin Mao, Yiqun Liu, Jingtao Zhan, Yukun Zheng, Min Zhang, and Shaoping Ma. 2020. Leveraging passage-level cumulative gain for document ranking. In WWW. ACM/IW3C2, 2421–2431.

[98]

Zhijing Wu, Jiaxin Mao, Yiqun Liu, Min Zhang, and Shaoping Ma. 2019. Investigating passage-level relevance and its role in document-level relevance judgment. In SIGIR. ACM, 605–614.

[99]

Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, and Russell Power. 2017. End-to-end neural ad-hoc ranking with kernel pooling. In SIGIR. ACM, 55–64.

[100]

Ming Yan, Chenliang Li, Chen Wu, Bin Bi, Wei Wang, Jiangnan Xia, and Luo Si. 2019. IDST at TREC 2019 Deep Learning Track: Deep cascade ranking with generation-based document expansion and pre-trained language modeling. In TREC.

[101]

Liu Yang, Mingyang Zhang, Cheng Li, Michael Bendersky, and Marc Najork. 2020. Beyond 512 tokens: Siamese multi-depth transformer-based hierarchical encoder for long-form document matching. In CIKM. ACM, 1725–1734.

[102]

Peilin Yang, Hui Fang, and Jimmy Lin. 2018. Anserini: Reproducible ranking baselines using Lucene. J. Data Inf. Qual. 10, 4 (2018), 16:1–16:20.

[103]

Wei Yang, Kuang Lu, Peilin Yang, and Jimmy Lin. 2019. Critically examining the “neural hype”: Weak baselines and the additivity of effectiveness gains from neural ranking models. In SIGIR. ACM, 1129–1132.

[104]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In NAACL-HLT. 1480–1489.

[105]

Andrew Yates, Kevin Martin Jose, Xinyu Zhang, and Jimmy Lin. 2020. Flexible IR pipelines with Capreolus. In CIKM. 3181–3188.

[106]

Zeynep Akkalyoncu Yilmaz, Shengjin Wang, Wei Yang, Haotian Zhang, and Jimmy Lin. 2019. Applying BERT to document retrieval with Birch. In EMNLP.

[107]

Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontañón, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. 2020. BigBird: Transformers for longer sequences. In NeurIPS.

[108]

Hamed Zamani, Bhaskar Mitra, Xia Song, Nick Craswell, and Saurabh Tiwary. 2018. Neural ranking models with multiple document fields. In WSDM. ACM, 700–708.

[109]

Xingxing Zhang, Furu Wei, and Ming Zhou. 2019. HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization. In ACL. 5059–5069.

[110]

Xinyu Zhang, Andrew Yates, and Jimmy Lin. 2021. Comparing score aggregation approaches for document retrieval with pretrained transformers. In Advances in Information Retrieval: 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part II. Springer-Verlag, Berlin, 150–163. DOI:

Digital Library

[111]

Chen Zhao, Chenyan Xiong, Corby Rosset, Xia Song, Paul N. Bennett, and Saurabh Tiwary. 2020. Transformer-XH: Multi-evidence reasoning with eXtra Hop attention. In ICLR. OpenReview.net.

[112]

Tiancheng Zhao, Xiaopeng Lu, and Kyusong Lee. 2020. Sparta: Efficient open-domain question answering via sparse transformer matching retrieval. arXiv preprint arXiv:2009.13013 (2020).

[113]

Jie Zhou, Xu Han, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, and Maosong Sun. 2019. GEAR: Graph-based evidence aggregating and reasoning for fact verification. In ACL. Association for Computational Linguistics, 892–901.

[114]

Shengyao Zhuang and Guido Zuccon. 2021. TILDE: Term Independent Likelihood moDEl for passage re-ranking. In SIGIR. 1483–1492.

Cited By

Zhang BNaderi NMishra RTeodoro D(2024)Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and ValidationJMIR AI10.2196/426303(e42630)Online publication date: 2-May-2024
https://doi.org/10.2196/42630
Anand ALeonhardt JSingh JRudra KAnand A(2024)Data Augmentation for Sample Efficient and Robust Document RankingACM Transactions on Information Systems10.1145/363491142:5(1-29)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3634911
Leonhardt JMüller HRudra KKhosla MAnand AAnand A(2024)Efficient Neural Ranking Using Forward Indexes and Lightweight EncodersACM Transactions on Information Systems10.1145/363193942:5(1-34)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3631939
Show More Cited By

Index Terms

PARADE: Passage Representation Aggregation forDocument Reranking
1. Information systems
  1. Information retrieval
    1. Document representation
    2. Retrieval models and ranking
      1. Language models

Recommendations

Utilizing inter-passage and inter-document similarities for reranking search results

We present a novel language-model-based approach to reranking search results; that is, reordering the documents in an initially retrieved list so as to improve precision at top ranks. Our model integrates whole-document information with that induced ...
Passage-level document retrieval using lexical chains
RIAO '00: Content-Based Multimedia Information Access - Volume 1

The importance of document retrieval systems which can retrieve relevant documents for user's needs is now increasing with the growing availability of full-text documents. In the traditional document retrieval, each document is treated as a single unit. ...
Japanese–English cross-language information retrieval integrating query and document translation methods

In cross-language information retrieval the search query and documents may undergo translation; here the question of how the task is converted into a monolingual information retrieval problem is of significant importance. The method that we propose in ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems

ACM Transactions on Information Systems Volume 42, Issue 2

March 2024

897 pages

EISSN:1558-2868

DOI:10.1145/3618075

Editor:
Min Zhang
Tsinghua University, China

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2023

Online AM: 26 May 2023

Accepted: 10 May 2023

Revised: 16 March 2023

Received: 15 December 2021

Published in TOIS Volume 42, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Natural Science Foundation of China
Google Cloud
Google TPU Research Cloud (TRC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
925
Total Downloads

Downloads (Last 12 months)581
Downloads (Last 6 weeks)69

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhang BNaderi NMishra RTeodoro D(2024)Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and ValidationJMIR AI10.2196/426303(e42630)Online publication date: 2-May-2024
https://doi.org/10.2196/42630
Anand ALeonhardt JSingh JRudra KAnand A(2024)Data Augmentation for Sample Efficient and Robust Document RankingACM Transactions on Information Systems10.1145/363491142:5(1-29)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3634911
Leonhardt JMüller HRudra KKhosla MAnand AAnand A(2024)Efficient Neural Ranking Using Forward Indexes and Lightweight EncodersACM Transactions on Information Systems10.1145/363193942:5(1-34)Online publication date: 29-Apr-2024
https://dl.acm.org/doi/10.1145/3631939
Vonásek JStraka MKrč RLasonová LEgorova EStraková JNáplava JHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)CWRCzech: 100M Query-Document Czech Click Dataset and Its Application to Web Relevance RankingProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657851(1221-1231)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657851
Tsirmpas DGkionis IPapadopoulos GMademlis I(2024)Neural natural language processing for long texts: A survey on classification and summarizationEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108231133(108231)Online publication date: Jul-2024
https://doi.org/10.1016/j.engappai.2024.108231
Pan MLi TYang CZhou SFeng SFang YLi X(2022)A Context-Aware BERT Retrieval Framework Utilizing Abstractive Summarization2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00142(873-878)Online publication date: Nov-2022
https://doi.org/10.1109/WI-IAT55865.2022.00142
Pan MLiu YPei QMao HJin AHuang SYang Y(2022)A Multi-Dimensional Semantic Pseudo-Relevance Feedback Information Retrieval Model2022 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT)10.1109/WI-IAT55865.2022.00141(866-872)Online publication date: Nov-2022
https://doi.org/10.1109/WI-IAT55865.2022.00141

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents