More Web Proxy on the site http://driver.im/

tutorial

Open access

Recent Advances in Generative Information Retrieval

Authors:

Maarten de RijkeAuthors Info & Claims

SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

Pages 294 - 297

https://doi.org/10.1145/3624918.3629547

Published: 26 November 2023 Publication History

Abstract

Generative retrieval (GR) has become a highly active area of information retrieval (IR) that has witnessed significant growth recently. Compared to the traditional “index-retrieve-then-rank” pipeline, the GR paradigm aims to consolidate all information within a corpus into a single model. Typically, a sequence-to-sequence model is trained to directly map a query to its relevant document identifiers (i.e., docids). This tutorial offers an introduction to the core concepts of the GR paradigm and a comprehensive overview of recent advances in its foundations and applications. We start by providing preliminary information covering foundational aspects and problem formulations of GR. Then, our focus shifts towards recent progress in docid design, training approaches, inference strategies, and the applications of GR. We end by outlining remaining challenges and issuing a call for future GR research. This tutorial is intended to be beneficial to both researchers and industry practitioners interested in developing novel GR solutions or applying them in real-world scenarios.

References

[1]

Gabriel Bénédict, Ruqing Zhang, and Donald Metzler. 2023. Gen-IR@ SIGIR 2023: The First Workshop on Generative Information Retrieval. In SIGIR. 3460–3463.

[2]

Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen-tau Yih, Sebastian Riedel, and Fabio Petroni. 2022. Autoregressive Search Engines: Generating Substrings as Document Identifiers. In NeurIPS. 31668–31683.

[3]

Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Wei Chen, Yixing Fan, and Xueqi Cheng. 2023. Continual Learning for Generative Retrieval over Dynamic Corpora. In CIKM.

[4]

Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yiqun Liu, Yixing Fan, and Xueqi Cheng. 2023. A Unified Generative Retriever for Knowledge-Intensive Language Tasks via Prompt Learning. In SIGIR. 1448–1457.

[5]

Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yixing Fan, and Xueqi Cheng. 2022. GERE: Generative Evidence Retrieval for Fact Verification. In SIGIR. 2184–2189.

[6]

Jiangui Chen, Ruqing Zhang, Jiafeng Guo, Yiqun Liu, Yixing Fan, and Xueqi Cheng. 2022. CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks. In CIKM. 191–200.

[7]

Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, and J. Shane Culpepper. 2017. Efficient Cost-aware Cascade Ranking in Multi-stage Retrieval. In SIGIR.

[8]

Xiaoyang Chen, Yanjiang Liu, Ben He, Le Sun, and Yingfei Sun. 2023. Understanding Differential Search Index for Text Retrieval. In Findings of ACL. 10701–10717.

[9]

Nick Craswell, Bhaskar Mitra, Emine Yilmaz, Daniel Campos, and Ellen M. Voorhees. 2020. Overview of the TREC 2019 Deep Learning Track. arXiv preprint arXiv:2003.07820 (2020).

[10]

Nicola De Cao, Gautier Izacard, Sebastian Riedel, and Fabio Petroni. 2021. Autoregressive Entity Retrieval. In ICLR.

[11]

Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, and Andrew Ng. 2002. Web Question Answering: Is More Always Better?. In SIGIR. 291–298.

[12]

Jianfeng Gao, Xiaodong He, and Jian-Yun Nie. 2010. Clickthrough-based Translation Models for Web Search: From Word Models to Phrase Models. In CIKM.

[13]

Jiafeng Guo, Yixing Fan, Liang Pang, Liu Yang, Qingyao Ai, Hamed Zamani, Chen Wu, W Bruce Croft, and Xueqi Cheng. 2020. A Deep Look into Neural Ranking Models for Information Retrieval. IPM 57, 6 (2020), 102067.

[14]

Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Mingwei Chang. 2020. Retrieval Augmented Language Model Pre-training. In ICML. 3929–3938.

[15]

Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck. 2013. Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data. In CIKM. 2333–2338.

[16]

Tom Kenter and Maarten de Rijke. 2015. Short Text Similarity with Word Embeddings. In CIKM. 1411–1420.

[17]

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natural Questions: A Benchmark for Question Answering Research. Transactions of the Association for Computational Linguistics 7 (2019), 452–466.

[18]

Hyunji Lee, Jaeyoung Kim, Hoyeon Chang, Hanseok Oh, Sohee Yang, Vladimir Karpukhin, Yi Lu, and Minjoon Seo. 2023. Nonparametric Decoding for Generative Retrieval. In Findings of the ACL 2023. 12642–12661.

[19]

Hyunji Lee, Sohee Yang, Hanseok Oh, and Minjoon Seo. 2022. Generative Multi-hop Retrieval. In EMNLP. 1417–1436.

[20]

Yongqi Li, Nan Yang, Liang Wang, Furu Wei, and Wenjie Li. 2023. Learning to Rank in Generative Retrieval. arXiv preprint arXiv:2306.15222 (2023).

[21]

Yongqi Li, Nan Yang, Liang Wang, Furu Wei, and Wenjie Li. 2023. Multiview Identifiers Enhanced Generative Retrieval. In ACL. 6636–6648.

[22]

Shichen Liu, Fei Xiao, Wenwu Ou, and Luo Si. 2017. Cascade Ranking for Operational E-commerce Search. In KDD. 1557–1565.

[23]

Yu-An Liu, Ruqing Zhang, Jiafeng Guo, Wei Chen, and Xueqi Cheng. 2023. On the Robustness of Generative Retrieval Models: An Out-of-Distribution Perspective. In Gen-IR@SIGIR.

[24]

Irina Matveeva, Chris Burges, Timo Burkard, Andy Laucius, and Leon Wong. 2006. High Accuracy Retrieval with Multiple Nested Ranker. In SIGIR. 437–444.

[25]

Sanket Vaibhav Mehta, Jai Gupta, Yi Tay, Mostafa Dehghani, Vinh Q. Tran, Jinfeng Rao, Marc Najork, Emma Strubell, and Donald Metzler. 2022. DSI++: Updating Transformer Memory with New Documents. arXiv preprint arXiv:2212.09744 (2022).

[26]

Donald Metzler, Yi Tay, Dara Bahri, and Marc Najork. 2021. Rethinking Search: Making Domain Experts Out of Dilettantes. SIGIR Forum 55, 1 (2021), 1–27.

Digital Library

[27]

Bhaskar Mitra, Eric Nalisnick, Nick Craswell, and Rich Caruana. 2016. A Dual Embedding Space Model for Document Ranking. arXiv preprint arXiv:1602.01137 (2016).

[28]

Usama Nadeem, Noah Ziems, and Shaoen Wu. 2022. CodeDSI: Differentiable Code Search. arXiv preprint arXiv:2210.00328 (2022).

[29]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A Human Generated MAchine Reading COmprehension Dataset. In Workshop on Cognitive Computation: Integrating Neural and Symbolic Approaches.

[30]

Thong Nguyen and Andrew Yates. 2023. Generative Retrieval as Dense Retrieval. In Gen-IR@SIGIR.

[31]

Ronak Pradeep, Kai Hui, Jai Gupta, Adam D. Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, and Vinh Q. Tran. 2023. How Does Generative Retrieval Scale to Millions of Passages?. In Gen-IR@SIGIR.

[32]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, 2023. Recommender Systems with Generative Retrieval. arXiv preprint arXiv:2305.05065 (2023).

[33]

Ruiyang Ren, Wayne Xin Zhao, Jing Liu, Hua Wu, Ji-Rong Wen, and Haifeng Wang. 2023. TOME: A Two-stage Approach for Model-based Retrieval. In ACL.

[34]

Daniel E. Rose and Danny Levinson. 2004. Understanding User Goals in Web Search. In WWW.

[35]

Tetsuya Sakai, Daisuke Ishikawa, Noriko Kando, Yohei Seki, Kazuko Kuriyama, and Chin-Yew Lin. 2011. Using Graded-relevance Metrics for Evaluating Community QA Answer Selection. In WSDM. 187–196.

[36]

Chirag Shah and Emily M Bender. 2022. Situating search. In ACM SIGIR Conf. on Human Information Interaction and Retrieval. 221–232.

Digital Library

[37]

Weiwei Sun, Lingyong Yan, Zheng Chen, Shuaiqiang Wang, Haichao Zhu, Pengjie Ren, Zhumin Chen, Dawei Yin, Maarten de Rijke, and Zhaochun Ren. 2023. Learning to Tokenize for Generative Retrieval. arXiv preprint arXiv:2304.04171 (2023).

[38]

Yubao Tang, Ruqing Zhang, Jiafeng Guo, Jiangui Chen, Zuowei Zhu, Shuaiqiang Wang, Dawei Yin, and Xueqi Cheng. 2023. Semantic-Enhanced Differentiable Search Index Inspired by Learning Strategies. In KDD.

[39]

Yi Tay, Vinh Q. Tran, Mostafa Dehghani, Jianmo Ni, Dara Bahri, Harsh Mehta, Zhen Qin, Kai Hui, Zhe Zhao, Jai Gupta, Tal Schuster, William W. Cohen, and Donald Metzler. 2022. Transformer Memory as a Differentiable Search Index. In NeurIPS, Vol. 35. 21831–21843.

[40]

James Thorne. 2022. Data-efficient Autoregressive Document Retrieval for Fact Verification. In Workshop on SENLP.

[41]

Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie, Hao Sun, Weiwei Deng, Qi Zhang, and Mao Yang. 2022. A Neural Corpus Indexer for Document Retrieval. In NeurIPS, Vol. 35. 25600–25614.

[42]

Zihan Wang, Yujia Zhou, Yiteng Tu, and Zhicheng Dou. 2023. NOVO: Learnable and Interpretable Document Identifiers for Model-Based IR. In CIKM.

[43]

Caiming Xiong, Victor Zhong, and Richard Socher. 2017. DCN+: Mixed Objective And Deep Residual Coattention for Question Answering. In ICLR.

[44]

Soyoung Yoon, Chaeeun Kim, Hyunji Lee, Joel Jang, and Minjoon Seo. 2023. Continually Updating Generative Retrieval on Dynamic Corpora. arXiv preprint arXiv:2305.18952 (2023).

[45]

Peitian Zhang, Zheng Liu, Yujia Zhou, Zhicheng Dou, and Zhao Cao. 2023. Term-Sets Can Be Strong Document Identifiers For Auto-Regressive Search Engines. arXiv preprint arXiv:2305.13859 (2023).

[46]

Yujia Zhou, Jing Yao, Zhicheng Dou, Ledell Wu, Peitian Zhang, and Ji-Rong Wen. 2022. Ultron: An Ultimate Retriever on Corpus with a Model-based Indexer. arXiv preprint arXiv:2208.09257 (2022).

[47]

Yu-Jia Zhou, Jing Yao, Zhi-Cheng Dou, Ledell Wu, and Ji-Rong Wen. 2023. DynamicRetriever: A Pre-trained Model-based IR System Without an Explicit Index. Machine Intelligence Research 20, 2 (2023), 276–288.

[48]

Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, and Daxin Jiang. 2023. Bridging the Gap between Indexing and Retrieval for Differentiable Search Index with Query Generation. In Gen-IR@SIGIR.

Cited By

Azzopardi LClarke CKantor PMitra BTrippas JRen ZAliannejadi MArabzadeh NChandrasekar Rde Rijke MEustratiadis PHersh WHuang JKanoulas EKareem JLi YLupart SMekonnen KRoegiest ASoboroff ISilvestri FVerberne SVos DYang EZhao Y(2024)Report on the Search Futures Workshop at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728858:1(1-41)Online publication date: 7-Aug-2024
https://dl.acm.org/doi/10.1145/3687273.3687288
Yates ALassance CMacAvaney SNguyen TLei YSakai TIshita EOhshima HRadboud University, Netherlands FMao JJose J(2024)Neural Lexical Search with Learned Sparse RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698441(303-306)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698441
Wu SWei WZhang MChen ZMa JRen Zde Rijke MRen PHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Generative Retrieval as Multi-Vector Dense RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657697(1828-1838)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657697

Index Terms

Recent Advances in Generative Information Retrieval
1. Information systems
  1. Information retrieval
    1. Retrieval models and ranking

Recommendations

Scalable and Effective Generative Information Retrieval
WWW '24: Proceedings of the ACM Web Conference 2024

Recent research has shown that transformer networks can be used as differentiable search indexes by representing each document as a sequence of document ID tokens. These generative retrieval models cast the retrieval problem to a document ID generation ...
Recent Advances in Generative Information Retrieval
WWW '24: Companion Proceedings of the ACM Web Conference 2024

Generative retrieval (GR) has witnessed significant growth recently in the area of information retrieval. Compared to the traditional "index-retrieve-then-rank'' pipeline, the GR paradigm aims to consolidate all information within a corpus into a single ...
Recent Advances in Generative Information Retrieval
Advances in Information Retrieval
Abstract
Generative retrieval (GR) has become a highly active area of information retrieval that has witnessed significant growth recently. Compared to the traditional “index-retrieve-then-rank” pipeline, the GR paradigm aims to consolidate all information ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR-AP '23: Proceedings of the Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

November 2023

324 pages

ISBN:9798400704086

DOI:10.1145/3624918

Editors:
Qingyao Ai
Tsinghua University, China
,
Yiqin Liu
Tsinghua University, China
,
Alistair Moffat
The University of Melbourne, Australia
,
Xuanjing Huang
Fudan University, China
,
Tetsuya Sakai
Waseda University, Japan
,
Justin Zobel
The University of Melbourne, Australia

Copyright © 2023 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 November 2023

Check for updates

Qualifiers

Tutorial
Research
Refereed limited

Funding Sources

Lenovo-CAS Joint Lab Youth Scientist Project
Hybrid Intelligence Center, a 10-year program funded by the Dutch Ministry of Education, Culture and Science through the Netherlands Organisation for Scientific Research
NONE
the National Natural Science Foundation of China (NSFC)
project LESSEN
the China Scholarship Council

Conference

SIGIR-AP '23

Sponsor:

SIGIR

SIGIR-AP '23: Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region

November 26 - 28, 2023

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
906
Total Downloads

Downloads (Last 12 months)888
Downloads (Last 6 weeks)121

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Azzopardi LClarke CKantor PMitra BTrippas JRen ZAliannejadi MArabzadeh NChandrasekar Rde Rijke MEustratiadis PHersh WHuang JKanoulas EKareem JLi YLupart SMekonnen KRoegiest ASoboroff ISilvestri FVerberne SVos DYang EZhao Y(2024)Report on the Search Futures Workshop at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728858:1(1-41)Online publication date: 7-Aug-2024
https://dl.acm.org/doi/10.1145/3687273.3687288
Yates ALassance CMacAvaney SNguyen TLei YSakai TIshita EOhshima HRadboud University, Netherlands FMao JJose J(2024)Neural Lexical Search with Learned Sparse RetrievalProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region10.1145/3673791.3698441(303-306)Online publication date: 8-Dec-2024
https://dl.acm.org/doi/10.1145/3673791.3698441
Wu SWei WZhang MChen ZMa JRen Zde Rijke MRen PHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Generative Retrieval as Multi-Vector Dense RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657697(1828-1838)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657697

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents