keynote

Generative Information Retrieval

Author:

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Page 1

https://doi.org/10.1145/3539618.3591871

Published: 18 July 2023 Publication History

Get Access

Abstract

Historically, information retrieval systems have all followed the same paradigm: information seekers frame their needs in the form of a short query, the system selects a small set of relevant results from a corpus of available documents, rank-orders the results by decreasing relevance, possibly excerpts a responsive passage for each result, and returns a list of references and excerpts to the user. Retrieval systems typically did not attempt fusing information from multiple documents into an answer and displaying that answer directly. This was largely due to available technology: at the core of each retrieval system is an index that maps lexical tokens or semantic embeddings to document identifiers. Indices are designed for retrieving responsive documents; they do not support integrating these documents into a holistic answer.

More recently, the coming-of-age of deep neural networks has dramatically improved the capabilities of large language models (LLMs). Trained on a large corpus of documents, these models not only memorize the vocabulary, morphology and syntax of human languages, but have shown to be able to memorize facts and relations. Generative language models, when provided with a prompt, will extend the prompt with likely completions -- an ability that can be used to extract answers to questions from the model. Two years ago, Metzler et al. argued that this ability of LLMs will allow us to rethink the search paradigm: to answer information needs directly rather that directing users to responsive primary sources. Their vision was not without controversy; the following year Shaw and Bender argued that such a system is neither feasible nor desirable. Nonetheless, the past year has seen the emergence of such systems, with offerings from established search engines and multiple new entrants to the industry.

The keynote will summarize the short history of these generative information retrieval systems, and focus on the many open challenges in this emerging field: ensuring that answers are grounded, attributing answer passages to a primary source, providing nuanced answers to non-factoid-seeking questions, avoiding bias, and going beyond simple regurgitation of memorized facts. It will also touch on the changing nature of the content ecosystem. LLMs are starting to be used to generate web content. Should search engines treat such derived content equal to human-authored content? Is it possible to distinguish generated from original content? How should we view hybrid authorship where humans contribute ideas and LLMs shape these ideas into prose? And how will this parallel technical evolution of search engines and content ecosystems affect their respective business models?

References

[1]

Donald Metzler, Yi Tay, Dara Bahri, and Marc Najork. 2021. Rethinking search: making domain experts out of dilettantes. ACM SIGIR Forum, Vol. 55, 1 (2021), 1--27.

Digital Library

Google Scholar

[2]

Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 5418--5426.

Crossref

Google Scholar

[3]

Chirag Shah and Emily M. Bender. 2022. Situating Search. In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval. 221--232. io

Google Scholar

Cited By

View all

Yamamoto Y(2024)Suggestive answers strategy in human-chatbot interaction: a route to engaged critical decision makingFrontiers in Psychology10.3389/fpsyg.2024.138223415Online publication date: 28-Mar-2024
https://doi.org/10.3389/fpsyg.2024.1382234
Wang XZheng J(2024)Can Banning AI-generated Content Save User-Generated Q&A Platforms?SSRN Electronic Journal10.2139/ssrn.4750326Online publication date: 2024
https://doi.org/10.2139/ssrn.4750326
Petrocchi MViviani M(2024)Report on the 4th Workshop on Reducing Online Misinformation through Credible Information Retrieval (ROMCIR 2024) at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728558:1(1-9)Online publication date: 7-Aug-2024
https://dl.acm.org/doi/10.1145/3687273.3687285
Show More Cited By

Index Terms

Generative Information Retrieval
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Natural language generation
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Question answering

Recommendations

Genetic Generative Information Retrieval
DocEng '23: Proceedings of the ACM Symposium on Document Engineering 2023

Documents come in all shapes and sizes and are created by many different means, including now-a-days, generative language models. We demonstrate that a simple genetic algorithm can improve generative information retrieval by using a document's text as a ...
An answer passage retrieval strategy for web-based question answering
InfoScale '07: Proceedings of the 2nd international conference on Scalable information systems

A passage retrieval strategy for our web-based Question Answering (QA) system is proposed in this paper. We utilize Google to retrieve web documents for answer passage finding. We propose a new method to rewrite the query for passage retrieval. We ...
Human question answering performance using an interactive document retrieval system
IIIX '12: Proceedings of the 4th Information Interaction in Context Symposium

Every day, people answer their questions by using document retrieval systems. Compared to document retrieval systems, question answering (QA) systems aim to speed the rate at which users find answers by retrieving answers rather than documents. To ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2023

3567 pages

ISBN:9781450394086

DOI:10.1145/3539618

General Chairs:
Hsin-Hsi Chen
National Taiwan University
,
Wei-Jou (Edward) Duh
National Taiwan University
,
Hen-Hsen Huang
Academia Sinica
,
Program Chairs:
Makoto P. Kato
Spotify
,
Josiane Mothe
Universite de Toulouse
,
Barbara Poblete
University of Chile and Amazon Visiting Academic

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Check for updates

Author Tags

Qualifiers

Keynote

Conference

SIGIR '23

Sponsor:

SIGIR

SIGIR '23: The 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 23 - 27, 2023

Taipei, Taiwan

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
742
Total Downloads

Downloads (Last 12 months)340
Downloads (Last 6 weeks)47

Reflects downloads up to 11 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Yamamoto Y(2024)Suggestive answers strategy in human-chatbot interaction: a route to engaged critical decision makingFrontiers in Psychology10.3389/fpsyg.2024.138223415Online publication date: 28-Mar-2024
https://doi.org/10.3389/fpsyg.2024.1382234
Wang XZheng J(2024)Can Banning AI-generated Content Save User-Generated Q&A Platforms?SSRN Electronic Journal10.2139/ssrn.4750326Online publication date: 2024
https://doi.org/10.2139/ssrn.4750326
Petrocchi MViviani M(2024)Report on the 4th Workshop on Reducing Online Misinformation through Credible Information Retrieval (ROMCIR 2024) at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728558:1(1-9)Online publication date: 7-Aug-2024
https://dl.acm.org/doi/10.1145/3687273.3687285
White R(2024)Advancing the Search Frontier with AI AgentsCommunications of the ACM10.1145/3655615Online publication date: 20-Aug-2024
https://doi.org/10.1145/3655615
White R(2024)Tasks, Copilots, and the Future of Search: A Keynote at SIGIR 2023ACM SIGIR Forum10.1145/3642979.364298557:2(1-8)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1145/3642979.3642985
Tang YZhang RRen ZGuo Jde Rijke MHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Recent Advances in Generative Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661379(3005-3008)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3661379
Oksanen J(2024)Bridging the Integrity Gap: Towards AI-assisted Design ResearchExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3647962(1-5)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613905.3647962
Tang YZhang RSun WGuo JDe Rijke MChua TNgo CKa-Wei Lee RKumar RLauw H(2024)Recent Advances in Generative Information RetrievalCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3641239(1238-1241)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3589335.3641239
Buongiorno SClark C(2024)Leveraging Gaming to Enhance Knowledge Graphs for Explainable Generative AI Applications2024 IEEE Conference on Games (CoG)10.1109/CoG60054.2024.10645673(1-4)Online publication date: 5-Aug-2024
https://doi.org/10.1109/CoG60054.2024.10645673
Tang YZhang RRen ZGuo Jde Rijke M(2024)Recent Advances in Generative Information RetrievalAdvances in Information Retrieval10.1007/978-3-031-56069-9_48(363-368)Online publication date: 24-Mar-2024
https://dl.acm.org/doi/10.1007/978-3-031-56069-9_48
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Genetic Generative Information Retrieval

An answer passage retrieval strategy for web-based question answering

Human question answering performance using an interactive document retrieval system