[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3539618.3591871acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
keynote

Generative Information Retrieval

Published: 18 July 2023 Publication History

Abstract

Historically, information retrieval systems have all followed the same paradigm: information seekers frame their needs in the form of a short query, the system selects a small set of relevant results from a corpus of available documents, rank-orders the results by decreasing relevance, possibly excerpts a responsive passage for each result, and returns a list of references and excerpts to the user. Retrieval systems typically did not attempt fusing information from multiple documents into an answer and displaying that answer directly. This was largely due to available technology: at the core of each retrieval system is an index that maps lexical tokens or semantic embeddings to document identifiers. Indices are designed for retrieving responsive documents; they do not support integrating these documents into a holistic answer.
More recently, the coming-of-age of deep neural networks has dramatically improved the capabilities of large language models (LLMs). Trained on a large corpus of documents, these models not only memorize the vocabulary, morphology and syntax of human languages, but have shown to be able to memorize facts and relations. Generative language models, when provided with a prompt, will extend the prompt with likely completions -- an ability that can be used to extract answers to questions from the model. Two years ago, Metzler et al. argued that this ability of LLMs will allow us to rethink the search paradigm: to answer information needs directly rather that directing users to responsive primary sources. Their vision was not without controversy; the following year Shaw and Bender argued that such a system is neither feasible nor desirable. Nonetheless, the past year has seen the emergence of such systems, with offerings from established search engines and multiple new entrants to the industry.
The keynote will summarize the short history of these generative information retrieval systems, and focus on the many open challenges in this emerging field: ensuring that answers are grounded, attributing answer passages to a primary source, providing nuanced answers to non-factoid-seeking questions, avoiding bias, and going beyond simple regurgitation of memorized facts. It will also touch on the changing nature of the content ecosystem. LLMs are starting to be used to generate web content. Should search engines treat such derived content equal to human-authored content? Is it possible to distinguish generated from original content? How should we view hybrid authorship where humans contribute ideas and LLMs shape these ideas into prose? And how will this parallel technical evolution of search engines and content ecosystems affect their respective business models?

References

[1]
Donald Metzler, Yi Tay, Dara Bahri, and Marc Najork. 2021. Rethinking search: making domain experts out of dilettantes. ACM SIGIR Forum, Vol. 55, 1 (2021), 1--27.
[2]
Adam Roberts, Colin Raffel, and Noam Shazeer. 2020. How Much Knowledge Can You Pack Into the Parameters of a Language Model?. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. 5418--5426.
[3]
Chirag Shah and Emily M. Bender. 2022. Situating Search. In Proceedings of the 2022 Conference on Human Information Interaction and Retrieval. 221--232. io

Cited By

View all
  • (2024)Suggestive answers strategy in human-chatbot interaction: a route to engaged critical decision makingFrontiers in Psychology10.3389/fpsyg.2024.138223415Online publication date: 28-Mar-2024
  • (2024)Can Banning AI-generated Content Save User-Generated Q&A Platforms?SSRN Electronic Journal10.2139/ssrn.4750326Online publication date: 2024
  • (2024)Report on the 4th Workshop on Reducing Online Misinformation through Credible Information Retrieval (ROMCIR 2024) at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728558:1(1-9)Online publication date: 7-Aug-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '23: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval
July 2023
3567 pages
ISBN:9781450394086
DOI:10.1145/3539618
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 July 2023

Check for updates

Author Tags

  1. generative information retrieval
  2. large language models
  3. question answering
  4. tool-augmented generation

Qualifiers

  • Keynote

Conference

SIGIR '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)340
  • Downloads (Last 6 weeks)47
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Suggestive answers strategy in human-chatbot interaction: a route to engaged critical decision makingFrontiers in Psychology10.3389/fpsyg.2024.138223415Online publication date: 28-Mar-2024
  • (2024)Can Banning AI-generated Content Save User-Generated Q&A Platforms?SSRN Electronic Journal10.2139/ssrn.4750326Online publication date: 2024
  • (2024)Report on the 4th Workshop on Reducing Online Misinformation through Credible Information Retrieval (ROMCIR 2024) at ECIR 2024ACM SIGIR Forum10.1145/3687273.368728558:1(1-9)Online publication date: 7-Aug-2024
  • (2024)Advancing the Search Frontier with AI AgentsCommunications of the ACM10.1145/3655615Online publication date: 20-Aug-2024
  • (2024)Tasks, Copilots, and the Future of Search: A Keynote at SIGIR 2023ACM SIGIR Forum10.1145/3642979.364298557:2(1-8)Online publication date: 22-Jan-2024
  • (2024)Recent Advances in Generative Information RetrievalProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3661379(3005-3008)Online publication date: 10-Jul-2024
  • (2024)Bridging the Integrity Gap: Towards AI-assisted Design ResearchExtended Abstracts of the CHI Conference on Human Factors in Computing Systems10.1145/3613905.3647962(1-5)Online publication date: 11-May-2024
  • (2024)Recent Advances in Generative Information RetrievalCompanion Proceedings of the ACM on Web Conference 202410.1145/3589335.3641239(1238-1241)Online publication date: 13-May-2024
  • (2024)Leveraging Gaming to Enhance Knowledge Graphs for Explainable Generative AI Applications2024 IEEE Conference on Games (CoG)10.1109/CoG60054.2024.10645673(1-4)Online publication date: 5-Aug-2024
  • (2024)Recent Advances in Generative Information RetrievalAdvances in Information Retrieval10.1007/978-3-031-56069-9_48(363-368)Online publication date: 24-Mar-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media