[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
introduction
Free access

Special Section on Efficiency in Neural Information Retrieval

Published: 29 April 2024 Publication History

1 Introduction

The rise of deep neural networks and self-supervised learning in recent years have brought about a paradigm shift in Information Retrieval. From retrieval to ranking, question answering to recommendation, search to conversational agents, models trained on hand-crafted features have given way to complex neural networks built of millions of parameters that are capable of learning granular features from raw data.
While this transition has led to large gains in efficacy in various tasks, it has been done so often at the expense of training and inference efficiency. With deep models embedded in evermore applications and devices with a drive towards ever higher efficacy, the rise in costs has a tangible, though often under-reported, impact on researchers, practitioners, users, and more importantly, the environment. It is therefore unsurprising that the difficult balancing act between celebrating effectiveness and seeking efficiency has resurrected old research questions from the field with a renewed urgency.
The aim of this Special Section is to engage with researchers in Information Retrieval, Natural Language Processing and related areas and gather insight into the core challenges in measuring, reporting, and optimizing all facets of efficiency in Neural Information Retrieval (NIR) systems, including time-, space-, resource-, sample-, and energy-efficiency, among other factors. While researchers in the field have assiduously explored the Pareto frontier in quality and efficiency in other contexts for decades, we believe that the neural dimension introduces new hurdles [2, 3, 4].
The breadth of the challenges facing NIR systems is reflected in the submissions received by the editors of this Special Section. The call for papers attracted 13 submissions in total, of which 7 have been accepted to appear in the journal. These touch on topics ranging from the ranking of long documents, sparse representation learning, late interaction models, sample efficiency, and sequential recommendation. In what follows, we briefly describe each article but invite the reader to review the relevant publication for details.

1.1 Ranking Long Queries and Documents

Let us start with the problem of ranking long sequences of text. Many neural ranking models rely on computationally intensive Transformer [14] blocks. Typically, these blocks have a hard limit on the input sequence length. That is because their complexity grows quadratically with the length of the sequence. When using these models on longer sequences, therefore, we must either truncate the sequence or modify the Transformer block to lower its time complexity.
The first article that approaches that problem is entitled “Revisiting Bag of Words Document Representations for Efficient Ranking with Transformers.” It asks if there is a third way of handling long sequences if we opened up the model itself. In particular, rather than truncating a sequence arbitrarily, the authors propose to represent a document with its “salient” terms only. In effect, a document is condensed into its “characteristic” terms. The main question then becomes: How do we define a salient term and what is the impact of different definitions of salience on ranking quality? That is the question this article explores in great depth.
The second article in the same category, “Retrieval for Extremely Long Queries and Documents with RPRS,” approaches the same problem but considers a setup where queries, too, can be long sequences. This use-case arises in many real-world applications such as patent search or search over legal documents. The article explores a solution where long text queries and documents are split into chunks along sentence boundaries. When a set of candidate documents is returned by a neural ranker, the article investigates methods of re-ranking the candidate set according to different (unsupervised) similarity metrics.

1.2 Retrieval with Sparse Representations

One of the more interesting neural retrieval paradigms that has emerged in the past few years attempts to learn sparse representations of text documents and use existing inverted index-based technologies [13] to perform efficient retrieval over the sparse representations [1, 5, 7, 8, 9, 11, 12, 15, 16]. The output space of such models has as many dimensions, as there are terms in the vocabulary—typically, this is the BERT [6] vocabulary. This one-to-one mapping between output dimensions and terms in the vocabulary make such representations highly interpretable, which makes them attractive in many applications.
The main challenge in training such models is to achieve high effectiveness but maintain enough sparsity in the learned representations so retrieval stays efficient; that is because inverted index-based algorithms operate under the assumption that queries consist of only a few terms and that term frequencies within documents follow a Zipfian distribution. The article entitled “Towards Effective and Efficient Sparse Neural Information Retrieval” details a long thread of research that studies that very question and explores the tradeoffs between the effectivess and efficiency of learned sparse representations in the context of text retrieval.

1.3 Retrieval with Forward Indexes

The article entitled “Efficient Neural Ranking Using Forward Indexes and Lightweight Encoders” introduces a novel index structure called Fast-Forward indexes, which takes advantage of the ability of dual encoders to pre-compute document representations to significantly improve the efficiency of the re-ranking phase. The authors show that a simple interpolation-based re-ranking combines the benefits of lexical—computed using sparse retrieval—and semantic—computed using dual encoders—similarity, which can result in competitive and sometimes better performance than cross-attention. They thus exploit fast-forward indexes to efficiently handle document representations generated with dual encoders within the re-ranking phase. Experiments on public datasets show that dual encoders combined with Fast-Forward indexes provide lower per-query latency and achieve competitive results without needing hardware acceleration such as GPUs.

1.4 Late Interaction

Late-interaction multi-vector models, such as ColBERT [10] and COIL [9], achieve state-of-the-art retrieval effectiveness by using all token embeddings to represent documents and queries while modeling their relevance with a sum-of-max operation. The limitation of these fine-grained representations is the space overhead resulting from having to store all token embeddings.
In an attempt to lower the storage costs, “An Analysis on Matching Mechanisms and Token Pruning for Late-interaction Models” investigates the matching mechanism of these late-interaction models. It shows that the sum-of-max operation heavily relies on the co-occurrence signals and certain important words in the document. Based on these findings, the authors propose several simple document pruning methods to reduce the storage overhead and compare the effectiveness of different pruning methods on different late-interaction models. The investigation also covers query pruning methods to reduce the retrieval latency further.

1.5 Sample Efficiency

The article entitled “Data Augmentation for Sample Efficient and Robust Document Ranking” investigates a rather different aspect of efficiency: one that focuses on training samples. Training a ranking model where there is too few training examples available is challenging. The hypothesis this work sets out to explore is whether data augmentation techniques, combined with contrastive learning, can remedy some of those challenges and lead to improved ranking quality. The authors present a comprehensive analysis of various data augmentation methods and contrastive losses in the context of different model sizes. Their experimental results are encouraging: Ranking quality improves in both in-domain and out-of-domain settings, with even larger language models benefiting from this scheme. Their findings bode well for sample efficiency: With appropriate data augmentation and contrastive learning formulation, fewer training examples are needed to train high-quality ranking models.

1.6 Sequential Recommendation

Finally, “Teach and Explore: A Multiplex Information-guided Effective and Efficient Reinforcement Learning for Sequential Recommendation” explores the application of Reinforcement Learning (RL) within a Sequential Recommendation (SR) system. It claims that current approaches in this direction are sub-optimal because (1) they fail to leverage the supervision signals to capture users’ explicit preferences, and (2) they do not utilize auxiliary information (e.g., knowledge graphs) to avoid blindness when exploring users’ potential interests.
To overcome the two limitations, the authors propose a multiplex information-guided RL model (MELOD), which uses a novel RL training framework with Teach and Explore components for SR. MELOD considers the SR task as a sequential decision problem and consists of three novel extensions, state encoding, policy function, and RL training, that concur to learn a comprehensive user representation. Experiments on seven real-world datasets show that MELOD achieves significant performance improvement in terms of Hit-Ratio and Normalized Discounted Cumulative Gain over 13 state-of-the-art competitors.

References

[1]
Yang Bai, Xiaoguang Li, Gang Wang, Chaoliang Zhang, Lifeng Shang, Jun Xu, Zhaowei Wang, Fangshan Wang, and Qun Liu. 2020. SparTerm: Learning Term-based Sparse Representation for Fast Text Retrieval. ArXiv. https://arxiv.org/abs/2010.00768
[2]
Sebastian Bruch, Claudio Lucchese, and Franco Maria Nardini. 2022. ReNeuIR: Reaching efficiency in neural information retrieval. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3462–3465.
[3]
Sebastian Bruch, Claudio Lucchese, and Franco Maria Nardini. 2023. Efficient and effective tree-based and neural learning to rank. Found. Trends Inf. Retriev. 17, 1 (2023), 1–123.
[4]
Sebastian Bruch, Claudio Lucchese, and Franco Maria Nardini. 2023. Report on the 1st Workshop on Reaching Efficiency in Neural Information Retrieval (ReNeuIR 2022) at SIGIR 2022. SIGIR Forum 56, 2, Article 12 (2023), 14 pages.
[5]
Zhuyun Dai and Jamie Callan. 2020. Context-aware term weighting for first stage passage retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 1533–1536.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
[7]
Thibault Formal, Carlos Lassance, Benjamin Piwowarski, and Stéphane Clinchant. 2022. From distillation to hard negative sampling: Making sparse neural IR models more effective. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2353–2359.
[8]
Thibault Formal, Benjamin Piwowarski, and Stéphane Clinchant. 2021. SPLADE: Sparse lexical and expansion model for first stage ranking. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2288–2292.
[9]
Luyu Gao, Zhuyun Dai, and Jamie Callan. 2021. COIL: Revisit exact lexical match in information retrieval with contextualized inverted list. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 3030–3042.
[10]
Omar Khattab and Matei Zaharia. 2020. ColBERT: Efficient and effective passage search via contextualized late interaction over BERT. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’20). Association for Computing Machinery, New York, NY, 39–48. DOI:
[11]
Jimmy Lin and Xueguang Ma. 2021. A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques. arxiv:2106.14807
[12]
Antonio Mallia, Omar Khattab, Torsten Suel, and Nicola Tonellotto. 2021. Learning passage impacts for inverted indexes. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1723–1727.
[13]
Nicola Tonellotto, Craig Macdonald, and Iadh Ounis. 2018. Efficient query processing for scalable web search. Found. Trends Inf. Retriev. 12, 4–5 (122018), 319–500.
[14]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000–6010.
[15]
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik Learned-Miller, and Jaap Kamps. 2018. From neural re-ranking to neural ranking: Learning a sparse representation for inverted indexing. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management. 497–506.
[16]
Shengyao Zhuang and Guido Zuccon. 2022. Fast passage re-ranking with contextualized exact term matching and efficient passage expansion. In Proceedings of the Workshop on Reaching Efficiency in Neural Information Retrieval at the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Information Systems
ACM Transactions on Information Systems  Volume 42, Issue 5
September 2024
809 pages
EISSN:1558-2868
DOI:10.1145/3618083
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 April 2024
Published in TOIS Volume 42, Issue 5

Check for updates

Qualifiers

  • Introduction

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 748
    Total Downloads
  • Downloads (Last 12 months)748
  • Downloads (Last 6 weeks)126
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media