Computer Science > Information Retrieval

arXiv:2310.05380 (cs)

[Submitted on 9 Oct 2023]

Title:Augmented Embeddings for Custom Retrievals

Authors:Anirudh Khatry, Yasharth Bajpai, Priyanshu Gupta, Sumit Gulwani, Ashish Tiwari

View PDF

Abstract:Information retrieval involves selecting artifacts from a corpus that are most relevant to a given search query. The flavor of retrieval typically used in classical applications can be termed as homogeneous and relaxed, where queries and corpus elements are both natural language (NL) utterances (homogeneous) and the goal is to pick most relevant elements from the corpus in the Top-K, where K is large, such as 10, 25, 50 or even 100 (relaxed). Recently, retrieval is being used extensively in preparing prompts for large language models (LLMs) to enable LLMs to perform targeted tasks. These new applications of retrieval are often heterogeneous and strict -- the queries and the corpus contain different kinds of entities, such as NL and code, and there is a need for improving retrieval at Top-K for small values of K, such as K=1 or 3 or 5. Current dense retrieval techniques based on pretrained embeddings provide a general-purpose and powerful approach for retrieval, but they are oblivious to task-specific notions of similarity of heterogeneous artifacts. We introduce Adapted Dense Retrieval, a mechanism to transform embeddings to enable improved task-specific, heterogeneous and strict retrieval. Adapted Dense Retrieval works by learning a low-rank residual adaptation of the pretrained black-box embedding. We empirically validate our approach by showing improvements over the state-of-the-art general-purpose embeddings-based baseline.

Comments:	14 pages
Subjects:	Information Retrieval (cs.IR); Machine Learning (cs.LG)
ACM classes:	I.2.6
Cite as:	arXiv:2310.05380 [cs.IR]
	(or arXiv:2310.05380v1 [cs.IR] for this version)
	https://doi.org/10.48550/arXiv.2310.05380

Submission history

From: Ashish Tiwari [view email]
[v1] Mon, 9 Oct 2023 03:29:35 UTC (163 KB)

Computer Science > Information Retrieval

Title:Augmented Embeddings for Custom Retrievals

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Information Retrieval

Title:Augmented Embeddings for Custom Retrievals

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators