: A Voting-Based Paradigm for Enhancing Retrieval Augmented Generation

Wenbo Guan ORCID: orcid.org/0000-0002-4645-6121^13,14,15,
Xiaoqian Li¹⁶,
Jiyu Lu^13,14,15 &
…
Jun Zhou¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15331))

Included in the following conference series:

International Conference on Pattern Recognition

72 Accesses

Abstract

Retrieval Augmented Generation (RAG) has become a common practice to alleviate the hallucination of Large Language Models (LLMs). The retrieval phase of RAG, however, usually solely depends on the original query, which, to some extent, suffers from the problem of semantic gap and thus degrades the quality of the retrieved external knowledge. To address this problem and enhance the performance of the traditional RAG, we propose a rEwrite-sElect-votE-rEad paradigm ( ) that first paraphrases the original query into N rewritten ones to bridge the semantic gap from different perspectives and then determines the most valuable retrieved external knowledge via a voting manner. Besides, in the midst of the above procedures, a certain query-selecting strategy is also required to filter out the extra noise introduced by the query-rewriting process. Following this proposed paradigm, we provide our implementation of . Experimental results of our implementation on long context reading comprehension datasets from LongBench demonstrate the effectiveness of our proposed paradigm and provide a profound insight into the whole enhanced RAG process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 89.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 109.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhancing Retrieval-Augmented Generation Models with Knowledge Graphs: Innovative Practices Through a Dual-Pathway Approach

Meta-prompting Optimized Retrieval-Augmented Generation

Enhancing Retrieval-Augmented LMs with a Two-Stage Consistency Learning Compressor

Notes

1.
Methods from information retrieval domain such as sparse and dense retrieval methods are widely used in this step.
2.
The long passage of a data sample in LCRC task can be regarded as the external non-parameterized knowledge and the question as the original query.
3.
Here, the best results among eight experimental settings are reported. In each column, the best result is in bold and results better than baselines are marked with .
4.
In each row, the best result is in bold and results better than baseline are marked with .
5.
Mean values of results under eight experimental settings are reported in Table 4.

References

Chen, T., et al.: Dense x retrieval: what retrieval granularity should we use? ArXiv abs/2312.06648 (2023)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers), pp. 4171–4186 (2019)
Google Scholar
Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: SPLADE v2: sparse lexical and expansion model for information retrieval. ArXiv abs/2109.10086 (2021)
Google Scholar
Gao, L., Ma, X., Lin, J., Callan, J.: Precise zero-shot dense retrieval without relevance labels. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1762–1777 (2023)
Google Scholar
Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6894–6910 (2021)
Google Scholar
Gao, Y., et al.: Retrieval-augmented generation for large language models: a survey. ArXiv abs/2312.10997 (2023)
Google Scholar
Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2021)
Google Scholar
Huang, L., et al.: A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ArXiv abs/2311.05232 (2023)
Google Scholar
Izacard, G., et al.: Unsupervised dense information retrieval with contrastive learning. ArXiv abs/2112.09118 (2021)
Google Scholar
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781 (2020)
Google Scholar
Liu, N.F., et al.: Lost in the middle: how language models use long contexts. Trans. Assoc. Comput. Linguist. 12, 157–173 (2024)
Article Google Scholar
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. ArXiv abs/1907.11692 (2019)
Google Scholar
Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8086–8098 (2022)
Google Scholar
Ma, X., Gong, Y., He, P., Zhao, H., Duan, N.: Query rewriting in retrieval-augmented large language models. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5303–5315 (2023)
Google Scholar
Mitra, M., Chaudhuri, B.: Information retrieval from documents: a survey. Inf. Retrieval 2, 141–163 (2000)
Article Google Scholar
Muresanu, A., Thudi, A., Zhang, M.R., Papernot, N.: Unlearnable algorithms for in-context learning. ArXiv abs/2402.00751 (2024)
Google Scholar
Nussbaum, Z., Morris, J.X., Duderstadt, B., Mulyar, A.: Nomic embed: training a reproducible long context text embedder. ArXiv abs/2402.01613 (2024)
Google Scholar
Peng, W., et al.: Large language model based long-tail query rewriting in taobao search. In: Companion Proceedings of the ACM on Web Conference 2024, pp. 20–28 (2024)
Google Scholar
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Inf. Retrieval 3(4), 333–389 (2009)
Google Scholar
Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration. ArXiv abs/1904.09223 (2019)
Google Scholar
Touvron, H., et al.: LLaMA 2: open foundation and fine-tuned chat models. ArXiv abs/2307.09288 (2023)
Google Scholar
Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Wang, Y., et al.: Self-instruct: Aligning language models with self-generated instructions. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13484–13508 (2023)
Google Scholar
Wei, J., et al.: Emergent abilities of large language models. ArXiv abs/2206.07682 (2022)
Google Scholar
Wu, T., et al.: A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA J. Autom. Sinica 10(5), 1122–1136 (2023)
Article Google Scholar
Zhao, W.X., Liu, J., Ren, R., Wen, J.R.: Dense text retrieval based on pretrained language models: A survey. ACM Trans. Inf. Syst. 42(4), 1–60 (2024)
Article Google Scholar
Zhu, Y., et al.: Large language models for information retrieval: a survey. ArXiv abs/2308.07107 (2023)
Google Scholar

Download references

Acknowledgments

This work is supported by the Youth Innovation Promotion Association of the Chinese Academy of Sciences (E1291902), Jun Zhou (2021025).

Author information

Authors and Affiliations

Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100084, China
Wenbo Guan & Jiyu Lu
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, 100049, China
Wenbo Guan & Jiyu Lu
Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, Chinese Academy of Sciences, Beijing, 100190, China
Wenbo Guan, Jiyu Lu & Jun Zhou
School of Computing Science and Engineering, South China University of Technology, Guangzhou, 510006, China
Xiaoqian Li

Authors

Wenbo Guan
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoqian Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiyu Lu
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Zhou .

Editor information

Editors and Affiliations

University of Salford, Salford, Lancashire, UK
Apostolos Antonacopoulos
Indian Institute of Technology Bombay, Mumbai, Maharashtra, India
Subhasis Chaudhuri
Johns Hopkins University, Baltimore, MD, USA
Rama Chellappa
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
IIT Kharagpur, Kharagpur, West Bengal, India
Saumik Bhattacharya
Indian Statistical Institute Kolkata, Kolkata, West Bengal, India
Umapada Pal

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this paper.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guan, W., Li, X., Lu, J., Zhou, J. (2025). : A Voting-Based Paradigm for Enhancing Retrieval Augmented Generation. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15331. Springer, Cham. https://doi.org/10.1007/978-3-031-78119-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-78119-3_9
Published: 05 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78118-6
Online ISBN: 978-3-031-78119-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

: A Voting-Based Paradigm for Enhancing Retrieval Augmented Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Retrieval-Augmented Generation Models with Knowledge Graphs: Innovative Practices Through a Dual-Pathway Approach

Meta-prompting Optimized Retrieval-Augmented Generation

Enhancing Retrieval-Augmented LMs with a Two-Stage Consistency Learning Compressor

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

: A Voting-Based Paradigm for Enhancing Retrieval Augmented Generation

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Enhancing Retrieval-Augmented Generation Models with Knowledge Graphs: Innovative Practices Through a Dual-Pathway Approach

Meta-prompting Optimized Retrieval-Augmented Generation

Enhancing Retrieval-Augmented LMs with a Two-Stage Consistency Learning Compressor

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation