[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to main content

: A Voting-Based Paradigm for Enhancing Retrieval Augmented Generation

  • Conference paper
  • First Online:
Pattern Recognition (ICPR 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15331))

Included in the following conference series:

  • 72 Accesses

Abstract

Retrieval Augmented Generation (RAG) has become a common practice to alleviate the hallucination of Large Language Models (LLMs). The retrieval phase of RAG, however, usually solely depends on the original query, which, to some extent, suffers from the problem of semantic gap and thus degrades the quality of the retrieved external knowledge. To address this problem and enhance the performance of the traditional RAG, we propose a rEwrite-sElect-votE-rEad paradigm ( ) that first paraphrases the original query into N rewritten ones to bridge the semantic gap from different perspectives and then determines the most valuable retrieved external knowledge via a voting manner. Besides, in the midst of the above procedures, a certain query-selecting strategy is also required to filter out the extra noise introduced by the query-rewriting process. Following this proposed paradigm, we provide our implementation of . Experimental results of our implementation on long context reading comprehension datasets from LongBench demonstrate the effectiveness of our proposed paradigm and provide a profound insight into the whole enhanced RAG process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
GBP 19.95
Price includes VAT (United Kingdom)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
GBP 89.99
Price includes VAT (United Kingdom)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
GBP 109.99
Price includes VAT (United Kingdom)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Methods from information retrieval domain such as sparse and dense retrieval methods are widely used in this step.

  2. 2.

    The long passage of a data sample in LCRC task can be regarded as the external non-parameterized knowledge and the question as the original query.

  3. 3.

    Here, the best results among eight experimental settings are reported. In each column, the best result is in bold and results better than baselines are marked with .

  4. 4.

    In each row, the best result is in bold and results better than baseline are marked with .

  5. 5.

    Mean values of results under eight experimental settings are reported in Table 4.

References

  1. Chen, T., et al.: Dense x retrieval: what retrieval granularity should we use? ArXiv abs/2312.06648 (2023)

    Google Scholar 

  2. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long and Short Papers), pp. 4171–4186 (2019)

    Google Scholar 

  3. Formal, T., Lassance, C., Piwowarski, B., Clinchant, S.: SPLADE v2: sparse lexical and expansion model for information retrieval. ArXiv abs/2109.10086 (2021)

    Google Scholar 

  4. Gao, L., Ma, X., Lin, J., Callan, J.: Precise zero-shot dense retrieval without relevance labels. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1762–1777 (2023)

    Google Scholar 

  5. Gao, T., Yao, X., Chen, D.: SimCSE: simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6894–6910 (2021)

    Google Scholar 

  6. Gao, Y., et al.: Retrieval-augmented generation for large language models: a survey. ArXiv abs/2312.10997 (2023)

    Google Scholar 

  7. Hofstätter, S., Lin, S.C., Yang, J.H., Lin, J., Hanbury, A.: Efficiently teaching an effective dense retriever with balanced topic aware sampling. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 113–122 (2021)

    Google Scholar 

  8. Huang, L., et al.: A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. ArXiv abs/2311.05232 (2023)

    Google Scholar 

  9. Izacard, G., et al.: Unsupervised dense information retrieval with contrastive learning. ArXiv abs/2112.09118 (2021)

    Google Scholar 

  10. Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 6769–6781 (2020)

    Google Scholar 

  11. Liu, N.F., et al.: Lost in the middle: how language models use long contexts. Trans. Assoc. Comput. Linguist. 12, 157–173 (2024)

    Article  Google Scholar 

  12. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. ArXiv abs/1907.11692 (2019)

    Google Scholar 

  13. Lu, Y., Bartolo, M., Moore, A., Riedel, S., Stenetorp, P.: Fantastically ordered prompts and where to find them: overcoming few-shot prompt order sensitivity. In: Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 8086–8098 (2022)

    Google Scholar 

  14. Ma, X., Gong, Y., He, P., Zhao, H., Duan, N.: Query rewriting in retrieval-augmented large language models. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 5303–5315 (2023)

    Google Scholar 

  15. Mitra, M., Chaudhuri, B.: Information retrieval from documents: a survey. Inf. Retrieval 2, 141–163 (2000)

    Article  Google Scholar 

  16. Muresanu, A., Thudi, A., Zhang, M.R., Papernot, N.: Unlearnable algorithms for in-context learning. ArXiv abs/2402.00751 (2024)

    Google Scholar 

  17. Nussbaum, Z., Morris, J.X., Duderstadt, B., Mulyar, A.: Nomic embed: training a reproducible long context text embedder. ArXiv abs/2402.01613 (2024)

    Google Scholar 

  18. Peng, W., et al.: Large language model based long-tail query rewriting in taobao search. In: Companion Proceedings of the ACM on Web Conference 2024, pp. 20–28 (2024)

    Google Scholar 

  19. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Inf. Retrieval 3(4), 333–389 (2009)

    Google Scholar 

  20. Sun, Y., et al.: ERNIE: enhanced representation through knowledge integration. ArXiv abs/1904.09223 (2019)

    Google Scholar 

  21. Touvron, H., et al.: LLaMA 2: open foundation and fine-tuned chat models. ArXiv abs/2307.09288 (2023)

    Google Scholar 

  22. Vaswani, A., et al.: Attention is all you need. Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  23. Wang, Y., et al.: Self-instruct: Aligning language models with self-generated instructions. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 13484–13508 (2023)

    Google Scholar 

  24. Wei, J., et al.: Emergent abilities of large language models. ArXiv abs/2206.07682 (2022)

    Google Scholar 

  25. Wu, T., et al.: A brief overview of ChatGPT: the history, status quo and potential future development. IEEE/CAA J. Autom. Sinica 10(5), 1122–1136 (2023)

    Article  Google Scholar 

  26. Zhao, W.X., Liu, J., Ren, R., Wen, J.R.: Dense text retrieval based on pretrained language models: A survey. ACM Trans. Inf. Syst. 42(4), 1–60 (2024)

    Article  Google Scholar 

  27. Zhu, Y., et al.: Large language models for information retrieval: a survey. ArXiv abs/2308.07107 (2023)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the Youth Innovation Promotion Association of the Chinese Academy of Sciences (E1291902), Jun Zhou (2021025).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Zhou .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this paper.

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Guan, W., Li, X., Lu, J., Zhou, J. (2025). : A Voting-Based Paradigm for Enhancing Retrieval Augmented Generation. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15331. Springer, Cham. https://doi.org/10.1007/978-3-031-78119-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78119-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78118-6

  • Online ISBN: 978-3-031-78119-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics