Abstract
The advent of Large Language Models (LLMs) has led to the development of new Question-Answering (QA) systems based on Retrieval-Augmented Generation (RAG) to incorporate query-specific knowledge at inference time. In this paper, the trustworthiness of RAG systems is investigated, particularly focusing on the performance of their retrieval phase when dealing with sensitive topics. This issue is particularly relevant as it could hinder a user’s ability to analyze sections of the available corpora, effectively biasing any following research. To mimic a specialised library possibly containing sensitive topics, a ḥādīṯ dataset has been curated using an ad-hoc framework called Question-Classify-Retrieve (QCR), which automatically assesses the performance of document retrieval by operating in three main steps: Question Generation, Passage Classification, and Passage Retrieval. Different sentence embedding models for document retrieval were tested showing significant performance gap between sensitive and non-sensitive topics compared to baseline. In real-world applications this would mean relevant documents placed lower in the retrieval list leading to the presence of irrelevant information or the absence of relevant one in case of a lower cut-off.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
the ITSERR research project is funded by the NextGenerationEU program provided by the Italian Ministry of Research, to enhance the European Research Infrastructure RESILIENCE to better meet the needs of the Religious Studies scientific community in terms of technological integration and increased innovative potential.
- 2.
Chunking refers to the common pre-processing step of dividing the documents into smaller chunks of text. This makes embedding vector and retrieved passages more content specific.
- 3.
Accuracy is the base metric used for model evaluation describing the number of correct predictions over all predictions; Precision measures how many of the positive predictions made are true positives; Recall measures how many of the positive cases the classifier correctly predicted, over all the positives in the data. It is sometimes also referred to as Sensitivity; F1 Score is a measure combining both precision and recall. It is generally described as the harmonic mean of the two.
References
Devlin, J., et al.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Ed. by Jill Burstein, Christy Doran, and Thamar Solorio. Minneapolis, Minnesota: Association for Computational Linguistics, June 2019, pp. 4171–4186. https://doi.org/10.18653/v1/N19-1423. https://aclanthology.org/N19-1423
Brown, T., et al.: Language models are few-shot learners. In: Larochelle, H., et al. (eds.) Advances in Neural Information Processing Systems, vol. 33. Curran Associates, Inc., pp. 1877–1901 (2020)
OpenAI: GPT-4 technical report (2024). arXiv: 2303.08774
Zhang, Q., et al.: A survey for efficient open domain question answering. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics, pp. 14447–14465 (2023). https://doi.org/10.18653/v1/2023.acl-long.808, https://aclanthology.org/2023.acl-long.808
Huang, L., et al.: A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions. arXiv:2311.05232 (2023). https://doi.org/10.48550/ARXIV.2311.05232 (visited on 05/10/2024). Publisher: [object Object] Version Number: 1
Xu, Z., Jain, S., Kankanhalli, M.: Hallucination is inevitable: an innate limitation of large language models. arXiv:2401.11817(2024). https://doi.org/10.48550/arXiv.2401.11817. (visited on 05/10/2024)
Lewis, P., et al.: Retrieval-augmented generation for knowledge-intensive NLP tasks. Adv. Neural. Inf. Process. Syst. 33, 9459–9474 (2020)
Gao, Y., et al.: Retrieval-augmented generation for large language models: a survey. arXiv:2312.10997 (2024). https://doi.org/10.48550/arXiv.2312.10997. (visited on 05/10/2024)
Gao, L., et al.: Precise zero-shot dense retrieval without relevance labels. In: Rogers, A., Boyd-Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics, pp. 1762–1777 (2023). https://doi.org/10.18653/v1/2023.acl-long.99, https://aclanthology.org/2023.acl-long.99
Ma, X., et al.: Query rewriting in retrieval-augmented large language models. In: Bouamor, H., Pino, J., Bali, K. (eds.) Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Singapore: Association for Computational Linguistics, pp. 5303–5315 (2023). https://doi.org/10.18653/v1/2023.emnlp-main.322, https://aclanthology.org/2023.emnlp-main.322
Liu, T.-Y., et al.: Learning to rank for information retrieval. In: Foundations and Trends® in Information Retrieval 3.3, pp. 225–331 (2009)
Jones, K.S., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: development and comparative experiments: Part 2. In: Information Processing Management, vol. 36 (2000), pp. 809–840. https://doi.org/10.1016/S0306-4573(00)00016-9
Karpukhin, V., et al.: Dense passage retrieval for open-domain question answering. In: Webber, B., et al. (eds.)Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics (2020), pp. 6769–6781. https://doi.org/10.18653/v1/2020.emnlp-main.550, https://aclanthology.org/2020.emnlp-main.550
Izacard, G., Grave, E.: Leveraging passage retrieval with generative models for open domain question answering. In: Merlo, P., Tiedemann, J., Tsarfaty, R. (eds.)Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: Association for Computational Linguistics, pp. 874–880, (2021). https://doi.org/10.18653/v1/2021.eacl-main.74, https://aclanthology.org/2021.eacl-main.74
Wang, L., et al.: Improving text embeddings with large language models. arXiv preprint arXiv:2401.00368 (2023)
Gallegos, I.O., et al.: Bias and fairness in large language models: a survey. arXiv preprint arXiv:2309.00770 (2023)
Mei, K., Fereidooni, S., Caliskan, A.: Bias against 93 stigmatized groups in masked language models and downstream sentiment classification tasks. In: Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, pp. 1699–1710 (2023)
Mozafari, M., Farahbakhsh, R., Crespi, N.: Hate speech detection and racial bias mitigation in social media based on BERT model. PloS one 15(8), e0237861 (2020)
Rekabsaz, N., Schedl, M.: Do neural ranking models intensify gender bias? In: Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2065–2068 (2020)
Rekabsaz, N., Kopeinik, S., Schedl, M.: Societal biases in retrieved contents: measurement framework and adversarial mitigation of BERT rankers. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 306–316 (2021)
Gallegos, I.O., et al.: Bias and fairness in large language models: a survey. arXiv: 2309.00770 (2024)
Bergamaschi, S., et al.: Preserving and conserving culture: first steps towards a knowledge extractor and cataloguer for multilingual and multi-alphabetic heritages. In: Proceedings of the Conference on Information Technology for Social Good. GoodIT 2021. Roma, Italy: Association for Computing Machinery, pp. 301–304 (2021). ISBN: 9781450384780. https://doi.org/10.1145/3462203.3475927, https://doi.org/10.1145/3462203.3475927
Bergamaschi, S., et al.: Novel perspectives for the management of multilingual and multialphabetic heritages through automatic knowledge extraction: the DigitalMaktaba approach. In: Sensors 22.11 (2022). ISSN: 1424–8220. https://doi.org/10.3390/s22113995, https://www.mdpi.com/1424-8220/22/11/3995
Martoglia, R., et al.: A tool for semiautomatic cataloguing of an islamic digital library: a use case from the digital Maktaba project. In: Paschke, A., et al. (eds.) Proceedings of the Third Conference on Digital Curation Technologies (Qurator 2022), Berlin, Germany, 19th–23rd Sept. 2022, vol. 3234. CEUR Workshop Proceedings. CEUR-WS.org (2022). https://ceur-ws.org/Vol-3234/paper1.pdf
Martoglia, R., et al.: Knowledge extraction, management and longterm preservation of non-Latin cultural heritages - Digital Maktaba project presentation. In:Alessia, B., et al. (eds.) Proceedings of the 19th Conference on Information and Research Science Connecting to Digital and Library Science, vol. 3365. CEUR Workshop Proceedings. ISSN: 1613–0073. Bari, Italy: CEUR, pp. 153–161 (2023). https://ceur-ws.org/Vol-3365/#short11 (visited on 09/14/2023)
El Ganadi, A., et al.: Bridging Islamic knowledge and AI: inquiring ChatGPT on possible categorizations for an Islamic digital library (full paper). In: 2nd Italian Workshop on Artificial Intelligence for Cultural Heritage, co-located with the 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2023), vol. 3536. Rome, Italy, pp. 21–33 (2023). https://ai4ch.di.unito.it/
Abū al-Ḥasan Aḥmad bin Fārsī Ibn Zakariyyā. Mu\(^c\)ǧam maqāyīs
ar. Ed. by \(^{\rm c}\)A. M. Hārūn. Bayrūt: Dār al-fikr (1979)
Muḥammad Murtaḍà al-Ḥuseiynī al-Zubaiydī. Tāǧ al-\(^c\)arūs min ǧawhar al-qaūms. ar. Ed. by Ḥasan Naṣṣār. al-turāth al-\(^{\rm c}\)arabī. Kuwayt: Maṭba\(^{\rm c}\)a ḥukūma Kuwayt (2001)
Encyclopaedia of Islam new edition online (EI-2 English). https://referenceworks.brill.com/display/db/eieo (visited on 05/10/2024)
Siddiqi, M.Z.: Ḥadīth Literature: its origin, development and special features. In: Google-Books-ID: cCnYAAAAMAAJ. Islamic Texts Society (1993). ISBN: 978-0-946621-38-5
Allport, G.W.: Taboo Topics. In: Farberow, N.L. (ed.) Atherton Press, New York (1963). ISBN: 978-1-4128-5236-4
Lee, R.M.: Doing research on sensitive topics. In: Google-Books- ID: AVW_MGH5ZsIC. Sage (1993). ISBN: 978-1-4462-2691-9
Dickson-Swift, V., James, E., Liamputtong, P.: Undertaking sensitive research in the health and social sciences: managing boundaries, emotions and risks. Cambridge: Cambridge University Press (2008). ISBN: 978-0-521-71823-3. https://doi.org/10.1017/CBO9780511545481. (Visited on 05/13/2024)
Touvron, H., et al.: LLama2: open foundation and fine-tuned chat models. arXiv: 2307.09288 (2023)
LLama-3: LLama-3-8B-Instruct 4b-quantized. https://ai.meta.com/blog/meta-llama-3 (visited on 05/03/2024)
Lin, S., Hilton, J., Evans, O.: Teaching models to express their uncertainty in words (2022). arXiv: 2205.14334
Muennighoff, N., et al.: MTEB: Massive Text Embedding Benchmark. In: arXiv preprint arXiv:2210.07316 (2022). https://doi.org/10.48550/ARXIV.2210.07316
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using siamese BERT-networks. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2019). https://arxiv.org/abs/1908.10084
Li, Z., et al.: Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281 (2023)
Wang, L., et al.: Text embeddings by weakly-supervised contrastive pre-training. arXiv preprint arXiv:2212.03533 (2022)
Merrick, L., et al.: Arctic-embed: scalable, efficient, and accurate text embedding models (2024). arXiv: 2405.05374
SFR-Embedding-Mistral: Enhance text retrieval with transfer learning
OpenAI. Embeddings - OpenAI API. https://platform.openai.com/docs/guides/embeddings (visited on 05/13/2024)
Acknowledgments
This work was supported by the PNRR project Italian Strengthening of Esfri RI Resilience (ITSERR) funded by the European Union - NextGenerationEU (CUP:B53C22001770006).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sullutrone, G., Vigliermo, R.A., Sala, L., Bergamaschi, S. (2024). Sensitive Topics Retrieval in Digital Libraries: A Case Study of ḥadīṯ collections. In: Antonacopoulos, A., et al. Linking Theory and Practice of Digital Libraries. TPDL 2024. Lecture Notes in Computer Science, vol 15178. Springer, Cham. https://doi.org/10.1007/978-3-031-72440-4_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-72440-4_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72439-8
Online ISBN: 978-3-031-72440-4
eBook Packages: Computer ScienceComputer Science (R0)