We present a large-scale conversational RAG benchmark named CORAL and propose a unified framework for standardizing and evaluating various conversational RAG baselines.
-
CORAL: CORAL has five critical features: open-domain coverage, knowledge-intensiveness, freeform response generation, handling of topic shifts, and citation labeling. In CORAL, we evaluate conversational RAG systems across three essential tasks:
(1) Conversational Passage Retrieval: assessing the system’s ability to retrieve relevant information from a large document set based on multi-turn context;
(2) Response Generation: evaluating the system’s capacity to generate accurate, contextually rich answers;
(3) Citation Labeling: ensuring that the generated responses are transparent and grounded by requiring correct attribution of sources. -
Conversational RAG Framework: We develop a unified framework for standardizing and evaluating various conversational RAG baselines, facilitating systematic comparison and advancement in this rapidly evolving field.
-
[05/2025] 🔥 Our dataset has been updated to the second version.
-
[10/2024] 🔥 We introduced CORAL, a conversational RAG dataset.
LDS | SIDS | STRW | DTRW | |||||
---|---|---|---|---|---|---|---|---|
Train | Test | Train | Test | Train | Test | Train | Test | |
# Conversation | 1800 | 200 | 1800 | 200 | 1800 | 200 | 1800 | 200 |
# Turns | 5934 | 651 | 16082 | 1727 | 18165 | 1949 | 19411 | 2153 |
# Turns / Conversation | 3.30 | 3.26 | 8.93 | 8.64 | 10.09 | 9.75 | 10.78 | 10.77 |
# Tokens / Question | 13.70 | 13.89 | 12.62 | 12.64 | 12.72 | 12.88 | 14.15 | 14.75 |
# Tokens / Response | 233.81 | 147.16 | 242.54 | 155.54 | 243.34 | 191.60 | 300.47 | 259.72 |
# Positive passages/ Turn | 3.25 | 2.03 | 2.64 | 1.73 | 3.01 | 2.12 | 3.98 | 3.50 |
CORAL includes 8,000 conversations in jsonline format. Each line in either the train_conversation.json
or test_conversation.json
file follows this structure:
{
"conv_id": "Train_type_convid",
"turns": [
{
"turn_id": 1,
"question": "",
"response": "",
"golden_docs_pids": [],
"golden_docs_text": []
},
{
"turn_id": 2,
"question": "",
"response": "",
"golden_docs_pids": [],
"golden_docs_text": []
},
...
}
git lfs clone https://huggingface.co/datasets/ariya2357/CORAL
Our code is licensed under the MIT License. Our dataset is distributed under the CC BY-SA-4.0 license.
Please kindly cite our paper if helps your research:
@article{coral,
author = {Yiruo Cheng and
Kelong Mao and
Ziliang Zhao and
Guanting Dong and
Hongjin Qian and
Yongkang Wu and
Tetsuya Sakai and
Ji{-}Rong Wen and
Zhicheng Dou},
title = {{CORAL:} Benchmarking Multi-turn Conversational Retrieval-Augmentation
Generation},
journal = {CoRR},
volume = {abs/2410.23090},
year = {2024},
url = {https://doi.org/10.48550/arXiv.2410.23090},
doi = {10.48550/ARXIV.2410.23090},
eprinttype = {arXiv},
eprint = {2410.23090},
timestamp = {Fri, 29 Nov 2024 21:16:27 +0100},
biburl = {https://dblp.org/rec/journals/corr/abs-2410-23090.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}