TruthEval

Using the perturbation pipeline

Using the cost-efficient Factual Correctness metric

Datasets

We are also open-sourcing the datasets we used to access the quality of our pipeline. In short, we asked annotators to compare the perceived quality of answers between experts and pipeline generated. The annotators needed to decide which option aligned best to a specific set of guidelines (which can be found at our paper; see Appendix C). The annotators have the alternative of accepting both options (if they had perceived similar quality) or rejecting them both (if they both didn't comply with guidelines).

├── datasets
│   ├── evaluation                         # datasets used for evaluating LLMs and other techniques (Section 5) 
│   │   ├── dataset.json                   # the pipeline generated dataset (with A0 -> A4)
│   │   ├── factual_correctness_eval.jsonl # evaluation for fast-fc (our cost efficient implementation and ragas (default)
│   │   ├── gold-dataset.json              # set of Question and groundtruths sampled from Google's Natural Questions dataset
│   │   ├── llm_as_judge_eval.jsonl        # evaluation of several LLMs for factual correctness
│   │   ├── report.json                    # detailed report of the question transformations 
│   ├── human-assessment                   # datasets used for validating the quality of the pipeline (Section 4)
│   │   ├── assessment-dataset.json        # set of Q&As manually fabricated by experts (including A0 -> A4) with alternative versions produced by our pipeline
│   │   ├── report.json                    # the pipeline report with details about the incremental changes when producing the "ai" responses in assessment-dataset.json
│   │   ├── results-evaluator-1.json       # assessment from evaluator 1 (preferences)
│   │   ├── results-evaluator-2.json       # assessment from evaluator 2 (preferences)

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
datasets		datasets
llm_judge		llm_judge
main_pipeline		main_pipeline
truthbench		truthbench
truthscore		truthscore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation< 875C /h2>
README
License

TruthEval

Using the perturbation pipeline

Using the cost-efficient Factual Correctness metric

Datasets

UI

Prompts

Cite this work

About

Uh oh!

Releases

Packages

Languages

License

GiovanniGatti/trutheval

Folders and files

Latest commit

History

Repository files navigation< 875C /h2>READMELicense

TruthEval

Using the perturbation pipeline

Using the cost-efficient Factual Correctness metric

Datasets

UI

Prompts

Cite this work

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Repository files navigation< 875C /h2>
README
License

Packages