ai-evaluation

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

nlp machine-learning gemini llama language-model model-evaluation ai-safety mistral claude disinformation ai-security ai-benchmarks ai-evaluation llm llm-benchmarking gpt4o

Updated Feb 22, 2025

aloth / JudgeGPT

Star

JudgeGPT - (Fake) News Evaluation, a research project

nlp machine-learning mongodb survey research-project fake-news survey-app fake-news-challenge crowdsource-experiments misinformation explainable-ai ai-ethics streamlit streamlit-webapp fake-news-analysis human-ai-interaction ai-evaluation generative-ai

Updated Feb 25, 2025
Python

bigdata-ustc / CAT4AI

Star

Adaptive Testing Framework for AI Models (Psychometrics in AI Evaluation)

psychometrics adaptive-testing ai-evaluation

Updated Oct 1, 2024
Jupyter Notebook

dpc10ster / RJafrocRocBook

Star

ROC methodology explained with R-examples

book roc ai-evaluation

Updated Apr 25, 2024
TeX

dpc10ster / RJafrocFrocBook

Star

FROC methodology explained with R-examples

pdf r book ai-evaluation

Updated Dec 26, 2023
TeX

dpc10ster / RJafrocQuickStart

Star

RJafroc quick start for those already familiar with windows jafroc

r rjafroc ai-evaluation

Updated Dec 28, 2023
TeX

dpc10ster / WindowsJAFROC

Star

Installation files for Windows JAFROC software

windows ai-evaluation jafroc

Updated Feb 8, 2023

dpc10ster / datasets

Star

ROC/FROC datasets from my collaborations

datasets ai-evaluation

Updated Aug 14, 2023

gabrielhamalwa / magpie

Star

Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.

ai-safety personality-traits interpretability cognitive-bias explainability ai-evaluation gpt-models machine-psychology ai-bias psychometric-analysis lwda24

Updated Sep 23, 2024
TeX

Improve this page

Add a description, image, and links to the ai-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ai-evaluation

Here are 13 public repositories matching this topic...

lechmazur / confabulations

rungalileo / agent-leaderboard

METR / vivaria

taoAIGC / AI-Shortcuts

lechmazur / deception

aloth / JudgeGPT

bigdata-ustc / CAT4AI

dpc10ster / RJafrocRocBook

dpc10ster / RJafrocFrocBook

dpc10ster / RJafrocQuickStart

dpc10ster / WindowsJAFROC

dpc10ster / datasets

gabrielhamalwa / magpie

Improve this page

Add this topic to your repo