Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

741 66 Updated Jun 10, 2025

langchain-ai / rag-from-scratch

Jupyter Notebook 4,248 1,231 Updated Jul 9, 2024

coree / awesome-rag

A curated list of retrieval-augmented generation (RAG) in large language models

279 21 Updated Feb 14, 2025

HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python 17,420 2,405 Updated Jun 11, 2025

sail-sg / I-FSJ

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)

Jupyter Notebook 61 10 Updated Jan 11, 2025

riverback / V2C-CBM

V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer (AAAI 2025)

Python 44 1 Updated Feb 24, 2025

OSU-NLP-Group / EIA_against_webagent

Python 25 2 Updated Oct 2, 2024

AI-secure / AgentPoison

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

Python 128 17 Updated Apr 12, 2025

WooooDyy / LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

7,718 468 Updated Jul 28, 2024

OSU-NLP-Group / AmpleGCG

AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM

Python 65 7 Updated Nov 3, 2024

OSU-NLP-Group / AgentAttack

19 2 Updated Oct 25, 2024

agiresearch / ASB

Agent Security Bench (ASB)

Python 85 5 Updated May 3, 2025

Unispac / shallow-vs-deep-alignment

Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Python 132 10 Updated Apr 23, 2025

MineDojo / Voyager

An Open-Ended Embodied Agent with Large Language Models

JavaScript 6,175 583 Updated Apr 3, 2024

mo666666 / SGM

5 Updated Oct 17, 2024

sail-sg / Cheating-LLM-Benchmarks

[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)

Jupyter Notebook 78 Updated Oct 23, 2024

suffix-maybe-feature / adver-suffix-maybe-features

Python 5 Updated Oct 4, 2024

rain152 / PAT

[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning

Python 10 1 Updated Oct 29, 2024

chrisliu298 / awesome-representation-engineering

A resource repository for representation engineering in large language models

124 5 Updated Nov 14, 2024

Cardinalere / Batch-ICL

Code for paper 'Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning'

Python 16 1 Updated Apr 19, 2024

jiaxiaojunQAQ / I-GCG

Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)

Python 105 7 Updated Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zeming Wei weizeming

Achievements

Achievements

Block or report weizeming

Lists (1)

🚀 My stack

Stars

lsdefine / simple_GRPO

wagner-group / JailbreaksOverTime

BMPixel / safety-residual-space

git-disl / awesome_LLM-harmful-fine-tuning-papers

RapidResponseBench / rapidresponsebench

zihao-ai / unthinking_vulnerability

IAAR-Shanghai / SafeRAG

tml-epfl / llm-adaptive-attacks

ethz-spylab / autoadvexbench

yueliu1999 / Awesome-Jailbreak-on-LLMs