Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

727 65 Updated May 23, 2025

langchain-ai / rag-from-scratch

Jupyter Notebook 4,218 1,220 Updated Jul 9, 2024

coree / awesome-rag

A curated list of retrieval-augmented generation (RAG) in large language models

278 21 Updated Feb 14, 2025

HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"

Python 17,200 2,371 Updated Jun 6, 2025

sail-sg / I-FSJ

Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)

Jupyter Notebook 61 9 Updated Jan 11, 2025

riverback / V2C-CBM

V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer (AAAI 2025)

Python 44 1 Updated Feb 24, 2025

OSU-NLP-Group / EIA_against_webagent

Python 25 2 Updated Oct 2, 2024

AI-secure / AgentPoison

[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"

Python 128 17 Updated Apr 12, 2025

WooooDyy / LLM-Agent-Paper-List

The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.

7,700 463 Updated Jul 28, 2024

OSU-NLP-Group / AmpleGCG

AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM

Python 64 7 Updated Nov 3, 2024

OSU-NLP-Group / AgentAttack

19 2 Updated Oct 25, 2024

agiresearch / ASB

Agent Security Bench (ASB)

Python 81 5 Updated May 3, 2025

Unispac / shallow-vs-deep-alignment

Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep

Python 127 9 Updated Apr 23, 2025

MineDojo / Voyager

An Open-Ended Embodied Agent with Large Language Models

JavaScript 6,163 582 Updated Apr 3, 2024

mo666666 / SGM

5 Updated Oct 17, 2024

sail-sg / Cheating-LLM-Benchmarks

[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)

Jupyter Notebook 78 Updated Oct 23, 2024

suffix-maybe-feature / adver-suffix-maybe-features

Python 5 Updated Oct 4, 2024

rain152 / PAT

[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning

Python 10 1 Updated Oct 29, 2024

chrisliu298 / awesome-representation-engineering

A resource repository for representation engineering in large language models

123 5 Updated Nov 14, 2024

Cardinalere / Batch-ICL

Code for paper 'Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning'

Python 16 1 Updated Apr 19, 2024

jiaxiaojunQAQ / I-GCG

Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)

Python 103 7 Updated Apr 7, 2025

AI45Lab / CodeAttack

[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

Python 43 6 Updated Oct 25, 2024

centerforaisafety / HarmBench

HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

Jupyter Notebook 655 90 Updated Aug 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zeming Wei weizeming

Achievements

Achievements

Block or report weizeming

Lists (1)

🚀 My stack

Stars

BMPixel / safety-residual-space

git-disl / awesome_LLM-harmful-fine-tuning-papers

RapidResponseBench / rapidresponsebench

zihao-ai / unthinking_vulnerability

IAAR-Shanghai / SafeRAG

tml-epfl / llm-adaptive-attacks

ethz-spylab / autoadvexbench

yueliu1999 / Awesome-Jailbreak-on-LLMs