-
Peking University
- Beijing
-
10:53
(UTC +08:00) - https://weizeming.github.io
- @weizeming25
- https://scholar.google.com/citations?user=Kyn1zdQAAAAJ
Lists (1)
Sort Name ascending (A-Z)
Stars
A survey on harmful fine-tuning attack for large language model
To Think or Not to Think: Exploring the Unthinking Vulnerability in Large Reasoning Models
Jailbreaking Leading Safety-Aligned LLMs with Simple Adaptive Attacks [ICLR 2025]
Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.
A curated list of retrieval-augmented generation (RAG) in large language models
"LightRAG: Simple and Fast Retrieval-Augmented Generation"
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses (NeurIPS 2024)
V2C-CBM: Building Concept Bottlenecks with Vision-to-Concept Tokenizer (AAAI 2025)
[NeurIPS 2024] Official implementation for "AgentPoison: Red-teaming LLM Agents via Memory or Knowledge Base Backdoor Poisoning"
The paper list of the 86-page paper "The Rise and Potential of Large Language Model Based Agents: A Survey" by Zhiheng Xi et al.
AmpleGCG: Learning a Universal and Transferable Generator of Adversarial Attacks on Both Open and Closed LLM
Official Repository for The Paper: Safety Alignment Should Be Made More Than Just a Few Tokens Deep
An Open-Ended Embodied Agent with Large Language Models
[ICLR 2025] Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates (Oral)
[NeurIPS 2024] Fight Back Against Jailbreaking via Prompt Adversarial Tuning
A resource repository for representation engineering in large language models
Code for paper 'Batch-ICL: Effective, Efficient, and Order-Agnostic In-Context Learning'
Improved techniques for optimization-based jailbreaking on large language models (ICLR2025)
[ACL 2024] CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal