-
SAE-Probes Public
Forked from JoshEngels/SAE-ProbesCode for reproducing our paper "Are Sparse Autoencoders Useful? A Case Study in Sparse Probing"
Jupyter Notebook UpdatedMay 30, 2025 -
-
feature-hedging-paper Public
Code for the paper "Feature Hedging: Correlated Features Break Narrow Sparse Autoencoders"
Python MIT License UpdatedMay 21, 2025 -
criminle Public
Wordle-inspired country guessing game, made in 30 minutes of vibe-coding with Cursor/Claude
TypeScript UpdatedMay 16, 2025 -
-
-
SAELens Public
Forked from jbloomAus/SAELensTraining Sparse Autoencoders on Language Models
Jupyter Notebook MIT License UpdatedMay 5, 2025 -
dictionary_learning Public
Forked from saprmarks/dictionary_learningPython MIT License UpdatedFeb 12, 2025 -
matryoshka-saes Public
Forked from noanabeshima/matryoshka-saes -
-
hanzi-writer Public
Chinese character stroke order animations and practice quizzes
-
transformers Public
Forked from huggingface/transformers🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Python Apache License 2.0 UpdatedNov 5, 2024 -
amr-logic-converter Public
Convert Abstract Meaning Representation (AMR) into first-order logic
-
linear-relational Public
Linear Relational Embeddings (LREs) and Linear Relational Concepts (LRCs) for LLMs in PyTorch
-
-
-
TransformerLens Public
Forked from TransformerLensOrg/TransformerLensA library for mechanistic interpretability of GPT-style language models
Python MIT License UpdatedJul 25, 2024 -
automated-interpretability Public
Forked from hijohnnylin/automated-interpretability -
LLM_Categorical_Hierarchical_Representations Public
Forked from KihoPark/LLM_Categorical_Hierarchical_RepresentationsJupyter Notebook UpdatedJun 4, 2024 -
feature-circuits Public
Forked from saprmarks/feature-circuitsPython MIT License UpdatedApr 21, 2024 -
-
-
GENIES Public
Forked from Joshuaclymer/GENIESGeneralization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains
Python UpdatedFeb 21, 2024 -
hanzi-writer-miniprogram Public archive
Wechat Miniprogram plugin for Hanzi Writer (微信小程序组件)
-
-
penman-js Public
Abstract Meaning Representation (AMR) parser and generator for Javascript
-
penman Public
Forked from goodmami/penmanPENMAN notation (e.g. AMR) in Python
Python MIT License UpdatedJan 2, 2024 -
amr-vscode Public
VSCode language definition for abstract meaning representation (AMR)
-
SycophancySteering Public
Forked from nrimsky/CAAModulating sycophancy in llama-2 via activation steering
Jupyter Notebook UpdatedDec 3, 2023 -