You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I remember when we were working on this together the paper first came out. Anyway you might be interested in my implementation using baukit instead of transformerlens. The advantage is that you don't need to map the weights, just the layer names. It makes exporting the model easier.
Also, the bottleneck here is often storing large activations in mem. Here's a nice way to cache them to disk, so that you can abliterate with larger datasets: https://github.com/wassname/activation_store
I remember when we were working on this together the paper first came out. Anyway you might be interested in my implementation using baukit instead of transformerlens. The advantage is that you don't need to map the weights, just the layer names. It makes exporting the model easier.
https://github.com/wassname/abliterator
The text was updated successfully, but these errors were encountered: