Releases: evanatyourservice/kron_torch
Releases · evanatyourservice/kron_torch
kron-torch 0.3.2
kron-torch 0.3.1
What's Changed
- Improve merge dims option and default to True. Now merging dims finds the most square matrix to reshape grad tensors into.
kron-torch 0.3.0
What's Changed
- Adding distributed versions of PSGD Kron and PSGD One-Sided Kron that use simple pipeline sharding, distributing params across GPUs layer-wise
kron-torch 0.2.9
What's Changed
- merge memory improvement PR from Lucas Nestler @ClashLuke
kron-torch 0.2.6
What's Changed
- Get rid of trust region
- Add normalize grads layer-wise argument
- deterministically update preconditioners for stability
- TODO: update using Lucas Nestler's optimizations
kron-torch 0.2.5
What's Changed
- small improvements
kron-torch 0.2.4
What's Changed
- Efficiency improvements from ClashLuke
- New trust region clipping that needs less (maybe no) tuning
kron-torch 0.2.3
What's Changed
- triton install, 3.0.0
kron-torch 0.2.2
What's Changed
- Trust region clipping improved
- Get rid of max skew triangular and replace with
memory_save_mode
which can be either None to use default triangular preconditioners, 'one_diag' to use one diagonal per layer, or 'all_diag' to use all diagonal preconditioners (fastest/lowest mem but slower learning)
kron-torch 0.2.1
What's Changed
- Better compiling, work with @opooladz