This repository was archived by the owner on Apr 11, 2025. It is now read-only.
v0.3.0
Overview
- CUDA kernels improvement: support models whose hidden_size can only divisible by 32/64 instead of 256.
- Peft integration: support training and inference using LoRA, AdaLoRA, AdaptionPrompt, etc.
- New models: BaiChuan, InternLM.
- Other updates: see 'Full Change Log' below for details.
Full Change Log
What's Changed
- Pytorch qlinear by @qwopqwop200 in #116
- Specify UTF-8 encoding for README.md in setup.py by @EliEron in #132
- Support cuda 64dim by @qwopqwop200 in #126
- Support 32dim by @qwopqwop200 in #125
- Peft integration by @PanQiWei in #102
- Support setting inject_fused_attention and inject_fused_mlp to False by @TheBloke in #134
- Add transpose operator when replace Conv1d with qlinear_cuda_old by @geekinglcq in #140
- Add support for BaiChuan model by @LaaZa in #164
- Fix error message by @AngainorDev in #141
- Add support for InternLM by @cczhong11 in #189
- Fix stale documentation by @MarisaKirisame in #158
New Contributors
- @EliEron made their first contribution in #132
- @geekinglcq made their first contribution in #140
- @AngainorDev made their first contribution in #141
- @cczhong11 made their first contribution in #189
- @MarisaKirisame made their first contribution in #158
Full Changelog: v0.2.1...v0.3.0