Main improvements
fp32_fast_tf32_t
data type to store data infp32_t
while doing compute-bound operation withintf32_t
data typebf16_t
data typeLlamaForCausalLM
model- QA: automatic testing and linting
What's Changed
- Update readme by @Muxas in #1
- Muxas/sha by @Muxas in #15
- Set up basic GitHub CI workflow by @daskol in #45
- Muxas/license header by @Muxas in #50
- Muxas/simgrid by @Muxas in #51
- Run pre-commit checks on diff with trunk by @daskol in #58
- Fix linting for forks by @daskol in #61
- Fp32 fast tf32 by @Muxas in #60
- Run linting and building workflows on push by @daskol in #65
- Lint everything on merge by @daskol in #66
- Fix linting on merge by @daskol in #67
- Decouple functions from tensor by @svtdanny in #62
- new basic types by @Muxas in #64
- Add BatchNorm2d implementation by @svtdanny in #63
- Fix build by @svtdanny in #70
- Muxas/fix cpp tests by @Muxas in #72
- Fix invlidate_submit bug with FlashAttention logic by @Muxas in #69
- Logger by @Muxas in #68
- Update Dockerfile by @Muxas in #53
- Add support of bf16_t for DeepRelu example by @amkatrutsa in #71
- Fix base_types and tests/kernel/randn by @Muxas in #77
- Run pre-commit for whitespaces and empty lines by @Muxas in #78
- Add default args to strapu::Config and fix gpt2_custom_train by @Muxas in #79
- Improve logger server options by @Muxas in #80
- Set default env values for logger server by @Muxas in #81
- Fix logger/server.py by @Muxas in #82
- Add handy methods for gpt2 class by @svtdanny in #83
- Add support of bf16 for gpt2 training by @amkatrutsa in #87
- Add greedy generation strategy by @svtdanny in #88
- Add upper level Dockerfile, update README by @Muxas in #89
- Add base unoptimized inference engine by @svtdanny in #90
- Rework GPT2 examples and optimizers by @Muxas in #92
- add inference server + example by @svtdanny in #96
- Fix SGD to support bf16 by @amkatrutsa in #97
- Add SiLU activation by @amkatrutsa in #93
- Adjust linting and testing CI workflows by @daskol in #94
- Add workflow for nightly linting and testing by @daskol in #98
- Add rmsnorn and test by @amkatrutsa in #100
- Muxas/gqa by @Muxas in #101
- Refurbish python tests for green trunk by @daskol in #99
- Add handling bus info for logger by @multeng in #103
- add usage memory size handling by @multeng in #107
- Add typing stubs for a native extension by @daskol in #108
- Configure regular typing checks by @daskol in #109
- Incorporate Rotary positional embedding into
LlamaAttention
by @Muxas in #106 - Maintain typing compatibility with
python_version<3.12
by @daskol in #110 - Add Llama model by @amkatrutsa in #111
- Revise LLaMA ingredients testing by @amkatrutsa in #120
- Add model LLaMaForCausalLM and tests by @amkatrutsa in #122
- Svtdanny/kvcache attention by @svtdanny in #121
- Improve utils/constructors.py by @Muxas in #124
- Pin down version of
action/upload-artifact
by @daskol in #132 - Conv2D layer with fwd/bwd, mixed precision and testing by @Muxas in #131
- Svtdanny/dynamic layers by @svtdanny in #123
- Support stride parameter for Conv2d layer by @Muxas in #133
- conv2d dilation parameter by @Muxas in #136
- Add
Add
layer into init.py by @Muxas in #137 - Update Dockerfile and README by @Muxas in #138
- Fix lint of some python files by @Muxas in #139
- Ruff+isort optimizers by @Muxas in #140
- Lint DeepLinear, DeepRelu, MLPMixer models by @Muxas in #141
- Lint many python files by @Muxas in #142
- Lint other not-currently-PRed python files by @Muxas in #143
- Norm-fiber op for batchnorm2d by @gogolgrind in #105
- Add simple GPT2 training example via Jupyter notebook by @Muxas in #144
- Add LLaMa training script by @amkatrutsa in #125
- Add Llama jupyter notebook by @Muxas in #145
- Bump version; add missing copyright headers by @Muxas in #146
New Contributors
- @Muxas made their first contribution in #1
- @daskol made their first contribution in #45
- @svtdanny made their first contribution in #62
- @amkatrutsa made their first contribution in #71
- @multeng made their first contribution in #103
- @gogolgrind made their first contribution in #105
Full Changelog: 1.0.0...1.1.0