8000 GitHub - cjmcv/flux at pure
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ flux Public
forked from bytedance/flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

License

Notifications You must be signed in to change notification settings

cjmcv/flux

 
 

Repository files navigation

Cutlass Operators

Install from Source

git clone 

# For Ampere(sm80) GPU
./build.sh --arch 80 --jobs 6
# For Ada Lovelace(sm89) GPU
./build.sh --arch 89 --jobs 6
# For Hopper(sm90) GPU
./build.sh --arch 90 --jobs 6

compute-sanitizer --tool memcheck python tools/test*.py

Dependencies

  1. CUTLASS: Flux leverages CUTLASS to generate high-performance GEMM kernels. We currently use CUTLASS 3.7.0 and a tiny patch should be applied to CUTLASS.

Quick Start

# Generate search_space_gemmnormal.cu 
# Move it to src/ops/gemm_normal/tuning_config, and compile the library again.
python3 tools/gen_search_space.py --schema=GemmNormal

# Generate tuned_config_gemmnormal.cu
# Move it to src/ops/gemm_normal/tuning_config, and compile the library again.
python3 tools/tuning/tune_gemm_normal.py --schema=GemmNormal

# Now you can test it.
python3 tools/test_gemm_normal.py 100 12288 6144 --dtype=float16

About

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 61.4%
  • Cuda 29.2%
  • Python 8.1%
  • CMake 0.8%
  • Shell 0.4%
  • C 0.1%
0