Tags: MegEngine/cutlass
Tags
Merge pull request NVIDIA#135 from NVIDIA/cutlass_2.3_final CUTLASS 2.3.0
Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (N… …VIDIA#100) - Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. - Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out - Added test_examples target to build and test all CUTLASS examples - Minor edits to documentation to point to GTC 2020 webinar
Performance enhancement for Volta Tensor Cores TN layout (NVIDIA#53) * Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement. * Updated patch version and changelog. * Updated patch version and changelog. * Added link to changelog in readme. * Fixed markdown link
Performance enhancement for Volta Tensor Cores TN layout (NVIDIA#53) * Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement. * Updated patch version and changelog. * Updated patch version and changelog. * Added link to changelog in readme. * Fixed markdown link
PreviousNext