8000 Tags · MegEngine/cutlass · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: MegEngine/cutlass

Tags

v2.3.0

Toggle v2.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request NVIDIA#135 from NVIDIA/cutlass_2.3_final

CUTLASS 2.3.0

v2.2.0

Toggle v2.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>. (N…

…VIDIA#100)

- Updated mma_sm80.h to avoid perf penalty due to reinterpret_cast<>.
- Enhancement to CUTLASS Utility Library's HostTensorPlanarComplex template to support copy-in and copy-out
- Added test_examples target to build and test all CUTLASS examples
- Minor edits to documentation to point to GTC 2020 webinar

v2.1.0

Toggle v2.1.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
update tools/library/CMakeLists to require python 3.6 according to NV…

…IDIA#70 (NVIDIA#82)

NVIDIA#70 only updates the documentation. This commit reflects this bump in python version to the CMake configuration as well.

v2.0.0

Toggle v2.0.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Need Python 3.6 to use enum.auto() (NVIDIA#70)

v1.3.3

Toggle v1.3.3's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Performance enhancement for Volta Tensor Cores TN layout (NVIDIA#53)

* Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement.

* Updated patch version and changelog.

* Updated patch version and changelog.

* Added link to changelog in readme.

* Fixed markdown link

v1.3.2

Toggle v1.3.2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Performance enhancement for Volta Tensor Cores TN layout (NVIDIA#53)

* Fixed performance defect with indirect access to pointer array for Volta TensorCores TN arrangement.

* Updated patch version and changelog.

* Updated patch version and changelog.

* Added link to changelog in readme.

* Fixed markdown link

v1.3.0

Toggle v1.3.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Cutlass 1.3 Release (NVIDIA#42)

CUTLASS 1.3 Release
- Efficient GEMM kernel targeting Volta Tensor Cores via mma.sync instruction added in CUDA 10.1.

v1.2.0

Toggle v1.2.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request NVIDIA#33 from NVIDIA/cutlass_1.2

CUTLASS 1.2

v1.1.0

Toggle v1.1.0's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request NVIDIA#28 from NVIDIA/cutlass_1.1

Fixed typeo

v1.0.1

Toggle v1.0.1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request NVIDIA#15 from NVIDIA/release_1.0.1_edits

Minor edits to README and changelog pursuant CUTLASS 1.0.1 patch.
0