fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217

jichangjichang · 2020-11-08T15:20:02Z

…eta is zero

aazz44ss

LGTM

jichangjichang · 2020-11-09T03:04:35Z

build rocblas with PR Tensile, rocblas-test results are all pass.
[----------] Global test environment tear-down
[==========] 1260834 tests from 510 test suites ran. (19298742 ms total)
[ PASSED ] 1260834 tests.
rocBLAS version: 2.32.0.2843-1b7ee568 (new Tensile client)

rocblas: https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/rocm-3.10.x
Tensile: https://github.com/ROCmSoftwarePlatform/Tensile/tree/rocm-3.10.x + This commit

rosenrodt · 2020-11-09T07:28:40Z

Are we going to add the test cases in Tensile as well as in rocBLAS?

jichangjichang · 2020-11-09T07:38:43Z

Are we going to add the test cases in Tensile as well as in rocBLAS?
no plan for hotfix
For Tensile develop, we can simply apply init-data of C matrix as NaN for GSU precheck-in test.

jichangjichang · 2020-11-09T10:16:13Z

Failed CI tests are not caused by this PR.

* add reference implementation for summation dimension mirroring * added mirroring dims parameters * set mirroring to false by default * properly extract mirror dims from ProblemType * mirror dims source writer initial impl for A * implement MirrorDimsA for LocalSplitU > 1 * fix tail loop global read pointer bounds check * mirror dims source writer initial impl for B * don't mirror the resulting tile when shifting the components * add a failing test case for mirror dims (works for A, needs to be fixed for B) * fixed tail loop * fix tail loop read offsets * update test cases: mirroring works for 2x2 and 2x4 but not 4x2 tiles * implement MirrorDimsA for not-LocalSplitU assembly kernel * add a test suite for mirroring with LocalSplitU * fixed mirroring code gen conditions * implement MirrorDimsB for LocalSplitU > 1 * fixed data writer with B0_E1 label * correct mirror the resulting tile (assembly kernel) * fix cpu reference for contractions with > 1 summation dimension * higher-dimensional mirroring: implement unroll dimension mirroring for tensor A * higher-dimensional mirroring: implement unroll dimension mirroring for tensor B * explicit reverse values in th thread tile * properly guard read address in tail loop * don't mirror non-summation dimensions for tensor A `MirrorDimsB` needs additional work, especially for cases with LocalSplitU > 1 since it was implemented slightly differently (perhaps this is a good time to ensure the implementations match each other?) * don't mirror non-summation dimensions in the assembly kernel writer * updated assembly mirroring test cases * extended mirror dims implementation for the 2sum * updated tests for the 2sum assembly implementation * fixed tile loop for the 2sum, 3sum and etc * fixed inc for the mirrored summation dims * fixed gro for several mirror sum dims * use gra increment for unrolled dims instead of changing incSrd logic * support packSumDims with griUseSgpr * capitalize mirrored dimension index names in operationIdentifier * fixed increment with stagger iter * correctly read values with graIdx>0 * Fix other sum idx mirroring with psd * update assembly mirror test cases * Fix use of the same sign-extend reg with psd and mirroring * correct buffer loading with SgprGRO * don't mirror non-summation dimensions for tensor B * updated mirror dims tests * support mirroring with zeroPad * Fix mirror other summation dims in the Source writer * Fix several sum dim mirroring * support mirroring with zeroPad in the Source writer * update mirror test cases * Fix global read with nrcv > 1 for mirror B * remove *-mirror-dims client args (operationIdentifier specifies the mirrored dimensions now) * incorrect mirror check in the graUnrollOffset fixed * revert tensor output back * fixed typo in variable offsetIsVgpr * fixed incorrect removing of one iteration for negative numbers when calculation WrapU * fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when beta is zero (#1217) * fix mirroring for summation dim * fixed outdated assertion Co-authored-by: johnny-keker <giomail.iv@gmail.com> Co-authored-by: Slimakanzer <gleb.larochkin@gmail.com> Co-authored-by: Lapo Lapo <glarochk@amd.com>

fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b…

d8725cc

…eta is zero

jichangjichang requested review from bragadeesh, solaslin, ramjana, aazz44ss, rosenrodt, zaliu, imcarsonliao and AlexBrownAMD November 8, 2020 15:20

aazz44ss approved these changes Nov 8, 2020

View reviewed changes

imcarsonliao approved these changes Nov 9, 2020

View reviewed changes

zaliu approved these changes Nov 9, 2020

View reviewed changes

jichangjichang added bug Bug fix Hotfix Hotfix to quickly address breakage labels Nov 9, 2020

jichangjichang merged commit ab44bf4 into ROCm:rocm-3.10.x Nov 9, 2020

jichangjichang mentioned this pull request Nov 9, 2020

[Hotfix] update tensile-tag for SWDEV-257443 ROCm/rocBLAS#1176

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217

fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217

fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!