-
Notifications
You must be signed in to change notification settings - Fork 166
fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
build rocblas with PR Tensile, rocblas-test results are all pass. rocblas: https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/rocm-3.10.x |
Are we going to add the test cases in Tensile as well as in rocBLAS? |
|
Failed CI tests are not caused by this PR. |
* add reference implementation for summation dimension mirroring * added mirroring dims parameters * set mirroring to false by default * properly extract mirror dims from ProblemType * mirror dims source writer initial impl for A * implement MirrorDimsA for LocalSplitU > 1 * fix tail loop global read pointer bounds check * mirror dims source writer initial impl for B * don't mirror the resulting tile when shifting the components * add a failing test case for mirror dims (works for A, needs to be fixed for B) * fixed tail loop * fix tail loop read offsets * update test cases: mirroring works for 2x2 and 2x4 but not 4x2 tiles * implement MirrorDimsA for not-LocalSplitU assembly kernel * add a test suite for mirroring with LocalSplitU * fixed mirroring code gen conditions * implement MirrorDimsB for LocalSplitU > 1 * fixed data writer with B0_E1 label * correct mirror the resulting tile (assembly kernel) * fix cpu reference for contractions with > 1 summation dimension * higher-dimensional mirroring: implement unroll dimension mirroring for tensor A * higher-dimensional mirroring: implement unroll dimension mirroring for tensor B * explicit reverse values in th thread tile * properly guard read address in tail loop * don't mirror non-summation dimensions for tensor A `MirrorDimsB` needs additional work, especially for cases with LocalSplitU > 1 since it was implemented slightly differently (perhaps this is a good time to ensure the implementations match each other?) * don't mirror non-summation dimensions in the assembly kernel writer * updated assembly mirroring test cases * extended mirror dims implementation for the 2sum * updated tests for the 2sum assembly implementation * fixed tile loop for the 2sum, 3sum and etc * fixed inc for the mirrored summation dims * fixed gro for several mirror sum dims * use gra increment for unrolled dims instead of changing incSrd logic * support packSumDims with griUseSgpr * capitalize mirrored dimension index names in operationIdentifier * fixed increment with stagger iter * correctly read values with graIdx>0 * Fix other sum idx mirroring with psd * update assembly mirror test cases * Fix use of the same sign-extend reg with psd and mirroring * correct buffer loading with SgprGRO * don't mirror non-summation dimensions for tensor B * updated mirror dims tests * support mirroring with zeroPad * Fix mirror other summation dims in the Source writer * Fix several sum dim mirroring * support mirroring with zeroPad in the Source writer * update mirror test cases * Fix global read with nrcv > 1 for mirror B * remove *-mirror-dims client args (operationIdentifier specifies the mirrored dimensions now) * incorrect mirror check in the graUnrollOffset fixed * revert tensor output back * fixed typo in variable offsetIsVgpr * fixed incorrect removing of one iteration for negative numbers when calculation WrapU * fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when beta is zero (#1217) * fix mirroring for summation dim * fixed outdated assertion Co-authored-by: johnny-keker <giomail.iv@gmail.com> Co-authored-by: Slimakanzer <gleb.larochkin@gmail.com> Co-authored-by: Lapo Lapo <glarochk@amd.com>
…eta is zero