8000 fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… by jichangjichang · Pull Request #1217 · ROCm/Tensile · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when b… #1217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jichangjichang
Copy link
Contributor

…eta is zero

Copy link
Contributor
@aazz44ss aazz44ss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jichangjichang
Copy link
Contributor Author

build rocblas with PR Tensile, rocblas-test results are all pass.
[----------] Global test environment tear-down
[==========] 1260834 tests from 510 test suites ran. (19298742 ms total)
[ PASSED ] 1260834 tests.
rocBLAS version: 2.32.0.2843-1b7ee568 (new Tensile client)

rocblas: https://github.com/ROCmSoftwarePlatform/rocBLAS/tree/rocm-3.10.x
Tensile: https://github.com/ROCmSoftwarePlatform/Tensile/tree/rocm-3.10.x + This commit

@rosenrodt
Copy link
Contributor

Are we going to add the test cases in Tensile as well as in rocBLAS?

@jichangjichang
Copy link
Contributor Author

Are we going to add the test cases in Tensile as well as in rocBLAS?
no plan for hotfix
For Tensile develop, we can simply apply init-data of C matrix as NaN for GSU precheck-in test.

@jichangjichang jichangjichang added bug Bug fix Hotfix Hotfix to quickly address breakage labels Nov 9, 2020
@jichangjichang
Copy link
Contributor Author

Failed CI tests are not caused by this PR.

@jichangjichang jichangjichang merged commit ab44bf4 into ROCm:rocm-3.10.x Nov 9, 2020
zaliu pushed a commit that referenced this pull request Nov 19, 2020
* add reference implementation for summation dimension mirroring
* added mirroring dims parameters
* set mirroring to false by default
* properly extract mirror dims from ProblemType
* mirror dims source writer initial impl for A
* implement MirrorDimsA for LocalSplitU > 1
* fix tail loop global read pointer bounds check
* mirror dims source writer initial impl for B
* don't mirror the resulting tile when shifting the components
* add a failing test case for mirror dims (works for A, needs to be fixed for B)
* fixed tail loop
* fix tail loop read offsets
* update test cases: mirroring works for 2x2 and 2x4 but not 4x2 tiles
* implement MirrorDimsA for not-LocalSplitU assembly kernel
* add a test suite for mirroring with LocalSplitU
* fixed mirroring code gen conditions
* implement MirrorDimsB for LocalSplitU > 1
* fixed data writer with B0_E1 label
* correct mirror the resulting tile (assembly kernel)
* fix cpu reference for contractions with > 1 summation dimension
* higher-dimensional mirroring: implement unroll dimension mirroring for tensor A
* higher-dimensional mirroring: implement unroll dimension mirroring for tensor B
* explicit reverse values in th thread tile
* properly guard read address in tail loop
* don't mirror non-summation dimensions for tensor A
`MirrorDimsB` needs additional work, especially for cases with LocalSplitU > 1
since it was implemented slightly differently (perhaps this is a good
time to ensure the implementations match each other?)
* don't mirror non-summation dimensions in the assembly kernel writer
* updated assembly mirroring test cases
* extended mirror dims implementation for the 2sum
* updated tests for the 2sum assembly implementation
* fixed tile loop for the 2sum, 3sum and etc
* fixed inc for the mirrored summation dims
* fixed gro for several mirror sum dims
* use gra increment for unrolled dims instead of changing incSrd logic
* support packSumDims with griUseSgpr
* capitalize mirrored dimension index names in operationIdentifier
* fixed increment with stagger iter
* correctly read values with graIdx>0
* Fix other sum idx mirroring with psd
* update assembly mirror test cases
* Fix use of the same sign-extend reg with psd and mirroring
* correct buffer loading with SgprGRO
* don't mirror non-summation dimensions for tensor B
* updated mirror dims tests
* support mirroring with zeroPad
* Fix mirror other summation dims in the Source writer
* Fix several sum dim mirroring
* support mirroring with zeroPad in the Source writer
* update mirror test cases
* Fix global read with nrcv > 1 for mirror B
* remove *-mirror-dims client args (operationIdentifier specifies the mirrored dimensions now)
* incorrect mirror check in the graUnrollOffset fixed
* revert tensor output back
* fixed typo in variable offsetIsVgpr
* fixed incorrect removing of one iteration for negative numbers when calculation WrapU
* fix GSU bug: PostGSU kernel refer to Nan data of C matrix even when beta is zero (#1217)
* fix mirroring for summation dim
* fixed outdated assertion

Co-authored-by: johnny-keker <giomail.iv@gmail.com>
Co-authored-by: Slimakanzer <gleb.larochkin@gmail.com>
Co-authored-by: Lapo Lapo <glarochk@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug fix Hotfix Hotfix to quickly address breakage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0