[EPIC][GPU][DT] Bring up GPU data-tiling with reasonable performance #17181
Labels
codegen/rocm
ROCm code generation compiler backend (HIP/HSA)
codegen
Shared code generation infrastructure and dialects
Uh oh!
There was an error while loading. Please reload this page.
Overview
This is the umbrella issue that collects tasks toward phase 1. In the phase 1, we aim to provide a functional data-tiling GPU path with reasonable performance. In this phase, we don't chase for optimal performance. Instead, we want to enable the path for all e2e tracking models.
The reasonable performance means that we should be able to vectorize, and apply vector distribution on data-tiling ops (i.e., pack/unpack/mmt4d-like ops).
ETA: ~1 month
Milestone 1 - enable data-tiling in tests/e2e/matmul test suite
The scope is to compile and execute a linalg.matmul; enable e2e tests. Additionally, we want to extract few matmul ops (potentially with dequant ops) from sdxl and lamma models, and focus on them. To achieve the milestone, the major tasks are:
@bjacob let's share the above tasks between you and me. I'll convert the tasks into issues soon.
Milestone 2 - enable at least one e2e model on benchmark CI
This milestone mainly focus on fusion codegen, which allows us to compile and execute ML workloads. For now, the target is sdxl and sd3.
Major tasks:
Assign @MaheshRavishankar to be contact point for milestone 2, because he is tracking the TilingInterface support. I can jump into some tasks when there is a need.
The text was updated successfully, but these errors were encountered: