8000
Announcements
--iree-hal-target-backends
deprecation
The --iree-hal-target-backends
option is now deprecated and some ways to specify target backends like llvm-cpu
in the --iree-hal-target-device
flag have been removed. See: #20295.
In practice, this means the following replacements:
Old Flag | New Flag |
---|---|
--iree-hal-target-backends | --iree-hal-target-devices |
--iree-hal-target-device=llm-cpu | --iree-hal-target-device=local --iree-hal-local-target-device-backends=llvm-cpu |
--iree-hal-target-backend=rocm | --iree-hal-target-device=hip |
Migration of Flow dependencies in Stream/Codegen to new TensorExt dialect
This update migrates the following Flow ops and types to a new TensorExt dialect to avoid any Flow dependency in Codegen and reduce Flow dependencies in Stream:
- IREE::Flow::DispatchWorkloadOrdinalOp
- IREE::Flow::DispatchTensorLoadOp
- IREE::Flow::DispatchTensorStoreOp
- IREE::Flow::DispatchWorkgroupCountFromSliceOp
- IREE::Flow::DispatchWorkgroupCountFromDagRootOp
- IREE::Flow::DispatchTensorType
Instructions for migrating downstream codebases can be found at #20564 (comment).
Notable changes
Compiler
- Dispatch creation process improvements including adding support for hoisting Flow ops #20346, skipping Pad decomposition if already in dispatch #20373, enabling more reshape movement #20320, allowing collapsing of more reduction operations #20544, and hoisting leaf
expand_shapes
out before collapsing dimensions #20552 - Data tiling optimizations including adding support for parallel generic ops materialization patterns for GPU #20316, migrating
round_to_dims
toiteration_sizes
#20459, splitting outiree_encoding.layout
fromEncodingAttr
, as it was encoding more information than needed #20507 - Completed the plumbing through of matmul_k encoding for padding approach (reduction dims) #20493
- Enhancements to code generation including merging multiple
transform.foreach_match
operations found across different inner modules into a single consolidated operation inside a newly created top-level__kernel_config
NamedSequenceOp
#20127, adding pass for interpreting transform specs on lowering configs #20408, adding op for swizzling memory accesses #20547, updating the kernel configuration in the case of multiple reduction dimensions #20585, adding option to expand subview metadata #20591, removing dependencies on Flow #20615, adding ops to translate between tensor and memref #20619, and adding placeholder op for buffer casts on tensors #20589 - Updates for AMDGPU support including adding placeholder op for buffer casts on tensors #20589, and optimizing masked transfer read in presence of fat raw buffers #20604
- Linalg Extensions Dialect improvements (#19226, #20358, #20411, #20460, #20462, #20464, #20670, #20640)
Runtime
- HIP runtime bug fixes and performance improvements including plumbing buffer allocation responsibility through the runtime and up into the lower-level stream dialect #20240, re-enabling peering for compatible devices #20481, adding iree_hal_semaphore_import/export_timepoint API #20560
Developer Tools
- Updates to workflow CI and test suites (#20319, #20327, #20306, #20329, #20295, #20351, #20426, #20465, #20629)
New Contributors
- @FantasqueX made their first contribution in #20270
- @Muzammiluddin-Syed-ECE made their first contribution in #20433
- @geomin12 made their first contribution in #20194
- @lpy made their first contribution in #20456
- @giacs-epic made their first contribution in #20516
- @Alex-Vasile made their first contribution in #20337
- @amd-justchen made their first contribution in #20665
Full changelog
List of changes
Full list of changes: v3.3.0...v3.4.0
What's Changed
- Moving func-to-util patterns to Util/Conversion/. by @benvanik in #20262
- Delete flaky gisel_abort test. by @ScottTodd in #20319
- Skip flaky test_top_k onnx op test on rdna3. by @ScottTodd in #20327
- Run
windows_x64_msvc
on postsubmit and opt-in on presubmit. by @ScottTodd in #20306 - [DispatchCreation] Delete pad->set_encoding fusion logic. by @hanhanW in #20324
- Fix asm parsing for
Util_PreprocessingPipelineAttr
by @zjgarvey in #20314 - [DataTiling][Codegen] Use iree.encoding.resolver to decide encoding resolvers by @Max191 in #20271
- [Encoding][NFC] Use testing_encoding<> in lit tests when possible. by @hanhanW in #20329
- [Codegen] Keep lowering config when decomposing linalg.pack by @Max191 in #20311
- [Encoding] Delete the allocation support of roundDimsTo field. by @hanhanW in #20332
- [Flow][Stream][NFC] Use parser.parseArgument() for custom parsers. by @hanhanW in #20333
- Replacing use of iree-hal-target-backends in most tests. by @benvanik in #20295
- Extend IGEMM utils to support conv2d_chwn_chwf ops by @yzhang93 in #20330
- Limit concurrency of ci_windows_x64_msvc. by @ScottTodd in #20338
- [DT] Add more folders to MaterializeEncoding pass cleanup patterns. by @hanhanW in #20334
- Integrate llvm-project@857a04cd by @qedawkins in #20336
- Revert recent Windows CI changes, moving back to nightly only. by @ScottTodd in #20342
- [Stream] Add support for materializing stream.tensor.encode ops. by @hanhanW in #20321
- [Codegen][Tuner] merge the default td specs by @bangtianliu in #20127
- [GlobalOpt] Add option to propagate transposes through conv by @nirvedhmeshram in #20339
- [DispatchCreation] Convert top-level encoding ops to flow.tensor.encode. by @hanhanW in #20300
- Do not wrap a single set_encoding op into flow.dispatch.region. by @hanhanW in #20322
- Add pass to propagate tranposes pass when making single dispatch from function by @nirvedhmeshram in #20348
- Avoid nested fashion traversal in ModuleOp when possible. by @hanhanW in #20344
- Only fuse gather-like producer for query operand of attention ops. by @MaheshRavishankar in #19829
- Mark tensorflow_hub_import.ipynb expected to pass. by @ScottTodd in #20356
- Fixing iree.build target flags, I think. by @benvanik in #20355
- Bump version to 3.4.0 after 3.3.0 release. by @ScottTodd in #20360
- Avoid reenabling ASM by @FantasqueX in #20270
- Only build elf_module_test when IREE_BUILD_TESTS is ON by @FantasqueX in #20351
- [Dispatch Creation] Allow cloning attention k & v bit-extend by @IanWood1 in #20364
- Move sharktank and regression-test MI300 jobs to the ossci cluster. by @Eliasj42 in #20359
- Mark e2e testonly by @FantasqueX in #20370
- [DispatchCreation] Add support for hoisting Flow ops. by @hanhanW in #20346
- Bump the github-actions group with 3 updates by @dependabot in #20366
- Mark flags_demo testonly by @FantasqueX in #20369
- [DispatchCreation] Skip Pad decomposition within dispatches. by @MaheshRavishankar in #20373
- Do not wrap single unset_encoding op into dispatch region. by @hanhanW in #20377
- [python] Do not conflate hip and amdgpu targets in iree.build by @sogartar in #20372
- [DT] Add parallel generic op materialization pattern for GPU by @jtuyls in #20316
- Do not set CMAKE_CXX_STANDARD as CACHE by @FantasqueX in #20261
- Plumbing buffer allocation responsibility through the runtime and up into the lower-level stream dialect. by @benvanik in #20240
- Only pin affinity during analysis if an affinity is specified. by @benvanik in #20382
- Integrate LLVM at 94783a8199c5e589d8efd6d4530482d72bf98f4d by @andfau-amd in #20375
- Revert "Integrate LLVM at 94783a8199c5e589d8efd6d4530482d72bf98f4d" by @andfau-amd in #20399
- Add passes to do more preprocessing when creating single dispatch by @nirvedhmeshram in #20394
- IREE_BUILD_COMPILER doesn't require nanobind by @FantasqueX in #20403
- Use a better way to determine C11 support by @FantasqueX in #20384
- [Codegen][GPU] Allow pre-distributed multi_mma ops in distribution by @qedawkins in #20407
- [Util] Add transform op for getting nearby symbols by @qedawkins in #20405
- Do more aggressive reshape propagation in single-dispatch pipeline by @rkayaith in #20410
- [DataTiling] Don't set encodings in dispatch.region ops with workgroup counts by @Max191 in #20387
- [GPU] Add canonicalize and cse after im2col decomposition by @nirvedhmeshram in #20414
- Adding flags to most HAL methods and extending existing ones to i64. by @benvanik in #20368
- [LLVMGPU] Support linalg.pack through LLVMGPUTileAndFuse by @Max191 in #20312
- Add HIP data-tiling resolver to e2e encoding.mlir tests. by @hanhanW in #20416
- [Codegen] Teach MaterializeEncodingIntoPadding about resolved layouts. by @hanhanW in #20393
- [Codegen] Allow folding collapse shape into partial store by @kuhar in #20417
- Bump actions/setup-python from 5.4.0 to 5.5.0 in the github-actions group by @dependabot in #20425
- [GPU] Enable e2e matmul tests for padding approach. by @hanhanW in #20426
- Integrate llvm-project@8244f8210f2e by @krzysz00 in #20404
- Only include googletest if IREE_BUILD_TESTS by @FantasqueX in #20413
- setup release callback for cuda buffers by @sgjzfzzf in #20419
- [GlobalOpt] Generalize named conv generalization by @rkayaith in #20424
- Integrate llvm-project@79487757b7 by @krzysz00 in #20434
- [Codegen] Work around for an SSA use-def violation issue with tile and fuse consumer by @MaheshRavishankar in #20430
- [Tooling] Improve descriptions and warnings in command line tools for npy file inputs by @Muzammiluddin-Syed-ECE in #20433
- [Dispatch Creation] Enable more reshape movement by @IanWood1 in #20320
- Add news links to SDXL MLPerf blog using IREE. by @ScottTodd in #20443
- Add link to LLVM Social Bangalore talk from Mahesh. by @ScottTodd in #20445
- [VectorDistribution] Add kernel config for single reduction by @pashu123 in #20172
- Integrate llvm-porject@4b67c53e206 by @krzysz00 in #20449
- [VectorExt] Add
iree_vector_ext.transfer_gather
operation by @Groverkss in #20442 - [GPU] Update the cache info for padding resolver in cloneWithSimplifiedConfig. by @hanhanW in #20371
- [LLVMGPU] Set thread tile sizes to 1 when reshapes are present by @Max191 in #20455
- Support
TRACY_MANUAL_LIFETIME=ON
by @rkayaith in #20448 - [DT] Carefully cleanup the IRs before and after the encoding fusion. by @hanhanW in #20435
- [Flow] Support dispatch tracing when encodings are present by @Max191 in #20447
- [NFC][LinalgExt] Remove unused scatter member fns by @IanWood1 in #20457
- Migrating regression suite to iree-test-suite by @geomin12 in #20194
- [DataTiling] Use late materialization for e2e GPU matmul tests by @Max191 in #20458
- Fix im2col conversion for no batch case by @nirvedhmeshram in #20467
- [Encoding][NFC] Trim dependencies for Encoding dialect. by @hanhanW in #20469
- Do not unconditionally build runtime demo by @FantasqueX in #20380
- [VectorExt] Add canonicalizations for iree_vector_ext.transfer_gather by @Groverkss in #20454
- [Util] Allow inlining on cast_and_call by @qedawkins in #20406
- [Codegen][ROCDL] Add hacky pattern to swap setprio with mfma by @qedawkins in #20479
- [GlobalOpt] Generalize 1x1 group convolutions by @rkayaith in #20480
- Update googletest to latest main. by @ScottTodd in #20477
- [hip] Re-enable peering for compatible devices. by @AWoloszyn in #20481
- Teach PropagateDispatchSizeBounds about gpu.lane_id by @lpy in #20456
- [GPU] Add col_major optional attribute to MMAAttr by @qedawkins in #19860
- NFC: [Codegen][ROCDL] Cleanup swap pattern by @qedawkins in #20488
- Integrate llvm-project@bafa2f4442bc by @kuhar in #20492
- [DT] Migrate round_dims_to to iteration_sizes by @jtuyls in #20459
- add flux transformer spec by @PhaneeshB in #20478
- [Encoding] Introduce matmul_k encoding. by @hanhanW in #20484
- Update torch-mlir to llvm/torch-mlir@11d0853 by @rkayaith in #20490
- Integrate llvm-project@bb1f32ded0b7 by @kuhar in #20494
- [Encoding][NFC] Fix boundary comments for the dialect name. by @hanhanW in #20501
- [Preprocessing] Add op for matching dimension bounds by @qedawkins in #20502
- Fix typos. NFC. by @kuhar in #20503
- Integrate llvm/llvm-project@36cb81cced6c by @kuhar in #20506
- Integrate llvm/llvm-project@ef1088f70356 by @kuhar in #20512
- [tuner] expose python binding for getting the tuner root ops by @bangtianliu in #20438
- Adding a memref linearizer pass by @lialan in #20335
- [Dispatch Creation] Fix reshapes created by collapse dimensions by @IanWood1 in #20515
- Add tip about
TdrValue
to GPU debugging playbook by @giacs-epic in #20516 - [DT] Split out iree_encoding.layout from EncodingAttr. by @hanhanW in #20507
- [LinalgExt] Share FFT rewriting between Torch and StableHLO by @giacs-epic in #19226
- Introduce IREE_USE_SYSTEM_DEPS by @FantasqueX in #20471
- Integrate llvm/llvm-project@2f41fa387d67 by @kuhar in #20519
- Stop forcing approximation of
math.erf
with--iree-codegen-gpu-native-math-precision
by @bjacob in #20074 - [Encoding] Implement SerializableEncodingAttrInterface for MatmulK. by @hanhanW in #20521
- [Encoding] Implement ContractionEncodingAttrInterface. by @hanhanW in #20514
- Move LinalgExt::SortOp to LLVMGPUTileAndFuse pipeline (#20358) by @Muzammiluddin-Syed-ECE in #20411
- [Codegen] For op canonicalization: generalize to multiple users by @newling in #20444
- Make
iree-codegen-gpu-native-math-precision
a NOP and warn of imminent removal. by @bjacob in #20523 - Always fuse encodings into dispatch region when the encoding is MatmulK. by @hanhanW in #20527
- [Encoding][NFC] Use assemblyFormat for testing encodings. by @hanhanW in #20520
- [DT] Add a flag to not hoist encodings when the source is ConstExpr. by @hanhanW in #20526
- Integrate llvm/llvm-project@32c39092eab3 by @kuhar in #20533
- [Encoding] Remove padFactor in set encoding pass by @pashu123 in #20532
- Drop manual bf16 handling (currently just in LLVMGPU) by @krzysz00 in #20313
- Properly set the upper bound of gpu.lane_id when rewriteForallToLanes. by @lpy in #20513
- [Encoding] Implement matmul_k encoding propagation across reshapes. by @hanhanW in #20367
- [LinalgExt] Add gather operation (1/5) by @IanWood1 in #20460
- [LinalgExt] Add TilingInterface support to GatherOp (2/5) by @IanWood1 in #20462
- [Codegen] Add pass for interpreting transform specs on lowering configs by @qedawkins in #20408
- [DataTiling] Add matmul_k option to SetEncoding pass. by @pashu123 in #20529
- Llama 8b f16 regression tests using random weights and inputs by @aviator19941 in #20487
- [Im2col] Add input-filter permutation info to im2col metadata by @yzhang93 in #20531
- Update torch-mlir to llvm/torch-mlir@9f2ba5a by @AmosLewis in #20545
- [LinalgExt] Support converting gather to loops (3/5) by @IanWood1 in #20464
- Add e2e tests for gather (4/5) by @IanWood1 in #20465
- Integrate llvm/llvm-project@2271f0bebd48 by @nithinsubbiah in #20556
- [Dispatch Creation] Remove collapse softmax-like check by @IanWood1 in #20544
- [NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs by @krzysz00 in #20561
- [DispatchCreation] Hoist leaf expand_shapes out before collapsing dimensions by @nirvedhmeshram in #20552
- [GPU] Extend Vector distribute to multiple reductions by @pashu123 in #20310
- [NFC] Refactor FlattenMemRefPass. by @lialan in #20522
- Integrate llvm/llvm-project@52e3f3d by @nithinsubbiah in #20557
- [IGEMM] Refactor the logic to generate input indexing map by @yzhang93 in #20562
- Integrate llvm/llvm-project@52e10e6 by @nithinsubbiah in #20572
- [Dispatch Creation] Disable broken reshape fusion by collapsing by @IanWood1 in #20576
- Integrate llvm/llvm-project@9ed4c705ac1c by @nithinsubbiah in #20577
- Enable tensor.pad lowering via buffer load with bounds check by @jerryyin in #20357
- [Codegen] Add op for swizzling memory accesses by @qedawkins in #20547
- BlockDynamicDimensions: Fold remaining reshapes with bindings by @IanWood1 in #20580
- Add ValueBoundInterface to flow.dispatch.workload.ordinal by @qedawkins in #20582
- Integrate llvm/llvm-project@f0a59c4 by @raikonenfnu in #20584
- [BlockDynamicDimensions] Add tensor bubbling patterns by @IanWood1 in #20588
- Bail out the split reduction if the mamtul has encodings. by @hanhanW in #20415
- [Codegen] Update warp reduction config for multiple reduction by @pashu123 in #20585
- [Codegen] Use GPUPadLayoutAttr to resolve layouts. by @hanhanW in #20565
- [CPU][NFC] Refresh debugging log with LDBG. by @hanhanW in #20592
- Migrate Flow dependencies in Stream/Codegen to TensorExt by @jtuyls in #20564
- [Codegen] Add option to expand subview metadata by @qedawkins in #20591
- [hip] Adding iree_hal_semaphore_import/export_timepoint API. by @AWoloszyn in #20560
- [DispatchCreation] Disable rope fusion with attention. by @MaheshRavishankar in #20605
- Bump torch-mlir to 3a85fa88be75fa13cb015bff4e6df13ab44220c8 by @jinchen62 in #20606
- [AMDGPU] Reorder MFMAs to put newer / wider ones first by @krzysz00 in #20566
- [GPU] Use upstream barrier elimination pass by @krzysz00 in #20534
- Integrate llvm/llvm-project@e87aa0c by @raikonenfnu in #20613
- [Codegen] Add LICM for linalg.generic ops by @Groverkss in #20608
- [VectorExt] Add generic vectorization to iree_vector_ext.transfer_gather by @Groverkss in #20476
- [Im2Col] Support converting group convs to im2col by @rkayaith in #20611
- [TensorExt][NFC] Remove dependency on Encoding through ExternalModel by @jtuyls in #20612
- [Preprocessing] Use value bounds for dim multiple checking by @qedawkins in #20583
- [Codegen] Remove dependencies on Flow by @jtuyls in #20615
- Regenerate API exports after
iree_input
removal by @AGindinson in #20590 - [CodeGen] Clean up the IRs within the MaterializeEncoding pass. by @hanhanW in #20625
- [GPU] Bail out in matmul TileAndFuse config for unaligned dynamic shapes by @nirvedhmeshram in #20622
- [GlobalOptimizations] Add a pass to simplify strided contraction-like ops by @zjgarvey in #20607
- Bump torch-mlir to bff2a99fd596766a8d85f6ebb9ef80044dcc6b81 by @jinchen62 in #20632
- Integrate llvm/llvm-project@0c61b24 by @raikonenfnu in #20630
- [Codegen] Add ops to translate between tensor and memref by @Max191 in #20619
- [Util] Fix intermittent crash in OptimizeIntArithmeticPass by @jtuyls in #20637
- Integrate llvm/llvm-project@60a1f5a by @raikonenfnu in #20641
- [DispatchCreation] Do not collapse dimensions if there are encoded operands by @hanhanW in #20621
- Integrate llvm/llvm-project@1c8e5e2 by @raikonenfnu in #20646
- [DispatchCreation] Add back dim resolution patterns in
BubbleUpExpandShapePass
. by @MaheshRavishankar in #20647 - [AMDGPU] Optimize masked transfer read in presence of fat raw buffers by @nirvedhmeshram in #20604
- Reduce verbosity of HAL translation failure errors by @rkayaith in #20614
- [build] Fix Bazel dependencies for shared library builds by @AGindinson in #20596
- [NFC] Restructure the lit tests for materialize encoding pass. by @hanhanW in #20629
- e2e matmul tests: log early the presence of a numerical error by @bjacob in #20638
- Remove include(CMakeParseArguments) by @FantasqueX in #20543
- Fix IREE_FILE_IO_ENABLE check by @FantasqueX in #20573
- [runtime] Fix missing zone names in GPU Tracy profiling by @renxida in #20297
- [DT][GPU] Retire experimental AMDGPU data-tiling flag. by @hanhanW in #20644
- Revert "[Dispatch Creation] Disable broken reshape fusion by collapsi… by @IanWood1 in #20653
- [Flow] Add folders for chained flow.tensor.transfer ops. by @Alex-Vasile in #20337
- Updating SHA for iree-test-suites by @geomin12 in #20656
- [Codegen] Clean up prints with
llvm::interleaved
. NFC. by @kuhar in #20649 - Structuring hal.executable.export workgroup count region. by @benvanik in #20659
- Replace loop with more efficient memcmp by @amd-vivekag in #20664
- Raise ValueError when output is not expected by @amd-vivekag in #20575
- Integrate llvm/llvm-project@5953f19 by @IanWood1 in #20657
- [docs] Add TensorExt dialect by @jtuyls in #20669
- Use static build path in local temp disk vs. shared by @amd-justchen in #20665
- [Codegen] Use isIdentityLayout instead of isNonZeroPadding by @jtuyls in #20666
- [LinalgExt] Enforce pure tensor or buffer semantics for LinalgExtOps by @Max191 in #20670
- [LinalgExt] Add map_scatter op and verifier by @Max191 in #20640
- Tolerate NaN == NaN in numerical checks. by @bjacob in #20677
- [Codegen] Support bufferization for load_from/store_to_memref ops by @Max191 in #20626
- Bump the github-actions group with 3 updates by @dependabot in #20658
- Update workflows to run on macOS 15 by @marbre in #20675
- [PJRT] Fix stablehlo attribute parameters for buffer transpose and broadcast by @PragmaTwice in #19488
- [runtime][python] Allow benchmark to accept file path, not just a VmModule by @sogartar in #19793
- [Encoding] Add interfaces for encoding propagation by @pashu123 in #20567
- Fixing hip-hal-driver.md typos. by @benvanik in #20683
Commit history: v3.3.0...v3.4.0