Release v3.4.0

@FantasqueX

Announcements

`--iree-hal-target-backends` deprecation

The --iree-hal-target-backends option is now deprecated and some ways to specify target backends like llvm-cpu in the --iree-hal-target-device flag have been removed. See: #20295.

In practice, this means the following replacements:

Old Flag	New Flag
--iree-hal-target-backends	--iree-hal-target-devices
--iree-hal-target-device=llm-cpu	--iree-hal-target-device=local --iree-hal-local-target-device-backends=llvm-cpu
--iree-hal-target-backend=rocm	--iree-hal-target-device=hip

Migration of Flow dependencies in Stream/Codegen to new TensorExt dialect

This update migrates the following Flow ops and types to a new TensorExt dialect to avoid any Flow dependency in Codegen and reduce Flow dependencies in Stream:

IREE::Flow::DispatchWorkloadOrdinalOp
IREE::Flow::DispatchTensorLoadOp
IREE::Flow::DispatchTensorStoreOp
IREE::Flow::DispatchWorkgroupCountFromSliceOp
IREE::Flow::DispatchWorkgroupCountFromDagRootOp
IREE::Flow::DispatchTensorType

Instructions for migrating downstream codebases can be found at #20564 (comment).

Notable changes

Compiler

Dispatch creation process improvements including adding support for hoisting Flow ops #20346, skipping Pad decomposition if already in dispatch #20373, enabling more reshape movement #20320, allowing collapsing of more reduction operations #20544, and hoisting leaf expand_shapes out before collapsing dimensions #20552
Data tiling optimizations including adding support for parallel generic ops materialization patterns for GPU #20316, migrating round_to_dims to iteration_sizes #20459, splitting out iree_encoding.layout from EncodingAttr, as it was encoding more information than needed #20507
Completed the plumbing through of matmul_k encoding for padding approach (reduction dims) #20493
Enhancements to code generation including merging multiple transform.foreach_match operations found across different inner modules into a single consolidated operation inside a newly created top-level __kernel_config NamedSequenceOp #20127, adding pass for interpreting transform specs on lowering configs #20408, adding op for swizzling memory accesses #20547, updating the kernel configuration in the case of multiple reduction dimensions #20585, adding option to expand subview metadata #20591, removing dependencies on Flow #20615, adding ops to translate between tensor and memref #20619, and adding placeholder op for buffer casts on tensors #20589
Updates for AMDGPU support including adding placeholder op for buffer casts on tensors #20589, and optimizing masked transfer read in presence of fat raw buffers #20604
Linalg Extensions Dialect improvements (#19226, #20358, #20411, #20460, #20462, #20464, #20670, #20640)

Runtime

HIP runtime bug fixes and performance improvements including plumbing buffer allocation responsibility through the runtime and up into the lower-level stream dialect #20240, re-enabling peering for compatible devices #20481, adding iree_hal_semaphore_import/export_timepoint API #20560

Developer Tools

Updates to workflow CI and test suites (#20319, #20327, #20306, #20329, #20295, #20351, #20426, #20465, #20629)

New Contributors

@FantasqueX made their first contribution in #20270
@Muzammiluddin-Syed-ECE made their first contribution in #20433
@geomin12 made their first contribution in #20194
@lpy made their first contribution in #20456
@giacs-epic made their first contribution in #20516
@Alex-Vasile made their first contribution in #20337
@amd-justchen made their first contribution in #20665

Full changelog

List of changes

Full list of changes: v3.3.0...v3.4.0

What's Changed

Moving func-to-util patterns to Util/Conversion/. by @benvanik in #20262
Delete flaky gisel_abort test. by @ScottTodd in #20319
Skip flaky test_top_k onnx op test on rdna3. by @ScottTodd in #20327
Run windows_x64_msvc on postsubmit and opt-in on presubmit. by @ScottTodd in #20306
[DispatchCreation] Delete pad->set_encoding fusion logic. by @hanhanW in #20324
Fix asm parsing for Util_PreprocessingPipelineAttr by @zjgarvey in #20314
[DataTiling][Codegen] Use iree.encoding.resolver to decide encoding resolvers by @Max191 in #20271
[Encoding][NFC] Use testing_encoding<> in lit tests when possible. by @hanhanW in #20329
[Codegen] Keep lowering config when decomposing linalg.pack by @Max191 in #20311
[Encoding] Delete the allocation support of roundDimsTo field. by @hanhanW in #20332
[Flow][Stream][NFC] Use parser.parseArgument() for custom parsers. by @hanhanW in #20333
Replacing use of iree-hal-target-backends in most tests. by @benvanik in #20295
Extend IGEMM utils to support conv2d_chwn_chwf ops by @yzhang93 in #20330
Limit concurrency of ci_windows_x64_msvc. by @ScottTodd in #20338
[DT] Add more folders to MaterializeEncoding pass cleanup patterns. by @hanhanW in #20334
Integrate llvm-project@857a04cd by @qedawkins in #20336
Revert recent Windows CI changes, moving back to nightly only. by @ScottTodd in #20342
[Stream] Add support for materializing stream.tensor.encode ops. by @hanhanW in #20321
[Codegen][Tuner] merge the default td specs by @bangtianliu in #20127
[GlobalOpt] Add option to propagate transposes through conv by @nirvedhmeshram in #20339
[DispatchCreation] Convert top-level encoding ops to flow.tensor.encode. by @hanhanW in #20300
Do not wrap a single set_encoding op into flow.dispatch.region. by @hanhanW in #20322
Add pass to propagate tranposes pass when making single dispatch from function by @nirvedhmeshram in #20348
Avoid nested fashion traversal in ModuleOp when possible. by @hanhanW in #20344
Only fuse gather-like producer for query operand of attention ops. by @MaheshRavishankar in #19829
Mark tensorflow_hub_import.ipynb expected to pass. by @ScottTodd in #20356
Fixing iree.build target flags, I think. by @benvanik in #20355
Bump version to 3.4.0 after 3.3.0 release. by @ScottTodd in #20360
Avoid reenabling ASM by @FantasqueX in #20270
Only build elf_module_test when IREE_BUILD_TESTS is ON by @FantasqueX in #20351
[Dispatch Creation] Allow cloning attention k & v bit-extend by @IanWood1 in #20364
Move sharktank and regression-test MI300 jobs to the ossci cluster. by @Eliasj42 in #20359
Mark e2e testonly by @FantasqueX in #20370
[DispatchCreation] Add support for hoisting Flow ops. by @hanhanW in #20346
Bump the github-actions group with 3 updates by @dependabot in #20366
Mark flags_demo testonly by @FantasqueX in #20369
[DispatchCreation] Skip Pad decomposition within dispatches. by @MaheshRavishankar in #20373
Do not wrap single unset_encoding op into dispatch region. by @hanhanW in #20377
[python] Do not conflate hip and amdgpu targets in iree.build by @sogartar in #20372
[DT] Add parallel generic op materialization pattern for GPU by @jtuyls in #20316
Do not set CMAKE_CXX_STANDARD as CACHE by @FantasqueX in #20261
Plumbing buffer allocation responsibility through the runtime and up into the lower-level stream dialect. by @benvanik in #20240
Only pin affinity during analysis if an affinity is specified. by @benvanik in #20382
Integrate LLVM at 94783a8199c5e589d8efd6d4530482d72bf98f4d by @andfau-amd in #20375
Revert "Integrate LLVM at 94783a8199c5e589d8efd6d4530482d72bf98f4d" by @andfau-amd in #20399
Add passes to do more preprocessing when creating single dispatch by @nirvedhmeshram in #20394
IREE_BUILD_COMPILER doesn't require nanobind by @FantasqueX in #20403
Use a better way to determine C11 support by @FantasqueX in #20384
[Codegen][GPU] Allow pre-distributed multi_mma ops in distribution by @qedawkins in #20407
[Util] Add transform op for getting nearby symbols by @qedawkins in #20405
Do more aggressive reshape propagation in single-dispatch pipeline by @rkayaith in #20410
[DataTiling] Don't set encodings in dispatch.region ops with workgroup counts by @Max191 in #20387
[GPU] Add canonicalize and cse after im2col decomposition by @nirvedhmeshram in #20414
Adding flags to most HAL methods and extending existing ones to i64. by @benvanik in #20368
[LLVMGPU] Support linalg.pack through LLVMGPUTileAndFuse by @Max191 in #20312
Add HIP data-tiling resolver to e2e encoding.mlir tests. by @hanhanW in #20416
[Codegen] Teach MaterializeEncodingIntoPadding about resolved layouts. by @hanhanW in #20393
[Codegen] Allow folding collapse shape into partial store by @kuhar in #20417
Bump actions/setup-python from 5.4.0 to 5.5.0 in the github-actions group by @dependabot in #20425
[GPU] Enable e2e matmul tests for padding approach. by @hanhanW in #20426
Integrate llvm-project@8244f8210f2e by @krzysz00 in #20404
Only include googletest if IREE_BUILD_TESTS by @FantasqueX in #20413
setup release callback for cuda buffers by @sgjzfzzf in #20419
[GlobalOpt] Generalize named conv generalization by @rkayaith in #20424
Integrate llvm-project@79487757b7 by @krzysz00 in #20434
[Codegen] Work around for an SSA use-def violation issue with tile and fuse consumer by @MaheshRavishankar in #20430
[Tooling] Improve descriptions and warnings in command line tools for npy file inputs by @Muzammiluddin-Syed-ECE in #20433
[Dispatch Creation] Enable more reshape movement by @IanWood1 in #20320
Add news links to SDXL MLPerf blog using IREE. by @ScottTodd in #20443
Add link to LLVM Social Bangalore talk from Mahesh. by @ScottTodd in #20445
[VectorDistribution] Add kernel config for single reduction by @pashu123 in #20172
Integrate llvm-porject@4b67c53e206 by @krzysz00 in #20449
[VectorExt] Add iree_vector_ext.transfer_gather operation by @Groverkss in #20442
[GPU] Update the cache info for padding resolver in cloneWithSimplifiedConfig. by @hanhanW in #20371
[LLVMGPU] Set thread tile sizes to 1 when reshapes are present by @Max191 in #20455
Support TRACY_MANUAL_LIFETIME=ON by @rkayaith in #20448
[DT] Carefully cleanup the IRs before and after the encoding fusion. by @hanhanW in #20435
[Flow] Support dispatch tracing when encodings are present by @Max191 in #20447
[NFC][LinalgExt] Remove unused scatter member fns by @IanWood1 in #20457
Migrating regression suite to iree-test-suite by @geomin12 in #20194
[DataTiling] Use late materialization for e2e GPU matmul tests by @Max191 in #20458
Fix im2col conversion for no batch case by @nirvedhmeshram in #20467
[Encoding][NFC] Trim dependencies for Encoding dialect. by @hanhanW in #20469
Do not unconditionally build runtime demo by @FantasqueX in #20380
[VectorExt] Add canonicalizations for iree_vector_ext.transfer_gather by @Groverkss in #20454
[Util] Allow inlining on cast_and_call by @qedawkins in #20406
[Codegen][ROCDL] Add hacky pattern to swap setprio with mfma by @qedawkins in #20479
[GlobalOpt] Generalize 1x1 group convolutions by @rkayaith in #20480
Update googletest to latest main. by @ScottTodd in #20477
[hip] Re-enable peering for compatible devices. by @AWoloszyn in #20481
Teach PropagateDispatchSizeBounds about gpu.lane_id by @lpy in #20456
[GPU] Add col_major optional attribute to MMAAttr by @qedawkins in #19860
NFC: [Codegen][ROCDL] Cleanup swap pattern by @qedawkins in #20488
Integrate llvm-project@bafa2f4442bc by @kuhar in #20492
[DT] Migrate round_dims_to to iteration_sizes by @jtuyls in #20459
add flux transformer spec by @PhaneeshB in #20478
[Encoding] Introduce matmul_k encoding. by @hanhanW in #20484
Update torch-mlir to llvm/torch-mlir@11d0853 by @rkayaith in #20490
Integrate llvm-project@bb1f32ded0b7 by @kuhar in #20494
[Encoding][NFC] Fix boundary comments for the dialect name. by @hanhanW in #20501
[Preprocessing] Add op for matching dimension bounds by @qedawkins in #20502
Fix typos. NFC. by @kuhar in #20503
Integrate llvm/llvm-project@36cb81cced6c by @kuhar in #20506
Integrate llvm/llvm-project@ef1088f70356 by @kuhar in #20512
[tuner] expose python binding for getting the tuner root ops by @bangtianliu in #20438
Adding a memref linearizer pass by @lialan in #20335
[Dispatch Creation] Fix reshapes created by collapse dimensions by @IanWood1 in #20515
Add tip about TdrValue to GPU debugging playbook by @giacs-epic in #20516
[DT] Split out iree_encoding.layout from EncodingAttr. by @hanhanW in #20507
[LinalgExt] Share FFT rewriting between Torch and StableHLO by @giacs-epic in #19226
Introduce IREE_USE_SYSTEM_DEPS by @FantasqueX in #20471
Integrate llvm/llvm-project@2f41fa387d67 by @kuhar in #20519
Stop forcing approximation of math.erf with --iree-codegen-gpu-native-math-precision by @bjacob in #20074
[Encoding] Implement SerializableEncodingAttrInterface for MatmulK. by @hanhanW in #20521
[Encoding] Implement ContractionEncodingAttrInterface. by @hanhanW in #20514
Move LinalgExt::SortOp to LLVMGPUTileAndFuse pipeline (#20358) by @Muzammiluddin-Syed-ECE in #20411
[Codegen] For op canonicalization: generalize to multiple users by @newling in #20444
Make iree-codegen-gpu-native-math-precision a NOP and warn of imminent removal. by @bjacob in #20523
Always fuse encodings into dispatch region when the encoding is MatmulK. by @hanhanW in #20527
[Encoding][NFC] Use assemblyFormat for testing encodings. by @hanhanW in #20520
[DT] Add a flag to not hoist encodings when the source is ConstExpr. by @hanhanW in #20526
Integrate llvm/llvm-project@32c39092eab3 by @kuhar in #20533
[Encoding] Remove padFactor in set encoding pass by @pashu123 in #20532
Drop manual bf16 handling (currently just in LLVMGPU) by @krzysz00 in #20313
Properly set the upper bound of gpu.lane_id when rewriteForallToLanes. by @lpy in #20513
[Encoding] Implement matmul_k encoding propagation across reshapes. by @hanhanW in #20367
[LinalgExt] Add gather operation (1/5) by @IanWood1 in #20460
[LinalgExt] Add TilingInterface support to GatherOp (2/5) by @IanWood1 in #20462
[Codegen] Add pass for interpreting transform specs on lowering configs by @qedawkins in #20408
[DataTiling] Add matmul_k option to SetEncoding pass. by @pashu123 in #20529
Llama 8b f16 regression tests using random weights and inputs by @aviator19941 in #20487
[Im2col] Add input-filter permutation info to im2col metadata by @yzhang93 in #20531
Update torch-mlir to llvm/torch-mlir@9f2ba5a by @AmosLewis in #20545
[LinalgExt] Support converting gather to loops (3/5) by @IanWood1 in #20464
Add e2e tests for gather (4/5) by @IanWood1 in #20465
Integrate llvm/llvm-project@2271f0bebd48 by @nithinsubbiah in #20556
[Dispatch Creation] Remove collapse softmax-like check by @IanWood1 in #20544
[NFC][AMDGPU] Refactor getSingleSubgroupLayout() for MFMAs by @krzysz00 in #20561
[DispatchCreation] Hoist leaf expand_shapes out before collapsing dimensions by @nirvedhmeshram in #20552
[GPU] Extend Vector distribute to multiple reductions by @pashu123 in #20310
[NFC] Refactor FlattenMemRefPass. by @lialan in #20522
Integrate llvm/llvm-project@52e3f3d by @nithinsubbiah in #20557
[IGEMM] Refactor the logic to generate input indexing map by @yzhang93 in #20562
Integrate llvm/llvm-project@52e10e6 by @nithinsubbiah in #20572
[Dispatch Creation] Disable broken reshape fusion by collapsing by @IanWood1 in #20576
Integrate llvm/llvm-project@9ed4c705ac1c by @nithinsubbiah in #20577
Enable tensor.pad lowering via buffer load with bounds check by @jerryyin in #20357
[Codegen] Add op for swizzling memory accesses by @qedawkins in #20547
BlockDynamicDimensions: Fold remaining reshapes with bindings by @IanWood1 in #20580
Add ValueBoundInterface to flow.dispatch.workload.ordinal by @qedawkins in #20582
Integrate llvm/llvm-project@f0a59c4 by @raikonenfnu in #20584
[BlockDynamicDimensions] Add tensor bubbling patterns by @IanWood1 in #20588
Bail out the split reduction if the mamtul has encodings. by @hanhanW in #20415
[Codegen] Update warp reduction config for multiple reduction by @pashu123 in #20585
[Codegen] Use GPUPadLayoutAttr to resolve layouts. by @hanhanW in #20565
[CPU][NFC] Refresh debugging log with LDBG. by @hanhanW in #20592
Migrate Flow dependencies in Stream/Codegen to TensorExt by @jtuyls in #20564
[Codegen] Add option to expand subview metadata by @qedawkins in #20591
[hip] Adding iree_hal_semaphore_import/export_timepoint API. by @AWoloszyn in #20560
[DispatchCreation] Disable rope fusion with attention. by @MaheshRavishankar in #20605
Bump torch-mlir to 3a85fa88be75fa13cb015bff4e6df13ab44220c8 by @jinchen62 in #20606
[AMDGPU] Reorder MFMAs to put newer / wider ones first by @krzysz00 in #20566
[GPU] Use upstream barrier elimination pass by @krzysz00 in #20534
Integrate llvm/llvm-project@e87aa0c by @raikonenfnu in #20613
[Codegen] Add LICM for linalg.generic ops by @Groverkss in #20608
[VectorExt] Add generic vectorization to iree_vector_ext.transfer_gather by @Groverkss in #20476
[Im2Col] Support converting group convs to im2col by @rkayaith in #20611
[TensorExt][NFC] Remove dependency on Encoding through ExternalModel by @jtuyls in #20612
[Preprocessing] Use value bounds for dim multiple checking by @qedawkins in #20583
[Codegen] Remove dependencies on Flow by @jtuyls in #20615
Regenerate API exports after iree_input removal by @AGindinson in #20590
[CodeGen] Clean up the IRs within the MaterializeEncoding pass. by @hanhanW in #20625
[GPU] Bail out in matmul TileAndFuse config for unaligned dynamic shapes by @nirvedhmeshram in #20622
[GlobalOptimizations] Add a pass to simplify strided contraction-like ops by @zjgarvey in #20607
Bump torch-mlir to bff2a99fd596766a8d85f6ebb9ef80044dcc6b81 by @jinchen62 in #20632
Integrate llvm/llvm-project@0c61b24 by @raikonenfnu in #20630
[Codegen] Add ops to translate between tensor and memref by @Max191 in #20619
[Util] Fix intermittent crash in OptimizeIntArithmeticPass by @jtuyls in #20637
Integrate llvm/llvm-project@60a1f5a by @raikonenfnu in #20641
[DispatchCreation] Do not collapse dimensions if there are encoded operands by @hanhanW in #20621
Integrate llvm/llvm-project@1c8e5e2 by @raikonenfnu in #20646
[DispatchCreation] Add back dim resolution patterns in BubbleUpExpandShapePass. by @MaheshRavishankar in #20647
[AMDGPU] Optimize masked transfer read in presence of fat raw buffers by @nirvedhmeshram in #20604
Reduce verbosity of HAL translation failure errors by @rkayaith in #20614
[build] Fix Bazel dependencies for shared library builds by @AGindinson in #20596
[NFC] Restructure the lit tests for materialize encoding pass. by @hanhanW in #20629
e2e matmul tests: log early the presence of a numerical error by @bjacob in #20638
Remove include(CMakeParseArguments) by @FantasqueX in #20543
Fix IREE_FILE_IO_ENABLE check by @FantasqueX in #20573
[runtime] Fix missing zone names in GPU Tracy profiling by @renxida in #20297
[DT][GPU] Retire experimental AMDGPU data-tiling flag. by @hanhanW in #20644
Revert "[Dispatch Creation] Disable broken reshape fusion by collapsi… by @IanWood1 in #20653
[Flow] Add folders for chained flow.tensor.transfer ops. by @Alex-Vasile in #20337
Updating SHA for iree-test-suites by @geomin12 in #20656
[Codegen] Clean up prints with llvm::interleaved. NFC. by @kuhar in #20649
Structuring hal.executable.export workgroup count region. by @benvanik in #20659
Replace loop with more efficient memcmp by @amd-vivekag in #20664
Raise ValueError when output is not expected by @amd-vivekag in #20575
Integrate llvm/llvm-project@5953f19 by @IanWood1 in #20657
[docs] Add TensorExt dialect by @jtuyls in #20669
Use static build path in local temp disk vs. shared by @amd-justchen in #20665
[Codegen] Use isIdentityLayout instead of isNonZeroPadding by @jtuyls in #20666
[LinalgExt] Enforce pure tensor or buffer semantics for LinalgExtOps by @Max191 in #20670
[LinalgExt] Add map_scatter op and verifier by @Max191 in #20640
Tolerate NaN == NaN in numerical checks. by @bjacob in #20677
[Codegen] Support bufferization for load_from/store_to_memref ops by @Max191 in #20626
Bump the github-actions group with 3 updates by @dependabot in #20658
Update workflows to run on macOS 15 by @marbre in #20675
[PJRT] Fix stablehlo attribute parameters for buffer transpose and broadcast by @PragmaTwice in #19488
[runtime][python] Allow benchmark to accept file path, not just a VmModule by @sogartar in #19793
[Encoding] Add interfaces for encoding propagation by @pashu123 in #20567
Fixing hip-hal-driver.md typos. by @benvanik in #20683

Commit history: v3.3.0...v3.4.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v3.4.0

Announcements

`--iree-hal-target-backends` deprecation

Migration of Flow dependencies in Stream/Codegen to new TensorExt dialect

Notable changes

Compiler

Runtime

Developer Tools

New Contributors

Full changelog

What's Changed

Contributors

Uh oh!

Release v3.4.0

Announcements

--iree-hal-target-backends deprecation

Migration of Flow dependencies in Stream/Codegen to new TensorExt dialect

Notable changes

Compiler

Runtime

Developer Tools

New Contributors

Full changelog

What's Changed

Contributors

Uh oh!

`--iree-hal-target-backends` deprecation