8000 Release Release v3.4.0 · iree-org/iree · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
8000

Release v3.4.0

Latest
Compare
Choose a tag to compare
@ScottTodd ScottTodd released this 05 May 16:42
· 136 commits to main since this release
v3.4.0
aacf93a

Announcements

--iree-hal-target-backends deprecation

The --iree-hal-target-backends option is now deprecated and some ways to specify target backends like llvm-cpu in the --iree-hal-target-device flag have been removed. See: #20295.

In practice, this means the following replacements:

Old Flag New Flag
--iree-hal-target-backends --iree-hal-target-devices
--iree-hal-target-device=llm-cpu --iree-hal-target-device=local --iree-hal-local-target-device-backends=llvm-cpu
--iree-hal-target-backend=rocm --iree-hal-target-device=hip

Migration of Flow dependencies in Stream/Codegen to new TensorExt dialect

This update migrates the following Flow ops and types to a new TensorExt dialect to avoid any Flow dependency in Codegen and reduce Flow dependencies in Stream:

  • IREE::Flow::DispatchWorkloadOrdinalOp
  • IREE::Flow::DispatchTensorLoadOp
  • IREE::Flow::DispatchTensorStoreOp
  • IREE::Flow::DispatchWorkgroupCountFromSliceOp
  • IREE::Flow::DispatchWorkgroupCountFromDagRootOp
  • IREE::Flow::DispatchTensorType

Instructions for migrating downstream codebases can be found at #20564 (comment).

Notable changes

Compiler

  • Dispatch creation process improvements including adding support for hoisting Flow ops #20346, skipping Pad decomposition if already in dispatch #20373, enabling more reshape movement #20320, allowing collapsing of more reduction operations #20544, and hoisting leaf expand_shapes out before collapsing dimensions #20552
  • Data tiling optimizations including adding support for parallel generic ops materialization patterns for GPU #20316, migrating round_to_dims to iteration_sizes #20459, splitting out iree_encoding.layout from EncodingAttr, as it was encoding more information than needed #20507
  • Completed the plumbing through of matmul_k encoding for padding approach (reduction dims) #20493
  • Enhancements to code generation including merging multiple transform.foreach_match operations found across different inner modules into a single consolidated operation inside a newly created top-level __kernel_config NamedSequenceOp #20127, adding pass for interpreting transform specs on lowering configs #20408, adding op for swizzling memory accesses #20547, updating the kernel configuration in the case of multiple reduction dimensions #20585, adding option to expand subview metadata #20591, removing dependencies on Flow #20615, adding ops to translate between tensor and memref #20619, and adding placeholder op for buffer casts on tensors #20589
  • Updates for AMDGPU support including adding placeholder op for buffer casts on tensors #20589, and optimizing masked transfer read in presence of fat raw buffers #20604
  • Linalg Extensions Dialect improvements (#19226, #20358, #20411, #20460, #20462, #20464, #20670, #20640)

Runtime

  • HIP runtime bug fixes and performance improvements including plumbing buffer allocation responsibility through the runtime and up into the lower-level stream dialect #20240, re-enabling peering for compatible devices #20481, adding iree_hal_semaphore_import/export_timepoint API #20560

Developer Tools

New Contributors

Full changelog

List of changes

Full list of changes: v3.3.0...v3.4.0

What's Changed

Commit history: v3.3.0...v3.4.0

0