Pulse · iree-org/iree · GitHub

10000 Pulse · iree-org/iree · GitHub

More Web Proxy on the site http://driver.im/

May 18, 2025 – May 25, 2025

Overview

39 Active pull requests

44 Active issues

7 Releases published by 1 person

iree-3.5.0rc20250519 iree candidate iree-3.5.0rc20250519
published May 19, 2025
iree-3.5.0rc20250520 iree candidate iree-3.5.0rc20250520
published May 20, 2025
iree-3.5.0rc20250521 iree candidate iree-3.5.0rc20250521
published May 21, 2025
iree-3.5.0rc20250522 iree candidate iree-3.5.0rc20250522
published May 22, 2025
iree-3.5.0rc20250523 iree candidate iree-3.5.0rc20250523
published May 23, 2025
iree-3.5.0rc20250524 iree candidate iree-3.5.0rc20250524
published May 24, 2025
iree-3.5.0rc20250525 iree candidate iree-3.5.0rc20250525
published May 25, 2025

32 Pull requests merged by 13 people

[Codegen][NFC] Make namespace usage follow IREE::[Encoding|Codegen].
#20894 merged May 23, 2025
Fix Link error when IREECompiler.lib hits 4GiB
#20892 merged May 22, 2025
Adding support for #hal.device.optimal<...> through to runtime.
#20879 merged May 22, 2025
Adding tryLookupResourceUsageAffinity.
#20891 merged May 22, 2025
Temporary automatic reference counting(ish) pass for inserting async deallocations.
#20765 merged May 22, 2025
[Dispatch Creation] Fix GatherFusionPattern crash
#20887 merged May 22, 2025
[Flow] Dump affinity info in DumpDispatchGraph pass.
#20888 merged May 22, 2025
[Codegen] Add patterns to fold reshapes into load_from/store_to_memref
#20881 merged May 22, 2025
[Preprocessing] Add SinkReshapesPass in MakeSingleDispatchPassPipeline
#20882 merged May 22, 2025
[LinalgExt] Fold unit dims for iree_linalg_ext.gather
#20877 merged May 22, 2025
[NFC][Codegen] Rename early bufferization op operands
#20874 merged May 21, 2025
[LinalgExt] Canonicalize gather to an extract_slice
#20878 merged May 21, 2025
Align iree_hal_sync_device_t allocation to 16 bytes.
#20773 merged May 21, 2025
[VectorExt] Fix illegal transfer_read during gather vectorization
#20876 merged May 21, 2025
[Codegen] Add ukernel support for argmax on BF16 and enable optional max value return
#20768 merged May 21, 2025
[Codegen] Support multiple forall ops in ReconcileTranslationInfo
#20848 merged May 21, 2025
[AMDGPU] Rewrite some gpu.shuffle xor to ds_swizzle, per upstream
#20868 merged May 21, 2025
[Encoding][NFC] Move Encoding Utils out from IR definition.
#20871 merged May 21, 2025
[GPU] Vector distribution support for multiple stores
#20816 merged May 21, 2025
[CodeGen] Fix a MemoryEffectsOpInterface bug in FuseConsumerOp.
#20869 merged May 20, 2025
Fix padding to nop encoding specialization
#20837 merged May 20, 2025
[Codegen][GPU] Support padding in CombineLayoutTransformation
#20797 merged May 20, 2025
[LinalgExt] Add map_scatter e2e tests for CPU and VMVX backends.
#20861 merged May 20, 2025
[Dispatch Creation] Clone iree_linalg_ext.gather for attn
#20866 merged May 20, 2025
[VectorExt] Vectorize iree_linalg_ext.gather
#20807 merged May 20, 2025
[Codegen] Fix dominance issue in collapse shape fusion
#20864 merged May 20, 2025
[VectorExt] Fix transfer_gather printer
#20860 merged May 20, 2025
[Dispatch Creation] Handle linalg.fill in collapse dimensions
#20863 merged May 19, 2025
Fix logic for yieldReplacements in tileDispatchUsingForall
#20844 merged May 19, 2025
[LinalgExt] Clone iree_linalg_ext.gather (5/5)
#20563 merged May 19, 2025
[TensorExt] Drop space from count_from_slice printer
#20850 merged May 19, 2025
[Codegen][NFC] Refresh remove_single_iteration_loop.mlir test.
#20842 merged May 18, 2025

7 Pull requests opened by 6 people

Integrate llvm-project@7a8090c037255b54895d61df2eb141fee48d6d83
#20873 opened May 21, 2025
Add support for dynamic unit trip scf.for to scf.if
#20880 opened May 21, 2025
Stream: Add topology attribute
#20885 opened May 22, 2025
[CPU] Disable scf.forall distribution by default.
#20893 opened May 22, 2025
Revert "[GPU] Vector distribution support for multiple stores (#20816)"
#20895 opened May 22, 2025
[NFC] Rename load_from/store_to_memref to load_from/store_to_buffer
#20897 opened May 23, 2025
[Codegen] Enable reshape into buffer folding in BlockDynamicDimensions
#20898 opened May 23, 2025

27 Issues closed by 7 people

llvmcpu_mobilenet_v3-large_uint8.run starts failing after bumping tf-nightly
#14830 closed May 22, 2025
[DT][GPU] Support codegen for pack/unpack fusion and mmt4d-like ops
#17720 closed May 22, 2025
[CPU] Inefficient img2col tensor kernel
#17413 closed May 22, 2025
[GPU][DT] Enable e2e tests for tensor.pack op on GPU
#17186 closed May 22, 2025
[GPU][DT] Enable e2e tests for tensor.unpack op on GPU
#17187 closed May 22, 2025
[GPU][DT] matmul microbenchmarks for GPU data-tiling path
#17189 closed May 22, 2025
Redundant buffer allocations in LinalgExt/Linalg ops when there are constants in outs
#9813 closed May 22, 2025
10000
CI - Windows x64 MSVC failing
#20763 closed May 22, 2025
[HAL] Add HAL allocator selection based on optimal affinity for a particular usage + API.
#20857 closed May 22, 2025
MLIR's memref.load/store should produce `nuw` and `inbounds` GEPs
#20483 closed May 22, 2025
Fix encoding dialect header/tablegen dependency.
#20681 closed May 21, 2025
[VectorExt] `transfer_gather` parser fails to round-trip valid operation emitted by printer
#20802 closed May 20, 2025
Correctness issue related to EmplaceAllocations
#19355 closed May 19, 2025
[Codegen][AMDGPU Backend] Correctness issue for conv_2d_ngchw_gfchw
#18798 closed May 19, 2025
[LLVMCPU] Data tiling of group quantized matmul on CPU
#14337 closed May 19, 2025
[Flow] Improve DetachElementwiseFromNamedOpPass to match codegen expectations
#12080 closed May 19, 2025
Not able to set atol in check tests
#5414 closed May 19, 2025
Util constant analysis should not depend on LinalgExt dialect
#14887 closed May 19, 2025
[Regression] The number of dispatches increases 12.28% after #14505
#14531 closed May 19, 2025
Implement a listener for non-pattern based passes
#12858 closed May 19, 2025
Missing documentation for build and test iree-dialects
#8361 closed May 19, 2025
Missing fusion for winograd transform ops with their consumers
#17487 closed May 19, 2025
Winograd transform generates bad memory accessing pattern for CPU
#17485 closed May 19, 2025
[CPU] Explore ukernels for winograd transform ops
#17491 closed May 19, 2025
[DT][Fusion] Move set encoding after forming dispatch region
#17718 closed May 19, 2025
[Stream] Add compile-time attr queries for affinity resource compatibility.
#20852 closed May 19, 2025
[GPU] Gather -> matmul fusion support
#18457 closed May 19, 2025

17 Issues opened by 10 people

[CPU] `transpose -> pack` folding pattern inhibits fusion
#20896 opened May 23, 2025
Add AMDGPU dialect ops for scaled fp conversions
#20890 opened May 22, 2025
[Codegen] m=1, k=2, n=1 Matmul fails to compile
#20889 opened May 22, 2025
How to Customize and Dynamically Adapt Tiling Strategy in IREE Based on Target Core Count
#20883 opened May 22, 2025
VAE compilation failure with aggressive fusion enabled
#20875 opened May 21, 2025
[CodeGen][SPIRV] Lowering for clustered reduce not implemented
#20872 opened May 21, 2025
Op verification regression in tensor-parallel toy Resnet block
#20870 opened May 20, 2025
build error when Generating check_cuda_ukernel_ukernel_example.mlir_module.vmfb
#20865 opened May 20, 2025
[LinalgExt] Split the op definition between pure ops and LinalgExt ops
#20862 opened May 19, 2025
[HIP] Support for `IREE_HAL_EXTERNAL_TIMEPOINT_TYPE_WAIT_PRIMITIVE`.
#20859 opened May 19, 2025
[HAL] Support external semaphores in local-task/local-sync implementations.
#20858 opened May 19, 2025
[HAL] Set HAL allocation usage bits (EXPORT/MAPPING/etc) based on affinities.
#20856 opened May 19, 2025
[Stream] Assign resource affinity based on usage (not just assigned execution affinity) with multiple potential affinities.
#20855 opened May 19, 2025
[Stream] Add a mechanism for denoting device-device link topology for transfer elision when NUMA.
#20854 opened May 19, 2025
[Stream] Elide transfers in the stream dialect when resources are accessible on participating affinities.
#20853 opened May 19, 2025
Implement initial heterogeneous support MVP for CPU (+ maybe something else).
#20851 opened May 19, 2025
[dlpack][runtime] Memory leak with dlpack capsules
#20849 opened May 19, 2025

28 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Lower `linalg.copy` to direct global load
#20568 commented on May 23, 2025 • 94 new comments
[Encoding] Teach specialize encodings to handle pad encodings.
#20845 commented on May 19, 2025 • 0 new comments
[Util] Move Attribute based pipeline to Util dialect.
#20832 commented on May 19, 2025 • 0 new comments
Preserve preprocessing passpipeline attribute during `WrapEntryPointsPass`.
#20818 commented on May 19, 2025 • 0 new comments
Export ExecutionEngine through python bindings
#20777 commented on May 24, 2025 • 0 new comments
[Codegen] Add pass for specializing executable variants
#20771 commented on May 19, 2025 • 0 new comments
[AMDGPU] Implement gpu.subgroup_reduce with DPP intrinsics on AMD GPUs
#20468 commented on May 22, 2025 • 0 new comments
[Dispatch] Only bubble reshapes when possibly blocking fusion
#20108 commented on May 19, 2025 • 0 new comments
error: failed to solve for affinity analysis on sharded toy llama model
#20436 commented on May 25, 2025 • 0 new comments
can't build on Mac Sequoia 15.4
#20472 commented on May 23, 2025 • 0 new comments
[CPU] TileRootAndFuseProducerConsumer causes redundant stack allocation
#20792 commented on May 22, 2025 • 0 new comments
[DT] Dependency graph of data-tiling fusion and GPU data-tiling issues
#17722 commented on May 22, 2025 • 0 new comments
[EPIC][GPU][DT] Bring up GPU data-tiling with reasonable performance
#17181 commented on May 22, 2025 • 0 new comments
Dependency graph of data-tiling issues
#17608 commented on May 22, 2025 • 0 new comments
[tosa] failed to legalize operation 'arith.extsi'
#19402 commented on May 22, 2025 • 0 new comments
[CPU] linalg.pack op is not fused in forall distribution
#20723 commented on May 22, 2025 • 0 new comments
[CPU] CPU backend produces unfused IR in mmt4d->unpack->elem dispatch
#20786 commented on May 21, 2025 • 0 new comments
[DataTiling][Codegen] Improve layout transformation codegen
#20530 commented on May 21, 2025 • 0 new comments
torch.aten.argmax kernel is slow
#20650 commented on May 21, 2025 • 0 new comments
[CPU] RoPE kernel producing incorrect results
#20824 commented on May 20, 2025 • 0 new comments
[Attention] Support for post-softmax fp8 scaling
#20823 commented on May 20, 2025 • 0 new comments
Avoid IREE internal allocation for "Bag of Ops" mode of deploying IREE kernels
#20811 commented on May 20, 2025 • 0 new comments
GPU codegen bug: incorrect results on matmul with 3D expanded LHS
#20843 commented on May 20, 2025 • 0 new comments
heterogeneous multi-device support
#20752 commented on May 20, 2025 • 0 new comments
Incorrect concatenation with both Turbine and ONNX compilation
#20819 commented on May 20, 2025 • 0 new comments
[CPU] Fail to vectorize ukernel + unpack
#20785 commented on May 19, 2025 • 0 new comments
`local-task` giving segmentation fault on `linalg.pack` when tile size is 512x32
#20595 commented on May 19, 2025 • 0 new comments
[Encoding] Revisit the need of EncodingNopLayout resolver
#20846 commented on May 19, 2025 • 0 new comments

0