Releases: apache/tvm
Apache TVM v0.20.0
Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress): Relax (especial PyTorch frontend), CUDA etc.
Please visit the full listing of commits for a complete view: v0.20.dev0...v0.20.0.rc0.
Community
None.
RFCs
None.
Adreno
- #17608 - [WINDOWS] Windows build dependencies for Adreno target
BugFix
- #17761 - [FIX][RELAX] fix fusion of transpose + matmul when constant weight
- #17762 - [Fix] Fix OpenCL header in attention utils
- #17711 - [Fix][dlight] add an explicit reduction loop check in Reduce
- #17697 - [Fix] Include
<chrono>
forstd::chrono
- #17677 - Declare build backend for python package
- #17598 - [TIR][FIX] update FlopEstimator to include missing nodes
- #17601 - [Flashinfer][Fix] fix missing args in flashinfer test
- #17607 - [FIX][TVMC] Fix the mixed precision conversion pipeline
CI
- #17687 - Update images to 20250226-223225-63bc315f
- #17680 - update images to 20250225-035137-aeadc31c
- #17675 - [skip ci]Update github tvmbot
- #17635 - Cleanup legacy files
- #17634 - [skip ci]Improve build time
- #17629 - [skip ci]Robustify CI for SPOT failure
- #17620 - Unpin pytest-profiling
- #17621 - [skip ci] Remove legacy CI runners protection
- #17619 - [Refactor]Remove legacy frontend tests
Dlight
- #17754 - Fix general reduction rule to support non-last reduction axis
- #17663 - [CPU] Add CPU Backend Support for GEMV Optimization
Docker
- #17691 - Fix ml_dtypes downgrade issue introduced by TensorFlow
- #17686 - Update ml_dtypes to 0.5.1+
- #17676 - Use Torch GPU on gpu device
- #17648 - Tensorflow (aka TFLite) upgrade to 2.18.0
- #17643 - Update ml_dtypes version
- #17638 - [skip ci]Update ml_dtypes version
- #17638 - [skip ci]Update ml_dtypes version
- #17617 - Tensorflow upgrade to 2.18.0
Docs
MetaSchedule
- #17104 - Adding post optimization in MetaSchedule to Improve Scheduling
OpenCL & CLML
- #17571 - [OPENCL][TEXTURE] Improved texture memory planning
Relax
- #17814 - [PyTorch] Add stack.default and sum.default to exported programs translator
- #17820 - [PyTorch] Add support for broadcast_to, narrow ops
- #17822 - [PyTorch] Cleanup tests for ExportedProgram frontend
- #17806 - [PyTorch] Add Softplus Op Support for Exported Program and FX graph
- #17817 - [PyTorch] Support dynamic shapes in ExportedProgram frontend
- #17813 - [PyTorch] Improve ExportedProgram frontend by supporting
unflatten.int
,hardtanh_.default
,dropout_.default
,silu_.default
,add_.Tensor
andrelu_.default
- #17812 - [PyTorch] Support argsort, topk ops for ExportedProgram importer
- #17810 - [PyTorch] Add support for argsort, sort, topk ops
- #17809 - [PyTorch] Delete duplicate converter function
_to
- #17807 - [PyTorch] Fix torch 2.6 compatibility issues
- #17797 - [Pytorch] Update SELU Implementation Using Decomposed Core-Level Ops
- #17802 - [Pytorch] support for arange in exported programs translator
- #17801 - [PyTorch] Support where, cumprod and reciprocal ops for ExportedProgram importer
- #17790 - [PyTorch] Add support for index_select
- #17786 - [PyTorch] Support softshrink op for ExportedProgram
- #17788 - [PyTorch] Add support for where, cumprod and reciprocal ops
- #17785 - [PyTorch] Support prod, std and var ops for ExportedProgram importer
- #17778 - [PyTorch] Support log2, log10 and log1p ops for ExportedProgram importer
- #17772 - [PyTorch] Add support for prod, std and var ops
- #17766 - [PyTorch] Add support for log2, log10 and log1p ops
- #17760 - [PyTorch] Add support for lerp, select and clone ops
- #17751 - [PyTorch] Support one_hot, empty_like ops for ExportedProgram importer
- #17747 - [PyTorch] Support flip, gather, take ops for ExportedProgram importer
- #17738 - [PyTorch] Support elu, celu, selu ops for ExportedProgram importer
- #17726 - [PyTorch] Add support for numel, empty_like and one_hot ops
- #17707 - [PyTorch] Add support for gather, flip and take ops
- #17702 - [PyTorch] Add support for celu, selu, is_floating_point ops
- #17694 - [PyTorch] Add support for elu, hardtanh ops
- #17689 - [PyTorch] Support several binary ops for ExportedProgram importer
- #17672 - [PyTorch] Refactor binary ops tests
- #17679 - [PyTorch] Support several unary ops for ExportedProgram importer
- #17668 - [PyTorch] Add support for and_, lshift, min, or_, rshift, xor ops
- #17664 - [PyTorch] Add support for ge, gt, le, mod, ne ops
- #17659 - [PyTorch] Add support for bitwise_not, isfinite, isinf, isnan, logical_not, sign and square ops
- #17622 - [PyTorch] Add support for abs, ceil, erf, floor, log ops and refactor unary tests
- #17566 - [ONNX] Add prim experssion support to Neg converter and update Arange converter to use relax.op.arange
- #17642 - [ONNX]replace topi.split with relax.op.split in the onnx frontend
- #17674 - [KVCache] PagedKVCache refactor, FlashInfer JIT and MLA integration
- #17618 - [KVCache] TIR attention kernel support for MLA
- #17615 - [KVCache] Add KV Cache for CPU Runtime
- #17616 - [Runtime][KVCache] Initial interface setup for MLA
- #17782 - [Frontend] Support max/min in frontend op interface
- #17758 - Allow ingesting tensor.chunk() from exported torch program
- #17781 - Enable bfloat16 for softmax struct-info inference
- #17752 - Batch norm correctness on eval mode
- #17774 - check for tensor_meta in exported_program_translator
- #17757 - Tensor.split with uneven tensors
- #17749 - Move TIR backend to gpu_generic
- #17725 - Ingest Tensor.clamp from torch export
- #17724 - Add support to ingest Tensor.expand_as()
- #17723 - Add torch exported program ingestion capability for Tensor.detach(), Tensor.copy_, and aten.lift_fresh_copy
- #17721 - Allow ingesting Upsample module from torch.export either using Size or Scale Factor argument
- [#17722](https://g...
Apache TVM v0.19.0
Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress): Relax, OpenCL, MetaSchedule.
Please visit the full listing of commits for a complete view: v0.19.dev0...v0.19.0.rc0.
Community
None.
RFCs
None.
Arith
- #17469 - [LLVM]Presburger compile fix for MLIR/LLVM 19.x
BugFix
- #17595 - [Fix][KVCache] Fix incorrect tile size calculation
- #17549 - [FIX][LLVM] Workaround -mcpu=apple-latest for llvm above 18.0 (#17492)
- #17537 - [FIX][topi.scatter_nd] fixed shape equality assert by using analyzer to prove equality
- #17502 - [FIX][TOPI][strided_slice] Fix topi.strided_slice output shape
- #17505 - [RELAX][ONNX][FIX] add a parser to handle expression in the shape dim names
- #17490 - [FIX][ONNX][RELAX] Add support for dynamic ShapeExpr in Slice, Squeeze and Flatten
- #17467 - [FIX][RELAX][ONNX] Fix typo in onnx frontend
CI
- #17596 - [Test] Skip flaky test to unblock CI
- #17451 - Upgrade CI image to
20241105-030952-3e386fd3
- #17534 - Upgrade zephyr-sdk to 0.16.9
- #17503 - Upgrade
>
- #17485 - Revert jax, keras, tensorflow, and tflite upgrades introduced #17425
- #17470 - Pin cpplint==1.6.1
Docs
- #17518 - Few fixes for broken Adreno docs
- #17527 - Fix typo in TensorIR
- #17528 - Fix Typo in Debugging TVM
LLVM
MetaSchedule
- #17465 - Fix a multilevel tiling error on dynamic relax workload
OpenCL & CLML
- #17516 - [RUNTIME][CLML] Dynamic backward compatibility
- #17519 - [OPENCL][ADRENO] Introduce Qualcomm extension support
- #17517 - [TEST][CLML] Clip test case updated
- #17472 - [Device][OpenCL] add CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST to โฆ
Relax
- #17541 - Fix bug in convert_layout pass
- #17539 - [KVCache] Fix attention prefill kernel for Metal and Android
- #17540 - Add support for ONNX LPPool
- #17536 - [Frontend][Onnx] Add auto_pad support for conv
- #17525 - support masked_scatter
- #17506 - [Python]Update Rotary positional embedding scaling
- #17523 - Add gather_elements and gather_nd operators
- #17511 - Update ONNX frontend for unique, nonzero and compress
- #17509 - support scatter ops
- #17504 - [ONNX] Add support for dynamic shape expression in Expand
- #17482 - [KVCACHE] Improved schedule for prefill attention
- #17445 - [MetaSchedule] Support CPU weight prepack
- #17462 - Enhance Relax op and ONNX frontend
- #17466 - Revert "[KVCACHE] Improved schedule for prefill attention"
Runtime
- #17557 - [Dist] Implementation of KV cache transfer
- #17498 - [mrvl]: Support Marvell Hardware Runtime
TIR
- #17423 - [Schedule] Add annotate_buffer_access primitive
web
- #17545 - Allows setting powerPreference on webgpu
Misc
- #17593 - Fix GPU detection in PerStoreFeatureNode
- #17554 - [Refactor] Phase out microTVM
- #17542 - [REFACTOR] Phase out VTA
- #17533 - [Contrib] Remove CLML version print
- #17532 - [3rdparty] Update Picojson with const
operator[]
function (#327) - #17474 - [TE][CreatePrimFunc] Fix loop carried dependency case with nested block levels
- #17501 - Fix InternalError in StaticPlanBlockMemory when visiting DataflowBlockNode
- #17455 - Compiled with Default Target(LLVM) and Built with USE_MRVL=ON
- #17481 - [Marvell BYOC]: global_max_pool2d and squeeze op support
- #17484 - Replace
np.int
withnp.int32
- #17476 - Pin pytest-profiling==1.7.0
- #17464 - [JVM] Align Java GraphModule Initialization with Python API
- #17458 - Show the record if the escape sequence is unsupported
Apache TVM v0.18.0
Introduction
The TVM community has worked since the last release to deliver the following new exciting improvements!
The main tags are below (bold text is with lots of progress):
- Frontend: PyTorch's ExportedProgram is supported in the relax frontend ( #17346)
- Community, RFCs
- AOT, Hexagon๏ผ OpenCL & CLML, Web, Metal
- Relax, Dlight, Disco
- TIR, TVMScript
- Docs, Docker, CI, Misc, BugFix
Please visit the full listing of commits for a complete view: v0.18.dev0...v0.18.0.rc0.
Community
- #17450 - update contributors
RFCs
The new RFC introduces a new backend Android Neural Network API (NNAPI) for BYOC. It is a graph-level neural network inference API provided by the Android runtime. Prior to this RFC, TVM on Android mobile devices mainly relies on OpenCL for GPU acceleration. This RFC aims to add a new codegen and a runtime via the BYOC framework, which enables execution on custom accelerators from SoC vendors on mobile devices.
- #109 - [RFC] NNAPI Integration via BYOC
BYOC
- #17385 - [NNAPI] Add NNAPI backend for BYOC
BugFix
- #17440 - [TIR][Schedule] TileWithTensorIntrin skip ComputeInline if buโฆ
- #17419 - [FFI]Grab GIL when check env signals
- #17403 - [Fix][LLVM] Fix getHostCPUFeatures LLVM version cutoff
- #17383 - [ONNX] Skip constant If node generated by PyTorch
- #17360 - [FIX] fix bug when normalize iter with different lower bounds
- #17148 - [Relax] Preserve existing DataflowBlock in ConvertToDataflow
- #17345 - [Fix][Relax] Add the missing tree-attn func arg for KV cache creation
- #17073 - [Relax]FCallPacked not checked in CodegenVMTIR
- #17315 - [MSC]Bugfix for strided_slice op
- #17335 - [Relax][PyTorch][Fix] use
_convert_torch_tensor_to_relax()
where possible - #17330 - [Relax][PyTorch]Update
layer_norm
converter to supportimmutable_list
fornormalized_shape
- #17324 - [Fix] Remove
tvm.
prefix from image name when./docker/build.sh
- #17308 - [TVM4J]Fix unhandled return type in JNI
- #17307 - [Fix][TIR] LowerThreadAllreduce warp reduction mask
- #17312 - [Relax]Infer TIR values from shapes inside a tuple
- #17292 - [Relax]Support torch.unbind op and fix bugs for expand && split
- #17263 - [Relax]Preserve dtype in ToMixedPrecision for kNever ops
- #17229 - [Cutlass] fix cutlass instantiate attention template bugs
- #17121 - [Relax]Fix a bug about the IR construction in test file
- #17142 - Allow import of TVM when current directory is read-only
CI
- #17444 - [Docs] Upgrade Sphinx
- #17425 - Upgrade CI to Python 3.9
- #17410 - Upgrade unity image tag to
20240917-153130-9f281758
- #17409 - [Windows] Workaround for error in FindLLVM
- #17397 - Update image tag to 20240917-153130-9f281758
- #17338 - Upgrade PyTorch to 2.4.1
- #17337 - Disable NNPACK build and fix error on Android SDK installaion
- #17355 - Upgrade github upload-artifact action
- #17334 - [Hexagon] Forward gtest tests into pytest as separate tests
- #17271 - Resolve CI compilation failures on MacOSX
- #17221 - Reduce logging level when checking if docker image exists
- #17206 - Update dummy-variable regex for pylint
- #17117 - [CLML]Fix for few clml regression issues
- #17155 - Remove lint step from
unity/pr-head
step
Disco
- #17398 - Enable float8 data type in disco
- #17275 - Fix double free of nccl communicator
- #17264 - Disable splitting nccl communicator in single-group
- #17182 - Implement SocketSession
- #17191 - Cross-group and p2p send/receive primitives
- #17180 - Group-wise operation
Dlight
- #17430 - [GPU] Improve matmul schedule for adreno
- #17363 - Fix Matmul rule for Conv3D
- #17259 - [ADRENO] Fix for opencl adreno matmul schedule
- #17187 - [GPU] Add OpenCL dequant matmul schedule
Docker
- #17433 - [CI] Add NNEF dependency to CI images
Docs
- #17436 - [Relax][PyTorch]Use
torch.export
insteamd offx.symbolic_trace
for tutorial - #17402 - [Doc] Update Architecture Overview
- #17382 - More clarity on security model of RPC server
- #17380 - [Doc] Relax Deep Dive
- #17377 - Update document to include security model of RPC server
- #17378 - Link to project-specific security page
- #17352 - TVM pip Installation fix
- #17343 - Minor fix typo in developer howto guide
- #17328 - [Doc] Deep Dive TensorIR
- #17327 - [Doc] How to Optimize a Language Model
- #17320 - [Doc] Customize Optimization
- #17319 - [Doc] Fix doc build error in e2e_opt_model.py
- #17306 - [Doc] Refactor How-To
- #17296 - [Doc] Overview
- #17298 - [Doc] IRModule
- #17286 - Introduce Relax API and move legacy part to standalone page
- #17289 - [Doc] Quick Start
- #17287 - [Doc] Refactor install docs
Frontend
- #17431 - [Relax][Onnx] Add support for pad-2
- #17447 - [ONNX] Move relax related tests to the correct file
- #17427 - [Relax][ONNX] Expand op support for ONNX frontend
- #17429 - [Relax][PyTorch] Support tensor manipulation and creation ops for ExportedProgram importer
- #17426 - [Relax][PyTorch] Support neural network ops for ExportedProgram importer
- #17424 - [Relax][PyTorch] Support binary, statistical and search ops for ExportedProgram importer
- #17421 - [Relax][PyTorch] Support more unary ops for ExportedProgram importer
- #17396 - [Relax][PyTorch] Add support for
torch.export.ExportedProgram
in Relax PyTorch Frontend - #17379 - [Relax][PyTorch] Fix output shape of
torch.nn.functional.scaled_dot_product_attention
- #17376 - [Relax][PyTorch] Cleanup Tensor Manipulation and Creation op converters
- #17372 - [Relax][PyTorch] Cleanup Statistical, Search and DataType op converters
- #17369 - [Relax][PyTorch] Cleanup Neural Network op converters
- #17366 - [Relax][PyTorch] Cleanup binary op converters
- #17356 - [Relax][PyTorch] Cleanup unary op converters
- #17350 - [Relax][Onnx] fix params name bug in onnx frontend
- #17342 - [Relax][PyTorch] Add support for
torch.ops.aten.sym_size.int
- #17300 - [Relax][PyTorch] Add support for torchvision.ops.stochastic_depth
- #17325 - [Relax][PyTorch] Add support for `torc...
Apache TVM v0.17.0
Introduction
The TVM community has worked since the v0.17.0 release to deliver the following new exciting improvements! This release version is:
The main tags are below (bold text is with lots of progress):
- Community, RFCs
- AOT, Hexagon๏ผ OpenCL & CLML, Web, Metal
- Relax, Dlight, Disco
- TIR, TVMScript
- Docs, CI, Misc, BugFix
Please visit the full listing of commits for a complete view: v0.17.dev0...v0.17.0.rc0.
Community
- #17018 - New committer: Balint Cristian
RFCs
This new RFC added an open, standardized format for neural network exchange developed by the Khronos Group since 2018 (https://www.khronos.org/nnef). It is aimed at deploying trained neural networks from deep learning frameworks to proprietary inference engines of neural network hardware vendors.
- #108 - [RFC] Add NNEF frontend
AOT
- #17077 - Correctly calculate workspace for vector types
Adreno
- #16927 - [SCRIPT]Fix in build config for adreno
BYOC
- #16895 - Add layout check and update shape check for cublas FP8 BYOC
BugFix
- #17138 - [Fix][TIR] Fix outdated call to create extern buffer in make_extern
- #17132 - Restrict CopyOnWrite to _type_final
- #17096 - Update FAttrsGetter to return Map<String, ObjectRef>
- #17078 - [NCCL] Release NCCL thread_local resources in destructor
- #17044 - [Support] Fix copy constructor for support::OrderedSet
- #17000 - [MSC] split name_string with index by colon from the right
- #16923 - [Fix][Dlight] Fix GeneralReduction for log-sum-exp
- #16924 - [Fix] Fix SSA conversion for SizeVar retention
- #16903 - CudaDeviceAPI::GetAttr may check kExist when GPUs absent
- #16901 - rocm shared memory issue on MI250
CI
- #17055 - [SME][Test] Add additional conv2d tests for asymmetric parameters
- #17007 - [TOPI][Testing] Enable conv2d NHWC fp16 topi testing for
arm_cpu
- #16930 - [UnitTest] Use pytest's scope='session' for tvm.testing.parameter
- #16948 - Update image tag to 20240428-060115-0b09ed018
- #16931 - Use LLVM17 for tests on
ci_cpu
- #16942 - Enable Conda setup v3
- #16939 - Upgrade CUDA to 12.4
CRT
- #17097 - [Bugfix]Return error code on error from ModuleGetFunction
Disco
- #17035 - [QoL] Implement broadcast/scatter methods for Session
- #16992 - [Bugfix]Handle NDArray larger than OS buffer for pipe
- #16978 - Implement
num_workers
property fordisco.Session
- #16989 - Treat hangup of disco worker process as kShutdown
- #16993 - Allow allocation that only exists on worker0
- #16979 - Expose disco.Session.shutdown through the python API
- #16919 - Improve error message for CallPacked
Dlight
- #17082 - Use 16x32 spatial x reduction thread extents in GEMV scheduling
- #17052 - Skip GEMV rules when more than one vector
- #17026 - Perf improvement for low_batch_gemv on Metal
- #17016 - Update Adreno GEMV Rules
- #16972 - [GPU] Enhance opencl thread limit for schedules
- #16973 - [GPU] Improved gemv outer fallback schedule
- #16958 - Check for target in function attributes
- #16894 - Enhance vectorization for gpu matmul
- #16884 - Add check for matmul dtype and fix reduction rule
Docs
- #17146 - [DOC] Fix typo for the "We utilize the intermediate representation of nn.Graph to convert the OneFlow model to Reley."
- #17015 - [DOC] Update Model Links to Include Commit
Frontend
- #17014 - [ArgParse] Pass default values to target compiler(#13264)
- #16961 - [Bugfix][ONNX] Improve broadcast and batch_matmul conversion
- #16936 - [TFLite] Add support for GELU conversion
Hexagon
- #17123 - Add support for v75
LLVM
- #17046 - [Arith][SVE] Add rewrite rules for indices split by scalable expressions
- #16966 - [SVE] Add support for representing and creating buffer-level predicates
- #17001 - [SVE] Use only powers of two as possible vscale values
- #16962 - [SVE] Add codegen support for
vscale_range()
function attribute - #16968 - Stringref API deprecation fixes
- #16965 - [SVE] Add get_active_lane_mask builtin
- #16899 - [SVE][TOPI] Add conv2d NHWC hybrid SVE schedule for
arm_cpu
- #16893 - [SVE] Check for SVE target in VectorizeLoop
- #16862 - [SVE] Support splitting by vscale in
tir::split
andte::split
MetaSchedule
- #17012 - [BugFix]MultiLevelTilingTensorCore generates inconsistent thread-binding sketch for batched matmul
- #17066 - [BugFix]Fix TensorIntrin โdot_4x4_i8i8s32_sdotโ is not registered
Metal
OpenCL & CLML
- #16933 - [CLML] Fix in clml pattern check condition
- #16929 - [VM][OPENCL] Take advantage of OpenCL host ptr for improved copy
ROCm
- #17141 - [Backend]Fix error when building TVM with LLVM 19
Relax
- #17139 - Fix cublas dispatch for corner cases
- #17127 - [KVCache] Support fork in sliding window sink part
- #17115 - Support
input_axis_separator
to allow 2D to 1D conversion - #17119 - [Bugfix]Set purity=false for LazySetOutput
- #17118 - [VM] Improved error messages for mismatched parameter count
- #17110 - Alloc BYOC workspace with R.builtin.alloc_tensor
- #17089 - [ONNX] Add support for HardSigmoid
- #17100 - [KVCache] Unlimited depth blocks
- #17075 - [Transform] Modify FuseTIR pass to propagate buffer attributes
- #17088 - [ONNX] Add support for HardSwish
- #17085 - [PyTorch] Add support for torch.nn.Hardsigmoid
- #17083 - [TVMScript]Preserve tir.SizeVar through TVMScript round-trip
- #17086 - Ignore dynamic parameters in RewriteDataflowReshape
- #17084 - [PyTorch] Add support for torch.nn.Hardswish
- #17074 - [KVCache][Test] Fix TIR attn kernels for uncommon group size
- #17067 - Add missing white spaces in error messages
- #17061 - [Frontend][Onnx] Cast Op special handling for ShapeExpr input
- #17033 - [Bugfix] Apply FuseOps to nested DataflowBlock
- #17032 - [Bugfix] Annotate ComputePrimValue output as host function
- #17034 - [Bugfix] Bind symbolic variables in R.match_cast
- #16960 - [UnitTest] Validate IRModule with multiple targets
- #16995 - [KVCache] Support KVCache decode from forked sequence an...
Apache TVM v0.16.0
Introduction
The TVM community has worked since the v0.15.0 release to deliver the following new exciting improvements! This release version is:
- First support of Relax, with dynamic shape and pipeline
- Dlight module for optimizing LLM TIR workloads on GPU
- Disco module for initial SPMD multi-GPU support
The main tags are below (bold text is with lots of progress):
- Community, RFCs
- Adreno, ArmComputeLibrary, Met 8000 al, cuda & cutlass & tensorrt, micoNPU, Runtime
- Relax, Dlight, Disco
- Arith, TIR, TVMScript
- Docs, CI, Misc, BugFix
Please visit the full listing of commits for a complete view: v0.16.dev0...v0.16.0.rc0.
Community
RFCs
This new RFC explores how TVM can be utilized to generate code for the SME ISA to achieve improved inference performance on supported Armยฎ-based hardware implementing the SME extension.
- #107 - [RFC] Scalable Matrix Extension enablement
Arith
- #16735 - [Fixup] Require feature flag for tighter inequality bounds
- #16588 - Provide tighter ConstIntBounds for special cases
- #16704 - [Fix]Fix canonical simplification of LE
BYOC
- #16567 - Skip processed functions in FuseOpsByPattern and RunCodegen
BugFix
- #16766 - [Target] Added null check to fix segfault at ->defined() in cpu.cc DetectSystemTriple()
- #16739 - [Ansor] Fixing Ansor Gradient Bug
- #16820 - [Fix] PAPI docs
- #16793 - [Fix] fix for numpy 2.0 compatibility
- #16790 - [Fix] Fix build errors with VS2022
- #16780 - [Fix] Fix numpy dtype map
- #16773 - [Fix] Fix the purity flag of "vm.call_tir_dyn" and "kill" ops
- #16770 - [Hotfix] Revert driver API pass ordering that breaks MLC, mark failing test
- #16771 - [Fix] Remove redundant "remove_all_unused" in IPC memory lowering
- #16746 - [Fix][Builtin] Fix "GetQueryPosition" of PagedKVCache
- #16728 - [Fix] Introduce TVM_DEBUG_WITH_ABI_CHANGE to warn ABI changes in debug mode
- #16714 - [Fix] PagedKVCache fetching compute stream when copy stream is needed
- #16684 - [SLM] Produce well-formed Relax for nn.modules.KVCache
- #16659 - add the default value for DFT in ONNX frontend
- #16637 - [Transform] Preserve symbolic variables in FuseOps
- #16649 - [FFI] Add a missing default for datatype lanes
- #16492 - [Executor] fix debug_executor function debug_get_output
- #16598 - [Transform]Handle non-composite lambda functions in FuseOps
- #16565 - [Transform] Keep private non-primitive functions in FuseTIR
- #16518 - Use xxx instead of pow(x,3)
- #16436 - Ensure that bf16 arrays are created as expected
- #16361 - Disable SingleEnvThreadVerifier
- #16289 - [AUTOTVM][FIX] Typo fixes and add a warning in the Droplet Search
CI
- #16837 - Disable flaky unit test
- #16765 - [AOT][Testing] Improve output mismatch information on test failure
- #16661 - add merge_with_main in unity
- #16611 - [AOT][Testing] Print output values on test failure
- #16546 - Disable testing that downloads from mxnet
- #16521 - Fix CI Script and Broken Tests
- #16502 - Support tvm-bot rerun for tvm-unity task
- #16435 - Update image tag to 20240126-070121-8ade9c30e
- #16420 - [WASM] Update emsdk and nodejs version
- #16384 - Remove NVIDIA_DISABLE_REQUIRE
- #16382 - In jenkins.cmd_utils.Sh.tee, check for failing subprocess
- #16366 - Upgrade sccache version to 0.7.*
- #16369 - Upgrade Unity ci images
- #16344 - Update docker images tag to 20240105-165030-51bdaec6
- #16340 - [Unity][UnitTest] Increase atol to resolve flaky CI failure
- #16337 - [Hexagon][UnitTest] Disable flaky quantization test
- #16336 - Upgrade cmake version to 3.24.0
Docker
- #16755 - [SME]Add Fixed Virtual Platform (FVP) and toolchain install
- #16348 - Upgrade pip in i386 container
Disco
- #16618 - [Disco] Propagate structlog configuration to disco workers
- #16639 - [Disco] Expose functions to query the per-worker device/rank
- #16617 - [Disco] Implement
Session.import_python_module
method - #16715 - [Disco] Propagate structlog/logging config to workers
- #16845 - [Debug][Disco] Check if a PackedFunc exists before calling it
- #16817 - [Disco] Reduce Process/ThreadSession message queue reads and writes
- #16807 - [Disco] Support setting workers' CPU affinity
- #16375 - [Unity] Fix creation of disco ProcessSession
- #16821 - [Fix] Add TVM_DLL to Disco session
- #16752 - [Fix] Lazy import of "psutil" in disco process pool
Dlight
- #16775 - [Fix][Dlight] (Low-batched-)GeMV on small spatial loops
- #16429 - [Unity][Dlight][Fix] Reduction rule support dyn-shape epilogue
- #16351 - [Unity] Add dlight.gpu.Fallback in DispatchSortScan, add argsort, topk, and cumprod
- #16338 - [Unity][DLight] Introduce Specific Rule for RMSNorm
- #16251 - [Unity][Dlight] Support dlight gemv rule on nested inner block
- #16878 - [Dlight] Enhance vectorization loading weight for gemv
- #16848 - [DLight] Fix a corner case for reduction rule
- #16701 - [Dlight] Add fallback for low batch gemv with outer reduction
- #16678 - [Dlight] LowBatchGemv rule only apply to function with spatial symbolic var
- #16665 - [Dlight] Skip GeMV when normalization fails
- #16579 - [Dlight] Scheduling Low batch GEMM using GEMV-like rule
- #16579 - [Dlight] Scheduling Low batch GEMM using GEMV-like rule
- #16321 - [DLight] Skip rule if target is not suitable
- #16731 - [Dlight] Fix GeMV shared memory estimation
Docs
- #16792 - [Doc] Fix set_axis_separator example
- #16610 - [Doc] Fixed Docstring usage example in
tvm.ir.make_node
- #16572 - [Doc] Remove MxNet related tutorials
- #16514 - [Unity][Doc] Document passes that depend on
DataflowBlock
s and encourage usingConvertToDataflow
- #16482 - [Doc] Fix Docstring in
extern.py
for Sphinx - #16346 - [Doc] Fix minor error in "Expressions in Relay"
Frontend
- #16001 - [ONNX] Fix interpreting auto_pad parameters in ConvTranspose operator
- #16651 - [PaddlePaddle] PaddlePaddle model with NCHW data format that supports quantization
- #16616 - [PaddlePaddle] Support conv2d when data_format is NHWC
- [#16526](https://github.com/a...
Apache TVM v0.15.0
Introduction
NOTE: This is last release version before unity branch switch as main branch. No unity features.
The TVM community has worked since the v0.14.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFCs
- Adreno, ArmComputeLibrary, Metal, cuda & cutlass & tensorrt, micoNPU, Runtime
- Frontend & Relay
- Arith, TOPI, TIR, TVMScript
- Docs, CI, Misc, BugFix
Please visit the full listing of commits for a complete view: v0.14.0...v0.15.0.
Community
- #16172 - Yixin Dong -> Reviewer
- #16162 - Shuai Yuan -> Committer
- #16164 - Qiang Zhang -> Committer
- #16166 - Bohan Hou -> PMC
- #16165 - Ruihang Lai -> PMC
RFCs
- #105 - Add a new backend languageโโSYCL
Adreno
- #15991 - [CI] Enhancements to Adreno specific CI utils
- #15786 - [TOPI] Add conv2d transpose nchw texture schedule
Arith
- #16227 - Simplify nested if_then_else when constant is appearing in then_expr
ArmComputeLibrary
- #15990 - [ACL] Update Compute Library to v23.08
Metal
- #16192 - [Device] Fix metal warp size
- #16033 - [Codegen] Disable cross-function call in Metal codegen
cuda & cutlass & tensorrt
- #16061 - [CUDA] Add an option for profiling cuda kernels
micoNPU
- #16003 - [microNPU][ETHOSU] Fix ConcatRewriter args processing
- #15929 - [microNPU][ETHOSU] Fix rounding mode in requantize operation
Runtime
- #15896 - [CLML] Fix for CLML ops and enable more test case
- #16133 - Parallel-for with threading backend
- #16066 - Support clear global memory allocators
- #16030 - Introduce
TVM_MODULE_VTABLE
Macros
BugFix
- #16269 - Update pillow usage
- #16272 - Fixed Inappropriate Logical Expression
- #16216 - [TIR] Fix dynamic smem merge leaf alloc
- #16190 - Fix the error of reloading the model library on the ROCm platform: "MIOpen Error: No invoker was registered for convolution forward.โ
- #16167 - [Relay][Pytorch] Fix missing
.dtype
- #16091 - [Fix] Fix
topi.rms_norm
with float32 upscale - #16081 - [Fix] Broken Windows Build with LLVM
- #16051 - [Fix][TIR] Fix dtype issues for match_buffer and ramp node
- #14655 - [VTA] Fix FSIM compile error on macOS
- #16021 - [FFI] Typo fix of IncRef to DecRef
- #16010 - [Fix][TIR] fix mul dtype mismatch
- #16000 - [Fix][TIR] fix symbolic strides lower
- #15970 - [Hotfix] Mark python-FFI handling with TVM_DLL
- #15965 - [CI] Better to pass the build folder
CI
- #16110 - Refactor unittest folder
- #16055 - Fix broken links about Jenkins
- #16062 - Use LLVM 17 for tests on
ci_arm
- #16018 - [Tests] Fix work_dir location used by test_micro_tuning_with_meta_schedule
- #16019 - [Tests] Check int8+int32 testcases in test_estimate_peak_flops_cpu
- #16017 - [Tests] Fix str vs. int comparison in test_num_threads
Docs
- #16282 - [Doc] Fix minor error in doc (Add an operator to Relay)
- #16152 - [DOC] Add v0.14.0 docs to site
- #16127 - Revert "[#15157][Rust][Doc] Re-enable the Rust documentation build (#15213)"
- #16097 - Add missing backtick to contribute/code_guide.rst
- #16089 - Fix error on linting by adding
--rev
argument - #16024 - Update release_process.rst about version number modification
Frontend & Relay
- #16243 - [TFLite] Add support for quantized mirror pad
- #15914 - [TFLite]Support quantized SQUARE
- #16159 - [KERAS] Fix bug concat convert for NCHW
- #16319 - [Torch] add aten:broadcast_to
- #16131 - [Pytorch] Add support for
aten::unflatten
- #16105 - [Pytorch] Add support for
aten::bitwise_and
- #16079 - [Pytorch] Add support for aten::swapaxes operator
- #15502 - [Pytorch] aten::copy_ support for pytorch
- #16180 - [Pytorch] Fix bug when converting models with torch.nn.ParameterList
- #16143 - [Pytorch] Add support for
aten::scaled_dot_product_attention
- #16123 - [Pytorch] Add support for
aten::linalg_vector_norm
- #16171 - [Frontend] Preserve Pytorch Span Names
- #16217 - [Frontend][QNN] fix access
param_debug_name_map
to node output name in fx-quantized graph node replacement - #16199 - [Frontend] Add support for aten::concat
- #16151 - conv3d depthwise bug fix
- #15928 - Expose qnn ops directly from relay.qnn module
TOPI
- #16259 - Add support for group_conv3d_transpose_ncdhw for generic
- #16052 - Enhance
topi.nn.matmul
- #16080 - Reduce code redundancy in conv2d weights transformation
- #16248 - [TOPI] Add support for group_conv1d_transpose_ncw for generic
- #16106 - [TOPI] Add conv2d NHWC hybrid schedule for
arm_cpu
TIR
- #16239 - [Schedule] TileWithTensorIntrin skip incorrect ComputeInline for input-padding
- #16236 - ConvertSSA process entry func first
- #16070 - [Transform] Introduce new
InjectPermutedLayout
pass - #16083 - Enhance Python Type Annotations for TIR Expr
- #16073 - Support more mma intrinsics and
get_mma_intrin_group
utility - #16076 - Enhance Python Type Annotations for TIR stmt
- #16074 - Fix the thread binding iter_var dtype in
Bind
primitive - #16063 - Fix pass RenewDefs error in gather/take case
- #16027 - Fix software pipeline with dynamic loop extent
TVMScript
- #16271 - Disable concise scoping when the scope stmt is explicitly annotated
- #16041 - Fix mismatched dtype of IterVar in
T.thread_binding
- #15953 - [TIR] Pretty print TIR LLVM function name
- #15972 - delete print extra info at parsing
Misc
- #16279 - replace deprecated np.int with int to avoid crash
- #16262 - Update conv2d.py
- #16255 - [Support] Add Interrupt Handling in Pipe
- #16104 - [LoopPartition] Fix a bug of LoopPartition in single point scenarioes
- #16231 - [Target] Add Jetson AGX Orin tags
- #16221 - remove deprecated np.int in slice converter (pytorch)
- #16214 - [Python] Fix setup.py for inplace build
- #16174 - Bump cryptography from 37.0.2 to 41.0.6 in /docker/python
- [#16202](#16...
Apache TVM v0.14.0
Introduction
The TVM community has worked since the v0.13.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC
- Arith, MetaSchedule
- Adreno, ArmComputeLibrary, Hexagon, Metal, OpenCL & CLML, ROCm, Vulkan, cuda & cutlass & tensorrt, micoNPU, web
- Runtime, TVMC, AOT, LLVM, microTVM, CMSIS-NN
- Frontend, Relay, BYOC
- TOPI, TIR, TVMScript
- Docs, CI, Docker
- Misc, , BugFix
Please visit the full listing of commits for a complete view: v0.13.0...v0.14.0.
Community
RFC
AOT
- #15301 - Avoid call_extern() with incorrect argument count
- #15181 - Remove workaround to help resolve test flakiness
Adreno
- #15830 - Minor changes for Adreno docs and help scripts
- #15671 - [VM]Fix using buffers for weights in VM
- #15391 - Small fixes in Adreno schedules
Arith
- #15881 - Simplify the result of non-divisible floordiv
- #15665 - Fix detect non-divisible iteration form like (x % 255) // 16
- #15638 - MLIR PresburgerSet compile fix mlir >= 160
- #15628 - Added simplification rule for multiple equality compares
- #15558 - Fix detect linear equation with uint var
- #14690 - Add tvm::arith::PresburgerSetNode to work with Presburger Set in MLIR
- #15555 - Fix handling of overlapping predicates
- #15471 - Enhance Canonical Simplify for LE
- #15228 - Enhance buffer shape bound deduction to include offset
ArmComputeLibrary
BugFix
- #15891 - [Relay]fix axis parsing of repeat converter in the MXNet frontend
- #15873 - [Fix] Remove duplicated words from comments, NFC
- #15868 - [Relay]Fix conv transpose with default strides in ONNX frontend
- #15773 - [CPP] Fix cpp deploy bug
- #15778 - [Hotfix] Fix Windows Pipe
- #15748 - Move symbols that are relevant to the runtime from libtvm toโฆ
- #15752 - [Relay]fix the wrong calculate logic of operator flip in PyTorch frontend
- #15715 - [Relay]Fix the wrong implementation about operator Threshold in oneflow
- #15711 - [Strategy] Fix
arm_cpu
int8 conv2d strategy for dotprod and i8mm targets - #15717 - [Relay]fix the wrong implementation of Softplus in OneFlow
- #15677 - [Arith] IterMapRewriter abort rewriting once failure
- #15629 - [VTA] tvm.tir.Call has no name attribute
- #15584 - [Relay][Strategy] Enable compile time transformation of weights matrix for arm_cpu NHWC quantized conv2d
- #15542 - [Fix] Fix the typo in compile flag
- #15484 - [TOPI] Fix a bug in arm_cpu int8 conv2d i8mm schedule
- #15473 - [Relay] Fix some bugs of dominator pattern
- #15478 - [TIR] ThreadSync with shared.dyn awareness
- #15406 - [TIR]Ensure the Var's scope is correct
- #15399 - [TIR] Fix multi-grouped multi-warp allreduce
- #15350 - [Relay] fix a bug of printing dataflow pattern
- #15385 - Work around "Internal Compiler Error" in MSVC
- #15294 - [Bug][Relay] fix relay frontend pytorch op addmm bug
- #15323 - [Fix][TIR] LowerThreadAllreduce with correct thread mask
- #15291 - [Relay][GraphExecutor] Fix set_input_zero_copy() precision bug
- #15225 - Fix function to read all file
CI
- #15903 - [Target]Add LLVM functions for current system info
- #15897 - [ADRENO] Few updates to Adreno docker setup
- #15836 - Update ci-gpu image
- #15668 - Allow Limit CPUs in Docker
- #15568 - [Testing] Allow Capitalized name in CompareBeforeAfter
- #15519 - [TEST] Run tests/python/relay/aot tests in ci-cortexm
- #15485 - Remove cython version pin
- #15421 - Bump Flax and Jaxlib versions to fix Jaxlib install error
- #15226 - Add ml_dypes dependency for all docker images
- #15353 - Pin cython version to fix cython compilation
- #15352 - Make Graviton3 default AArch64 job runner node
- #15339 - Update test to include unique attribute
- #15277 - [Testing] Return BenchmarkResult in local_run and rpc_run
- #15268 - [Testing] Add tvm.testing.local_run
- #15136 - [UnitTest][NVPTX] Avoid cascading failures from CUDA postproc
CMSIS-NN
Docker
- #15799 - Add LLVM 17 to the LLVM install script
- #15862 - Upgrade oneflow to v0.8.0
- #15819 - Install oneflow from PyPi
- #15310 - Update ci-cortexm docker image
- #15293 - tensorflow_aarch64 package upgrade
Docs
- #15619 - community strategy decision process
- #15508 - Add v0.13.0 docs to site
- #15213 - [#15157][Rust][Doc] Re-enable the Rust documentation build
Frontend
- #15821 - [TFLite]Support quantized ELU
- #15844 - [TFLite]Fix test failures caused by div-by-zero
- #15798 - [TFLite]Support quantized Pow
- #15829 - [Relay][Keras][Bugfix] fix the converters of GRU and SimpleRNN about the go_backwards attribute
- #15838 - Fix unnecessary pylint errors
- #15802 - [SkipCI][Hotfix][TFLite] Disable test of quantized floor mod
- #15790 - [TFLite]Support quantized LESS_EQUAL
- #15775 - [TFLite]Support quantized GREATER_EQUAL
- #15769 - [TFLite]Support quantized NOT_EQUAL
- #15768 - [TFLite]Support quantized div
- #15746 - [TFLite]Support quantized LESS
- #15733 - [TFLite]Support quantized floor_mod
- #15724 - [TFLite]Support quantized floor_div
- #15602 - [ONNX][BugFix] Support If body with free variable from graph input
- #15472 - [Relay][TFLite] Fix in qnn.conv2d when parameter groups not equal to 1
- #15117 - [TFLITE] Add support for TFLite's regular NMS operator
- #15415 - [ONNX] add onnx Mish operator
- #15422 - [Keras] Add support for swish actiivation
- #15370 - [Relay][Pytorch...
Apache TVM v0.13.0
Introduction
The TVM community has worked since the v0.12.0 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC;
- Frontend: TensorFlow/TFLite, Pytorch/Torch, Paddle, keras;
- Runtime: Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, Vulkan, Hexagon, Metal, others about runtime;
- Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule;
- microTVM, AOT, TVMC, LLVM;
- CI, BugFix, Docs, Docker, Miscs;
Please visit the full listing of commits for a complete view: v0.12.0...v0.13.0.
Community
- #15086 - Aleksei-grovety -> Reviewer
- #14676 - Jiajun Jiang -> Reviewer
- #14677 - Qiang Zhang -> Reviewer
- #14622 - Sunghyun Park -> Reviewer
- #14578 - Zihao Ye -> Committer
- #14853 - Anirudh Sundar Subramaniam -> Committer
- #14772 - Add new key for release signing
RFC
Frontend
- #14830 - Use f-strings for string formatting, NFC
- Keras
- #15122 - [Relay][Keras] Fix SeparableConv2D conversion in dilation_rate attribute
- #15107 - [Relay][Keras] Fix a wrong variable name in keras frontend
- #15053 - [Relay][Keras] Fix the wrong implementation logic about cropping2D
- #15082 - [Relay][Keras] Fix UpSampling2D about the wrong assertion about size
- #15060 - [Relay][keras] Fix the bug about the attribute 'output_padding' in Deconv
- #14707 - [Keras]fix a bug about alpha attribute in LeakyReLU which lead to passes conflict
- #15175 - [Relay][Keras] Fix concatenate convert function in axis parsing
- Paddle
- #14801 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for gaussian_random/softplus/Conv3d/Conv2d
- #14973 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for tanhshrink/pool3d/set_value ops for paddle frontend
- #14826 - [Paddle] [PaddlePaddle Hackathon 4] add convert support for p_norm/roi_align/softmax_with_cross_entropy
- #14575 - [Paddle] [PaddlePaddle Hackathon 4]add attribute support for dropout/hard_sigmoid/pixel_shuffle
- TFLite
- TensorFlow
- #14546 - [Tensorflow] Fix conv2d_transpose for NHWC layout
- PyTorch
- ONNX
- #15017 - [ONNX] Fix bug in scatter_elements
Runtime
- #15182 - Add weak symbol to builtin fp16
- #15161 - Clean TVM stacktrace in error messages
- #15162 - Support void as dtype in FFI
- #14902 - Update Module and Registry to use String Container
- #14967 - [Runtime,RPC] Use f-strings for string formatting, NFC
- #14887 - Make systemlib unique per prefix
- #14775 - Added str for tvm._ffi.runtime_ctypes.TVMArray
- #14656 - Fix Can't "query_imports" Bug of VM Executable
Adreno
CMSIS-NN
- #15059 - Update CMSIS-NN release to v4.1.0
OpenCL & CLML
- #14972 - [OPENCL] Always use convert_T for type conversion
- #14995 - [OpenCL] Improve diagnostic message
- #14833 - [Codegen][OpenCL] fix amibiguous selection operator call
- #14792 - [OpenCL] Refactor OpenCL runtime to support SPIRV binary ingestion
- #14922 - [OpenCLML] Reactor and introduce on chip memory and memory planner
- #14949 - [CodegenC] Updated unit test for sorted CodegenC output
- #14767 - [OpenCLML] Transposed convolution support and other fixes
cuda & cutlass & tensorrt
- #14751 - [CUDA] Fixed the call of the min function in the schedule for cuda
- #14798 - [CUTLASS] Add NDEBUG option to CUTLASS compile to speed up attention kernel
- #14782 - [Bugfix][Codegen][CUDA] Wrong casting in ASM
metal
- #14962 - Fix int8 vectorized cast
- #14846 - Fix vectorized select
- #14727 - Update metal runtime to directly store kernel map
- #14671 - Fix flaky memory issue due to racing
Vulkan
Hexagon
- #14997 - Remove "c" as aot_host_target tvm/contrib/hexagon/pytest_plโฆ
- #14948 - Update instructions to compile hexagon runtime
- #14965 - Add support for v73, make v68 default
- #14720 - [TIR] Add get_vtcm_allocation_sizes with lowering
- #14567 - [TIR] Use the "target" value in T.func_attr for VTCM limit
ROCm
- #15106 - [TensorIR]AMD Matrix Core Support
- #15088 - [Target]Replace rocm arch parsing from int to string
microTVM
- #14872 - Use self.close_transport() on error
AOT
- #15033 - Avoid Var-to-Var Let binding in AOTExecutorCodegen
- #15032 - Remove duplication in tvm.testing.aot.compile_models < 5D32 li>#14529 - Fix warning on dropping const in TVMAotExecutor_GetInputName
micoNPU
- #15159 - [microNPU][ETHOSU] Fix compiler attributes types
- #15147 - [microNPU][ETHOSU] Add option to disable copying constants for case without cascader
- #15069 - [microNPU][ETHOSU] Fix SoftMax legalization parameters
- #15115 - [microNPU][ETHOSU] Upgrade to 23.05 version of Arm(R) Ethos(TM)-U NPU drivers
- #15114 - [microNPU] Upgrade Vela to v3.8.0
- #15104 - [microNPU][ETHOSU] Fix minimum buffer size
- #15063 - [microNPU][ETHOSU] Fix CopyComputeReordering pass arguments
- #14861 - [microNPU][ETHOSU] Add offloading to the NPU the nn.avg_pool2d operator with a stride > 3
- #14765 - [microNPU][ETHOSU] Channel pad offloaded to NPU
- #14774 - [microNPU][ETHOSU] Fix Softmax quantization parameters
- #14629 - [microNPU][ETHOSU] Softmax int8 legalization support
- #14353 - [microNPU] Add support for MEAN with uint8 ifm
- #14587 - [microNPU] Fix skip tests when Vela is not present
- #14464 - [microNPU][ETHOSU] Add restrictions to convert to NHCWB16 layout in LayoutOptimization pass
BYOC
Re...
Apache TVM v0.12.0
Introduction
The TVM community has worked since the v0.11.1 release to deliver the following new exciting improvements! The main tags are below (bold text is with lots of progress):
- Community, RFC;
- Runtime: ACL(ArmComputeLibrary), Adreno, OpenCL & CLML, ROCm, CUDA & CUTLASS & TensorRT, Ethosn, CRT, Hexagon, Metal, Web & WASM, others about runtime;
- Frontend: TensorFlow/tflite, Pytorch/Torch, Paddle, OneFlow, keras;
- TE, Relay, BYOC, TOPI, Arith, TIR, TVMScript, MetaSchedule, Schedule;
- CI, Tests, BugFix, Docs, Docker, Build;
- Android, microTVM, Target, AutoTVM, AOT, LLVM.
Please visit the full listing of commits for a complete view: v0.11.1...v0.12.0.
Thanks @ysh329 for the great effort to the release process as the release manager.
Community
- Reviewer
- Committer
- PMC
RFC
- [RFC] Introduce PresburgerSet (#99) (
e17994b
) - [RFC] Further Unify Packed and Object in TVM Runtime (#97) (
d646a22
)
Runtime
ArmComputeLibrary
- [ACL][TESTING] Use pytest.mark.parametrize in ACL conv2d tests
- [ACL] Prevent offloading of per-channel quantized operators
- [CL] Update Compute Library from v22.11 to v23.02.1
Adreno
- [Adreno] Extend pack_filter for HWIO layout
- [Adreno] Update interface of AnnotateMemoryScope pass
- [Adreno] Optimize reduction schedule
- [BENCHMARK][ADRENO] Adreno Benchmarks with texture
- [BENCHMARKS][CLML] Adreno benchmarks with CLML BYOC path added
- [BENCHMARKS][ADRENO] Documentation for Adreno (Texture) benchmarks
- [DOCS][ADRENO] Improved Adreno documentation
OpenCL & CLML
- OpenCL
- CLML
- [CLML][RUNTIME] Enable more ops in CLML runtime
- [CLML][RELAY] Enable Pad and Conv2d layer fusion
- [CLML][CODEGEN] CLML native codegen utility
- [CLML] Version compatibility and various test cases
- [CLML] Changes corresponding to OpenCL workspace refactorization
- [RUNTIME][CLML] OpenCLML tuning and profiling enhanced
ROCm
CMSIS-NN
- [CMSIS-NN] Global function that provides range based on dtype
- [CMSIS-NN] Add int16 add and mul operator support
- [CMSIS-NN] Add a runtime error message
- [CMSIS-NN] Reduction in code size of AOT test runner binary
- [CMSIS-NN] Remove support for the old CMSIS NN project
- [CMSIS-NN] Support CMSIS NN from new GitHub location
- [CMSIS-NN] Add Cortex-M85 support
CUDA & CUTLASS & TensorRT
- [CUDA][Schedule] Better Layout Transform Schedules
- [Profiler] Allow user to flush L2 cache in
time_evalutor
function for profiling CUDA kernels - [Codegen][CUDA] Add error message for missing fragment info
- [CUTLASS][Ansor] Combine CUTLASS and Ansor
- [TensorRT] Fix BiasAdd with correct axis attribute
- [TRT][BYOC] allow strided_slice ops on selected dimensions (#14142)
Ethosn
- [ETHOSN] Update driver stack version to 22.11
- [ETHOSN] Support for addition with constant input
- [ETHOSN] Apply FoldConstant before NPU partitioning
- [ETHOSN] Remove support for NPU driver 22.08
- [ETHOSN] Fix for the mock inference after NPU driver update
- [ETHOSN] Remove requantize dependency on resize
- [ETHOSN] Add support for experimental compiler option
CRT
- [CRT] USE CMake for CRT standalone libraries
- [CRT][microTVM] Enable USMP by default for AoTExecutor + CRT runtime
- [CRT]Cleanup unused macros in crt_config.h.template
Hexagon
- [Hexagon][TOPI] Use IndexMap axis separator instead of TE
- [Hexagon] Add concept of DMA groups
- [Hexagon] Improve cache management strategy for HexagonBuffer
- [Hexagon] Denote DMA cache bypass as experimental feature
- [Hexagon] Adapt some intrinsics for high vector lanes
- Hexagon compilation on MacOS system
- [Hexagon] Enable depthwise conv2d NHWC with an HWIO kernel layout
- [Hexagon][QNN] Improve performance wo QNN canonicalization
- [Hexagon][Metaschedule] Add timeout_sec arg to get_hexagon_local_builder
- [Hexagon] Fix deprecated call for data layout size in bits
- [Hexagon] Allow scalar tensors to have null shape during allocation
- [Hexagon][runtime] Make HexagonThreadManager::CheckSemaphore thread safe
- [Hexagon] Float and quantized dense operators with schedules
- [Hexagon][CI] Updated sha for builder LLVM
- [Hexagon][CI] Update the docker image ID to reflect newer LLVM
- [Hexagon] Switch from default_rng to random in Hexagon tests
- [Hexagon] Add hexagon user DMA intrins for tensorization
- [hexagon] Hexagon inference fix
Metal
- [METAL][CODEGEN] testcase for ramp codegen
- [CODEGEN][METAL] Fix unaligned vector load
- [CODEGEN][METAL] Fix ramp codegen
MicroNPU
- [microNPU] Sum legalization support
- [microNPU] Add rescale parameters for binary elementwise
- [microNPU] Add hardware constraints for binary elementwise
- [microNPU] Add support for TFLite PAD
- [microNPU] Upgrade Vela to v3.7.0
- [microNPU] Merge LUT activation with binary elementwise operation
- [microNPU] Upgrade to 22.08 version of Arm(R) Ethos(TM)-U NPU drivers
- [microNPU] Add relu6 relu_n1_to_1 test cases for Ethos-U
- [microNPU] Add a legalization test for TFLite PAD
- [[microNPU] Disable copying weights to SRAM for FullyConnected ops in CopyConstants scheduler](https://github.com/ap...