-
Notifications
You must be signed in to change notification settings - Fork 24.5k
Insights: pytorch/pytorch
Overview
Could not load contribution data
Please try again later
1 Pull request merged by 1 person
-
Bump requests from 2.32.2 to 2.32.4 in /.github
#155491 merged
Jun 16, 2025
204 Pull requests opened by 110 people
-
feat(cmake): add NCCL version selection based on CUDA version
#156014 opened
Jun 15, 2025 -
[profiler] add more CUDA API for kernel launcher
#156016 opened
Jun 15, 2025 -
[BE] add a minimal linter to check `pyproject.toml` consistency
#156017 opened
Jun 15, 2025 -
[build] modernize build-frontend: `python setup.py develop/install` -> `[uv ]pip install[ -e] .`
#156027 opened
Jun 15, 2025 -
bmm, topk, cholesky, linalg.norm, max with out variants set causing r…
#156030 opened
Jun 15, 2025 -
[BE][Easy] set end-of-line for `.bat` file to CRLF in `.editorconfig`
#156032 opened
Jun 15, 2025 -
[dynamo] Weblink generation when unimplemented_v2() is called
#156033 opened
Jun 15, 2025 -
Fix atleast_{1,2,3}d() with no arguments description
#156042 opened
Jun 16, 2025 -
[BE][Easy][setup] wrap over long error messages and redirect them to `stderr` in `setup.py`
#156043 opened
Jun 16, 2025 -
[BE][Easy][setup] use `super().method(...)` in command subclasses in `setup.py`
#156044 opened
Jun 16, 2025 -
[build] remove upper version pin for `setuptools<80.0`
#156049 opened
Jun 16, 2025 -
Update URL for RPATH documentation
#156060 opened
Jun 16, 2025 -
Support transpose and pack for bit8
#156065 opened
Jun 16, 2025 -
Register hpu device to fake backend
#156076 opened
Jun 16, 2025 -
revamp dtype documentation for 2025
#156087 opened
Jun 16, 2025 -
Convert to markdown: jit.rst
#156094 opened
Jun 16, 2025 -
Add error to intercept crash in issue #154882 on maxpool2d with indices
#156101 opened
Jun 16, 2025 -
[opinfo] Exclude aten_name if its not actually a name
#156104 opened
Jun 16, 2025 -
[opinfo] add overloads to opinfo
#156109 opened
Jun 16, 2025 -
local load/save
#156110 opened
Jun 16, 2025 -
Add debug messages for deps issues during fx splits
#156111 opened
Jun 16, 2025 -
Bump transfomers version
#156118 opened
Jun 16, 2025 -
[ROCm][Inductor][CK] update API for gemm-multiD change
#156122 opened
Jun 16, 2025 -
Display a warning when overwriting `CMAKE_CUDA_ARCHITECTURES`
#156123 opened
Jun 16, 2025 -
[test] re-run CI with complex + Python dispatch key changes
#156131 opened
Jun 16, 2025 -
NOT-FOR-LAND: enable autochunker by default
#156132 opened
Jun 16, 2025 -
[WIP][ci][cutlass backend] Add ci for cutlass backend tests
#156136 opened
Jun 16, 2025 -
Templatize model_container
#156137 opened
Jun 16, 2025 -
Improve documentation for torch.lobpcg
#156139 opened
Jun 16, 2025 -
[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel
#156140 opened
Jun 17, 2025 -
[executorch hash update] update the pinned executorch hash
#156141 opened
Jun 17, 2025 -
[dcp_poc] Introduce a new simple rank local checkpointer
#156142 opened
Jun 17, 2025 -
[Docs] Fix indentations in cond.md
#156147 opened
Jun 17, 2025 -
[list] Raise exception in invalid list method call
#156148 opened
Jun 17, 2025 -
Convert bottleneck.rst to markdown
#156149 opened
Jun 17, 2025 -
Optimize dim description in torch.max
#156153 opened
Jun 17, 2025 -
Attempt to microoptimize TorchDynamoContext enter/exit
#156155 opened
Jun 17, 2025 -
Bump protobuf from 5.29.4 to 5.29.5 in /.ci/docker
#156157 opened
Jun 17, 2025 -
Optimize scatter/gather kernel for ARM.
#156161 opened
Jun 17, 2025 -
Deprecate CUDAAllocatorConfig, use AllocatorConfig instead
#156165 opened
Jun 17, 2025 -
draft: [cp] context_parallel + flex_attention using monkey_patch
#156170 opened
Jun 17, 2025 -
Implementation of a ScannedModule
#156172 opened
Jun 17, 2025 -
[WIP] Add a new API of allocator setting for accelerator
#156175 opened
Jun 17, 2025 -
[NVIDIA] Refactor Family Blackwell Support codegen
#156176 opened
Jun 17, 2025 -
[WIP] Remove legacy aarch64_linux builder in favor of Manylinux
#156178 opened
Jun 17, 2025 -
[TEST] Add Windows cuda 12.9.1 build
#156179 opened
Jun 17, 2025 -
[Native][CPU][TopK] Improve perf by reducing swap operations
#156183 opened
Jun 17, 2025 -
[Inductor] Subgraph as a choice symbolic expression as input
#156185 opened
Jun 17, 2025 -
[TEST] Triton 3.4.0 pin update
#156186 opened
Jun 17, 2025 -
[inductor] Quiesce Triton compile worker pool after each dynamo compile
#156187 opened
Jun 17, 2025 -
Engine reuse calling thread when only single device detected
#156188 opened
Jun 17, 2025 -
Add auto support
#156189 opened
Jun 17, 2025 -
[ROCm] [CK] Composable Kernel integration for ROCm
#156192 opened
Jun 17, 2025 -
[dynamo] allow symints in list.__setitem__
#156197 opened
Jun 17, 2025 -
[Codemod][Folly target clean up] 57
#156198 opened
Jun 17, 2025 -
Fix torch.clamp CPU overflow with float16 tensors
#156199 opened
Jun 17, 2025 -
[CUTLASS] [CUDA] SM100 GroupMM
#156203 opened
Jun 17, 2025 -
[DCP] OSS Zero Overhead Checkpointing Implementation
#156207 opened
Jun 17, 2025 -
[CI] Add prebuild command option, set prebuild command option for CI to build flash attention
#156236 opened
Jun 17, 2025 -
Fix constant folding pass for mutable buffer
#156239 opened
Jun 17, 2025 -
Fix `aten::index_put` args Dtensor type mismatch
#156240 opened
Jun 17, 2025 -
[list] Implement `list.remove`
#156242 opened
Jun 17, 2025 -
Extract CPU log_softmax kernels to header
#156243 opened
Jun 17, 2025 -
[EZ/Profiler] Change 'b' to 'B' in FunctionEvent Frontend
#156250 opened
Jun 17, 2025 -
[dynamo] fix some cross-graph-break refleaks in eval_frame
#156252 opened
Jun 17, 2025 -
[Test] Kineto Submodule Update
#156253 opened
Jun 17, 2025 -
Add size_hints_or_throw
#156255 opened
Jun 17, 2025 -
Consolidate stack trace in Tracer
#156257 opened
Jun 18, 2025 -
[invoke_subgraph] make same subgraph share get_attr target
#156260 opened
Jun 18, 2025 -
Convert quantization.rst to markdown
#156266 opened
Jun 18, 2025 -
Add fallback-aware device checking for MPS operations
#156267 opened
Jun 18, 2025 -
[Inductor][CPP backend] Optimize parallel depth algorithm [Don't merge]
#156268 opened
Jun 18, 2025 -
Implement list.__add__ and list.__iadd__
#156270 opened
Jun 18, 2025 -
[list] Add list.__mul__ and list.__imul__
#156271 opened
Jun 18, 2025 -
[Intel GPU] Enable training for SDPA XPU [WIP]
#156272 opened
Jun 18, 2025 -
[inductor] split out triton templates
#156276 opened
Jun 18, 2025 -
[inductor][tma template] subclass workspace arg for choice
#156277 opened
Jun 18, 2025 -
[inductor] add KernelTemplateParams
#156278 opened
Jun 18, 2025 -
[inductor] introduce kernel_inputs
#156279 opened
Jun 18, 2025 -
[inductor][1/2] break out TritonTemplate, TritonTemplateKernel, TritonTemplateCaller out of select_algorithm.py
#156280 opened
Jun 18, 2025 -
[inductor][2/2] break out TritonTemplate, TritonTemplateKernel, TritonTemplateCaller out of select_algorithm.py
#156281 opened
Jun 18, 2025 -
[inductor] heuristics based on kernel templates
#156282 opened
Jun 18, 2025 -
Introduce sync_cross_rank_decision
#156287 opened
Jun 18, 2025 -
[inductor] KernelTemplates report their own KernelParams
#156292 opened
Jun 18, 2025 -
Add cascade sum support for Inductor CPP backend
#156296 opened
Jun 18, 2025 -
FlexAttn config refactor + ROCm optimisations
#156307 opened
Jun 18, 2025 -
[BE][1/16] fix typos in torch/
#156311 opened
Jun 18, 2025 -
[BE][2/16] fix typos in torch/ (torch/_*/)
#156312 opened
Jun 18, 2025 -
[BE][3/16] fix typos in torch/ (torch/_inductor/)
#156313 opened
Jun 18, 2025 -
[BE][4/16] fix typos in torch/ (torch/_dynamo/)
#156314 opened
Jun 18, 2025 -
[BE][5/16] fix typos in torch/ (torch/distributed/)
#156315 opened
Jun 18, 2025 -
[BE][6/16] fix typos in torch/
#156316 opened
Jun 18, 2025 -
[BE][7/16] fix typos in torch/ (torch/csrc/)
#156317 opened
Jun 18, 2025 -
[BE][8/16] fix typos in torch/ (torch/csrc/jit/)
#156318 opened
Jun 18, 2025 -
[BE][9/16] fix typos in torch/ (torch/csrc/)
#156319 opened
Jun 18, 2025 -
[BE][10/16] fix typos in torch/ (torch/csrc/jit/)
#156320 opened
Jun 18, 2025 -
[BE][11/16] fix typos in torch/ (torch/csrc/distributed/)
#156321 opened
Jun 18, 2025 -
Address richard's comments on libtorch_stable_abi note
#156324 opened
Jun 18, 2025 -
[TEST] DO not commit
#156326 opened
Jun 18, 2025 -
Migrate c10/macros/cmake_macros.h.in
#156329 opened
Jun 18, 2025 -
Fix native static dispatch kernels
#156331 opened
Jun 18, 2025 -
Validate custom op support for compile_kernel
#156332 opened
Jun 18, 2025 -
[testing] test/run_test.py: Only shutdown pool if it was created
#156333 opened
Jun 18, 2025 -
Storage: add_delete_hook for deregistration
#156338 opened
Jun 18, 2025 -
[list] Add list.__delitem__
#156339 opened
Jun 18, 2025 -
Add private API to modify the tags for a custom operator
#156343 opened
Jun 18, 2025 -
[inductor] set config.min_num_split by default
#156345 opened
Jun 18, 2025 -
[invoke_subgraph] make collect_meta_analysis fake prop cachable
#156347 opened
Jun 18, 2025 -
Add User defined subclass handling to funcitonalize impl
#156349 opened
Jun 18, 2025 8000 p> -
Build FBGEMM GenAI as part of PyTorch
#156355 opened
Jun 18, 2025 -
[wip]
#156356 opened
Jun 18, 2025 -
[BE] comments + try to get rid of secondary `make_autotune_fn`
#156358 opened
Jun 18, 2025 -
[Codemod][Folly target clean up] 28
#156365 opened
Jun 18, 2025 -
[Codemod][Folly target clean up] 22
#156366 opened
Jun 18, 2025 -
[iter] Update some of the tests to not call pickle
#156369 opened
Jun 18, 2025 -
[iter] exhaust `ListIterator` when `unpack_var_sequence` is called
#156370 opened
Jun 18, 2025 -
[iter] Add support for sequence protocol in `iter(..)`
#156371 opened
Jun 18, 2025 -
Add macos26 beta test runner
#156372 opened
Jun 18, 2025 -
[TSAN][live speech translation] Fix A data race in caffe2
#156378 opened
Jun 18, 2025 -
cub and compile_kernel composition
#156380 opened
Jun 19, 2025 -
Prevent cudaStreamSync when indexing GPU tensors with boolean CPU mask
#156384 opened
Jun 19, 2025 -
[InductorBench] Fix accuracy validation logic for MPS
#156385 opened
Jun 19, 2025 -
[Inductor][CPP] Enable WOQ int4 concat linear
#156387 opened
Jun 19, 2025 -
Improve All to All Perf for inter-node use-case (#156376)
#156389 opened
Jun 19, 2025 -
Bump urllib3 from 2.2.2 to 2.5.0 in /tools/build/bazel
#156390 opened
Jun 19, 2025 -
Use CUDA::cusolver targets
#156391 opened
Jun 19, 2025 -
Use CMake wholearchive group
#156393 opened
Jun 19, 2025 -
[Inductor][CPP] Enable a config to use a small dequant buffer for woq int4
#156395 opened
Jun 19, 2025 -
Use CUDA::cupti target
#156396 opened
Jun 19, 2025 -
Added index 0 for ROCR_VISIBLE_DEVICES
#156398 opened
Jun 19, 2025 -
[OpenReg][1/N] Migrate cpp_extensions_open_device_registration to OpenReg
#156400 opened
Jun 19, 2025 -
[OpenReg][2/N] Migrate cpp_extensions_open_device_registration to OpenReg
#156401 opened
Jun 19, 2025 -
[ez] fix typo in comment
#156402 opened
Jun 19, 2025 -
[Codemod][Folly target clean up] 28 [A]
#156403 opened
Jun 19, 2025 -
[DONT MERGE][TESTING][2/2] test new xpu runner
#156410 opened
Jun 19, 2025 -
Fix storage_offset preservation in clone_preserve_strides
#156415 opened
Jun 19, 2025 -
[iter] support `iter(callable, sentinel)`
#156416 opened
Jun 19, 2025 -
Change t.is_cuda to t.device.type == 'cuda' in torch/utils/viz
#156418 opened
Jun 19, 2025 -
[cc][multi-kernel] attempt 1
#156421 opened
Jun 19, 2025 -
[dm][multi-kernel] attempt 1
#156422 opened
Jun 19, 2025 -
[dm][mk] attempt 2
#156423 opened
Jun 19, 2025 -
[cc][multi-kernel] attempt 2
#156427 opened
Jun 19, 2025 -
[br][mk] attempt 1
#156428 opened
Jun 19, 2025 -
[precompile] Detect source code changes for save/load.
#156432 opened
Jun 19, 2025 -
[dynamo] show frame information when recompilation is triggered on fail_on_recompile
#156433 opened
Jun 19, 2025 -
use cmake target torch instead of ${TORCH_LIBRARIES} in cpp installation docs
#156435 opened
Jun 19, 2025 -
[cc][multi-kernel] attempt 3
#156439 opened
Jun 19, 2025 -
[invoke_subgraph] Add config flag to control support of input mutation
#156450 opened
Jun 19, 2025 -
[cc][multi-kernel] attempt 4
#156452 opened
Jun 19, 2025 -
[WIP]Fallback to CPU for XPU FP64
#156456 opened
Jun 19, 2025 -
Fixes issue #156414: Fixes bug in implementation of _combine_histograms.
#156457 opened
Jun 20, 2025 -
wip Updates to scaled_mm code
#156458 opened
Jun 20, 2025 -
[iter] Wrap iter(..) call in a ObjectIteratorVariable
#156460 opened
Jun 20, 2025 -
[dynamo] fixes to lru_cache message and adding user stack trace in debug mode
#156463 opened
Jun 20, 2025 -
[inductor] select_algorithm: add preprocessing fns
#156464 opened
Jun 20, 2025 -
[torchbench] update environment setup script
#156465 opened
Jun 20, 2025 -
WIP: Add `max_pool3d` for MPS
#156467 opened
Jun 20, 2025 -
Debug PR, no need to review
#156468 opened
Jun 20, 2025 -
Docs/update contributing rebase tip
#156469 opened
Jun 20, 2025 -
kernel arg munging attempt
#156470 opened
Jun 20, 2025 -
[wip][inductor] add kernel choice
#156477 opened
Jun 20, 2025 -
[Codemod][Folly target clean up] 22 [B]
#156478 opened
Jun 20, 2025 -
[ROCm][Windows] Fixing undefined symbol linker error after exposing MIOpen symbols
#156479 opened
Jun 20, 2025 -
[WIP] Add device_id to XPU device properties
#156481 opened
Jun 20, 2025 -
Fix torch.onnx.export parameter for onnx_shape_inference (#156480)
#156483 opened
Jun 20, 2025 -
[Profiler] Fix profile_all_threads in debug build
#156484 opened
Jun 20, 2025 -
Add regression test for UnicodeDecodeError in torch.compile with extreme values
#156485 opened
Jun 20, 2025 -
[ROCm][Windows] Skip using rocm-core on Windows case
#156486 opened
Jun 20, 2025 -
[DO NOT MERGE] Update trunk.yml to change the runner that the job runs-on
#156491 opened
Jun 20, 2025 -
[INIT DRAFT] setting up the build for torch/standalone
#156492 opened
Jun 20, 2025 -
Fix type annotations for dim parameter in torch.amin and torch.amax
#156493 opened
Jun 20, 2025 -
[MPS] Optimize cumsum/cumprod metal kernels
#156494 opened
Jun 20, 2025 -
cublaslt/hipblaslt persistent workspace
#156495 opened
Jun 20, 2025 -
add test_batchnorn_2D and 3D tests
#156498 opened
Jun 20, 2025 -
[ROCm] Bump AOTriton to 0.10b
#156499 opened
Jun 20, 2025 -
[Inductor] Fix epilogue fusion decision with 1 Triton caller as choice
#156500 opened
Jun 20, 2025 -
[MTIA Aten Backend] Migrate maximum.out / minimum.out / cos.out / erf.out / exp.out
#156502 opened
Jun 20, 2025 -
Organize BUCK for torch/standalone
#156503 opened
Jun 20, 2025 -
added stubs for jit tree views
#156504 opened
Jun 20, 2025 -
Fix dynamo benchmarks no dtype.__name__
#156505 opened
Jun 20, 2025 -
[nativert] Move PrimKernelRegistry to PyTorch core
#156506 opened
Jun 20, 2025 -
[nativert] Move HigherOrderKernel
#156507 opened
Jun 20, 2025 -
[nativert] move layout planner algorithms to libtorch
#156508 opened
Jun 20, 2025 -
[docs][typing] Document and type support for dim=None in torch.amin and torch.amax
#156510 opened
Jun 20, 2025 -
python definitely_contiguous-> is_contiguous_or_false
#156515 opened
Jun 20, 2025 -
Unify dynamic shapes APIs naming 2 (expect_true and check) attempt2
#156518 opened
Jun 20, 2025 -
[aoti] Check longlong upperbound for codegening input size check
#156522 opened
Jun 20, 2025 -
remove gso from set_storage_meta__symint
#156525 opened
Jun 20, 2025 -
[Inductor][CPP] Fix perf regression of functorch_maml_omniglot
#156526 opened
Jun 21, 2025 -
[dynamo] fix segfault due to dangling CacheEntry backend pointer
#156527 opened
Jun 21, 2025 -
[dynamo] Guard eagerly on list objects to avoid guard on getitem index
#156531 opened
Jun 21, 2025 -
Add RoPE (Rotary Positional Embedding) to PyTorch core
#156532 opened
Jun 21, 2025 -
[inductor] Quiesce Triton compile worker pool by default in OSS
#156534 opened
Jun 21, 2025 -
remove allow-untyped-defs from c10d_rendezvous_backend.py
#156536 opened
Jun 21, 2025 -
remove allow-untyped-defs from torch/ao/nn/sparse/quantized/linear.py
#156537 opened
Jun 21, 2025 -
remove allow-untyped-defs from torch/fx/passes/utils/fuser_utils.py
#156538 opened
Jun 21, 2025 -
[MTIA Aten Backend] Migrate _log_softmax.out / _log_softmax_backward_data.out
#156539 opened
Jun 21, 2025 -
avoid to declare an unknown bound array without any element
#156543 opened
Jun 21, 2025 -
Enable target-determination (TD) for ROCm CI
#156545 opened
Jun 21, 2025 -
[ddp] improve c++ reducer bucketing readability
#156550 opened
Jun 21, 2025 -
[CUDAGraph] add config `cudagraph_capture_sizes`
#156551 opened
Jun 21, 2025 -
Add fx_graph_runnable tests boilerplate
#156552 opened
Jun 21, 2025 -
[MTIA Aten Backend] Migrate isnan
#156554 opened
Jun 22, 2025
145 Issues closed by 41 people
-
Is it possible to serialize a torch.cuda.CUDAGraph into disk or CPU memory
#125820 closed
Jun 22, 2025 -
`scaled_dot_product_attention` backwards: illegal memory access with large inputs
#150054 closed
Jun 21, 2025 -
Using `opset_version = 22` in `torch.onnx.export` with `dynamo=True` includes dropout nodes in the model
#156542 closed
Jun 21, 2025 -
`torch.distributed.pipelining.pipeline` error when initializing on meta device
#156541 closed
Jun 21, 2025 -
Add runtime profiler info for AOTDispatcher prologue
#155721 closed
Jun 21, 2025 -
UNSTABLE inductor-rocm-mi300 / rocm-py3.10-inductor-mi300 / test (inductor)
#154884 closed
Jun 21, 2025 -
UNSTABLE rocm-mi300 / linux-jammy-rocm-py3.10-mi300 / test (default)
#156360 closed
Jun 21, 2025 -
Support input mutations + aliasing with scan during training
#156337 closed
Jun 20, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#151228 closed
Jun 20, 2025 -
Add @markDynamoStrictTest to all TestCase
#115671 closed
Jun 20, 2025 -
Loss with LBFGS not going down
#156501 closed
Jun 20, 2025 -
Can we have Dim.AUTO/Dim.DYNAMIC with an optional min & max?
#147483 closed
Jun 20, 2025 -
DTensor does not compose with Parameters Groups
#156453 closed
Jun 20, 2025 -
[ONNX] Support for grouped query attention
#151762 closed
Jun 20, 2025 -
[XPU] Support toggling profiler on/off for XPU.
#154898 closed
Jun 20, 2025 -
Sourceforge outage causing multiple CI failures
#108773 closed
Jun 20, 2025 -
pytorchbot erroneously thinks PR has already been merged as a different commit
#154427 closed
Jun 20, 2025 -
[ONNX] Inputs generated by onnx.export() with dynamo=False are not consistent with dynamo=True
#136179 closed
Jun 20, 2025 -
[Torch TO ONNX BUG] The right shift operation in torch is mapped as a division operation when converted to ONNX.
#139455 closed
Jun 20, 2025 -
[ONNX] 2.0 regression: dynamic shapes lost for an operator
#139463 closed
Jun 20, 2025 -
[ONNX] Document the registration API
#139499 closed
Jun 20, 2025 -
[ONNX] Run report_exportability when report=True
#139904 closed
Jun 20, 2025 -
Replace reduce(operator.mul) with math.prod for computing product of dimensions
#140888 closed
Jun 20, 2025 -
Exporting the operator 'aten::_transformer_encoder_layer_fwd' to ONNX opset version 17 is not supported
#144242 closed
Jun 20, 2025 -
Custom symbolic functions for ONNX export with None args causes SEGFAULT
#145261 closed
Jun 20, 2025 -
ONNX export failing when using `symbolic` functions and scripting
#146035 closed
Jun 20, 2025 -
Export HuggingFace mamba to ONNX
#146835 closed
Jun 20, 2025 -
[ONNX] BitwiseOr was generated for bool inputs (invalid)
#147854 closed
Jun 20, 2025 -
[ONNX] dynamic dims are not exported with the specified names
#148629 closed
Jun 20, 2025 -
[ONNX] How to export Llama4
#150891 closed
Jun 20, 2025 -
Exporting the operator 'aten::lift_fresh' to ONNX - not supported
#151932 closed
Jun 20, 2025 -
Exporting the operator 'aten::fft_fft2' to ONNX opset version 19 is not supported.
#153823 closed
Jun 20, 2025 -
[ONNX] Verify the translation of SDPA to Attention-23
#156105 closed
Jun 20, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float64 (__main__.TestForeachCUDA)
#151214 closed
Jun 20, 2025 -
DISABLED test_triton_template_generated_code_caching (__main__.TestMaxAutotune)
#154108 closed
Jun 20, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float32 (__main__.TestForeachCUDA)
#151136 closed
Jun 20, 2025 -
Inductor cpp_wrapper has performance regressions
#156037 closed
Jun 20, 2025 -
DISABLED test_export_opnames_interface (__main__.TestMisc)
#154986 closed
Jun 20, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float16 (__main__.TestForeachCUDA)
#151114 closed
Jun 19, 2025 -
_flash_attention_forward accuracy drop from CUDA to ROCM implementation.
#154582 closed
Jun 19, 2025 -
xpu: AOT compilation does not happen with sycl extension (JIT fallback happens)
#156249 closed
Jun 19, 2025 -
Cannot install pytorch through official pip guidance
#156413 closed
Jun 19, 2025 -
Tensors with no explicit references are possible not freed timely with torch.compile
#155778 closed
Jun 19, 2025 -
Support C shim for customized OP
#150988 closed
Jun 19, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex64 (__main__.TestForeachCUDA)
#151099 closed
Jun 19, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex128 (__main__.TestForeachCUDA)
#151093 closed
Jun 19, 2025 -
FSDP + save optimizer dtype AssertionError
#156166 closed
Jun 19, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#151054 closed
Jun 19, 2025 -
`max_entries` parameter of `torch.cuda.memory._record_memory_history()`
#129674 closed
Jun 19, 2025 -
Indexing beyond end of array on ROCm build
#155045 closed
Jun 18, 2025 -
[ued][kokoro] torch.compile fails in kokoro (both fullgraph=True and False)
#149570 closed
Jun 18, 2025 -
Actual torch `ExportGraphSignature` does not match the example in the docs
#156184 closed
Jun 18, 2025 -
Certain operations cause implicity sync-points
#12461 closed
Jun 18, 2025 -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_ 1E79 float64 (__main__.TestForeachCUDA)
#151019 closed
Jun 18, 2025 -
DISABLED test_comprehensive_pca_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139828 closed
Jun 18, 2025 -
DISABLED test_roi_align_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#103156 closed
Jun 18, 2025 -
NCCL init hits CUDA failure 'invalid argument' on 12.2 driver
#150852 closed
Jun 18, 2025 -
Schema version check fails in `torch.export.load`
#156354 closed
Jun 18, 2025 -
Windows Runners are not available on PyTorch CI/CD
#156352 closed
Jun 18, 2025 -
format_flamegraph failed to setup the script
#156309 closed
Jun 18, 2025 -
Cannot install >=2.7.0 on ubuntu 18.04, conflict with prerequisite
#156215 closed
Jun 18, 2025 -
[tracker] DTensor Operator Coverage
#156204 closed
Jun 18, 2025 -
Flip is much slower than advanced indexing
#16424 closed
Jun 18, 2025 -
Please implement the batching rule for torch.matrix_exp.
#115992 closed
Jun 18, 2025 -
Function 'MmBackward0' returned nan values in its 0th output.
#156015 closed
Jun 18, 2025 -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_float32 (__main__.TestForeachCUDA)
#151003 closed
Jun 18, 2025 -
Status of support for ROCm 6.4.1
#155292 closed
Jun 18, 2025 -
DISABLED test_matmul_layer_norm_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#151835 closed
Jun 18, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_float32 (__main__.TestForeachCUDA)
#150530 closed
Jun 18, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_float16 (__main__.TestForeachCUDA)
#150510 closed
Jun 18, 2025 -
[FR] Expose CUDAGraph handle to allow customized modification on the graph
#155106 closed
Jun 18, 2025 -
Have compiled autograd config API support nested compilation
#152219 closed
Jun 18, 2025 -
Convert to markdown: rpc.rst, signal.rst, size.rst, sparse.rst, special.rst
#155033 closed
Jun 18, 2025 -
DISABLED test_serialize_by_key (__main__.PrecompileContextTests)
#156146 closed
Jun 17, 2025 -
DISABLED test_randint_distribution_dynamic_shapes_xpu (__main__.DynamicShapesGPUTests)
#155692 closed
Jun 17, 2025 -
DISABLED test_randint_distribution_dynamic_shapes_xpu (__main__.DynamicShapesCodegenGPUTests)
#155689 closed
Jun 17, 2025 -
DISABLED test_basic (__main__.PrecompileContextTests)
#156063 closed
Jun 17, 2025 -
`torch.ops.aten.index_put` returns different results on CUDA and CPU
#156173 closed
Jun 17, 2025 -
DISABLED test_grad_with_manual_interleaved_ScheduleClass0_use_new_runtime_True (__main__.ScheduleTest)
#154373 closed
Jun 17, 2025 -
DISABLED test_grad_with_manual_interleaved_ScheduleClass1_use_new_runtime_False (__main__.ScheduleTest)
#154391 closed
Jun 17, 2025 -
DISABLED test_grad_with_manual_interleaved_ScheduleClass1_use_new_runtime_True (__main__.ScheduleTest)
#154408 closed
Jun 17, 2025 -
DISABLED test_grad_with_manual_interleaved_ScheduleClass2_use_new_runtime_False (__main__.ScheduleTest)
#154443 closed
Jun 17, 2025 -
torch.where() can produce nan values for unselected branch during backward
#156212 closed
Jun 17, 2025 -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_bool (__main__.TestForeachCUDA)
#150468 closed
Jun 17, 2025 -
Failure of iOS Build Test: Build (default, 1, 1, macos-14-xlarge, SIMULATOR, arm64)
#136284 closed
Jun 17, 2025 -
[Testing] multigpu tests are still running against CUDA-11
#154119 closed
Jun 17, 2025 -
ONNX Dynamo Export - Unsupported FX nodes: {'call_function': ['aten._upsample_bilinear2d_aa.default']}.
#128818 closed
Jun 17, 2025 -
torch.compile fails to trace methods decorated with @lru_cache
#155841 closed
Jun 17, 2025 -
[FDSP2] express zero-1 with fully_shard
#155952 closed
Jun 17, 2025 -
MPS cumsum failure for 5D tensor or above
#154881 closed
Jun 17, 2025 -
get different result between conv1x1 and linear
#156154 closed
Jun 17, 2025 -
[dynamo] Add support for torch.cuda.FloatTensor()
#130722 closed
Jun 17, 2025 -
Convert to markdown: linalg.rst, logging.rst, masked.rst, meta.rst, miscellaneous_environment_variables.rst
#155025 closed
Jun 17, 2025 -
A mistake in PyTorch Docs for nn.RNN
#129446 closed
Jun 17, 2025 -
When calling torch.histc the CPU and CUDA implementations produce different outputs.
#156019 closed
Jun 17, 2025 -
When calling torch.cumprod on a float16 tensor, the CPU and CUDA implementations produce different outputs.
#156018 closed
Jun 17, 2025 -
Extra onnx::Neg_2 input after torch.onnx.export
#148655 closed
Jun 17, 2025 -
RuntimeError: CUDA driver error: operation not supported with test_stream_write_value32 and cuStreamWriteValue32
#154073 closed
Jun 17, 2025 -
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 closed
Jun 17, 2025 -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int8 (__main__.TestForeachCUDA)
#150407 closed
Jun 17, 2025 -
[XPU] Upgrade the XPU support packages version to 2025.1 in CI/CD
#151097 closed
Jun 17, 2025 -
libtorch doesn't work with cuda 12.6 and 12.4
#132575 closed
Jun 17, 2025 -
DISABLED test_weight_norm_bwd_dynamic_shapes_cpu (__main__.DynamicShapesCodegenCpuTests)
#153803 closed
Jun 17, 2025 -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64 (__main__.TestForeachCUDA)
#150392 closed
Jun 17, 2025 -
DISABLED test_pattern_matcher_multi_user_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134433 closed
Jun 17, 2025 -
Update ONNX Opset Version to Support Attention Operator
#153611 closed
Jun 17, 2025 -
DISABLED test_weight_norm_bwd_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#141484 closed
Jun 17, 2025 -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool (__main__.TestForeachCUDA)
#150120 closed
Jun 17, 2025 -
[ONNX] Implement scan
#151327 closed
Jun 17, 2025 -
`TestCppExtensionOpenRgistration.test_base_device_registration` hangs during shutdown on MacOS
#155759 closed
Jun 16, 2025 -
ROCm: no HIP device available if device is already initialized
#152941 closed
Jun 16, 2025 -
Inductor CI failure due to Huggingface outage
#156113 closed
Jun 16, 2025 -
[Compiled_autograd] running nn.LayerNorm failed for torch.compile with compiled_autograd when deepspeed Zero3
#140091 closed
Jun 16, 2025 -
add x/0 gradient behaviour to documentation
#128796 closed
Jun 16, 2025 -
Stop special-casing einops in Dynamo
#142486 closed
Jun 16, 2025 -
None deterministic output of linear projection based on batch size and projection dimensions
#156084 closed
Jun 16, 2025 -
DISABLED test_tmp_not_defined_issue2_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#135219 closed
Jun 16, 2025 -
DISABLED test_grad_with_manual_interleaved_ScheduleClass2_use_new_runtime_True (__main__.ScheduleTest)
#154481 closed
Jun 16, 2025 -
DISABLED test_on_device_tma_store_old_api (__main__.MutationTests)
#155691 closed
Jun 16, 2025 -
torch.cuda.set_device(0) behaves differently from torch.cuda.set_device(1) in terms of cuda context
#155668 closed
Jun 16, 2025 -
DISABLED test_cache_hot_load_device_cuda_bfloat16_dynamic_False (__main__.AOTAutogradCacheTests)
#145334 closed
Jun 16, 2025 -
IInconsistent Error Handling in `torch.fused_moving_avg_obs_fake_quant` Between CPU and GPU Implementations
#153310 closed
Jun 16, 2025 -
DISABLED test_fake_registration (__main__.TestOpProfiles)
#151301 closed
Jun 16, 2025 -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32 (__main__.TestForeachCUDA)
#150350 closed
Jun 16, 2025 -
旧版pytorch标注python版本
#156038 closed
Jun 16, 2025 -
In the docs for torch.amax/amin the note about min/max gradient behavior is outdated
#155048 closed
Jun 15, 2025 -
[feature request]: Update max onnx opset to 21 for compatability
#127167 closed
Jun 15, 2025
91 Issues opened by 59 people
-
ConvertTritonGPUToLLVM pass fails on fused GroupNorm backward (SM 89) under torch.compile(…, backend='inductor')
#156549 opened
Jun 21, 2025 -
Inconsistent Model Results and Failures on Windows with CUDA vs. CPU PyTorch Builds
#156547 opened
Jun 21, 2025 -
`TorchScript` does not allow accessing methods of nested tensors
#156544 opened
Jun 21, 2025 -
FSDP2 - Tensor incompatibility
#156535 opened
Jun 21, 2025 -
`<<` and `>>` operators seem silently broken for DTensor operand 1 and scalar operand 2
#156533 opened
Jun 21, 2025 -
[ONNX] Update tests for attention
#156524 opened
Jun 20, 2025 -
[Dtensor] handle dtensor ops that only need to operate on certain shard without all_gather first.
#156523 opened
Jun 20, 2025 -
UNSTABLE inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf)
#156521 opened
Jun 20, 2025 -
[user empathy] compile for `transformers` model
#156520 opened
Jun 20, 2025 -
Convenient way to create device with torch.accelerator and a specific device index
#156519 opened
Jun 20, 2025 -
DISABLED test_comprehensive_nn_functional_linear_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#156514 opened
Jun 20, 2025 -
Native BFloat16 Mixed BatchNorm Train gives incorrect gradients
#156513 opened
Jun 20, 2025 -
functorch_maml_omniglot is a bad CPU performance smoketest model
#156511 opened
Jun 20, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int32 (__main__.TestForeachCUDA)
#156497 opened
Jun 20, 2025 -
mypy.ini deprecation: numpy.typing.mypy_plugin
#156489 opened
Jun 20, 2025 -
Add stub for mypy-torch._C._jit_tree_views
#156488 opened
Jun 20, 2025 -
SDPA FLASH_ATTENTION backend gets NaN values for IPEX on Intel CPU
#156487 opened
Jun 20, 2025 -
Dynamo benchmark test got failed torch.dtype object has no attribute '__name__'
#156482 opened
Jun 20, 2025 -
`torch.compile` fails with `UnicodeDecodeError` when model contains extreme value injection
#156451 opened
Jun 19, 2025 -
Tensor.is_pinned() raises error after renaming privateuseone backend.
#156444 opened
Jun 19, 2025 -
DISABLED test_inlined_optimized_graph (__main__.TestTEFuserDynamic)
#156438 opened
Jun 19, 2025 -
Accuracy minifier fails to minify anything
#156437 opened
Jun 19, 2025 -
DISABLED test_skip_grad_in_check (__main__.TestTEFuserDynamic)
#156436 opened
Jun 19, 2025 -
Suggest to use the torch cmake target instead of ${TORCH_LIBRARIES} in the c++ docs
#156434 opened
Jun 19, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_int16 (__main__.TestForeachCUDA)
#156430 opened
Jun 19, 2025 -
[MPSInductor] Silently incorrect result with varmean+epilogue
#156426 opened
Jun 19, 2025 -
[CI] "Update viable/strict" job occasionally hangs for days
#156425 opened
Jun 19, 2025 -
research adding cuda-bindings to core
#156424 opened
Jun 19, 2025 -
DISABLED test_fake_crossref_backward_no_amp_cholesky_solve_cuda_float32 (__main__.TestFakeTensorCUDA)
#156419 opened
Jun 19, 2025 -
ShardedTensor breaks cycle detection
#156417 opened
Jun 19, 2025 -
A possible bug in HistogramObserver._combine_histograms()
#156414 opened
Jun 19, 2025 -
bmm max-autotune segfaults on x86 cpu
#156412 opened
Jun 19, 2025 -
[compile] torch._dynamo.exc.TorchRuntimeError: Failed running call_function aten.lift_fresh_copy.default
#156411 opened
Jun 19, 2025 -
DataParallel gather NaN across multiple gpus
#156392 opened
Jun 19, 2025 -
[compile][transformers] Recompilation with mark_static_address with cudagraphs
#156377 opened
Jun 18, 2025 -
DISABLED test_inplace_on_view_undefined_grad_output_cpu (__main__.TestAutogradDeviceTypeCPU)
#156363 opened
Jun 18, 2025 -
DISABLED test_schedule_with_native_zero_bubble_ScheduleClass1 (__main__.ScheduleTest)
#156328 opened
Jun 18, 2025 -
UNSTABLE periodic / linux-jammy-rocm-py3.10 / test (distributed)
#156327 opened
Jun 18, 2025 -
Provide a way to allow dynamo to trace into an operator defined with `torch.library.custom_op`
#156322 opened
Jun 18, 2025 -
Composition of nested `torch.compile` calls is not well defined
#156308 opened
Jun 18, 2025 -
DISABLED test_inplace_on_view_then_no_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156306 opened
Jun 18, 2025 -
Inductor error with Torch XPU optimizations to StableDiffusion3 Pipeline
#156303 opened
Jun 18, 2025 -
DISABLED test_inplace_on_view_of_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156289 opened
Jun 18, 2025 -
DISABLED test_inplace_on_view_non_contig_cpu (__main__.TestAutogradDeviceTypeCPU)
#156265 opened
Jun 18, 2025 -
DISABLED test_shape_env (__main__.TestGuardSerialization)
#156264 opened
Jun 18, 2025 -
torch._foreach_copy_ causing CUDA illegal memory access.
#156261 opened
Jun 18, 2025 -
[ONNX] Create a tutorial for exporting hf transformers model
#156258 opened
Jun 18, 2025 -
[dynamic shapes] translation validation failure under `fake_tensor_propagate_real_tensors`
#156251 opened
Jun 17, 2025 -
DISABLED test_name_match (__main__.TestGuardSerialization)
#156246 opened
Jun 17, 2025 -
Upgrade torch._scaled_grouped_mm to SM100+
#156238 opened
Jun 17, 2025 -
[Tracker] AutoParallel's feature request to DTensor
#156217 opened
Jun 17, 2025 -
DISABLED test_inplace_on_view_makes_base_require_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156209 opened
Jun 17, 2025 -
When compiling submodules, AOTInductor is significantly slower with torch.export
#156206 opened
Jun 17, 2025 -
Upgrade torch._grouped_mm to SM100+
#156202 opened
Jun 17, 2025 -
Dynamo does not know how to trace method `__len__` of class `<unknown type>` with torch.logging calls
#156191 opened
Jun 17, 2025 -
[CD] Windows Wheel builds CUDA 12.9.1 Stack Overflow during build
#156181 opened
Jun 17, 2025 -
DISABLED test_inplace_on_view_backprop_view_of_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156180 opened
Jun 17, 2025 -
nn.Module._load_from_state_dict is always called with strict=True
#156177 opened
Jun 17, 2025 -
[Feature Request]: Native C++ API for ONNX Export in LibTorch
#156168 opened
Jun 17, 2025 -
DISABLED test_inplace_on_view_backprop_view_cpu (__main__.TestAutogradDeviceTypeCPU)
#156163 opened
Jun 17, 2025 -
torch2.7.1 issue for torch.compile numpy
#156162 opened
Jun 17, 2025 -
[SDPA] RTX5080 is different from CPU calculation result in backward with long seq
#156160 opened
Jun 17, 2025 -
Error shm.dll
#156159 opened
Jun 17, 2025 -
Inconsistent `torch.rsqrt` results on complex128 between CPU and CUDA
#156152 opened
Jun 17, 2025 -
DISABLED test_inplace_on_view_backprop_base_cpu (__main__.TestAutogradDeviceTypeCPU)
#156143 opened
Jun 17, 2025 -
[dynamo, dynamic shapes] .item() on Tensor created in the compiled region fails
#156135 opened
Jun 16, 2025 -
[DDP][FSDP2] add unit test to showcase DDP mixed precision with FSDP2 mixed precision
#156130 opened
Jun 16, 2025 -
[dynamo] Show carets in graph break stack traces
#156127 opened
Jun 16, 2025 -
[compile][torchtune] Full model compiled Qwen3 is 4x slower than eager
#156103 opened
Jun 16, 2025 -
UNSTABLE rocm / linux-jammy-rocm-py3.10 / test (default)
#156098 opened
Jun 16, 2025 -
DISABLED test_quantize (__main__.TestOpenReg)
#156089 opened
Jun 16, 2025 -
DISABLED test_schedule_with_native_zero_bubble_ScheduleClass0 (__main__.ScheduleTest)
#156088 opened
10000 Jun 16, 2025 -
Idea: Add SBOM Generation (and optional vuln scan) for better supply chain insight
#156085 opened
Jun 16, 2025 -
`torch.logsumexp`: support `dim=None`
#156075 opened
Jun 16, 2025 -
[codespell] fix typos in the codebase
#156073 opened
Jun 16, 2025 -
[typing][docs] `torch.amin` and `torch.amax` do not document `dim=None`
#156072 opened
Jun 16, 2025 -
Docs incorrectly claim `torch.max` and `torch.logsumexp` accept `dim=None`
#156071 opened
Jun 16, 2025 -
torch.nn.functional.conv_transpose3d return inconsistent results when weight containing inf between CPU and GPU
#156062 opened
Jun 16, 2025 -
Dynamo trace an incorrect result on torch._C._storage_Use_Count
#156059 opened
Jun 16, 2025 -
torch.equal causes fallback to eager mode in torch.compile
#156057 opened
Jun 16, 2025 -
`torch.distributed.tensor.parallel.style.ColwiseParallel` introduce huge guard eval latency
#156054 opened
Jun 16, 2025 -
Ability to set device guard in Python
#156052 opened
Jun 16, 2025 -
[RFC] Migrate to modern Python build system and replace `setup.py` commands with their modern alternatives
#156029 opened
Jun 15, 2025 -
[Upstream Triton] persistent mm + tma accuracy failures
#156028 opened
Jun 15, 2025 -
Ошибка установки torch для CUDA 12.1 на GTX 1660 Ti
#156024 opened
Jun 15, 2025 -
torch.fft.ifft for complex64 produces inconsistent results between CPU and CUDA
#156020 opened
Jun 15, 2025 -
[ROCm] BF16 Context Parallelism MI300X Not Numerically Accurate
#156012 opened
Jun 15, 2025
443 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[build] modernize build-backend: `setuptools.build_meta:__legacy__` -> `setuptools.build_meta`
#155998 commented on
Jun 21, 2025 • 19 new comments -
[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm
#155357 commented on
Jun 19, 2025 • 19 new comments -
[DLPack] Add support for missing keyword-arguments.
#150218 commented on
Jun 21, 2025 • 10 new comments -
[list] Implement list.count
#153969 commented on
Jun 21, 2025 • 8 new comments -
[BE][Ez]: Use ruff type inference to autotype parts of dynamo
#156001 commented on
Jun 17, 2025 • 7 new comments -
[hop] support torch.func.functional_call in hop subgraph
#155886 commented on
Jun 20, 2025 • 7 new comments -
Add DeviceAllocator as the base device allocator
#138222 commented on
Jun 20, 2025 • 6 new comments -
[cpp wrapper] add AOTI shim for collective ops
#154492 commented on
Jun 20, 2025 • 6 new comments -
[doc] Updates to distributed.md for XCCL backend
#155834 commented on
Jun 20, 2025 • 6 new comments -
fix: 155029 convert rst to md
#155554 commented on
Jun 19, 2025 • 6 new comments -
Fix issue with set_reduce_scatter_divide_factor errors and MixedPrecisionPolicy
#155964 commented on
Jun 20, 2025 • 6 new comments -
[PowerPC] Fixed build issue for vsx vec256 complexfloat and scaled_mm_out_cpu
#155255 commented on
Jun 21, 2025 • 5 new comments -
Fix clang-tidy bugprone* warnings
#148529 commented on
Jun 21, 2025 • 5 new comments -
unify dynamic shapes API namings 3 (guard_int, guard_int_seq)
#155973 commented on
Jun 19, 2025 • 5 new comments -
Optionally avoid `record_streams` in autograd with `TORCH_AUTOGRAD_AVOID_RECORD_STREAMS=1`
#155857 commented on
Jun 16, 2025 • 5 new comments -
Fused RMSNorm implementation
#153666 commented on
Jun 21, 2025 • 4 new comments -
Enable Leak Sanitizer
#154584 commented on
Jun 22, 2025 • 4 new comments -
Support --inplace flag for tools/nightly.py
#155419 commented on
Jun 20, 2025 • 4 new comments -
New Sampler: DistributedWeightedRandomSampler
#150182 commented on
Jun 21, 2025 • 4 new comments -
Fix cudagraph record_stream memory leak
#155658 commented on
Jun 17, 2025 • 4 new comments -
Convert to markdown: jit_python_reference.rst, jit_unsupported.rst, jit_utils.rst, library.rst
#155404 commented on
Jun 19, 2025 • 3 new comments -
Fix: fallback in deserialize_torch_artifact for ScriptObject using weights_only=FalseFix: fallback in deserialize_torch_artifact for ScriptObject using we…
#154333 commented on
Jun 21, 2025 • 3 new comments -
DOC: update CrossEntropyLoss with note and example of incorrect target specification
#155649 commented on
Jun 18, 2025 • 3 new comments -
NUMA Binding Integration with torchrun
#149334 commented on
Jun 16, 2025 • 3 new comments -
Compute contiguity symbolically to avoid dde, and introduce c++ is_contiguous_or_false.
#155590 commented on
Jun 21, 2025 • 3 new comments -
distributions/constraints type annotations + public classes + some refactoring
#154827 commented on
Jun 16, 2025 • 3 new comments -
Convert sparse rst to md
#155438 commented on
Jun 18, 2025 • 3 new comments -
Add unified memory APIs for torch.accelerator
#152932 commented on
Jun 20, 2025 • 3 new comments -
[Inductor] Fix output discrepancy between Inductor and eager of mean with input of a large size tensor
#155428 commented on
Jun 17, 2025 • 3 new comments -
docs: fix dead link in torch.compile docs
#152734 commented on
Jun 17, 2025 • 3 new comments -
[WIP] Port dynamo test cases for xpu backend.
#155524 commented on
Jun 20, 2025 • 3 new comments -
[aotd] Support mutations of the same input in fw and bw
#155354 commented on
Jun 20, 2025 • 2 new comments -
add sfdp pattern
#155792 commented on
Jun 17, 2025 • 2 new comments -
Add UT for torch.accelerator memory-related API
#155200 commented on
Jun 20, 2025 • 2 new comments -
[Reland] [Intel GPU] Make SDPA output has the same stride as Query.
#154340 commented on
Jun 20, 2025 • 2 new comments -
[2/n] rewrite load balancing and sharding in context parallel
#155442 commented on
Jun 18, 2025 • 2 new comments -
[ONNX] Don't link to third-party protobuf
#153920 commented on
Jun 20, 2025 • 2 new comments -
Default USE_PRIORITIZED_TEXT_FOR_LD=1 on Linux aarch64 via setup.py
#155901 commented on
Jun 20, 2025 • 2 new comments -
[Set] Support sets in VariableBuilder
#153150 commented on
Jun 21, 2025 • 2 new comments -
[export] add serialized_artifact test
#152739 commented on
Jun 16, 2025 • 2 new comments -
Build libgomp (gcc-11) from src on AArch64
#152361 commented on
Jun 20, 2025 • 2 new comments -
Deprecated pkg_resources and use distributions instead
#151915 commented on
Jun 20, 2025 • 2 new comments -
Refactor cpp codegen to support overridable class attributes.
#155553 commented on
Jun 19, 2025 • 2 new comments -
Fix clang-tidy warnings of performance from uncovered files
#144542 commented on
Jun 19, 2025 • 2 new comments -
Upgrade to DLPack 1.0.
#145000 commented on
Jun 20, 2025 • 2 new comments -
Update OpenBLAS commit
#151547 commented on
Jun 20, 2025 • 2 new comments -
XCCL changes for DDP
#155497 commented on
Jun 21, 2025 • 1 new comment -
[DRAFT] Evaluate feasability of using FunctionalTensor for Example Value
#155606 commented on
Jun 19, 2025 • 1 new comment -
Add serialized_type_name to torch.return_types.* so we can dump them
#155245 commented on
Jun 19, 2025 • 1 new comment -
[docs] Decorator to create a deprecation warning
#155127 commented on
Jun 17, 2025 • 1 new comment -
[ca] cpp tensor pre hooks
#155082 commented on
Jun 19, 2025 • 1 new comment -
Support deterministic upsample trilinear backward
#154239 commented on
Jun 20, 2025 • 1 new comment -
[CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check
#153272 commented on
Jun 16, 2025 • 1 new comment -
Enable the AMP precision with freezing for CPU nightly test
#152298 commented on
Jun 20, 2025 • 1 new comment -
[reland][ROCm] remove caffe2 from hipify
#151845 commented on
Jun 19, 2025 • 1 new comment -
Deprecate DataLoader pin_memory_device param
#146821 commented on
Jun 20, 2025 • 1 new comment -
ROCm OCP Micro-scaling Format (mx-fp8/mx-fp4) Support
B93C #151360 commented on
Jun 20, 2025 • 1 new comment -
Add CPython string tests
#150793 commented on
Jun 17, 2025 • 1 new comment -
Refactor CUDAAllocatorConfig to reuse AllocatorConfig
#150312 commented on
Jun 18, 2025 • 1 new comment -
[nn.utils] scale_grad_ with for_each
#150033 commented on
Jun 21, 2025 • 1 new comment -
[dynamo] Support builtin bool on non-constant VTs
#155863 commented on
Jun 19, 2025 • 1 new comment -
Overload `mul_overflows` for `size_t`
#155736 commented on
Jun 22, 2025 • 1 new comment -
[CI] Remove conda from from windows
#155731 commented on
Jun 17, 2025 • 1 new comment -
Add Windows CUDA 12.9.1 build
#155748 commented on
Jun 20, 2025 • 1 new comment -
Fix argument validation for torch.nn.attention.sdpa_kernel
#155922 commented on
Jun 17, 2025 • 1 new comment -
Allow as_tensor to retain grad info
#156006 commented on
Jun 17, 2025 • 1 new comment -
Fixed NLLLoss 1D input crash with torch.compile
#155672 commented on
Jun 18, 2025 • 1 new comment -
Add BufferDict works like ParameterDict
#151870 commented on
Jun 21, 2025 • 0 new comments -
distributed: add distributed P2P TensorQueue and TensorStore
#151631 commented on
Jun 17, 2025 • 0 new comments -
[draft export] normalize sympy expressions for data-dependent counting
#151856 commented on
Jun 21, 2025 • 0 new comments -
Add assert_on_assumption on to guard_or_true, and guard_or_false
#151854 commented on
Jun 21, 2025 • 0 new comments -
[demo] Verify test runner integration
#151645 commented on
Jun 21, 2025 • 0 new comments -
Fix stride comparison max(512 - s, 1) vs. (512 - s)
#155938 commented on
Jun 21, 2025 • 0 new comments -
Deduplicate library deletion
#151795 commented on
Jun 20, 2025 • 0 new comments -
[ca] mark some sparse tests fixed by AccumulateGrad functionalization
#155948 commented on
Jun 17, 2025 • 0 new comments -
enable windows inductor UT in CI
#151777 commented on
Jun 21, 2025 • 0 new comments -
[MPS] Implement upsample_nearest3d_vec operator
#151760 commented on
Jun 22, 2025 • 0 new comments -
don't do a full deserialize on every file
#155942 commented on
Jun 16, 2025 • 0 new comments -
torch.testing._internal.optests - MPS Support
#151758 commented on
Jun 21, 2025 • 0 new comments -
Refactor duplicate code into a utility function in pytorch/torch/nn/functional.py
#151752 commented on
Jun 20, 2025 • 0 new comments -
Update __init__.py
#151751 commented on
Jun 22, 2025 • 0 new comments -
[WIP][draft_export] suppress pending unbacked for divisibility symbol
#151718 commented on
Jun 19, 2025 • 0 new comments -
Remove unnecessary recompile
#151711 commented on
Jun 18, 2025 • 0 new comments -
Update link to NVIDIA cuDNN Support Matrix
#151647 commented on
Jun 19, 2025 • 0 new comments -
[aot] Set config partitioner recompute_views True by default
#151676 commented on
Jun 17, 2025 • 0 new comments -
test without rocblas conv when using cudagraphs
#155902 commented on
Jun 18, 2025 • 0 new comments -
Mitigate upcoming removal of direct invocation of setup.py support
#155910 commented on
Jun 20, 2025 • 0 new comments -
[FrozenSet] Fixes for FrozenSet
#152991 commented on
Jun 21, 2025 • 0 new comments -
[ROCm] Ck gemm architecture guard
#152951 commented on
Jun 17, 2025 • 0 new comments -
[WIP] Automatic load/save
#155913 commented on
Jun 18, 2025 • 0 new comments -
[dynamo] Add `-> bool` to functions named `is_*` or `_is_*`
#155923 commented on
Jun 20, 2025 • 0 new comments -
[ROCm] Initial AITER Integration for mha_bwd asm kernels
#152630 commented on
Jun 17, 2025 • 0 new comments -
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on
Jun 18, 2025 • 0 new comments -
Implemented `Size.__radd__`
#152554 commented on
Jun 16, 2025 • 0 new comments -
complex.pow(2) on GPU by replacing with complex * complex to avoid numerical instability
#152373 commented on
Jun 21, 2025 • 0 new comments -
[inductor] Add `-> bool` to functions named `is_*` or `_is_*`
#155928 commented on
Jun 17, 2025 • 0 new comments -
updated matplotlib version in docs requirements
#155931 commented on
Jun 20, 2025 • 0 new comments -
Switch to standard pep517 sdist generation
#152098 commented on
Jun 20, 2025 • 0 new comments -
Test
#152055 commented on
Jun 19, 2025 • 0 new comments -
[inductor][profiler] lazily import things in standalone_compile
#151956 commented on
Jun 22, 2025 • 0 new comments -
add tlpare logs
#151948 commented on
Jun 22, 2025 • 0 new comments -
[profiler] use inspect.getattr_static to avoid importing inductor
#151946 commented on
Jun 22, 2025 • 0 new comments -
[WIP][dynamic shapes] whitelist at dim-level
#151941 commented on
Jun 22, 2025 • 0 new comments -
[Observability][Optimus] Fix the tlparse name
#151935 commented on
Jun 22, 2025 • 0 new comments -
[export] add _union_dataclass to support comparing dataclasses that inherits from union.
#155932 commented on
Jun 19, 2025 • 0 new comments -
Add additional MacOS test runners for MPS
#150964 commented on
Jun 15, 2025 • 0 new comments -
Add complex logaddexp
#150946 commented on
Jun 15, 2025 • 0 new comments -
all_reduce autograd
#150942 commented on
Jun 22, 2025 • 0 new comments -
Pin all root requirements to major versions
#150833 commented on
Jun 18, 2025 • 0 new comments -
[CI][cpp_wrapper] Fix selection of CPU OpInfo tests
#155967 commented on
Jun 19, 2025 • 0 new comments -
draft: [cp] context_parallel + flex_attention_backward using torch_function and autograd function
#155970 commented on
Jun 16, 2025 • 0 new comments -
[Inductor] Set the default value of min_chunk_size to 512
#150762 commented on
Jun 19, 2025 • 0 new comments -
cd: Introduce new binary build workflows (cpu)
#150713 commented on
Jun 16, 2025 • 0 new comments -
Raise `BufferError` for DLPack buffer-related errors.
#150691 commented on
Jun 20, 2025 • 0 new comments -
fix dynamic shapes for kwargs
#150583 commented on
Jun 19, 2025 • 0 new comments -
Initial Implementation of Padded Tensor
#150567 commented on
Jun 18, 2025 • 0 new comments -
Fixes for CPython int/float tests
#155978 commented on
Jun 17, 2025 • 0 new comments -
Add `mse_loss_backward_out` type promotion10000
#150384 commented on
Jun 16, 2025 • 0 new comments -
Copy native runtime code to OSS.
#150338 commented on
Jun 15, 2025 • 0 new comments -
[FlexAttention] Don't load invalid values from mask mod
#150331 commented on
Jun 15, 2025 • 0 new comments -
Handling overflow for long int overflow for the product of kernel_hei…
#155989 commented on
Jun 15, 2025 • 0 new comments -
[Inductor] Synchronize type annotations between torch and triton
#150311 commented on
Jun 21, 2025 • 0 new comments -
[BE][Ez]: Fix untyped decorator in dcp utils
#156003 commented on
Jun 19, 2025 • 0 new comments -
add enum for core Backend class
#156004 commented on
Jun 19, 2025 • 0 new comments -
Fix `L1Loss`, `MSELoss`, `HuberLoss` missing `weight` param
#150097 commented on
Jun 17, 2025 • 0 new comments -
added six and pyyaml to requirements.txt to fix missing module error …
#151605 commented on
Jun 17, 2025 • 0 new comments -
[DRAFT] fix issues related to deferred assertion on unabcked floats
#151604 commented on
Jun 17, 2025 • 0 new comments -
[bazel] Fix aten generator directory path
#151580 commented on
Jun 19, 2025 • 0 new comments -
[autodeps2] Replace third-party/pyqt5 with third-party/pypi/pyqt5
#151557 commented on
Jun 16, 2025 • 0 new comments -
TopK workaround when tensor rank - sort axis > 4
#155950 commented on
Jun 17, 2025 • 0 new comments -
Fix normalize mypy warning with tuple dim
#151553 commented on
Jun 18, 2025 • 0 new comments -
[DRAFT][cuDNN][SDPA] Introduce `TORCH_CUDNN_SDPA_AVOID_RECOMPILE=1`
#155958 commented on
Jun 18, 2025 • 0 new comments -
inductor.config.descriptive_names = False is not actually supported (#145523) (#146051)
#151481 commented on
Jun 18, 2025 • 0 new comments -
[ROCm] Initial plumbing for CK Gemm Perf Improvement
#151465 commented on
Jun 19, 2025 • 0 new comments -
Add default value for `serialization_format` in `_write_item` function for better compatibility
#151452 commented on
Jun 16, 2025 • 0 new comments -
[ca] default on in CI also for PYTORCH_TEST_WITH_INDUCTOR
#155960 commented on
Jun 18, 2025 • 0 new comments -
update fx.Interpreter error logging to check if submodules are GraphModules
#151451 commented on
Jun 17, 2025 • 0 new comments -
Implement fast exp for AVX2 and AVX512 for the flash attention
#151441 commented on
Jun 16, 2025 • 0 new comments -
Update docker image names for s390x release
#151429 commented on
Jun 16, 2025 • 0 new comments -
[CI] Remove redundant accuracy benchmarks for cpp_wrapper
#155966 commented on
Jun 19, 2025 • 0 new comments -
Add inductor backend to device interface; make minifier_tests more device agnostic
#151314 commented on
Jun 20, 2025 • 0 new comments -
[WIP] Generalize device caching allocator
#151298 commented on
Jun 17, 2025 • 0 new comments -
Remove outdated Android workarounds of nearbyintf
#151292 commented on
Jun 22, 2025 • 0 new comments -
[WIP][dynamic shapes] lru cache bound_sympy
#151271 commented on
Jun 16, 2025 • 0 new comments -
[AMD][FA] Block mem efficient attention if backward head_dim > 128 in CK backend
#151258 commented on
Jun 16, 2025 • 0 new comments -
[aot] bw_module for ca: do not clone real buffers/params
#155370 commented on
Jun 19, 2025 • 0 new comments -
[AOTInductor] Inherit extern kernels for runtime constant folding
#155361 commented on
Jun 16, 2025 • 0 new comments -
Fix serialization of nans in torch.export
#155359 commented on
Jun 18, 2025 • 0 new comments -
[MPS] Activation kernels: do compute at float precision
#155735 commented on
Jun 17, 2025 • 0 new comments -
[#155034] Converted RST files to Markdown
#155287 commented on
Jun 15, 2025 • 0 new comments -
[inductor] support linear & layer_norm unbacked
#155267 commented on
Jun 17, 2025 • 0 new comments -
[Torch Package] Make get names of OrderedImporters support fallback to importers
#155743 commented on
Jun 17, 2025 • 0 new comments -
updated adafactor doc #154862
#155248 commented on
Jun 20, 2025 • 0 new comments -
Add is_hidden_event method to KinetoEvent Python interface
#155214 commented on
Jun 20, 2025 • 0 new comments -
Add AMD AWS runners to inductor performance tests
#155206 commented on
Jun 20, 2025 • 0 new comments -
add `__annotations__` attribute to `OpOverload`
#155784 commented on
Jun 16, 2025 • 0 new comments -
[MPS] Add device guard for MPS dispatch key
#155165 commented on
Jun 20, 2025 • 0 new comments -
[DONT MERGE][TESTING][1/2] xpu test runner
#155793 commented on
Jun 19, 2025 • 0 new comments -
Fix conversion of values in libtorch agnostic tests
#155115 commented on
Jun 18, 2025 • 0 new comments -
Issue warning with reference to user code rather than torch
#155112 commented on
Jun 16, 2025 • 0 new comments -
[Quant][CPU] fix fake_quantize_per_tensor_affine of inf values
#155109 commented on
Jun 17, 2025 • 0 new comments -
Adapting pipeline parallelism test cases to be device agnostic
#155108 commented on
Jun 20, 2025 • 0 new comments -
Deprecate c10::string
#155084 commented on
Jun 22, 2025 • 0 new comments -
Refactor DynamoStore into disk and in memory implementations
#155818 commented on
Jun 18, 2025 • 0 new comments -
update the baseline for nightly max_autotune tests
#154973 commented on
Jun 18, 2025 • 0 new comments -
[cond] auto_functionalize cond
#155645 commented on
Jun 17, 2025 • 0 new comments -
[Misc] handle sys exit caused by skip_if_lt_x_gpu in test_composabili…
#155665 commented on
Jun 18, 2025 • 0 new comments -
[ROCm][SymmetricMemory] Avoid bf16 to float conversion during reduce
#155587 commented on
Jun 20, 2025 • 0 new comments -
[Misc] skip the case test_foreach_add_different_mesh if world size is…
#155563 commented on
Jun 18, 2025 • 0 new comments -
Implement guard collectives
#155558 commented on
Jun 20, 2025 • 0 new comments -
Document `Flop Counter Mode` in torch.utils
#155673 commented on
Jun 16, 2025 • 0 new comments -
Update MAIAHooksInterface to pin host memory in MAIA device
#155541 commented on
Jun 20, 2025 • 0 new comments -
[Quant][CPU] Enable fp8 qlinear
#155678 commented on
Jun 16, 2025 • 0 new comments -
Making implicit packages explicit (torch)
#155505 commented on
Jun 20, 2025 • 0 new comments -
[fsdp] fix: fix optim_state_dict with FSDP model not on global rank 0
#155685 commented on
Jun 18, 2025 • 0 new comments -
[ROCm] skip convolution tests on Navi, enable batch_norm_with_update
#155454 commented on
Jun 16, 2025 • 0 new comments -
Update slow tests
#155448 commented on
Jun 16, 2025 • 0 new comments -
[Profiler] Fix lost C call events problem in Python 3.12.0-3.12.4
#155446 commented on
Jun 21, 2025 • 0 new comments -
Clean up HF components
#155707 commented on
Jun 17, 2025 • 0 new comments -
docs: clean up docstring for clarity and correctness
#155712 commented on
Jun 15, 2025 • 0 new comments -
[2/2] proxy_tensor do not clobber for mutating ops
#155716 commented on
Jun 16, 2025 • 0 new comments -
[FP8] Fix Benchmarking for certain Priors
#155722 commented on
Jun 17, 2025 • 0 new comments -
Remove remaining CUDA 12.4 CI code
#155412 commented on
Jun 22, 2025 • 0 new comments -
Convert onnx torchscript rst to md
#155390 commented on
Jun 21, 2025 • 0 new comments -
[Precompile] Hook up backend="inductor"
#155387 commented on
Jun 22, 2025 • 0 new comments -
[cuBLAS][cuBLASLt] Reduce scale of inputs for reduced precision reduction matmul test
#154293 commented on
Jun 17, 2025 • 0 new comments -
[Dynamo] [FrozensetSubclass] Add support for user defined frozensets
#154263 commented on
Jun 21, 2025 • 0 new comments -
[NOT FOR MERGE] Exploratory work on AOTInductor training
#155877 commented on
Jun 19, 2025 • 0 new comments -
implement MKLGenerator
#154199 commented on
Jun 18, 2025 • 0 new comments -
[cond] support gen_schema for cond
#154193 commented on
Jun 17, 2025 • 0 new comments -
[cuBLASLt][cuBLAS] Support 2D bias and `beta != 1.0` in cuBLASLt
#154170 commented on
Jun 17, 2025 • 0 new comments -
[BE]: Update pybind11 submodule to 3.0.0rc
#154115 commented on
Jun 19, 2025 • 0 new comments -
[Dynamo] [Set] Add comparison for set subclass
#154066 commented on
Jun 21, 2025 • 0 new comments -
[Dynamo] [Set] Raise TypeError in set.union(...) and "__or__"
#154065 commented on
Jun 21, 2025 • 0 new comments -
[Dynamo] [Set] Raise TypeError if object is unhashable
#154064 commented on
Jun 21, 2025 • 0 new comments -
[Dynamo] [Set] Implement some binop operators for dict/set/frozenset/dict_keys
#154063 commented on
Jun 21, 2025 • 0 new comments -
[draft][do not review] H-FSDP prototype
#154000 commented on
Jun 18, 2025 • 0 new comments -
Docs: Fix sphinx heading markup in `nn.rst`
#155883 commented on
Jun 17, 2025 • 0 new comments -
[WIP][user triton] AOT inductor support for device-side TMA
#155896 commented on
Jun 17, 2025 • 0 new comments -
Ignore url lint in install_xpu.sh
#153796 commented on
Jun 16, 2025 • 0 new comments -
[BE] Use latest mkl-include and mkl-devel on Windows CI
#153684 commented on
Jun 15, 2025 • 0 new comments -
[Cutlass] Fix buffer missing issues
#155897 commented on
Jun 17, 2025 • 0 new comments -
[Dynamo] [SetSubclass] Add support for user defined sets
#153553 commented on
Jun 21, 2025 • 0 new comments -
[ROCm] update state check for test_trace_while_active*
#153545 commented on
Jun 19, 2025 • 0 new comments -
CMake: update FindCUDAToolkit.cmake, use torch::nvtx3 if present, mod…
#153339 commented on
Jun 15, 2025 • 0 new comments -
[CI] Removing --user flag from all pip install commands
#154900 commented on
Jun 16, 2025 • 0 new comments -
[ROCm] SDPA fix mem fault when dropout is enabled
#154864 commented on
Jun 21, 2025 • 0 new comments -
Skip FSDP tests if device count is less then requested world_size value
#155836 commented on
Jun 20, 2025 • 0 new comments -
[BE]: Try to enable LTO
#154819 commented on
Jun 18, 2025 • 0 new comments -
[Wheel Variant] Experimental Support
#154733 commented on
Jun 21, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#154694 commented on
Jun 22, 2025 • 0 new comments -
Use official CUDAToolkit module in CMake
#154595 commented on
Jun 22, 2025 • 0 new comments -
Fix MKL error: Inconsistent configuration parameters
#154585 commented on
Jun 17, 2025 • 0 new comments -
[einops] Ensure Dynamo can trace through explicit set dunder method call
#155842 commented on
Jun 20, 2025 • 0 new comments -
[dynamo] raise hard error if error is encountered while tracing resume function prologue
#154564 commented on
Jun 21, 2025 • 0 new comments -
Fixes Issue #154491
#154561 commented on
Jun 15, 2025 • 0 new comments -
Improve torch.ops typing
#154555 commented on
Jun 21, 2025 • 0 new comments -
[cpp_wrapper] Build main and kernel code in separate threads
#154551 commented on
Jun 19, 2025 • 0 new comments -
[Generator] Implement generator.__contains__
#154539 commented on
Jun 21, 2025 • 0 new comments -
Fix Float16 CooperativeReduction Test Failure
#154516 commented on
Jun 18, 2025 • 0 new comments -
[NCCL][P2P] Optionally avoid `recordStream`in P2P comms
#155854 commented on
Jun 17, 2025 • 0 new comments -
[easy] better copy_misaligned_inputs assertion failure message
#154472 commented on
Jun 21, 2025 • 0 new comments -
Fix: Ensure writeback handles NO_SHARD correctly by flattening tensors before copying
#154369 commented on
Jun 18, 2025 • 0 new comments -
Ensure Dynamo can trace through explicit dunder method call
#154366 commented on
Jun 20, 2025 • 0 new comments -
[DONT MERGE] Diffusion models benchmarking for compile time
#155866 commented on
Jun 21, 2025 • 0 new comments -
DISABLED test_wait_tensor (__main__.CompileTest)
#148014 commented on
Jun 19, 2025 • 0 new comments -
Torchrun should handle SIGUSR1 and SIGUSR2
#154849 commented on
Jun 18, 2025 • 0 new comments -
Cudnn attention is very slow when sequence length changed in every step
#154602 commented on
Jun 18, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_single (__main__.CompileTest)
#147707 commented on
Jun 18, 2025 • 0 new comments -
DISABLED test_per_sample_api_compute_batch_size_not_pytreeable_cpu (__main__.TestExpandedWeightModuleCPU)
#146972 commented on
Jun 18, 2025 • 0 new comments -
xpu: implement aten::_linalg_eigvals for XPU backend (affecting HF Transformers v4.46.0-v4.48.0)
#140965 commented on
Jun 18, 2025 • 0 new comments -
Export Huggingface models with StaticCache
#155862 commented on
Jun 18, 2025 • 0 new comments -
DTensor RNG state for non CUDA backends
#138329 commented on
Jun 18, 2025 • 0 new comments -
UR Error when calling grid_sample
#153996 commented on
Jun 18, 2025 • 0 new comments -
Reproducibility of results without AVX512 by setting ATEN_CPU_CAPABILITY=avx2
#155552 commented on
Jun 18, 2025 • 0 new comments -
Enable CUDA 12.9 binaries
#155196 commented on
Jun 17, 2025 • 0 new comments -
Device check missing in torch.linalg.solve_triangular leading to hard crash
#142048 commented on
Jun 17, 2025 • 0 new comments -
RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/pytorch/pytorch/c10/cuda/CUDACachingAllocator.cpp":830, please report a bug to PyTorch.
#123834 commented on
Jun 17, 2025 • 0 new comments -
support for cuDNN 9.8+
#155203 commented on
Jun 17, 2025 • 0 new comments -
Excessively restrictive dependencies
#155325 commented on
Jun 17, 2025 • 0 new comments -
CUDA 12.6->12.8 slow and periodic failures
#155607 commented on
Jun 17, 2025 • 0 new comments -
Functional all_gather_into_tensor does not support stacking, fails when compiled
#155632 commented on
Jun 17, 2025 • 0 new comments -
[RFC] Experimental Wheel Variant Support
#155141 commented on
Jun 17, 2025 • 0 new comments -
torch.compile produces incorrect output
#155690 commented on
Jun 17, 2025 • 0 new comments -
Question in aot_autograd trace in torch.distributed case
#155599 commented on
Jun 17, 2025 • 0 new comments -
Graph break when modifying a list that contains symints.
#155174 commented on
Jun 17, 2025 • 0 new comments -
[Misc] test_foreach_add_different_mesh cannot work on machines with less than 4 GPUs
#155562 commented on
Jun 17, 2025 • 0 new comments -
[NJT] can only chunk if the 2nd dimension is ragged
#153238 commented on
Jun 17, 2025 • 0 new comments -
Vulkan interoperability
#155986 commented on
Jun 17, 2025 • 0 new comments -
The difference between input grad computed by channels last backward and the input grad computed by channels first backward of Hardswish on MPS is too large
#107214 commented on
Jun 17, 2025 • 0 new comments -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float32 (__main__.TestForeachCUDA)
#153470 commented on
Jun 17, 2025 • 0 new comments -
DCP save only saves one shard of tensor parallel model when using DP + TP
#156002 commented on
Jun 17, 2025 • 0 new comments -
`torch.prod` or `torch.special.entr` triggers `CUDA driver error: invalid argument` on GPU unless kernel cache is cleared
#156010 commented on
Jun 17, 2025 • 0 new comments -
Torch compile CUDA graphs leads to a large number of CUDA streams
#155679 commented on
Jun 19, 2025 • 0 new comments -
ROCm: torch.cholesky_inverse raises Memory access fault for large tensor shapes
#155046 commented on
Jun 19, 2025 • 0 new comments -
Dynamo export: Fake tensor broadcast error
#129534 commented on
Jun 19, 2025 • 0 new comments -
Sparse tensor indexing not implemented, but partially supported by using index_select
#150277 commented on
Jun 19, 2025 • 0 new comments -
NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'SparseCUDA' backend.
#152226 commented on
Jun 19, 2025 • 0 new comments -
`torch.sparse.log_softmax` output mismatch between CPU and CUDA
#152293 commented on
Jun 19, 2025 • 0 new comments -
Segmentation fault when converting sparse COO tensor with complex values to dense
#153329 commented on
Jun 19, 2025 • 0 new comments -
cpp wrapper calls back to python for custom op even when a C++ registration is made
#153478 commented on
Jun 19, 2025 • 0 new comments -
NotImplementedError: Could not run 'aten::log' with arguments from the 'SparseCUDA' backend.
#153497 commented on
Jun 19, 2025 • 0 new comments -
Clarify default value of eps in RMSNorm documentation
#155527 commented on
Jun 19, 2025 • 0 new comments -
Triton pin update for PyTorch 2.8 / Triton 3.4
#154206 commented on
Jun 19, 2025 • 0 new comments -
[torch.export] Cannot export TorchVision raft_small, raft_large
#155550 commented on
Jun 19, 2025 • 0 new comments -
DISABLED test_mempool_ctx_multithread (__main__.TestMemPool)
#153460 commented on
Jun 19, 2025 • 0 new comments -
Windows inductor genarated zero size array code, and is not supported by MSVC(C2466).
#153180 commented on
Jun 19, 2025 • 0 new comments -
Dynamo handling for all methods of torch.Generator
#88576 commented on
Jun 19, 2025 • 0 new comments -
DISABLED test_memory_snapshot (__main__.TestCudaMallocAsync)
#126953 commented on
Jun 19, 2025 • 0 new comments -
RFC: The State of Custom CUDA extensions in PyTorch
#152032 commented on
Jun 19, 2025 • 0 new comments -
`torch._dynamo.exc.Unsupported: Attempted to call function marked as skipped`. Explanation: Dynamo developers have intentionally marked that the function `_immutable_list_unflatten`
#155426 commented on
Jun 18, 2025 • 0 new comments -
[dtensor] ops coverage tracker
#119930 commented on
Jun 18, 2025 • 0 new comments -
Context Parallel -- unsharded output doesn't match output without CP.
#152261 commented on
Jun 18, 2025 • 0 new comments -
"RuntimeError: makeDeviceForHostname(): unsupported gloo device" with nightly torch 2.8
#150381 commented on
Jun 18, 2025 • 0 new comments -
Escape hatch: way to dynamically add or remove tags from custom operators
#150972 commented on
Jun 18, 2025 • 0 new comments -
Compile produces different result than eager for mutable custom op use case
#153389 commented on
Jun 18, 2025 • 0 new comments -
[RFC][API-Unstable] Support 3rd party SYCL kernels with CPP Extension API
#153265 commented on
Jun 18, 2025 • 0 new comments -
Timer benchmark stores only one time value, and therefore has broken mean/median/etc metrics
#106801 commented on
Jun 18, 2025 • 0 new comments -
[feature request] Native checkpointing to/from `s3://`
#155992 commented on
Jun 18, 2025 • 0 new comments -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float64 (__main__.TestForeachCUDA)
#153544 commented on
Jun 18, 2025 • 0 new comments -
torch.distributions.kl_divergence Fails with MultivariateNormal in Dynamo Due to _infer_size Type Error
#155800 commented on
Jun 16, 2025 • 0 new comments -
Activation Checkpointing breaks "torch.distributed.checkpoint.state_dict._get_fqns"
#155924 commented on
Jun 16, 2025 • 0 new comments -
Can't call torch.compile inside of a custom op
#151328 commented on
Jun 16, 2025 • 0 new comments -
pipeline() fails when a sub-module uses "no_grad()"; impacts RoPE implementation on HF models
#155589 commented on
Jun 16, 2025 • 0 new comments -
Cannot compile with latest LLVM-19
#139065 commented on
Jun 16, 2025 • 0 new comments -
torch.clamp throws overflow error on CPU but not on CUDA
#155671 commented on
Jun 16, 2025 • 0 new comments -
crash in torch.histc
#155393 commented on
Jun 16, 2025 • 0 new comments -
nn.init.trunc_normal_ Creates Massive Outliers with Small std Due to erfinv Instability
#155588 commented on
Jun 16, 2025 • 0 new comments -
FSDP learning hangs when the program tries to save the model
#143536 commented on
Jun 16, 2025 • 0 new comments -
Most requested ops for the MPS backend
#154052 commented on
Jun 16, 2025 • 0 new comments -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_float16 (__main__.TestForeachCUDA)
#153379 commented on
Jun 16, 2025 • 0 new comments -
[Tracker] Nested tensor op coverage requests
#118107 commented on
Jun 16, 2025 • 0 new comments -
[RFC][API-Unstable] Intel GPU distributed Backend integration in `torch-xpu-ops`and registeration in PyTorch
#141741 commented on
Jun 16, 2025 • 0 new comments -
TypeError when using torch.cuda.list_gpu_processes() on Windows with the WDDM driver
#64491 commented on
Jun 16, 2025 • 0 new comments -
DISABLED test_jacobian_vectorize_raises_no_warnings_logging_tensor (__main__.TestAutogradFunctional)
#153707 commented on
Jun 16, 2025 • 0 new comments -
DTensor + torch.compile on CPU: compiled matmul fails with multiple shape inputs
#154111 commented on
Jun 16, 2025 • 0 new comments -
FSDP2's `set_reduce_scatter_divide_factor` is inconsistent wrt reduce dtype
#155904 commented on
Jun 16, 2025 • 0 new comments -
Including XPU and CUDA in ProfilerActivity causes XPU profiling to be ignored
#155957 commented on
Jun 16, 2025 • 0 new comments -
Some Doc Issue about `torch.lobpcg()`
#152107 commented on
Jun 16, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on
Jun 16, 2025 • 0 new comments -
Continuous calls to nn.Linear in fp32 on the 5090D cause severe performance degradation
#150725 commented on
Jun 15, 2025 • 0 new comments -
Enable TorchInductor to Generate Matmuls Natively via `tl.dot`
#151705 commented on
Jun 15, 2025 • 0 new comments -
bmm, topk, cholesky, linalg.norm, max with out variants set causing recompilations in torch.compile
#135859 commented on
Jun 15, 2025 • 0 new comments -
Enable `torch.topk` to support `stable` flag
#88227 commented on
Jun 15, 2025 • 0 new comments -
ONNX export via Dynamo sets `dft_length = 1` in `DFT`, breaking shape-inference for `torch.fft.rfft`
#155997 commented on
Jun 15, 2025 • 0 new comments -
Incompatible Torch and Torchvision while building from source for 2.6.0 and CUDA 12.6, RuntimeError: operator torchvision::nms does not exist
#146221 commented on
Jun 15, 2025 • 0 new comments -
[MPS] Performance regression and visual bug with ComfyUI Flux dev since nightly 20250510
#155797 commented on
Jun 15, 2025 • 0 new comments -
Keep gettting AssertionError: found no DeviceMesh from dtensor args for c10d.broadcast_.default!
#155993 commented on
Jun 17, 2025 • 0 new comments -
[ONNX] dynamic_axes does not rename dynamic dimension in torch.onnx.export
#150544 commented on
Jun 17, 2025 • 0 new comments -
Inductor Perf MX to_blocked
#153194 commented on
Jun 17, 2025 • 0 new comments -
[Async TP] Fuse all-gather-matmuls for float8 rowwise training
#149990 commented on
Jun 17, 2025 • 0 new comments -
torch.compile on MPS progress tracker
#150121 commented on
Jun 17, 2025 • 0 new comments -
canUse32BitIndexMath set to False with efficient net
#155225 commented on
Jun 17, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Jun 17, 2025 • 0 new comments -
Pypi Support for Windows arm64
#154260 commented on
Jun 17, 2025 • 0 new comments -
[feature request] Rank-Revealing QR - Adding dgeqp3 support to torch.qr
#10454 commented on
Jun 17, 2025 • 0 new comments -
Feature Request: Add a rounding mode to round
#55289 commented on
Jun 17, 2025 • 0 new comments -
DISABLED test_run_decompositions_map_handle_to_new_nodes (__main__.TestNumericDebugger)
#144933 commented on
Jun 17, 2025 • 0 new comments -
foreach CUDA tests flaky on CUDA 12.6+ due to flaky profiler results
#148681 commented on
Jun 17, 2025 • 0 new comments -
AssertionError: found no DeviceMesh from dtensor args for c10d.broadcast_.default!
#155463 commented on
Jun 17, 2025 • 0 new comments -
[feature request] Exact euclidean distance transform
#61509 commented on
Jun 17, 2025 • 0 new comments -
[DCP] Allow for rank-specific tensors with duplicate keys
#146566 commented on
Jun 17, 2025 • 0 new comments -
DISABLED test_re_export_preserve_handle (__main__.TestNumericDebugger)
#144898 commented on
Jun 17, 2025 • 0 new comments -
DISABLED test_lowering_to_x86 (__main__.TestQuantizePT2EX86Inductor)
#153140 commented on
Jun 17, 2025 • 0 new comments -
Process never ends when sending tensors through multiprocessing queues in Python 3.12+ on macOS
#153050 commented on
Jun 16, 2025 • 0 new comments -
DISABLED test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn)
#75648 commented on
Jun 16, 2025 • 0 new comments -
Running dispatch modes on compile-disabled regions of a compiled model
#155825 commented on
Jun 16, 2025 • 0 new comments -
[MPS] Migrate torch.sort to Metal shader
#155560 commented on
Jun 16, 2025 • 0 new comments -
[DTensor] DTensor is not well supported on older versions of GPUs, such as A10
#155657 commented on
Jun 16, 2025 • 0 new comments -
Add support for MaxPool3D on the MPS backend
#100674 commented on
Jun 16, 2025 • 0 new comments -
Graph Partition Issue Tracker
#151832 commented on
Jun 16, 2025 • 0 new comments -
Segmentation error for torch==2.2.1 on MacOs
#121101 commented on
Jun 16, 2025 • 0 new comments -
Suggestion: integration of einops test suite
#146782 commented on
Jun 16, 2025 • 0 new comments -
test vLLM with PyTorch 2.8rc before releasing PyTorch 2.8
#155933 commented on
Jun 16, 2025 • 0 new comments -
[inductor] Add typing to _inductor/ir.py
#149958 commented on
Jun 19, 2025 • 0 new comments -
gloo: fix building system gloo with CUDA/HIP
#146637 commented on
Jun 15, 2025 • 0 new comments -
Add MPS OpInfo db, rework test_mps to use OpInfo
#145955 commented on
Jun 16, 2025 • 0 new comments -
[WIP] Allow generation of inductor backend specific tests using instantiate_device_type_tests
#145873 commented on
Jun 21, 2025 • 0 new comments -
NJT support for cat() on the ragged dim
#145778 commented on
Jun 18, 2025 • 0 new comments -
Fix full_like decomposition to preserve strides
#144765 commented on
Jun 20, 2025 • 0 new comments -
[Reopen] [Intel GPU] Set higher tolerance for some models only on XPU Device
#144756 commented on
Jun 16, 2025 • 0 new comments -
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on
Jun 16, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `test/[i-z]*/` to `ruff format`
#144556 commented on
Jun 16, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `test/[a-h]*/` to `ruff format`
#144555 commented on
Jun 16, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on
Jun 16, 2025 • 0 new comments -
Support Swiglu for Module and functional
#144465 commented on
Jun 21, 2025 • 0 new comments -
[ci] Add riscv opt-int build
#143979 commented on
Jun 17, 2025 • 0 new comments -
Defaults to C++20 in CMake torch targets
#143959 commented on
Jun 15, 2025 • 0 new comments -
[Don't Review] Test CI
#139971 commented on
Jun 19, 2025 • 0 new comments -
`has_triton`: Use the device interface for detecting Triton availability
#139171 commented on
Jun 16, 2025 • 0 new comments -
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on
Jun 19, 2025 • 0 new comments -
[Docker] Create an independent dependecies layer
#138612 commented on
Jun 18, 2025 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
Jun 18, 2025 • 0 new comments -
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on
Jun 18, 2025 • 0 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
Jun 18, 2025 • 0 new comments -
Avoid sqrt calculations with values less than zero
#136824 commented on
Jun 21, 2025 • 0 new comments -
[Inductor] auto-chunker
#136702 commented on
Jun 17, 2025 • 0 new comments -
Remove deprecated jit code
#131296 commented on
Jun 20, 2025 • 0 new comments -
[DTensor] decomposed sharding propagation
#130887 commented on
Jun 19, 2025 • 0 new comments -
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on
Jun 22, 2025 • 0 new comments -
[inductor] enable bf32 test for mkldnn conv
#127293 commented on
Jun 22, 2025 • 0 new comments -
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on
Jun 18, 2025 • 0 new comments -
[draft] Add support in Flex for non-contiguous NJT
#149892 commented on
Jun 18, 2025 • 0 new comments -
cd: Add script for generating binary build matrix
#149830 commented on
Jun 16, 2025 • 0 new comments -
[Inductor] Restrict block analysis to only match integer dims and strides
#149615 commented on
Jun 20, 2025 • 0 new comments -
Generalize AllocatorConfig to be device-agnostic
#149601 commented on
Jun 18, 2025 • 0 new comments -
[ROCm] support experimental CU carveout
#149466 commented on
Jun 21, 2025 • 0 new comments -
Use mypy 1.15
#149426 commented on
Jun 20, 2025 • 0 new comments -
Fix B018 Useless Expressions in Multiple Files (#106571)
#149408 commented on
Jun 17, 2025 • 0 new comments -
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on
Jun 17, 2025 • 0 new comments -
PaddedTensor Init
#149140 commented on
Jun 18, 2025 • 0 new comments -
[Intel GPU] Allow XPU backend in Depthwise_conv2d&3d operators
#149114 commented on
Jun 21, 2025 • 0 new comments -
remove guard_size_oblivious from unbind.
#148815 commented on
Jun 21, 2025 • 0 new comments -
Trunk workflow for Windows Arm64
#148753 commented on
Jun 17, 2025 • 0 new comments -
< 10000 a href="https://github.com/pytorch/pytorch/pull/148569" class="h4 Link--primary mb-1">[BE][pytree] cleanup parameterized pytree tests
#148569 commented on
Jun 18, 2025 • 0 new comments -
[triton hash update] update the pinned triton hash
#148492 commented on
Jun 22, 2025 • 0 new comments -
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on
Jun 18, 2025 • 0 new comments -
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on
Jun 18, 2025 • 0 new comments -
[pytree] simplify public API exposition with `__module__`
#148328 commented on
Jun 18, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `test/inductor/` to `ruff format`
#148186 commented on
Jun 16, 2025 • 0 new comments -
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on
Jun 18, 2025 • 0 new comments -
[test] compile cmd
#147470 commented on
Jun 18, 2025 • 0 new comments -
Small scheduler refactor
#147410 commented on
Jun 21, 2025 • 0 new comments -
[MPS] Fix incorrect size for uint3 arg
#147325 commented on
Jun 16, 2025 • 0 new comments -
[MPS] Fix metallib embedding in static builds
#147324 commented on
Jun 16, 2025 • 0 new comments -
Add ppc64le wheel build support
#147194 commented on
Jun 20, 2025 • 0 new comments -
Fix the Problems About Defining Static Variable in Inline Function
#147095 commented on
Jun 20, 2025 • 0 new comments -
Porting Pytorch to AIX Operating System.
#146983 commented on
Jun 19, 2025 • 0 new comments -
Optimize isclose() for CPU and GPU by adding specific implementations
#146656 commented on
Jun 17, 2025 • 0 new comments -
DISABLED test_slice_scatter_reinplace_cuda (__main__.GPUTests)
#145189 commented on
Jun 20, 2025 • 0 new comments -
[AC] torch.utils.checkpoint.CheckpointError from HF qwen2
#155171 commented on
Jun 20, 2025 • 0 new comments -
DISABLED test_non_contiguous_input_mm_plus_mm (__main__.TestMaxAutotune)
#126867 commented on
Jun 20, 2025 • 0 new comments -
Migrating existing backend-MAIA integration toward PrivateUse1 / openReg
#155864 commented on
Jun 20, 2025 • 0 new comments -
Multi-dimensional tensors in datasets might get incorrectly flattened when fetching data from dataloader which is specified 'batch_sampler' when created
#154810 commented on
Jun 20, 2025 • 0 new comments -
[torch.export] Cannot export TorchVision fasterrcnn_mobilenet_v3_large_fpn
#146152 commented on
Jun 20, 2025 • 0 new comments -
CUDA 12.6 Inductor accuracy test failures
#148699 commented on
Jun 20, 2025 • 0 new comments -
`setup.py develop` command is disappearing soon from `setuptools`
#152276 commented on
Jun 20, 2025 • 0 new comments -
[release] Make pytorch source distribution package respect pep-0517
#150461 commented on
Jun 20, 2025 • 0 new comments -
[CI] [anaconda] Docker files have conda environment installed
#148335 commented on
Jun 20, 2025 • 0 new comments -
[CI] [anaconda] CI Build and Test scripts MacOS
#148340 commented on
Jun 20, 2025 • 0 new comments -
[Docs] [anaconda] Review and update
#148339 commented on
Jun 20, 2025 • 0 new comments -
[CI] [anaconda] CI Build and Test scripts Windows
#148338 commented on
Jun 20, 2025 • 0 new comments -
[CI] [anaconda] CI Build and Test scripts Linux
#148336 commented on
Jun 20, 2025 • 0 new comments -
Deprecation notice of `torch.norm` and `Tensor.norm` across the documentation
#156005 commented on
Jun 20, 2025 • 0 new comments -
[ONNX] broadcast_in_dim: model (ReDimNet)
#138313 commented on
Jun 20, 2025 • 0 new comments -
`torch.onnx.export` (dynamo=False) fails with uninformative error when exporting `apply_rotary_pos_emb`/`repeat_interleave`
#145100 commented on
Jun 20, 2025 • 0 new comments -
[ONNX Convert] Error when input to nn.AdaptiveAvgPool2d size is variable
#147720 commented on
Jun 20, 2025 • 0 new comments -
[export] Decomp failure when running `aten.item.default`
#150823 commented on
Jun 20, 2025 • 0 new comments -
[ONNX] Use dlpack to transfer tensors when onnxruntime implements proper support
#151064 commented on
Jun 20, 2025 • 0 new comments -
[ONNX] Simple torch.nn.Identity onnx export with dynamo=True does not load
#151017 commented on
Jun 20, 2025 • 0 new comments -
DISABLED test_sdpa_mask_fp16_L6_S17_NH23_HS121 (__main__.TestSDPA)
#138905 commented on
Jun 20, 2025 • 0 new comments -
The docstring linter should not force overridden methods to be documented
#151692 commented on
Jun 20, 2025 • 0 new comments -
Remove redundant type aliases of _device for torch.Device
#152952 commented on
Jun 20, 2025 • 0 new comments -
[torch.compile][Megatron] Error with Megatron with Pytorch v2.5.0 using `AOTAutograd` and `torch.compile`
#141783 commented on
Jun 20, 2025 • 0 new comments -
[ONNX] ONNX export of simple quantized model fails
#113817 commented on
Jun 20, 2025 • 0 new comments -
Make streams used for NCCL operations configurable
#67158 commented on
Jun 19, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on
Jun 22, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on
Jun 22, 2025 • 0 new comments -
refine fp32 precision api
#125888 commented on
Jun 22, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Jun 21, 2025 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
Jun 19, 2025 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
Jun 19, 2025 • 0 new comments -
Support building pytorch using MKL ILP64 model.
#102613 commented on
Jun 19, 2025 • 0 new comments -
Support sparse COO/CSR/CSC/BSR/BSC return values in gradcheck input function
#97825 commented on
Jun 19, 2025 • 0 new comments -
Online softmax is disabled on the fly
#153241 commented on
Jun 21, 2025 • 0 new comments -
[Tracker] Support flash attention fa3 ABI stable w/ libtorch
#154908 commented on
Jun 21, 2025 • 0 new comments -
[ONNX] exported nodes of Multi-head attention can be simplified
#151209 commented on
Jun 21, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
Jun 21, 2025 • 0 new comments -
CompiledFxGraph.current_callable is not thread-safe
#138961 commented on
Jun 21, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
Jun 21, 2025 • 0 new comments -
[RFC] Use CUDA graphs by default on torch.compile
#121968 commented on
Jun 21, 2025 • 0 new comments -
Segmentation fault when calling `torch.choose_qparams_optimized()` with empty tensors and extreme num_bins value
#153326 commented on
Jun 21, 2025 • 0 new comments -
Tensor.lerp inconsistent when using -Infinity between MPS and CPU
#111374 commented on
Jun 21, 2025 • 0 new comments -
Looking for valid compiling option for extension based on torch-2.1.0+cpu.cxx11.abi
#143780 commented on
Jun 21, 2025 • 0 new comments -
Division by zero in ONNX export with `dynamo=True` leading to NaN outputs
#150623 commented on
Jun 21, 2025 • 0 new comments -
High-performance LLM quantization on X86 CPU with native PyTorch
#155435 commented on
Jun 21, 2025 • 0 new comments -
`torch.jit.script` models with `Dict[str, Tensor]` return cannot be exported via `torch.onnx.export` without `dynamo=True`, and error message is unclear
#155091 commented on
Jun 21, 2025 • 0 new comments -
as_tensor of list of tensors should keep grad history
#155983 commented on
Jun 21, 2025 • 0 new comments -
Segfault after clearing Dynamo Cache
#155057 commented on
Jun 21, 2025 • 0 new comments -
MSE documentation is weak
#88327 commented on
Jun 20, 2025 • 0 new comments -
torch.export does not support torchaudio.transforms.Spectrogram
#112844 commented on
Jun 20, 2025 • 0 new comments -
UNSTABLE pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks)
#153987 commented on
Jun 20, 2025 • 0 new comments -
DISABLED test_inductor_reduce_scatter_tensor_coalesced (__main__.CompileTest)
#147887 commented on
Jun 20, 2025 • 0 new comments