8000 [Meta] CI Revert Tracker · Issue #66178 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

[Meta] CI Revert Tracker #66178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
suo opened this issue Oct 6, 2021 · 30 comments
Closed

[Meta] CI Revert Tracker #66178

suo opened this issue Oct 6, 2021 · 30 comments
Labels
module: ci Related to continuous integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@suo
Copy link
Member
suo commented Oct 6, 2021

Meta issue to track reverts and their causes

We should have a weekly stats of PRs that were reverted bucketed in several categories:

  • Missed signal on PR
  • Ignored PR signal
  • Land race
  • GH1
  • Other

cc @seemethere @malfet @pytorch/pytorch-dev-infra

@suo suo added module: ci Related to continuous integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Oct 6, 2021
@malfet malfet changed the title CI Revert Tracker [Meta] CI Revert Tracker Oct 6, 2021
@malfet
Copy link
Contributor
malfet commented Oct 6, 2021

Initial list

No PR signal

Ignored signal

Weird

Land races

@malfet malfet closed this as completed Oct 6, 2021
@malfet malfet reopened this Oct 6, 2021
@janeyx99
Copy link
Contributor
janeyx99 commented Oct 7, 2021

10/5 - 10/12

No PR signal

Ignored signal

Weird

Land races

@suo
Copy link
Member Author
suo commented Oct 13, 2021

10/12 - 10/19

No PR signal

Ignored Signal

Weird

Land races

@suo
Copy link
Member Author
suo commented Oct 20, 2021

10/19 - 10/26

No PR signal

Ignored Signal

Weird

Land races

@janeyx99
Copy link
Contributor
janeyx99 commented Nov 1, 2021

10/26 - 11/2

No PR signal

Ignored Signal

Weird

Land races

@seemethere
Copy link
Member

11/2 - 11/9

No PR signal

Ignored Signal

Stack 1

Stack 2

Weird

Land races

@suo
Copy link
Member Author
suo commented Dec 3, 2021

@seemethere
Copy link
Member
seemethere commented Dec 14, 2021

12/07- 12/14

No PR signal

Ignored Signal

Land races

@suo
Copy link
Member Author
suo commented Jan 11, 2022

@seemethere
Copy link
Member
seemethere commented Jan 25, 2022

1/18-1/25

No PR signal

Ignored Signal

Land races

Weird

@mruberry
Copy link
Collaborator
mruberry commented Feb 1, 2022

1/26 - 2/1

No PR Signal

Ignored Signal

@janeyx99
Copy link
Contributor
janeyx99 commented Feb 2, 2022

2/1 - 2/8

No PR signal

Ignored Signal

Land races

GitHub First

Weird

@zengk95
Copy link
Contributor
zengk95 commented Feb 18, 2022

2/15 - 2/22

No PR signal

Ignored Signal

Land races

GitHub First

Weird

@seemethere
Copy link
Member
seemethere commented Mar 8, 2022

3/1 - 3/8

Command used:

git log --grep 'Revert ' --since 3/01/2022 --oneline

No PR signal

Ignored Signal

Land races

Github First

Weird

@malfet
Copy link
Contributor
malfet commented Mar 9, 2022

2/8 - 2/15

No PR signal

Ignored Signal

Land races

Weird

@b0noI
Copy link
Contributor
b0noI commented Mar 25, 2022

@zengk95
Copy link
Contributor
zengk95 commented Apr 5, 2022

3/26–4/03

https://github.com/pytorch/pytorch/pulls?q=is%3Apr+label%3Areverted+is%3Aclosed+updated%3A%22%3E2022-03-26%22+

No PR Signal

Ignored Signal

Land races

Weird

Github First

@rohan-varma
Copy link
Member

@zengk95 I would not say #74452 falls under "ignored signal", as it was more due to missing signal on the PR, possibly due to different NCCL versions running on master vs PR branches.

@zengk95
Copy link
Contributor
zengk95 commented Apr 6, 2022

@zengk95 I would not say #74452 falls under "ignored signal", as it was more due to missing signal on the PR, possibly due to different NCCL versions running on master vs PR branches.

Thanks for pointing that out! I moved it; I think I checked the original PR and it had a red signal so I thought it was ignored signal but I think that signal was from an underlying failure.

@malfet
Copy link
Contributor
malfet commented Apr 6, 2022

@zengk95 I don't think #74542 is a land race, but rather an internal-only regression (although a simple linter failure)

@suo
Copy link
Member Author
suo commented Apr 19, 2022

4/11–4/18

No PR signal

5878215 Revert "Port sort to structured kernels."
3c238c6 Revert "Add support to Tensor[]? for structured kernel codegen."
cc1902a Revert "Add warning when importing caffe2 on build without BUILD_CAFFE2=1"
3471b0e Revert "Remove histogramdd functional wrapper"
80e05b7 Revert "Extend sign-compare warnings to gcc"

Ignored signal

d79d9fa Revert "Remove breakpad dependency"
9312ee8 Revert "remove fp16 support from cpu linalg functions"
e8ed042 Revert "Optimize PReLU (float32) and enable PReLU BFloat16 support in CPU path"
495c5ae Revert "remove fp16 support from cpu linalg functions"

Land races

cbb9b33 Revert "Reland Fix public binding check for modules with __all__"
c52290b Revert "Fix public binding check for modules with __all__"
c5d57e7 Revert "Use batched operations for PowerSGD"
715e07b Revert "Remove histogramdd functional wrapper"
1aeea24 Revert "Add checks for public and private API"

Weird

496d4bb Revert "Add first version of Buck build workflow"
dunno what happened with this one

1118b15 Revert "Remove 11.5 experimental builds now that we have 11.6"
caused a downstream domains failure

GitHub first-related

c4cf51d Revert D35679120: Add first version of Buck build workflow
b5a2518 Revert "Add first version of Buck build workflow"
fe8eff3 Revert "Add upgrader related logic to flatbuffer"
db61652 Revert "[ci] use lintrunner in CI"
db18010 Revert "Relanding shape cache (75400)"
c274f66 Revert "Adding Caching of calculated Symbolic Shapes"

@zengk95
Copy link
Contributor
zengk95 commented Apr 26, 2022

4/19–4/26

No PR signal

#76045
#75974

Ignored signal

#76252
#76076
#73770

Land races

#76038

Weird

#72302 this caused internal regressions but didn't cause errors, just performance degradation
#75983 broke something in torchvision

GitHub first-related

#76075
#75538

@janeyx99
Copy link
Contributor

5/9 –5/16

No PR signal

#74410 broke slow master test like https://github.com/pytorch/pytorch/runs/6433364399?check_suite_focus=true
#77212 had all its builds canceled and broke the bazel build
#77089 breaks distributed tests on trunk, see https://hud.pytorch.org/pytorch/pytorch/commit/bf61b795031b4f30b2cf1267c5625bfe36cd5f3c
#76591 breaks trunk 10.2 tests because sampled_addm is not available for older CUDA,
see https://hud.pytorch.org/minihud?name_filter=trunk%20/%20linux-bionic-cuda10.2-py3.9-gcc7%20/%20test%20(default,%201,%202,%20linux.4xlarge.nvidia.gpu)

Ignored signal

#76319 breaks ONNX tests (which also show up on the PR) https://hud.pytorch.org/minihud?name_filter=pull%20/%20linux-xenial-py3.7-clang7-onnx%20/%20test%20(default,%202,%202,%20linux.2xlarge)

Land races

#77226 broke lint --> lint would be great to add on mergequeue!
#76675 broke lint on trunk
#76812

Weird

#76738 broke XLA due to a land race...with pytorch/xla
#77405 reverted for potentially causing SIGIOT flakiness, but later discussion showed that this PR was not the culprit
#77279 broke CIFlow functionality
#77142 broke torchvision with circular import
#77364 breaks the Create Release workflow on trunk. https://hud.pytorch.org/minihud?name_filter=Create%20Release%20/%20Create%20Release
#76875 potentially broke torchvision
#76984 broke lint on trunk due to inconsistencies between pr/trunk lint
#77100 broke 2d functionality (untested?)

GitHub first-related

#76823 broke internal tests
#72710 breaks internal builds by introducing unused capture
#76711 breaks internal builds (as any autogen changes currently do), see D36250347
#73434 breaks backward compatibility of torchaudio models (internally discovered)
#72935 and dependent change #73803, #73804, #73806 breaks internal build
#76812 REVERTED A SECOND TIME for being a dependent change on #72935

@janeyx99
Copy link
Contributor

5/9 –5/16

No PR signal

  1. [PyTorch] Record Sequence Number to Match Forward and Backward Operators #78795 on behalf of https://github.com/janeyx99 due to Broke profiler tests https://hud.pytorch.org/pytorch/pytorch/commit/a299a2fa262ba26a2aa3519d2b6d27a33e82f580
  2. [ci] remove IS_GHA env var #79219 on behalf of https://github.com/malfet due to Broke binary jobs see https://hud.pytorch.org/pytorch/pytorch/commit/1a2d95c68a11dd95b4699e7a74d679d8b9a5fb5a
  3. kl_div: fix for grads wrt target, double backward, forward-over-reverse AD support. #79007 on behalf of https://github.com/janeyx99 due to Broke test_fn_fwgrad_bwgrad_nn_functional_kl_div_cpu_float64 on trunk https://hud.pytorch.org/minihud?name_filter=pull%20/%20linux-xenial-py3.7-clang7-asan%20/%20test%20(default,%202,%205,%20linux.2xlarge)
  4. Support both train / eval modes for ModuleInfo #78735 on behalf of https://github.com/malfet due to Broke eval tests on Win, 10.2 and ROCM, see https://hud.pytorch.org/pytorch/pytorch/commit/12658fcd5bdf4d2437754633b3fa39ab15d213b9
  5. Port index.Tensor to structured kernels. #69607 on behalf of https://github.com/zengk95 due to This is breaking mac trunk tests https://hud.pytorch.org/pytorch/pytorch/commit/cfd84125bdb841f0efe038988bb87d645f419338
  6. [Profiler] Move python tracing to unified event type (Part 2) #78164 on behalf of https://github.com/malfet due to Broke cuda-on-cpu tests, see https://hud.pytorch.org/pytorch/pytorch/commit/c2a3c8186c3f3798684cecd60d62a991c223eeef
  7. Test torch._refs with aten and nvfuser executors #78926 on behalf of https://github.com/malfet due to breaks rocms, see https://hud.pytorch.org/pytorch/pytorch/commit/d4eebca7bc14b28688914577f690b68313cb846f
  8. Wrote stubbed out test cases for isGreen function to verify if a commit SHA is promote-able #78932 on behalf of https://github.com/janeyx99 due to Broke ROCm tests when introducing rockset requirement https://hud.pytorch.org/minihud#fc3a5d81171ba465e3d62ffbda9713b53039b5f0

Ignored signal

  1. turn on -Werror=unused-function in our Bazel CPU build #79154 on behalf of https://github.com/malfet due to Breaks bazel build: https://hud.pytorch.org/pytorch/pytorch/commit/67d313a03259be4da7a1d623a9df6791e02248e8
  2. [JIT] Add mutation checks for tensor inputs #79078 on behalf of https://github.com/davidberard98 due to broke bazel build-and-test, see https://github.com/pytorch/pytorch/runs/6836001002?check_suite_focus=true
  3. fix _unsafe_view schema to work with functionalization #79148 on behalf of https://github.com/janeyx99 due to Broke 11.3 tests on trunk and on PR, see https://hud.pytorch.org/pytorch/pytorch/commit/46234df5f12e62b891be4ef4574bfa5380c0ad21
  4. Added {logical_not, trace} refs, moved logical ops to use method overloads #79000 on behalf of https://github.com/malfet due to Introduces test failure, see https://hud.pytorch.org/pr/79000
  5. Added kl_div_backward decomp #79001 on behalf of https://github.com/malfet due to PR failed in newly added tests, see https://hud.pytorch.org/pr/79001
  6. formatted _decomp folder with black #79002 on behalf of https://github.com/janeyx99 due to Broke decomp tests on trunk + also on PR https://hud.pytorch.org/minihud#4945c72151e29cb524974e1714654cf790ddb37d
  7. add some instructions for ios test #79097 on behalf of https://github.com/kit1980 due to ios tests failed on both PR and trunk, see https://hud.pytorch.org/pytorch/pytorch/commit/e1534e3fe7ac48203fb13e3d62c6389c9a52212d
  8. add non-kwarg device and _like constructors #78536 on behalf of https://github.com/janeyx99 due to Broke meta tests on trunk and on PR https://github.com/pytorch/pytorch/runs/6765692797?check_suite_focus=true

Land races

  1. Fixes maybe_broadcast to actually broadcast only when needed #79298 Broke FakeTensor tests on master, see: https://hud.pytorch.org/pytorch/pytorch/commit/1cb1c2c08cf8d4ce62d3d861b931df2e579872c0
  2. [nvfuser_upstream_push] nvfuser code base bump 060822 #79147 Broke 11.3 builds on trunk https://hud.pytorch.org/pytorch/pytorch/commit/49c41b87a28848655d7fe8168e9a717d53508609
  3. Revert "Revert "Added {logical_not, trace} refs, moved logical ops to use method overloads"" #79224 on behalf of https://github.com/suo due to broke lots of things https://hud.pytorch.org/pytorch/pytorch/commit/a2d2981e8e5e128d4e642494bbb5e037ae9933ff
  4. [reland] Support multi-dimensional lengths in segment_reduce to support pytorch_scatter.segment_* functionalities (CUDA) #77061 on behalf of https://github.com/janeyx99 due to Broke segment_reduce tests on trunk, e.g., https://hud.pytorch.org/pytorch/pytorch/commit/40f7ef1f3db9717d8149a0bd1e8b8c80c8600753
  5. Make ShufflerDataPipe deterministic for persistent DL and distributed DL #78765 on behalf of https://github.com/janeyx99 due to broke lint on trunk

Weird

  1. [mergebot] Make merge on green default behavior #79199 on behalf of https://github.com/zengk95 due to messed up on-mandatory which is a functional issue
  2. [hot fix] Disable MPS tests as machines are down #79119 on behalf of https://github.com/malfet due to Runners are back online
  3. Default on green #78811 on behalf of https://github.com/zengk95 due to This does not have force in there

GitHub first-related

  1. [JIT] Propagate profiled information to DifferentiableGraph outputs #78875 due to Internal failures were bisected to this change
  2. [mobile] Fix lightweight dispatch OOM error by introducing selective build #78983 on behalf of https://github.com/osalpekar due to broke internal mobile tests
  3. moved logit to use torch ops instead of refs + added a couple more decompositions #78984 on behalf of https://github.com/osalpekar due to broke some jobs, like meta functorch builds

@zengk95
Copy link
Contributor
zengk95 commented Jun 22, 2022

6/13 - 6/ 20

No PR Signal

#74813 on behalf of https://github.com/janeyx99 due to Broke slow tests in cuda 10.2 https://github.com/pytorch/pytorch/runs/6944238177?check_suite_focus=true

#77591 on behalf of https://github.com/zengk95 due to this is breaking linux slow test on trunk

#79470 on behalf of https://github.com/zengk95 due to typo in workflow

#79548 on behalf of https://github.com/malfet due to Broke on trunk, see https://hud.pytorch.org/pytorch/pytorch/commit/e479daed78b80179466c7233203c98a3fc0fc117

#79465 on behalf of https://github.com/zengk95 due to this broke X linux-xenial-py3.7-clang7-onnx / test (default, 1, 2, linux.2xlarge). Not sure why since it passed on pull.

ignored signal

#79318 on behalf of https://github.com/janeyx99 due to Broke dropout tests on trunk, also errors on PR

land race

Weird

#78257 on behalf of https://github.com/malfet due to This breaks executorch

#79617 on behalf of https://github.com/zengk95 due to this is breaking periodic jobs (and maybe pull) on trunk

#78135 on behalf of https://github.com/ezyang due to broke torchvision tests

#79494 on behalf of https://github.com/ezyang due to conflicts with earlier diff that needs revert

#78473 on behalf of https://github.com/malfet due to Seems to broke Mac tests, see https://hud.pytorch.org/pytorch/pytorch/commit/
a9f6a35

Github First

#79655 on behalf of https://github.com/facebook-github-bot due to Diff internally

#79281 on behalf of https://github.com/bigfootjon due to Diff internally

#79370 on behalf of https://github.com/facebook-github-bot due to Diff internally

#79397 on behalf of https://github.com/facebook-github-bot due to Diff internally

#79443 on behalf of https://github.com/facebook-github-bot due to Diff internally

#79034 on behalf of https://github.com/facebook-github-bot due to Diff internally

#78907 on behalf of https://github.com/osalpekar due to Caused Typecasting errors in PT Distributed and fx2trt builds internally

@ZainRizvi ZainRizvi assigned ZainRizvi and unassigned ZainRizvi Jul 19, 2022
@janeyx99
Copy link
Contributor

7/15 - 7/22

No Signal

  1. [Profiler] Use parent time for implicitly finished Torch ops #80810 for breaking iOS builds, see https://github.com/pytorch/pytorch/runs/7372398950?check_suite_focus=true
  2. [Profiler] Move Kineto activity generation into collection.cpp #80796 Broke OSS iOS builds, see https://hud.pytorch.org/pytorch/pytorch/commit/24e6b60be29d2e53b967809067fff392c0591e53
  3. Adding fsdp fp16 and bf16 hooks #80557 for breaking distributed tests on trunk
  4. Adding fsdp fp16 and bf16 hooks #80557 for breaking dist tests
  5. Enable reentrant dispatch for decompositions #81598 for breaking out of tree failures
  6. FIX make sure we import the correct object from multiprocessing #53282 broke 10.2 tests on trunk
  7. [complex] conv_transpose1d #79694 broke slow tests
  8. Add python stack tracing option on on-demand flow #80919 broke buck build/test https://hud.pytorch.org/pytorch/pytorch/commit/f50a248a5eacb9a9aa475a9e610486aea136e4f5
  9. Support non-standard bools in CUDA mode #79393 for breaking 10.2 build

GH1

  1. Call lift_fresh after scalar_to_tensor in composite derivative formulas #81609 for breaking internal builds
  2. [fix] allow saving python attr on Tensor and Parameter via torch.save #81616 for breaking internal builds
  3. Enabling SymInt in autograd; take 3 #81145 for breaking internal builds
  4. Add should_traverse_fn to torch.fx.node.map_aggregate #81510 reverted internally by facebook-github-bot
  5. https://api.github.com/repos/pytorch/pytorch/issues/81441 reverted internally by facebook-github-bot
  6. [Py3.10] Allow floats to be imported as Long #81372 for breaking internal build
  7. [CI] Move CUDA-11.6 to Python-3.10 configuration #81233 companion to [Py3.10] Allow floats to be imported as Long #81372
  8. Recursively print graph module and its submodule #81080 for breaking internal builds
  9. [complex] conv_transpose1d #79694 for breaking internal builds
  10. [FX] Fix PyTree unpacking carrying forward type annotations #81906 for breaking internal builds
  11. Refactored prim utils into _prims_utils folder #81088 for breaking internal builds
  12. Implement mul(dense, sparse), mul(sparse, dense) for sparse COO tensors. #81556 for breaking internal builds
  13. [Reland] Add should_traverse_fn to torch.fx.node.map_aggregate #81695 by facebook-github-bot

Weird

  1. Add more functorch shards to PR CI #81919 since test skips were not working
  2. Disable use_mkldnn when input is not contiguous for oneDNN #80864 reverting due to a perf regression Performance Signal Detected by TorchBench CI on '1.13.0.dev20220718+cu113' benchmark#1040
  3. [ROCm] Temporarily disabling ROCm CI job #81646 restoring ROCm jobs

@ZainRizvi
Copy link
Contributor

Closing since we no longer us this tracker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: ci Related to continuous integration triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

9 participants
0