8000 Github outage resulting in jobs failing at checkout · Issue #155829 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
Github outage resulting in jobs failing at checkout #155829
Closed
@clee2000

Description

@clee2000

NOTE: Remember to label this issue with "ci: sev"

See https://www.githubstatus.com/incidents/d9xd9k1j6sl0

At the time of creating this issue, I think the effect of the incident are no longer happening, but I'm creating this issue anyways in case we need follow ups

Current Status

Status could be: preemptive, ongoing, mitigated, closed. Also tell people if they need to take action to fix it (i.e. rebase).
closed?

Error looks like

https://github.com/pytorch/pytorch/actions/runs/15618165793/job/43996868331

GH jobs failing at checkout step

  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/python-peachpy'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen'...
  Error: fatal: remote error: GitLab is currently unable to handle this request due to load (ID 01JXJQ2JT2E5G1N9QH5WW0NGSE).
  Error: fatal: clone of 'https://gitlab.com/libeigen/eigen.git' into submodule path '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/eigen' failed
  Failed to clone 'third_party/eigen' a second time, aborting
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/kineto'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/flatbuffers'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/tensorpipe'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/sleef'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/pybind11'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/fbgemm'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/cutlass'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/onnx'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/composable_kernel'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/opentelemetry-cpp'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/nlohmann'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/protobuf'...
  Cloning into '/home/ec2-user/actions-runner/_work/pytorch/pytorch/third_party/XNNPACK'...
  Error: The process '/usr/bin/git' failed with exit code 1

Incident timeline (all times pacific)

Include when the incident began, when it was detected, mitigated, root caused, and finally closed.

Reported by nikita at 11:36 AM

Not sure which GH job failed first but:
https://hud.pytorch.org/hud/pytorch/pytorch/7986c0dba6e1044d90b7f607f9cca15922339bb4/1?per_page=100&mergeEphemeralLF=true

User impact

How does this affect users of PyTorch CI?
Failing jobs at checkout

Root cause

What was the root cause of this issue?
GH incident

Mitigation

How did we mitigate the issue?

Prevention/followups

How do we prevent issues like this in the future?

Metadata

Metadata

Assignees

No one assigned

    Labels

    ci: sevcritical failure affecting PyTorch CI

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0