8000 Tags · gilfree/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Tags: gilfree/pytorch

Tags

ciflow/trunk/78693

Toggle ciflow/trunk/78693's commit message
Update on "Add tests for SyncBatchNorm with CUDA graph capturing"

[ghstack-poisoned]

ciflow/trunk/78692

Toggle ciflow/trunk/78692's commit message
Update nccl to v2.12.12-1

[ghstack-poisoned]

ciflow/trunk/78690

Toggle ciflow/trunk/78690's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
Merge pull request pytorch#19 from kulinseth/denis/cpu_to_mps_view_copy

Fix view copies for cpu->mps case; add testcase

ciflow/trunk/78683

Toggle ciflow/trunk/78683's commit message
Remove mentions of deleted TH and friends (pytorch#78683)

Summary: Pull Request resolved: pytorch#78683

Test Plan: No-op, rely on CI.

Reviewed By: dagitses

Differential Revision: D36829491

fbshipit-source-id: a57c07ad239f55e0096424e1517a8b5b38798fd5

ciflow/trunk/78666

Toggle ciflow/trunk/78666's commit message
Update on "Avoid CPU Sync in SyncBatchNorm When Capturing CUDA Graphs"

We recently updated `SyncBatchNorm` to support empty input batches.
The new code removes stats from ranks with empty inputs. However,
this change breaks CUDA graph capture as it forces CPU sync. This
commit uses `is_current_stream_capturing()` to guard the new code
path, and only run the new code when not capturing CUA Graphs. To
support empty inputs with CUDA graph capturing, we might need to
update CUDA kernels for `batch_norm_backward_elemt` and
`batch_norm_gather_stats_with_counts`. See pytorch#78656.

Fixes pytorch#78549

Differential Revision: [D36826558](https://our.internmc.facebook.com/intern/diff/D36826558)

[ghstack-poisoned]

ciflow/trunk/78345

Toggle ciflow/trunk/78345's commit message
[1] move simple pytorch buck targets to the shared build file (part 1) (

pytorch#78345)

Summary:
Pull Request resolved: pytorch#78345

This diff moved a few BUCK targets (mostly headers) to the shared build file, which will be used for both internal and OSS BUCK build

Test Plan: sandcaslte, OSS BUCK CI

Differential Revision: D36694963

fbshipit-source-id: 99d53dfbfea76d6f301964aa2cc2913a625a6536

ciflow/trunk/78225

Toggle ciflow/trunk/78225's commit message
Merge remote-tracking branch 'upstream/master' into allow-cuda-extens…

…ion-rdc

ciflow/all/78080

Toggle ciflow/all/78080's commit message
Add onlyNativeDeviceTypes to roll's OpInfo because XLA is failing

ciflow/trunk/78672

Toggle ciflow/trunk/78672's commit message
Fuse matmul in row-wise sharded linear to have a single matmul.

Performing a single large matmul is more efficient than having to
perform multiple matmuls in a loop.

Similar improvement to pytorch#78449

Differential Revision: [D36828505](https://our.internmc.facebook.com/intern/diff/D36828505/)

[ghstack-poisoned]

ciflow/trunk/78665

Toggle ciflow/trunk/78665's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
fix for rocm

0