Implement MKLGenerator #151218

michalowski-arm · 2025-04-14T11:06:42Z

This PR aims to fix the issue from #132395 by implementing a new MKLGeneratorImpl that stores a consistent, global vslStream for use in random numbers generation. This path was previously disabled due to a problem of repeating variates, caused by repeated reseeding of the MKL generator with variates from the CPUGenerator. This new implementation only seeds the MKLGenerator once using the CPUGenerator, and then keeps reusing the same vslStream, providing the full period of the RNG.

For the sake of reproducibility, the saving and restoring of the MKLGenerator has been linked to CPUGenerator state changes, and the former does not provide its own get_state() and set_state() functionality. The point was to keep the user experience identical to before -- they do not need to handle a separate MKLGenerator explicitly.

There already exists a test to check for repetition based on the script from #132395. It can be found test_distribution.py as test_multinomial_sequential_draw(). For the old (reseeded) implementation of the MKL vslStream, this test showed 21 repetitions. For this new implementation, the test gives 0 repetitions as expected.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

pytorch-bot · 2025-04-14T11:06:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151218

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit c91e817 with merge base cd995bf ():

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / cuda12.8-py3.10-gcc9-sm75 / test (pr_time_benchmarks, 1, 1, linux.g4dn.metal.nvidia.gpu, unstable) (gh) (#153987)
MISSING REGRESSION TEST

This comment was automatically generated by Dr. CI and updates every 15 minutes.

michalowski-arm · 2025-04-14T11:09:15Z

CC @nikhil-arm @fadara01 @agrawal-aka @malfet @CaoE

michalowski-arm · 2025-04-14T11:10:49Z

@pytorchbot label "topic: not user facing"

nikhil-arm · 2025-04-14T11:17:31Z

@pytorchbot label "ciflow/linux-aarch64"

pytorch-bot · 2025-04-14T11:17:41Z

To add the ciflow label ciflow/linux-aarch64 please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

nikhil-arm · 2025-04-14T11:18:14Z

@pytorchbot label "ciflow/linux-aarch64"

CaoE · 2025-04-15T05:06:01Z

aten/src/ATen/native/cpu/DistributionKernels.cpp

+          {
+            // See Note [Acquire lock when using random generators]
+            std::lock_guard<std::mutex> lock(mklGenerator->mutex_);
+            mklGenerator->get_stream_copy(stream);


Isn't get_stream_copy thread-safe ?

malfet · 2025-04-15T21:03:51Z

Those failures can not be a coincidence, are they?

leslie-fang-intel · 2025-04-17T01:17:13Z

Hi @michalowski-arm @CaoE, the CI failure seems related. Could you take a look?

CaoE · 2025-04-17T01:27:31Z

@leslie-fang-intel Of course, and I am glad to help promote the progress of this PR.

michalowski-arm · 2025-04-17T11:14:42Z

At least some of the failures are due to not handling torch.manual_seed appropriately. It resets CPUGenerator to the beginning of the input seed's sequence (and the tests are expecting that reset) but MKLGenerator is unaffected and just continues its old sequence.

michalowski-arm · 2025-04-17T13:04:53Z

@pytorchbot label "ciflow/linux-aarch64"

pytorch-bot · 2025-04-17T13:05:02Z

To add these label(s) (ciflow/linux-aarch64) to the PR, please first approve the workflows that are awaiting approval (scroll to the bottom of this page).

This helps ensure we don't trigger CI on this PR until it is actually authorized to do so. Please ping one of the reviewers if you do not have access to approve and run workflows.

aten/src/ATen/mklrng/MKLGeneratorImpl.cpp

CaoE · 2025-04-21T09:14:32Z

There are some errors in test_distributions.py, for example, test_cdf_icdf_inverse. My local tests show that using VSL_BRNG_MT19937 can pass the test. There may be other tests that will fail and can be made to pass by modifying sample length.

fadara01 · 2025-04-30T11:36:07Z

@pytorchbot rebase

pytorchmergebot · 2025-04-30T11:37:44Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-04-30T11:37:49Z

Successfully rebased mklrng-base onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout mklrng-base && git pull --rebase)

fadara01 · 2025-05-08T07:33:09Z

@michalowski-arm can we try reproducing this locally on an x86 machine?
It might give us a longer stack trace or more insights about the failure.

You can pull the docker image from pytorch/pytorch-linux-jammy-py3-clang15-asan as done here then download the build and test artifacts for linux-jammy-clang15-asan from https://hud.pytorch.org/pr/151218
and then just follow the "Test" steps in the pipeline.

michalowski-arm · 2025-05-08T09:02:39Z

@michalowski-arm can we try reproducing this locally on an x86 machine? It might give us a longer stack trace or more insights about the failure.

You can pull the docker image from pytorch/pytorch-linux-jammy-py3-clang15-asan as done here then download the build and test artifacts for linux-jammy-clang15-asan from https://hud.pytorch.org/pr/151218 and then just follow the "Test" steps in the pipeline.

Yes, I've been looking into it

CaoE · 2025-05-23T08:44:03Z

@michalowski-arm I downloaded the image locally docker pull ghcr.io/pytorch/ci-image:pytorch-linux-jammy-py3-clang15-asan-90e2f0da05deb1b09040f7c1ee78ad0f34911877, and reproduced the Intel MKL function load error: __vslNewStreamEx. as done here.
I found that the MKL 2021.4.0 used in the conda environment py_3.10 of this image is relatively old. I manually deleted the old mkl and pip install mkl-static mkl-include, and then found that the issue was gone.

It seems that CI requires an upgraded version of MKL.

fadara01 · 2025-05-23T09:16:59Z

@CaoE thank you for your insights!

I found that the MKL 2021.4.0 used in the conda environment py_3.10

do you guys have a preference as to which version we should update to?

CaoE · 2025-05-23T09:18:43Z

@fadara01 @abullabib I tried MKL 2025.1 and 2024.2 (used in torch release 2.7), and they can fix the error. Let's see the result of #154198.

CaoE · 2025-05-28T01:31:51Z

@malfet I have tired to upgrade MKL to 2024.2.0 in CI #154198, but there are many "Inconsistent configuration parameters" errors appears. The new MKL has made some changes to DFTI. The MKL version in pytorch releases is inconsistent with the MKL used by CI, which causes these errors to not be triggered in CI.
The torch FFT ops seem to need to be fixed for these "Inconsistent configuration parameters" issues #154477.

CaoE · 2025-05-28T05:38:04Z

I added the fix in this PR #154198.

nikhil-arm · 2025-05-29T07:47:45Z

@pytorchbot rebase

pytorchmergebot · 2025-05-29T07:49:13Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-05-29T07:49:16Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/151218/head returned non-zero exit code 1

Rebasing (1/7)
Rebasing (2/7)
Rebasing (3/7)
Rebasing (4/7)
Rebasing (5/7)
Rebasing (6/7)
Rebasing (7/7)
Auto-merging benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
CONFLICT (content): Merge conflict in benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
error: could not apply 4c664feeebd... Update expected benchmark results
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 4c664feeebd... Update expected benchmark results

Raised by https://github.com/pytorch/pytorch/actions/runs/15319034371

michalowski-arm · 2025-07-04T09:12:28Z

@pytorchbot rebase

pytorchmergebot · 2025-07-04T09:14:02Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2025-07-04T09:14:06Z

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/151218/head returned non-zero exit code 1

Rebasing (1/9)
Rebasing (2/9)
Rebasing (3/9)
Rebasing (4/9)
Rebasing (5/9)
Rebasing (6/9)
Rebasing (7/9)
Auto-merging benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
CONFLICT (content): Merge conflict in benchmarks/dynamo/pr_time_benchmarks/expected_results.csv
error: could not apply 4c664feeebd... Update expected benchmark results
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply 4c664feeebd... Update expected benchmark results

Raised by https://github.com/pytorch/pytorch/actions/runs/16070267154

* Implements MKLGeneratorImpl which uses MKL/OpenRNG to generate random variates and keeps a consistent global state. * Links MKLGenerator state change to CPUGenerator state changes

michalowski-arm · 2025-07-10T14:31:34Z

@nikhil-arm @fadara01 @agrawal-aka @malfet @CaoE The PR now passes all checks, could I get a review

nikhil-arm · 2025-07-17T13:27:13Z

CI is passing, review comments are addressed. 1 Ci failure seems to be unrelated.
Ready for merge

pytorch-bot bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Apr 14, 2025

michalowski-arm changed the title ~~[Draft] Implement MKLGenerator~~ Implement MKLGenerator Apr 14, 2025

pytorch-bot bot added the topic: not user facing topic category label Apr 14, 2025

pytorchbot added the open source label Apr 14, 2025

pytorch-bot bot added the ciflow/linux-aarch64 linux aarch64 CI workflow label Apr 14, 2025

nikhil-arm requested a review from digantdesai April 14, 2025 11:17

pytorch-bot bot removed the ciflow/linux-aarch64 linux aarch64 CI workflow label Apr 14, 2025

nikhil-arm requested a review from malfet April 14, 2025 11:17

pytorch-bot bot added the ciflow/linux- 8000 aarch64 linux aarch64 CI workflow label Apr 14, 2025

nikhil-arm requested review from Ryo-not-rio and fadara01 April 14, 2025 11:19

CaoE reviewed Apr 15, 2025

View reviewed changes

CaoE mentioned this pull request Apr 15, 2025

[release] CPU perf benchmark latency increase for 2.6->2.7 on c5.24xlarge and A100 instances #151037

Open

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 16, 2025

CaoE mentioned this pull request Apr 17, 2025

torch.multinomial generates incorrect distribution #132395

Open

pytorch-bot bot removed the ciflow/linux-aarch64 linux aarch64 CI workflow label Apr 17, 2025

CaoE reviewed Apr 21, 2025

View reviewed changes

aten/src/ATen/mklrng/MKLGeneratorImpl.cpp Outdated Show resolved Hide resolved

pytorchmergebot force-pushed the mklrng-base branch from 2447019 to 4c664fe Compare April 30, 2025 11:37

michalowski-arm requested a review from CaoE May 1, 2025 12:41

CaoE mentioned this pull request May 23, 2025

implement MKLGenerator #154199

Draft

michalowski-arm and others added 8 commits July 10, 2025 09:24

Implement MKLGeneratorImpl

7c42e2e

* Implements MKLGeneratorImpl which uses MKL/OpenRNG to generate random variates and keeps a consistent global state. * Links MKLGenerator state change to CPUGenerator state changes

Use MKLGeneratorImpl in DistributionKernels.cpp

a2c3360

Link MKLGenerator seed change to CPUGenerator seed changes

b696425

Use VSL_BRNG_PHILOX4X32X10 instead of VSL_BRNG_MCG59

80a27ea

Allow more samples for convergence in tests of exponential

d6d52dd

Update the bazel build to include mklrng

8415b91

Update test_distributions.py

19c96f5

Update test_variance

c91e817

michalowski-arm force-pushed the mklrng-base branch from e9a6b42 to c91e817 Compare July 10, 2025 10:51

Implement MKLGenerator #151218

Are you sure you want to change the base?

Implement MKLGenerator #151218

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/151218

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!