8000 Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ by IvanYashchuk · Pull Request #68812 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ #68812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 13 commits into from

Conversation

IvanYashchuk
Copy link
Collaborator
@IvanYashchuk IvanYashchuk commented Nov 23, 2021

Fixes #67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative info error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

UPDATE:
MKL 2022 has uses the latest reference LAPACK behavior and returns the same info as OpenBLAS 0.3.15+
This PR fixes #71645 that is due to the updated MKL version in CI.

Fixes pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when
inputs contain non-finite numbers. In PyTorch we raise an internal
assert error for negative `info` error codes because usually it would
indicate wrong implementation. However, this is not the case with SVD
now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a
postive error code for this kind of input. This change alligns the
OpenBLAS and MKL behavior in our code.
@pytorch-probot
Copy link
CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/IvanYashchuk/pytorch/blob/4238efbbbf92950a8b1da2ec5bdeaa331441f0c6/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
macos-10-15-py3-arm64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64 ciflow/all, ciflow/macos 🚫 skipped
macos-10-15-py3-x86-64 ciflow/all, ciflow/macos 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Contributor
facebook-github-bot commented Nov 23, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 2ddf4ed (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See GitHub Actions build linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-03T08:02:13.8058899Z The PR is introduc...m to confirm whether this change is wanted or not.
2022-02-03T08:02:13.8032975Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-02-03T08:02:13.8034034Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-02-03T08:02:13.8035246Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-02-03T08:02:13.8036568Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-02-03T08:02:13.8038332Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-02-03T08:02:13.8039043Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-02-03T08:02:13.8040588Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-02-03T08:02:13.8041242Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-02-03T08:02:13.8043334Z processing existing schema:  _dump_stats(__torch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-02-03T08:02:13.8058449Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-02-03T08:02:13.8058899Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-02-03T08:02:13.8058921Z 
2022-02-03T08:02:13.8059047Z Broken ops: [
2022-02-03T08:02:13.8059305Z 	aten::bilinear(Tensor input1, Tensor input2, Tensor weight, Tensor? bias=None) -> (Tensor)
2022-02-03T08:02:13.8059366Z ]
2022-02-03T08:02:13.8846016Z + cleanup
2022-02-03T08:02:13.8846144Z + retcode=1
2022-02-03T08:02:13.8847220Z + set +x
2022-02-03T08:02:13.8885324Z ##[error]Process completed with exit code 1.
2022-02-03T08:02:13.8911992Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-02-03T08:02:13.8912145Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

// Making the behaviour different from MKL
// Here we check for the case where `info` is -4 and set it to a positive number
// This will give the same error message as with MKL
#if AT_BUILD_WITH_OPENBLAS()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be unconditional?

According to the LAPACK documentation in 3.10, this is expected not only for OPENBLAS, but for all LAPACK-complaint backends. See
https://www.netlib.org/lapack/explore-html/d3/da8/group__complex16_g_esing_gaccb06ed106ce18814ad7069dcb43aa27.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyTorch's build system doesn't allow mixing different implementations of BLAS and LAPACK. Both must be from OpenBLAS, or MKL, or Accelerate/vecLib, ... .
MKL 2021.4 is not compliant with the latest Reference LAPACK implementation.
https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/lapack-routines/lapack-least-squares-and-eigenvalue-problem/lapack-least-squares-eigenvalue-problem-driver/singular-value-decomposition-lapack-driver/gesdd.html?wapkw=gesdd?wapkw=gesdd
There is no documentation for Apple's Accelerate/vecLib.

There are really only two used BLAS&LAPACK implementations for CPU computing which are MKL and OpenBLAS. We can safely ignore other options until someone actually complains.

Other implementations allowed by pytorch/cmake/Modules/FindLAPACK.cmake:

  • GotoBLAS2 - not maintained anymore
  • FLAME - can't find documentation, but the code didn't change from 2014
  • AMD Core Math Library (ACML) - not maintained anymore

Switching the -4 value of info for any LAPACK implementation shouldn't hurt anything.
We could also specialize SVD's error with info = -4 to Sorry, the input contained non-finite values and the computation cannot be completed, but we don't have the same info from MKL to raise the same error.

Copy link
Collaborator
@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

8000

Fair enough then.

On a related but different topic. I still like the idea of, if we get an info > 0, rather than raising that obscure "wrong implementation" error, first check for NaNs. That would account for this case, and I suspect that also for a number of other cases.

@IvanYashchuk
Copy link
Collaborator Author

@mruberry, could you please take a look and merge if the change looks reasonable to you?

Copy link
Collaborator
@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

Thanks @IvanYashchuk

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@mruberry
Copy link
Collaborator

Some internal builds are complaining with

stderr: aten/src/ATen/native/BatchLinearAlgebra.cpp:3006:5: error: invalid token at start of a preprocessor expression
#if AT_BUILD_WITH_OPENBLAS()

Not sure what's going on there

@dagitses
Copy link
Collaborator

Some internal builds are complaining with

stderr: aten/src/ATen/native/BatchLinearAlgebra.cpp:3006:5: error: invalid token at start of a preprocessor expression
#if AT_BUILD_WITH_OPENBLAS()

Not sure what's going on there

This looks like an internal build issue and some internal build files will need to define the expansion like this PR does in BUILD.bazel.

@IvanYashchuk
Copy link
Collaborator Author

@dagitses, @mruberry is there anything on my side I could help with to merge this PR?

@mruberry
Copy link
Collaborator

@dagitses, @mruberry is there anything on my side I could help with to merge this PR?

Not that I know of at this time, sorry. It's just going to require some work internally.

@IvanYashchuk
Copy link
Collaborator Author

Latest MKL fails with the same error as OpenBLAS 0.3.15: #71645
I need to modify this PR to cover MKL as well.

@ngimel
Copy link
Collaborator
ngimel commented Jan 23, 2022

@IvanYashchuk please ping me when you rebase and update PR.

@IvanYashchuk IvanYashchuk changed the title Fix SVD error code handling for OpenBLAS 0.3.15+ Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ Jan 27, 2022
@IvanYashchuk
Copy link
Collaborator Author

Alright, ROCm build is now successful: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.5-py3.7-trigger/5021/ and all other CI jobs are green as well.

@IvanYashchuk
Copy link
Collaborator Author

@mruberry, I modified this PR slightly so that there are no conflicts with #72125.

cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 3, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: osalpekar

Differential Revision: D32626563

Pulled By: ngimel

fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c
(cherry picked from commit aadf507)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 3, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: mrshenli

Differential Revision: D33844257

Pulled By: ngimel

fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e
(cherry picked from commit c00a9bd)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 3, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: osalpekar

Differential Revision: D32626563

Pulled By: ngimel

fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c
(cherry picked from commit aadf507)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 3, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: mrshenli

Differential Revision: D33844257

Pulled By: ngimel

fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e
(cherry picked from commit c00a9bd)
@mruberry mruberry added this to the 1.11.0 milestone Feb 3, 2022
@IvanYashchuk
Copy link
Collaborator Author

This PR fixes false triggers of TORCH_INTERNAL_ASSERT that suggest users file a bug report (see #71645). Therefore I think it's important to include this PR in the next 1.11 release, users should see a nicer error message.

F438

@IvanYashchuk
Copy link
Collaborator Author

@ngimel, could you please import this PR? Here is the HUD for the CI, failed tests on backwards compatibility are not related.

@ngimel
Copy link
Collaborator
ngimel commented Feb 4, 2022

Can you please open a new PR? I can't reimport this one.

@facebook-github-bot
Copy link
Contributor

This pull request has been reverted by bb6b501. To re-land this change, follow these steps.

facebook-github-bot pushed a commit that referenced this pull request Feb 7, 2022
#72357)

Summary:
This PR was opened as copy of #68812 by request #68812 (comment).

-----

Fixes #67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR also fixes #71645 that is due to the updated MKL version in CI.

Pull Request resolved: #72357

Reviewed By: albanD

Differential Revision: D34012245

Pulled By: ngimel

fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
pytorchmergebot pushed a commit that referenced this pull request Feb 7, 2022
#72357)

Summary:
This PR was opened as copy of #68812 by request #68812 (comment).

-----

Fixes #67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR also fixes #71645 that is due to the updated MKL version in CI.

Pull Request resolved: #72357

Reviewed By: albanD

Differential Revision: D34012245

Pulled By: ngimel

fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
(cherry picked from commit fa29e65)
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Feb 8, 2022
pytorch#72357)

Summary:
This PR was opened as copy of pytorch#68812 by request pytorch#68812 (comment).

-----

Fixes pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR also fixes pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch#72357

Reviewed By: albanD

Differential Revision: D34012245

Pulled By: ngimel

fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
(cherry picked from commit fa29e65)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: osalpekar

Differential Revision: D32626563

Pulled By: ngimel

fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c
(cherry picked from commit aadf507)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: mrshenli

Differential Revision: D33844257

Pulled By: ngimel

fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e
(cherry picked from commit c00a9bd)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
…) (#72357)

Summary:
This PR was opened as copy of pytorch/pytorch#68812 by request pytorch/pytorch#68812 (comment).

-----

Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR also fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#72357

Reviewed By: albanD

Differential Revision: D34012245

Pulled By: ngimel

fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
(cherry picked from commit fa29e65611ea5028bf6d2d3c151d79e6c9e4ffef)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: osalpekar

Differential Revision: D32626563

Pulled By: ngimel

fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c
(cherry picked from commit aadf507)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
Summary:
Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

**UPDATE:**
MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#68812

Reviewed By: mrshenli

Differential Revision: D33844257

Pulled By: ngimel

fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e
(cherry picked from commit c00a9bd)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Feb 9, 2022
…) (#72357)

Summary:
This PR was opened as copy of pytorch/pytorch#68812 by request pytorch/pytorch#68812 (comment).

-----

Fixes pytorch/pytorch#67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR also fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI.

Pull Request resolved: pytorch/pytorch#72357

Reviewed By: albanD

Differential Revision: D34012245

Pulled By: ngimel

fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
(cherry picked from commit fa29e65611ea5028bf6d2d3c151d79e6c9e4ffef)
atalman pushed a commit that referenced this pull request Feb 9, 2022
#72357) (#72513)

Summary:
This PR was opened as copy of #68812 by request #68812 (comment).

-----

Fixes #67693.

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+
This PR also fixes #71645 that is due to the updated MKL version in CI.

Pull Request resolved: #72357

Reviewed By: albanD

Differential Revision: D34012245

Pulled By: ngimel

fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
(cherry picked from commit fa29e65)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

INTERNAL ASSERT in svd_cpu test_svd_errors_and_warnings tests fail on CPU
10 participants
0