Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ #68812

IvanYashchuk · 2021-11-23T13:32:46Z

Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative info error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.

UPDATE:
MKL 2022 has uses the latest reference LAPACK behavior and returns the same info as OpenBLAS 0.3.15+
This PR fixes #71645 that is due to the updated MKL version in CI.

Fixes pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch we raise an internal assert error for negative `info` error codes because usually it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a postive error code for this kind of input. This change alligns the OpenBLAS and MKL behavior in our code.

pytorch-probot · 2021-11-23T13:32:49Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/IvanYashchuk/pytorch/blob/4238efbbbf92950a8b1da2ec5bdeaa331441f0c6/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
docker-builds	`ciflow/all`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
macos-10-15-py3-arm64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-10-15-py3-lite-interpreter-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
macos-10-15-py3-x86-64	`ciflow/all`, `ciflow/macos`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7-debug	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-11-23T13:32:52Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/68812
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 2ddf4ed (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-02-03T08:02:13.8058899Z The PR is introduc...m to confirm whether this change is wanted or not.

2022-02-03T08:02:13.8032975Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-02-03T08:02:13.8034034Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-02-03T08:02:13.8035246Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-02-03T08:02:13.8036568Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-02-03T08:02:13.8038332Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-02-03T08:02:13.8039043Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-02-03T08:02:13.8040588Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-02-03T08:02:13.8041242Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-02-03T08:02:13.8043334Z processing existing schema:  _dump_stats(__torch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-02-03T08:02:13.8058449Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-02-03T08:02:13.8058899Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-02-03T08:02:13.8058921Z 
2022-02-03T08:02:13.8059047Z Broken ops: [
2022-02-03T08:02:13.8059305Z 	aten::bilinear(Tensor input1, Tensor input2, Tensor weight, Tensor? bias=None) -> (Tensor)
2022-02-03T08:02:13.8059366Z ]
2022-02-03T08:02:13.8846016Z + cleanup
2022-02-03T08:02:13.8846144Z + retcode=1
2022-02-03T08:02:13.8847220Z + set +x
2022-02-03T08:02:13.8885324Z ##[error]Process completed with exit code 1.
2022-02-03T08:02:13.8911992Z ##[group]Run # Ensure the working directory gets chowned back to the current user
2022-02-03T08:02:13.8912145Z �[36;1m# Ensure the working directory gets chowned back to the current user�[0m

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

lezcano · 2021-11-23T13:36:29Z

aten/src/ATen/native/BatchLinearAlgebra.cpp

+// Making the behaviour different from MKL
+// Here we check for the case where `info` is -4 and set it to a positive number
+// This will give the same error message as with MKL
+#if AT_BUILD_WITH_OPENBLAS()


Should this be unconditional?

According to the LAPACK documentation in 3.10, this is expected not only for OPENBLAS, but for all LAPACK-complaint backends. See
https://www.netlib.org/lapack/explore-html/d3/da8/group__complex16_g_esing_gaccb06ed106ce18814ad7069dcb43aa27.html

PyTorch's build system doesn't allow mixing different implementations of BLAS and LAPACK. Both must be from OpenBLAS, or MKL, or Accelerate/vecLib, ... .
MKL 2021.4 is not compliant with the latest Reference LAPACK implementation.
https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/lapack-routines/lapack-least-squares-and-eigenvalue-problem/lapack-least-squares-eigenvalue-problem-driver/singular-value-decomposition-lapack-driver/gesdd.html?wapkw=gesdd?wapkw=gesdd
There is no documentation for Apple's Accelerate/vecLib.

There are really only two used BLAS&LAPACK implementations for CPU computing which are MKL and OpenBLAS. We can safely ignore other options until someone actually complains.

Other implementations allowed by pytorch/cmake/Modules/FindLAPACK.cmake:

GotoBLAS2 - not maintained anymore

FLAME - can't find documentation, but the code didn't change from 2014

AMD Core Math Library (ACML) - not maintained anymore

Switching the -4 value of info for any LAPACK implementation shouldn't hurt anything.
We could also specialize SVD's error with info = -4 to Sorry, the input contained non-finite values and the computation cannot be completed, but we don't have the same info from MKL to raise the same error.

lezcano

8000

Fair enough then.

On a related but different topic. I still like the idea of, if we get an info > 0, rather than raising that obscure "wrong implementation" error, first check for NaNs. That would account for this case, and I suspect that also for a number of other cases.

IvanYashchuk · 2021-11-23T18:10:11Z

@mruberry, could you please take a look and merge if the change looks reasonable to you?

mruberry

Cool!

Thanks @IvanYashchuk

facebook-github-bot · 2021-11-23T20:24:57Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-11-23T21:58:17Z

Some internal builds are complaining with

stderr: aten/src/ATen/native/BatchLinearAlgebra.cpp:3006:5: error: invalid token at start of a preprocessor expression
#if AT_BUILD_WITH_OPENBLAS()

Not sure what's going on there

dagitses · 2021-11-23T22:36:28Z

Some internal builds are complaining with
stderr: aten/src/ATen/native/BatchLinearAlgebra.cpp:3006:5: error: invalid token at start of a preprocessor expression
#if AT_BUILD_WITH_OPENBLAS()
Not sure what's going on there

This looks like an internal build issue and some internal build files will need to define the expansion like this PR does in BUILD.bazel.

IvanYashchuk · 2021-12-12T13:24:49Z

@dagitses, @mruberry is there anything on my side I could help with to merge this PR?

mruberry · 2021-12-16T21:12:16Z

@dagitses, @mruberry is there anything on my side I could help with to merge this PR?

Not that I know of at this time, sorry. It's just going to require some work internally.

IvanYashchuk · 2022-01-22T09:51:38Z

Latest MKL fails with the same error as OpenBLAS 0.3.15: #71645
I need to modify this PR to cover MKL as well.

ngimel · 2022-01-23T22:22:38Z

@IvanYashchuk please ping me when you rebase and update PR.

IvanYashchuk · 2022-02-02T08:11:31Z

Alright, ROCm build is now successful: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.5-py3.7-trigger/5021/ and all other CI jobs are green as well.

…her PR that is being merged

IvanYashchuk · 2022-02-02T08:19:57Z

@mruberry, I modified this PR slightly so that there are no conflicts with #72125.

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)

IvanYashchuk · 2022-02-03T15:51:10Z

This PR fixes false triggers of TORCH_INTERNAL_ASSERT that suggest users file a bug report (see #71645). Therefore I think it's important to include this PR in the next 1.11 release, users should see a nicer error message.

F438

IvanYashchuk · 2022-02-04T07:28:04Z

@ngimel, could you please import this PR? Here is the HUD for the CI, failed tests on backwards compatibility are not related.

ngimel · 2022-02-04T17:51:32Z

Can you please open a new PR? I can't reimport this one.

facebook-github-bot · 2022-02-05T00:02:53Z

This pull request has been reverted by bb6b501. To re-land this change, follow these steps.

#72357) Summary: This PR was opened as copy of #68812 by request #68812 (comment). ----- Fixes #67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes #71645 that is due to the updated MKL version in CI. Pull Request resolved: #72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310

#72357) Summary: This PR was opened as copy of #68812 by request #68812 (comment). ----- Fixes #67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes #71645 that is due to the updated MKL version in CI. Pull Request resolved: #72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65)

pytorch#72357) Summary: This PR was opened as copy of pytorch#68812 by request pytorch#68812 (comment). ----- Fixes pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch#72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65)

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)

…) (#72357) Summary: This PR was opened as copy of pytorch/pytorch#68812 by request pytorch/pytorch#68812 (comment). ----- Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65611ea5028bf6d2d3c151d79e6c9e4ffef)

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)

Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)

…) (#72357) Summary: This PR was opened as copy of pytorch/pytorch#68812 by request pytorch/pytorch#68812 (comment). ----- Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65611ea5028bf6d2d3c151d79e6c9e4ffef)

#72357) (#72513) Summary: This PR was opened as copy of #68812 by request #68812 (comment). ----- Fixes #67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes #71645 that is due to the updated MKL version in CI. Pull Request resolved: #72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65)

IvanYashchuk added the module: openblas label Nov 23, 2021

IvanYashchuk requested a review from mruberry November 23, 2021 13:32

IvanYashchuk requested review from lezcano and nikitaved as code owners November 23, 2021 13:32

pytorch-probot bot added the ciflow/default label Nov 23, 2021

facebook-github-bot added the cla signed label Nov 23, 2021

lezcano reviewed Nov 23, 2021

View reviewed changes

pytorchbot added the open source label Nov 23, 2021

lezcano approved these changes Nov 23, 2021

View reviewed changes

mruberry approved these changes Nov 23, 2021

View reviewed changes

IvanYashchuk mentioned this pull request Jan 22, 2022

INTERNAL ASSERT in svd_cpu #71645

Closed

This was referenced Jan 24, 2022

Release 1.10.1 and migrate protobuf and mkl conda-forge/pytorch-cpu-feedstock#84

Merged

BLAS options: OpenBLAS vs Accelerate #71712

Closed

IvanYashchuk added 5 commits January 27, 2022 11:37

Merge remote-tracking branch 'upstream/viable/strict' into openblas-svd

d78f923

Raise an error if info == -4

af8b678

Enable test_svd_errors_and_warnings

7dac9cc

Enable test_norm_extreme_values and skip 'nuc', 2, -2 ords

b52a037

Revert adding AT_BUILD_WITH_OPENBLAS

690c8d6

IvanYashchuk changed the title ~~Fix SVD error code handling for OpenBLAS 0.3.15+~~ Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ Jan 27, 2022

IvanYashchuk added 2 commits February 1, 2022 12:23

Merge remote-tracking branch 'upstream/master' into openblas-svd

146a128

Skip ROCm and add a link to the issue about MAGMA version

0196c04

mruberry mentioned this pull request Feb 1, 2022

[Meta] CI Revert Tracker #66178

Closed

Move the test to the other location to avoid merge conflict with anot…

4d12e72

…her PR that is being merged

ngimel added the ciflow/all label Feb 2, 2022

Merge remote-tracking branch 'upstream/viable/strict' into openblas-svd

2ddf4ed

mruberry added this to the 1.11.0 milestone Feb 3, 2022

IvanYashchuk closed this Feb 4, 2022

IvanYashchuk mentioned this pull request Feb 4, 2022

Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ (again) #72357

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ #68812

Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ #68812

Uh oh!

Uh oh!

⚛️ CI Flow

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ #68812

Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ #68812

Uh oh!

Conversation

Uh oh!

Uh oh!

⚛️ CI Flow

Uh oh!

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (1/1)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!