-
Notifications
You must be signed in to change notification settings - Fork 24.4k
Fix SVD error code handling for OpenBLAS 0.3.15+ and MKL 2022+ #68812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch we raise an internal assert error for negative `info` error codes because usually it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a postive error code for this kind of input. This change alligns the OpenBLAS and MKL behavior in our code.
CI Flow Status⚛️ CI FlowRuleset - Version:
You can add a comment to the PR and tag @pytorchbot with the following commands: # ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun
# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow For more information, please take a look at the CI Flow Wiki. |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 2ddf4ed (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
// Making the behaviour different from MKL | ||
// Here we check for the case where `info` is -4 and set it to a positive number | ||
// This will give the same error message as with MKL | ||
#if AT_BUILD_WITH_OPENBLAS() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be unconditional?
According to the LAPACK documentation in 3.10, this is expected not only for OPENBLAS, but for all LAPACK-complaint backends. See
https://www.netlib.org/lapack/explore-html/d3/da8/group__complex16_g_esing_gaccb06ed106ce18814ad7069dcb43aa27.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch's build system doesn't allow mixing different implementations of BLAS and LAPACK. Both must be from OpenBLAS, or MKL, or Accelerate/vecLib, ... .
MKL 2021.4 is not compliant with the latest Reference LAPACK implementation.
https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-fortran/top/lapack-routines/lapack-least-squares-and-eigenvalue-problem/lapack-least-squares-eigenvalue-problem-driver/singular-value-decomposition-lapack-driver/gesdd.html?wapkw=gesdd?wapkw=gesdd
There is no documentation for Apple's Accelerate/vecLib.
There are really only two used BLAS&LAPACK implementations for CPU computing which are MKL and OpenBLAS. We can safely ignore other options until someone actually complains.
Other implementations allowed by pytorch/cmake/Modules/FindLAPACK.cmake
:
- GotoBLAS2 - not maintained anymore
- FLAME - can't find documentation, but the code didn't change from 2014
- AMD Core Math Library (ACML) - not maintained anymore
Switching the -4
value of info
for any LAPACK implementation shouldn't hurt anything.
We could also specialize SVD's error with info = -4
to Sorry, the input contained non-finite values and the computation cannot be completed
, but we don't have the same info from MKL to raise the same error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough then.
On a related but different topic. I still like the idea of, if we get an info > 0
, rather than raising that obscure "wrong implementation" error, first check for NaN
s. That would account for this case, and I suspect that also for a number of other cases.
@mruberry, could you please take a look and merge if the change looks reasonable to you? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
Thanks @IvanYashchuk
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Some internal builds are complaining with
Not sure what's going on there |
This looks like an internal build issue and some internal build files will need to define the expansion like this PR does in BUILD.bazel. |
Latest MKL fails with the same error as OpenBLAS 0.3.15: #71645 |
@IvanYashchuk please ping me when you rebase and update PR. |
Alright, ROCm build is now successful: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-bionic-rocm4.5-py3.7-trigger/5021/ and all other CI jobs are green as well. |
…her PR that is being merged
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)
This PR fixes false triggers of |
@ngimel, could you please import this PR? Here is the HUD for the CI, failed tests on backwards compatibility are not related. |
Can you please open a new PR? I can't reimport this one. |
This pull request has been reverted by bb6b501. To re-land this change, follow these steps. |
#72357) Summary: This PR was opened as copy of #68812 by request #68812 (comment). ----- Fixes #67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes #71645 that is due to the updated MKL version in CI. Pull Request resolved: #72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310
#72357) Summary: This PR was opened as copy of #68812 by request #68812 (comment). ----- Fixes #67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes #71645 that is due to the updated MKL version in CI. Pull Request resolved: #72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65)
pytorch#72357) Summary: This PR was opened as copy of pytorch#68812 by request pytorch#68812 (comment). ----- Fixes pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch#72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65)
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)
…) (#72357) Summary: This PR was opened as copy of pytorch/pytorch#68812 by request pytorch/pytorch#68812 (comment). ----- Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65611ea5028bf6d2d3c151d79e6c9e4ffef)
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: osalpekar Differential Revision: D32626563 Pulled By: ngimel fbshipit-source-id: 09042f07cdc9c24ce1fa5cd6f4483340c7b5b06c (cherry picked from commit aadf507)
Summary: Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. **UPDATE:** MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#68812 Reviewed By: mrshenli Differential Revision: D33844257 Pulled By: ngimel fbshipit-source-id: fd1c86e37e405b330633d039f49dce466391b66e (cherry picked from commit c00a9bd)
…) (#72357) Summary: This PR was opened as copy of pytorch/pytorch#68812 by request pytorch/pytorch#68812 (comment). ----- Fixes pytorch/pytorch#67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes pytorch/pytorch#71645 that is due to the updated MKL version in CI. Pull Request resolved: pytorch/pytorch#72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65611ea5028bf6d2d3c151d79e6c9e4ffef)
#72357) (#72513) Summary: This PR was opened as copy of #68812 by request #68812 (comment). ----- Fixes #67693. Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative `info` error codes because usually, it would indicate the wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code. MKL 2022 has uses the latest reference LAPACK behavior and returns the same `info` as OpenBLAS 0.3.15+ This PR also fixes #71645 that is due to the updated MKL version in CI. Pull Request resolved: #72357 Reviewed By: albanD Differential Revision: D34012245 Pulled By: ngimel fbshipit-source-id: 2b66c173cc3458d8c766b542d0d569191cdce310 (cherry picked from commit fa29e65)
Fixes #67693.
Reference LAPACK (used in OpenBLAS) changed info error code for svd when inputs contain non-finite numbers. In PyTorch, we raise an internal assert error for negative
info
error codes because usually, it would indicate wrong implementation. However, this is not the case with SVD now in newer versions of LAPACK. MKL (tried 2021.4.0) still gives a positive error code for this kind of input. This change aligns with the OpenBLAS and MKL behavior in our code.UPDATE:
MKL 2022 has uses the latest reference LAPACK behavior and returns the same
info
as OpenBLAS 0.3.15+This PR fixes #71645 that is due to the updated MKL version in CI.