8000 BLAS options: OpenBLAS vs Accelerate · Issue #71712 · pytorch/pytorch · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

BLAS options: OpenBLAS vs Accelerate #71712

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ngam opened this issue Jan 24, 2022 · 11 comments
Closed

BLAS options: OpenBLAS vs Accelerate #71712

ngam opened this issue Jan 24, 2022 · 11 comments
Labels
module: performance Issues related to performance, either of kernel code or framework glue module: third_party needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@ngam
Copy link
ngam commented Jan 24, 2022

🚀 The feature, motivation and pitch

Are there any benchmarks or preference among developers here? Assuming that Intel MacOS users should use MKL; MKL isn't available for Apple Silicon --- is there any benefit from using OpenBLAS vis-a-vis Accelerate? Any documentation or benchmarks? Thanks!

Alternatives

No response

Additional context

No response

cc @VitalyFedyunin @ngimel

@vadimkantorov
Copy link
Contributor
vadimkantorov commented Jan 24, 2022

also, BLIS might be an option... flame/blis#492 although this BLIS seems was not much used with pytorch...

@malfet malfet added module: performance Issues related to performance, either of kernel code or framework glue module: third_party needs research We need to decide whether or not this merits inclusion, based on research world labels Jan 24, 2022
@malfet
Copy link
Contributor
malfet commented Jan 24, 2022

OpenBLAS and Accelerate should have the same API, but I'm not aware of any good benchmark of one vs another. But one should be able to recompile with different BLAS frameworks

@ngam
Copy link
Author
ngam commented Jan 24, 2022

OpenBLAS and Accelerate should have the same API, but I'm not aware of any good benchmark of one vs another. But one should be able to recompile with different BLAS frameworks

Yes, I can confirm this. For what is worth, the default mechanism (i.e. no user preference in BLAS flag) in cmake seems to go something like this: it looks for MKL first, then BLIS, and then Accelerate, and I think finally OpenBLAS. I do understand the importance of setting MKL first (it seems to outperform everything else in this context). However, I am slightly confused about OpenBLAS vis-a-vis Accelerate (and BLIS maybe) from a few comments I gathered around here. Mainly this one: #68812 (comment). Perhaps @IvanYashchuk can weigh in?

Details of cmake compilation process:
  -- Trying to find preferred BLAS backend of choice: MKL
-- MKL_THREADING = OMP
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of void*
-- Check size of void* - done
-- MKL_THREADING = OMP
CMake Warning at cmake/Dependencies.cmake:177 (message):
  MKL could not be found.  Defaulting to Eigen
Call Stack (most recent call first):
  CMakeLists.txt:653 (include)


CMake Warning at cmake/Dependencies.cmake:205 (message):
  Preferred BLAS (MKL) cannot be found, now searching for a general BLAS
  library
Call Stack (most recent call first):
  CMakeLists.txt:653 (include)


-- MKL_THREADING = OMP
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - iomp5 - pthread - m]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - iomp5 - pthread - m]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - guide - pthread - m]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - guide - pthread - m]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_intel_thread - mkl_core - pthread - m]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_intel_thread - mkl_core - pthread - m]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_sequential - mkl_core - m]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_sequential - mkl_core - m]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - iomp5 - pthread - m]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - iomp5 - pthread - m]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - guide - pthread - m]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - guide - pthread - m]
--   Library mkl_intel: not found
-- Checking for [mkl_intel_lp64 - mkl_core - pthread - m]
--   Library mkl_intel_lp64: not found
-- Checking for [mkl_intel - mkl_core - pthread - m]
--   Library mkl_intel: not found
-- Checking for [mkl - guide - pthread - m]
--   Library mkl: not found
-- MKL library not found
-- Checking for [blis]
--   Library blis: BLAS_blis_LIBRARY-NOTFOUND
-- Checking for [Accelerate]
--   Library Accelerate: /opt/MacOSX11.0.sdk/System/Library/Frameworks/Accelerate.framework
-- Looking for sgemm_
-- Looking for sgemm_ - found
-- Found a library with BLAS API (accelerate). Full path: (/opt/MacOSX11.0.sdk/System/Library/Frameworks/Accelerate.framework)

To expand: Because PyTorch would necessarily bundle both the BLAS routine with LAPACK, and it's been reported that the LAPACK routines included in Accelerate can be buggy/unreliable, it could make a difference. We could potentially try to unbundle BLAS and LAPACK here --- i.e., we can select BLAS from Accelerate while LAPACK from another provider as available. This will obviously be more work for a very niche optimization; hence, I would like to see if we can establish any firm benchmark that Accelerate can indeed be worthy here and then we can work on unbundling BLAS and LAPACK maybe.

@ngam
Copy link
Author
ngam commented Jan 24, 2022

also, BLIS might be an option... flame/blis#492 although this BLIS seems was not much used with pytorch...

Yes, thanks @vadimkantorov. The default mechanism appears to favor BLIS more than Accelerate as shown above.

@IvanYashchuk
Copy link
Collaborator

Here's the FindBLAS.cmake and FindLAPACK.cmake files that PyTorch uses. There's no way to specify the BLAS variant and CMake tries to find it in a specific order (specified in FindBLAS.cmake), but it's possible to compile PyTorch with Accelerate if CMake doesn't find anything with higher priority.

FindLAPACK.cmake disallows mixing BLAS & LAPACK from different providers because it's fragile.

@ngimel
Copy link
Collaborator
ngimel commented Jan 24, 2022

cc @robieta

@ngam
Copy link
Author
ngam commented Jan 24, 2022

Here's the FindBLAS.cmake and FindLAPACK.cmake files that PyTorch uses. There's no way to specify the BLAS variant and CMake tries to find it in a specific order (specified in FindBLAS.cmake), but it's possible to compile PyTorch with Accelerate if CMake doesn't find anything with higher priority

No! There is (which is actually good, so thank you for the flexibility!) You can specify BLAS=OpenBLAS and you will force it to compile with OpenBLAS (if available)

See more here: conda-forge/pytorch-cpu-feedstock#84 (comment)

FindLAPACK.cmake disallows mixing BLAS & LAPACK from different providers because it's fragile

Good to know the reason, thanks!

@ngam
Copy link
Author
ngam commented Jan 24, 2022

Also, I am happy to run some benchmarks and tests if you can point me to some meaningful ones for this particular case. I already have both OpenBLAS-based and Accelerate-based PyTorch ready (and reproducible; I can also add BLIS and/or other BLAS libraries to test). I also am happy to help if there is an interest in clarifying this further :)

Note: I believe this whole thing is meaningless outside of Apple Silicon Macs at the moment. MKL BLAS/LAPACK should still be used whenever available, imo, but it is not available on Apple Silicon Macs as far as I could tell.

@IvanYashchuk
Copy link
Collaborator

You can specify BLAS=OpenBLAS and you will force it to compile with OpenBLAS (if available)

Great it was fixed, if I recall correctly previously it was only affecting Caffe2 and not ATen (for example #60328).

FindLAPACK.cmake disallows mixing BLAS & LAPACK from different providers because it's fragile

Reference LAPACK requires BLAS and if LAPACK is built from source against BLAS from Accelerate then I guess there shouldn't be any problems.

@anjali411 anjali411 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 25, 2022
@ngam
Copy link
Author
ngam commented Feb 5, 2022

@robieta any thoughts?

@ngam
Copy link
Author
ngam commented Feb 9, 2022

From my limited testing, there is little value in choosing one or the other --- they do end up being rather similar. Closing this, thanks everyone for engaging with this and good luck :)

@ngam ngam closed this as completed Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: performance Issues related to performance, either of kernel code or framework glue module: third_party needs research We need to decide whether or not this merits inclusion, based on research world triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

6 participants
0