OpenBLAS bottle MAX_THREADS is fixed to 6?

Here’s something apparently going on with the openblas bottle. I and some Octave.app users have found that its MAX_THREADS is set to 6, even when you run it on an 8- or 10-core Mac, where I’d expect it to be 16 or 20. This shows up in Octave when you run version -blas.

On my 10-core iMac Pro “angharad”:

With bottled openblas:

$ octave
octave:1> version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.12 DYNAMIC_ARCH NO_AFFINITY USE_OPENMP Haswell MAX_THREADS=6)

After brew reinstall --build-from-source openblas:

$ octave -q
octave:1> version -blas
ans = OpenBLAS (config: OpenBLAS 0.3.12 DYNAMIC_ARCH NO_AFFINITY USE_OPENMP Haswell MAX_THREADS=20)

I’m guessing that MAX_THREADS is set to 6 at build time, based on the size of the VM in the bottle build farm. This means that when you run a bottled openblas on a Mac with more than 3 cores, it’s underutilizing the CPU. No big deal on a quad-core, but may be a significant waste on an 8-core or more.

You can set OMP_NUM_THREADS, but that only seems to work to reduce the number of threads below the built-in MAX_THREADS; setting it to a higher value has no effect.

For power users, the easy workaround is to just reinstall openblas from source. But that’s kind of a bummer for non-power-users.

Any ideas what a good thing to do here is?

Downstream bug report: https://github.com/octave-app/octave-app/issues/214

Cheers,
Andrew

From https://github.com/xianyi/OpenBLAS/wiki/Faq :

  • I cannot get OpenBLAS to use more than a small subset of available cores on a big system

Multithreading support in OpenBLAS requires the use of internal buffers for sharing partial results, the number and size of which is defined at compile time. Unless you specify NUM_THREADS in your make or cmake command, the build scripts try to autodetect the number of cores available in your build host to size the library to match. This unfortunately means that if you move the resulting binary from a small “front-end node” to a larger “compute node” later, it will still be limited to the hardware capabilities of the original system. The solution is to set NUM_THREADS to a number big enough to encompass the biggest systems you expect to run the binary on - at runtime, it will scale down the maximum number of threads it uses to match the number of cores physically available.

Please open an openblas formula PR to configure the build as described above. https://docs.brew.sh/How-To-Open-a-Homebrew-Pull-Request