-
Couldn't load subscription status.
- Fork 94
Description
I've run your script a few times on different setup. Interestingly, I obtained different results :)
The main reason is that numpy already has some multi-threaded function (and it spawns a number of workers that depends on python version and platform).1
Once numpy is forced to a single thread2 the best performance are obtained.
Multiprocessing is always slightly better than pathos (reasonable, as pathos uses multiprocessing as backend).3
With 12 cores and numpy set to 1 thread I got throughput_rays_per_sec around 430 with both Py 3.7.9, Py 3.8.5 and P 3.9.1 4
Default

Setting numpy to 1 thread

[1] This is evident by CPU usage that is double or 4 times the expected. How many workers are present depends on numpy compile settings for accelerated algebra libraries (BLAST and friends) and environmental variables. More info with: ```python import numpy as np np.show_config() ```
[2] i.e. by setting the following environmental variable before importing numpy
import os
NUMPY_THREADS = 1
os.environ["MKL_NUM_THREADS"] = str(NUMPY_THREADS)
os.environ["NUMEXPR_NUM_THREADS"] = str(NUMPY_THREADS)
os.environ["OMP_NUM_THREADS"] = str(NUMPY_THREADS)[3] If pathos.pools.ProcessPool is used, performance are further reduced (roughtly 10%).
[4] For Py>=3.8 the following line is needed to prevent an AttributeError in multiprocessing (see: https://bugs.python.org/issue39414)
atexit.register(pool.close)Originally posted by @dcambie in #48 (comment)