I have access to a new machine that has 2 x 20 core Xeon Gold CPUs which support multithreading. I was playing around with some simple pmap problems, and I was not seeing a performance jump when I tried going from 40 to 80 workers (hoping to benefit from the multithreading). Does anyone have any suggestions on how to best leverage my computing environment with Julia for embarassingly parallel (i.e., pmap) type problems?

To be clear, pmap distributes computation across processes which do not share memory; objects get serialized to get sent between processes, and the processes can run on remote machines. Maybe things aren’t speeding up when you add processes because of this serialization overhead?

Multi-threading on the other hand is shared-memory so does not incur that overhead. It is most easily used with the @threads macro, not pmap, and all threads are on the same machine. Multi-threading is experimental, but mostly works unless you’re doing IO on the threads. So you could try the @threads macro instead, but then you’d need to launch Julia in an environment where JULIA_NUM_THREADS is set for julia’s Threads to run on multiple cpu cores, otherwise all your @threads will do is run a bunch of tasks on the same core.

o you could try the @threads macro instead, but then you’d need to launch Julia in an environment where JULIA_NUM_THREADS is set for julia’s Threads to run on multiple cpu cores, otherwise all your @threads will do is run a bunch of tasks on the same core.

If he goes the threading route, I’d recommend taking a look at KissThreading.jl. Among other things, it offers a tmap! function and initializes a vector of Mersenne Twisters named TRNG that you can use if any of the code generates random numbers.pmap and tmap! are better than @distributed for and @threads for when the functions being called take a while, and there’s some variance in that run time.
The former use dynamic scheduling, and the latter static scheduling.

I’ve normally tried using threads before distributed. However, I normally get poor scaling. Much worse than OpenMP. It’s probably my fault.