I recently ran into a problem writing a program that I hoped I could speed
up by running over multiple CPUs. I want non-haskell users to be able to
run the tool and take advantage of multiple CPUs. *But* there is a serious
slowdown when the RTS is run with -N and some of the CPUs are already busy.
This is already mentioned in the GHC docs, but the problem I experienced
was more serious than I expected causing a slowdown of around 2x compared
to running with a single CPU.
I reproduced the problem with the following code from the haskell wiki:
{-# LANGUAGE BangPatterns #-}
import Data.Digest.Pure.MD5
import qualified Data.ByteString.Lazy as L
import System.Environment
import Control.Concurrent
import Control.Monad (replicateM_)
main = do
files <- getArgs
str <- newEmptyMVar
mapM_ (forkIO . hashAndPrint str) files
printNrResults (length files) str
printNrResults i var = replicateM_ i (takeMVar var >>= putStrLn)
hashAndPrint str f = do
bs <- L.readFile f
let !h = show $ md5 bs
putMVar str (f ++ ": " ++ h)
When run on 4 idle CPU cores, I get the following wall clock times:
./run +RTS -N1 : 20.4 sec
./run +RTS -N2 : 11.0 sec
./run +RTS -N4 : 6.7 sec
When run on the same 4 core machine, but with 2 cores already busy:
./run +RTS -N1 : 23.5 sec
./run +RTS -N2 : 14.1 sec
./run +RTS -N4 : 57.8 sec <---- Blowout...
This is quite a problem in practice when running on a shared server. Is
there anything that can be done to address this?
(I wrote up a few more details here:
http://thunking.drp.id.au/2012/06/slowdown-with-ghc-when-using-multiple.html
)
Thanks,
--
David Powell
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/haskell-cafe/attachments/20120629/e857d7dd/attachment.htm>