I don't know if this is relevant to your problems, but I'm currently
struggling to get some performance out of a parallel - or rather,
concurrent - program.
Basically, the initial thread parses some data into an IntMap, and then
multiple threads access this read-only to do the Real Work.
Now, there appears to be a lot of overhead incurred when using multiple
threads, and I suspect that this is caused by the map storing
unevaluated thunks, which then are forced by accesses by the worker
threads. Ideally, the evaluation should be performed in parallel, but
perhaps there are issues (of synchronization, say) that makes this less
performant?
-k
--
If I haven't seen further, it is by standing in the footprints of giants