I'm new to SBCL and Common Lisp in general, and I recently ran into a bit of a problem and have struggled(gooogled) for hours to find a good solution or even explanation, to no avail.

The program I'm writing currently creates ~12 threads which run concurrently, and I expected that between them all they'd be able to occupy all 4 CPU cores in my system, but it appears as though only one core is seeing any activity from them.

It looks like you were all either entirely correct or at least had the right idea.

I was using top to monitor the CPU usage and it turns out that I couldn't tell the difference between using 25% each of 4 cores and using all of one. I'm obviously hitting some other kind of bottleneck, but I'm not sure where. I'm not performing any kind of IO, it's exclusively CPU and RAM here. They shouldn't be waiting around for locks too often, either, but it's possible there are scenarios I didn't take into consideration.

Anyway, hopefully I'll be able to find my problem. Thanks a bunch for trying to help, everyone.

When I was doing matrix operations in SBCL which I thought should be completely parallelized, I was getting about a 2.5x speedup on a 4 core machine. I suspect that memory issues were the reason -- each core has cache contention issues and also, the memory bandwidth may be limited.

If you have a large amount of memory, you may want to increase the gc-limit to make gc occur less often, as that's not in parallel (I think), and increasing the limit to 500mb on a 32gb machine sped up my application substantially.

I was able to increase CPU utilization from ~25% to ~70% without making any changes other than tweaking the gc settings as you suggested. I hadn't even suspected that the gc would be running frequently enough to have that sort of impact, so I likely would have been on the wrong track for a long time without your help.

I'll hopefully be able to get better performance down the line by optimizing and breaking up the work differently in an attempt to avoid issues related to cache size and memory bandwidth, but this is already a huge improvement.

Glad it helped. I suspect that the garbage collector is not tuned for parallelism and large memory sizes; I think that increasing the (bytes-consed-between-gcs) is a good idea on any machine with a large amount of memory. In fact, SBCL has some other issues with the address space; last I checked, it was only able to use 8gb of RAM though it may have changed. In any case, trying to cons less, perhaps with more type declarations, is advantageous for better performance.