How does the performance compare with using a single core? I'm assuming that there is some overhead so it doesn't scale predictably?

Actually it scales very nicely. Since each tile is rendered by separate CPU and the messaging overhead between tasks and between processors is very low, the speed increases almost linear with amount of CPUs.

When one renders a scene with very low detail, i.e. there is a lot of redrawing (WritePixelArray + BltBitMap) compared to calculations, I got something around 3.8x speed increase on four CPU cores. When I was rendering the same scene with high amount of detail, i.e. amount of redraw calls was negligible when compared to amount of calculations of the 3d scene, the speed increase was in the range of 3.95x on the same four cores :)