We actually run a lot of tasks now - one for every timer tick on every
guest core. That's a lot of thread creation/destruction - you can see the
mmaps and munmaps with strace, and each munmap involves a tlb_shootdown
broadcast.

With a slab allocator, we can avoid all of that, especially the thread
initialization that involves the stack mmap/munmap. Object reuse is
basically the lesson of the slab allocator.