The benchmarks were run in a 2-socket, 24-cpu machine. I haven't run allpossible configurations I have envisioned, because I wanted this postedearly rather than later. I've also had un-official runs in my 4-cpu i7laptop and in a 6-way single socket AMD box. They would need to bere-run to be publishable, since they are quite raw and ad-hoc (like, Iwas not running perf stat always in the same way, doing some thingsmanually, etc) But they overall point to consistent results.

You can find a guide to that data in the README file in that dir, andthe actual data in the results* dir. The chosen allocator for this isthe SLAB.

So in general, I don't see a big difference, with almost allmeasurements falling inside the 2-sigma range.

From the fork intensive workload, two things pop out: first, kmempatches applied, but kmem not used, actually performs slightly betterthan no patches at all. I don't know why this is, and it might even be aglitch. But it consistently happened in my laptop and in the 6-way AMDmachine.

Also, we can see that in that workload, which is slab intensive,kmemcg-slab-Set performs slightly worse. Being worse is inline withexpectations, but I don't consider the hit to be too big.

Please let me know of any additional work you would like to see done here.