nmalloc should color addresses to avoid cache bank conflictsw

nmalloc returns addresses without regard as to whether those addresses
will result in cache bank conflicts; cache bank conflicts will result
in wasting a fair amount of a CPU's load bandwidth, as seen in the
Himeno matrix benchmark, among others.

commit 8120f5e2a46e669c06a7afdd7de60fa6d6996f9d added simple cache
coloring to nmalloc for 32KB allocations, offsetting them by 4KB. While
it does work (and restores Himeno's performance), it can be substantially
improved on. We should look at doing so.