Firstly, there is no need to move /lib/tls out of the way - TLS emulation can be
slow at times, but it does work correctly now that Xen uses segment flipping.
Secondly, could you take a look where the CPU time is being spent?
Is it spent in user space, or in kernel space? If the latter, could you boot
with 'profile=2' and use readprofile to figure out where the cpu time is being
spent ?