To summarise,with 32 vcpu guest with nr thread=32 we get around 27% improvement. In very low/undercommitted systems we may see very small improvement or small acceptable degradation ( which it deserves).

(IMO with more overcommit/contention, we can get more than 15% for the benchmarks and we do ).

Please let me know if you have any suggestions for try.(Currently my PLE machine lease is expired, it may take some time to comeback :()

Ingo, Avi ?

>> Avi,> Can patch series go ahead for inclusion into tree with following> reasons:>> The patch series brings fairness with ticketlock ( hence the> predictability, since during contention, vcpu trying> to acqire lock is sure that it gets its turn in less than total number> of vcpus conntending for lock), which is very much desired irrespective> of its low benefit/degradation (if any) in low contention scenarios.>> Ofcourse ticketlocks had undesirable effect of exploding LHP problem,> and the series addresses with improvement in scheduling and sleeping> instead of burning cpu time.>> Finally a less famous one, it brings almost PLE equivalent capabilty to> all the non PLE hardware (TBH I always preferred my experiment kernel to> be compiled in my pv guest that saves more than 30 min of time for each> run).