On Thu, Feb 02, 2012 at 01:38:26AM +0000, Paul Turner wrote:> As referenced above this also allows us to potentially improve decisions within> the load-balancer, for both distribution and power-management.> Exmaple: consider 1x80% task and 2x40% tasks on a 2-core machine. It's> currently a bit of a gamble as to whether you get an {AB, B} or {A,> BB} split since they have equal weight (assume 1024). With per-task> tracking we can actually consider them at their contributed weight and> see a stable ~{800,{400, 400}} load-split. Likewise within balance_tasks we can> consider the load migrated to be that actually contributed.

Hi Paul (and LKML),

As a follow up to the discussions held during the scheduler mini-summitat the last Linaro Connect I would like to share what I (working forARM) have observed so far in my experiments with big.LITTLE scheduling.

I see task affinity on big.LITTLE systems as a combination ofuser-space affinity (via cgroups+cpuset etc) and introspective affinityas result of intelligent load balancing in the scheduler. I see theentity load tracking in this patch set as a step towards the latter. Iam very interested in better task profiling in the scheduler as this iscrucial for selecting which tasks that should go on which type of core.

I am using the patches for some very crude experiments with schedulingon big.LITTLE to explore possibilities and learn about potential issues.What I want to achieve is that high priority CPU-intensive tasks willbe scheduled on fast and less power-efficient big cores and backgroundtasks will be scheduled on power-efficient little cores. At the sametime I would also like to minimize the performance impact experiencedby the user. The following is a summary of the observation that I havemade so far. I would appreciate comments and suggestions on the best wayto go from here.

I have set up two sched_domains on a 4-core ARM system with two coreseach that represents big and little clusters and disabled load balancingbetween them. The aim is to separate heavy and high priority tasks fromless important tasks using the two domains. Based on load_avg_contribtasks will be assigned to one of the domains by select_task_rq().However, this does not work out very well. If a task in the littledomain suddenly consumes more CPU time and never goes to sleep it willnever get the chance to migrate to the big domain. On a homogeneoussystem it doesn't really matter _where_ a task goes if imbalance isunavoidable as all cores have equal performance. For heterogeneoussystems like big.LITTLE it makes a huge difference. To mitigate thisissue I am periodically checking the currently running task on eachlittle core to see if a CPU-intensive task is stuck there. If there isit will be migrated to a core in the big domain usingstop_one_cpu_nowait() similar to the active load balance mechanism. Itis not a pretty solution, so I am open for suggestions. Furthermore, byonly checking the current task there is a chance of missing busy taskswaiting on the runqueue but checking the entire runqueue seems tooexpensive.

My observations are based on a simple mobile workload modelling webbrowsing. That is basically two threads waking up occasionally to rendera web page. Using my current setup the most CPU intensive of the twowill be scheduled on the big cluster as intended. The remainingbackground threads are always on the little cluster leaving the bigcluster idle when it is not rendering to save power. Thetask-stuck-on-little problem can most easily be observed with CPUintensive workloads such the sysbench CPU workload.

I have looked at traces of both runnable time and usage time trying tounderstand why you use runnable time as your load metric and not usagetime which seems more intuitive. What I see is that runnable timedepends on the total runqueue load. If you have many tasks on therunqueue they will wait longer and therefore have higher individualload_avg_contrib than they would if the were scheduled across more CPUs.Usage time is also affected by the number of tasks on the runqueue asmore tasks means less CPU time. However, less usage can also just meanthat the task does not execute very often. This would make a loadcontribution estimate based on usage time less accurate. Is this yourreason for choosing runnable time?

Do you have any thoughts or comments on how entity load tracking couldbe applied to introspectively select tasks for appropriate CPUs insystem like big.LITTLE?