Stop The Ticks!

After those previous hurdles, I’d like to talk about what I’m working on currently 🙂

Busy cpus always have periodic ticks interrupting them to update scheduling stats, trigger load balancing, preempt tasks, update clock. When cpus go idle none of these is required, so the tick is asked to stop on that cpu in case of the CONFIG_NO_HZ_IDLE kernel configuration. So the design of the kernel is such that when there are missing stats, or absence of scheduling activity, it is assumed that the cpu was idle during this period.

However when nohz_full was introduced, the field changed. Periodic ticks were not required to interrupt nohz_full cpus when they were running single tasks. It was discussed why it was sensible to stop ticks on such cpus. We could have stopped periodic ticks altogether but for some constraints. If a remote cpu reads the load on the nohz_full cpu during the time that it was running one task, it would read a stale value since the tick is not running to update the load on the nohz_full cpu, although it is running a task.
At best, the remote cpu would accommodate stale values and at worst it could lead to crashes. So we need to have the tick running on the nohz_full cpu today, although we can accommodate it at a lower granularity. We can afford to do so at a lower granularity because it is about updating some statistics and having a stale value for a second will hopefully
not cause serious issues.

Now we want to get rid of this residual tick on the nohz_full cpus as well. But that would mean either that when the stats on cpu/task load are being accommodated for nohz_full cpus, the
callers must be made aware of the lagging numbers. They must then calculate the pending updates by themselves. This is hard since we will need to identify each of the callers and look
at ways to help them distinguish between idle cpus and nohz_full cpus running single tasks.

The easier option would be to offload the job of updating the load stats on nohz_full cpus to the housekeeping cpu. The housekeeping cpu is already doing the timekeeping duty in the nohz_full environment. It now needs to do some additional work on the behalf of nohz_full cpus.

So this is what I’m working on now! Hopefully once we’re done, the kernel HPC workload will be handled much better 🙂