Dinakar wrote:> Ok, Let me begin at the beginning and attempt to define what I am > doing here

The statement of requirements and approach help. Thank-you.

And the comments in the code patch are much easier for meto understand. Thanks.

Let me step back and consider where we are here.

I've not been entirely happy with the cpu_exclusive (and mem_exclusive)properties. They were easy to code, and they require only looking atones siblings and parent, but they don't provide all that people usuallywant, which is system wide exclusivity, because they don't exclude tasksin ones parent (or more remote ancestor) cpusets from stealing resources.

I take your isolated cpusets as a reasonable attempt to provide what'sreally wanted. I had avoided simple, system-wide exclusivity becauseI really wanted cpusets to be hierarchical. One should be able tosubdivide and manage one subtree of the cpuset hierarchy, obliviousto what someone else is doing with a disjoint subtree. Your work showshow to provide a stronger form of isolation (exclusivity) withoutabandoning the hierarchical structure.

There are three directions we could go from here. I am not yet decidedbetween them:

1) Remove cpu and mem exclusive flags - they are of limited use.

2) Leave code as is.

3) Extend the exclusive capability to include isolation from parents, along the lines of your patch.

If I was redoing cpusets from scratch, I might not include the exclusivefeature at all - not sure. But it's cheap, at least in terms of code,and of some use to some users. So I would choose (2) over (1), givenwhere we are now. The main cost at present of the exclusive flags isthe cost in understanding - they tend to confuse people at first glance,due to their somewhat unusual approach.

If we go with (3), then I'd like to consider the overall design of thisa bit more. Your patch, as is common for patches, attempts to work withinthe current framework, minimizing change. Better to take a step back andconsider what would have been the best design as if the past didn't matter,then with that clearly in mind, ask how best to get there from here.

I don't think we would have both isolated and exclusive flags, in the'ideal design.' The exclusive flags are essentially half (or a third)of what's needed, and the isolated flags and masks the rest of it.

Essentially, your patch replaces the single set of CPUs in a cpusetwith three, related sets: A] the set of all CPUs managed by that cpuset B] the set of CPUs allowed to tasks attached to that cpuset C] the set of CPUs isolated for the dedicated use of some descendent

Sets [B] and [C] form a partition of [A] -- their intersection is empty,and their union is [A].

Your current presentation of these sets of CPUs shows set [B] in thecpus file, followed by set [C] in brackets, if I am recalling correctly.This format changes the format of the current cpus_allowed file, and itviolates the preference for a single value or vector per file. I wouldlike to consider alternatives.

Your code automatically updates [C] if the child cpuset adds or removesCPUs from those it manages in isolation (though I am not sure that yourcode manages this change all the way back up the hierarchy to the topcpuset, and I wondering if perhaps your code should be doing this, asnoted in my detailed comments on your patch earlier today.)

I'd be tempted, if taking this approach (3) to consider a couple ofalternatives.

As I spelled out a few days ago, one could mark some cpusets that form apartition of the systems CPUs, for the purposes of establishing isolatedscheduler domains, without requiring the above three related sets percpuset instead of one. I am still unsure how much of your motivation isthe need to make the scheduler more efficient by establishing usefulisolated sched domains, and how much is the need to keep the usage ofCPUs by various jobs isolated, even from tasks attached to parent cpusets.

One can obtain the job isolation just in user code - if you don't want atask to use a parent cpusets access to your isolated cpuset, then simplydon't attach a task to the parent cpusets. I do not understand yet howstrong your requirement is to have the _kernel_ enforce that there arenot tasks in a parent cpuset which could intrude on the non-isolatedresources of a child. I provide (non open source) user level tools tomy users which enable them to conveniently ensure that there are no suchunwanted tasks, so they don't have a problem with a parent cpusets CPUsoverlapping a cpuset that they are using for an isolated job. Perhaps Icould persuade my employer that it would be appropriate to open sourcethese tools.

In any case, going (3) would result in _one_ attribute, not two (bothexclusive and isolated, with overlapping semantics, which is confusing.)

-- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401-To unsubscribe from this list: send the line "unsubscribe linux-kernel" inthe body of a message to majordomo@vger.kernel.orgMore majordomo info at http://vger.kernel.org/majordomo-info.htmlPlease read the FAQ at http://www.tux.org/lkml/