: - A lattice of lists linking tasks to their css_groups and css_groups to their containers has been added to support more efficient iteration across the member tasks of a container.

+

+

: - Support for the cpusets "release agent" functionality has been added back in; this is based on a workqueue concept similar to the changes that Cliff Wickman has been pushing for supporting CPU hot-unplug.

: - Reorganisation of the mount/unmount code to use sget(); the new approach is modelled on the NFS superblock code. This fixes some potential lock inversions pointed out by lockdep.

+

+

: - Fix various lockdep warnings

+

+

: - Changed the create() subsystem callback to return a pointer to the new state object rather than updating the subsystem pointer in the container directly.

+

+

: - Changed container_add_file() to automatically prefix the subsystem name (and a period) on to all container files unless the filesystem is mounted with the "noprefix" option (intended for use by the legacy cpuset filesystem emulation).

+

+

: - Added a release_agent= mount option to allow the release agent path to be specified at mount time.

+

+

: - css_put() is now completely non-blocking

+

+

: - css_get()/css_put() avoid taking/dropping reference counts on the root state since this can't be freed anyway; this saves some atomic ops

+

+

: API changes (for subsystem writers):

+

: 1) return your new css object from create() callback

+

: 2) remove the subsystem name prefix from your cftype structures

+

: 3) pass your subsystem pointer as an additional new parameter to container_add_file() and container_add_files()

+

+

[[Category:Community]]

Latest revision as of 20:23, 21 October 2011

19.09.2006 Containers(V2) by Rohit Seth (from Google) based on CPUsets

Over the limit memory handler is called when number of pages (anon + pagecache) exceed the limit. Currently, this memory handler scans the mappings and tasks belonging to container (file and anonymous) and tries to deactivate pages. If the number of page cache pages is also high then it also invalidate mappings.

This patchset extracts the process grouping code from cpusets into a generic container system, and makes the cpusets code a client of the container system.

It also provides a very simple additional container subsystem to do per-container CPU usage accounting; this is primarily to demonstrate use of the container subsystem API, but is useful in its own right.

- an example patch implementing the BeanCounters core and numfiles counters over generic containers. The addition of the BeanCounters code unifies the three main process grouping abstractions (Cpusets, ResGroups and BeanCounters).

- a patch splitting Cpusets into two independently groupable subsystems, Cpusets and Memsets.

- support for a subsystem to keep a container alive via refcounts (e.g. the BeanCounters numfiles counter has a reference to the beancounter object from each file charged to that beancounter, so needs to be able to keep the beancounter alive until the file is destroyed)

- added more details about multiple hierarchy support in the documentation

- reduced the per-task memory overhead to one pointer (previously it was one pointer for each hierarchy). Now each task has a pointer to a container_group, which holds the pointers to the containers (one per active hierarchy) that the task is attached to and the associated per-subsystem state (one per active subsystem). This container group is shared (with reference counts) between all tasks that have the same set of container mappings.

- added API support for binding/unbinding subsystems to/from active hierarchies, by remounting with -oremount,<new-subsys-list>. Currently this fails with EBUSY if the hierarchy has a child containers; full implementation support is left to a later patch.

- added a bind() subsystem callback to indicate when a subsystem is moved between hierarchies

- added container_clone(subsys, task), which creates a child container for the hierarchy that the specified subsystem is bound to, and moves the given task into that container. An example use of this would be in sys_unshare, which could, if the namespace container subsystem is active, create a child container when the new namespace is created.

- temporarily removed the "release agent" support. It's only currently used by CPUsets, and intrudes somewhat on the per-container reference counting. If necessary it can be re-added, either as a generic subsystem feature or a CPUset-specific feature, via a kernel thread that periodically polls containers that have been designated as notify_on_release to see if they are releasable

- Removed the config-time choice of the number of supported hierarchies - this is now completely dynamic; new hierarchies are allocated on demand, and freed when no longer in use.

- Subsystems are now registered at compile-time in linux/container_subsys.h. This allows for faster access to subsystem state since the id is a compile-time constant, so there's only a single extra pointer dereference compared to having a pointer directly in the task_struct. It also avoids wasting space with unused subsystem pointers.

- Removed the container pointers from container_group - this results in a structure very similar to Srivatsa Vaddagiri's rcfs approach. (RCFS uses the nsproxy object rather than the container_group object; merging container_group and nsproxy would be pretty straightforward if desired).

- Removed callback_mutex from container subsystem to be purely back in the cpuset subsystem. Renamed manage_mutex to container_mutex.

- Condensed post_attach_task() into attach_task() now that callback_mutex is purely within cpuset.c

- A lattice of lists linking tasks to their css_groups and css_groups to their containers has been added to support more efficient iteration across the member tasks of a container.

- Support for the cpusets "release agent" functionality has been added back in; this is based on a workqueue concept similar to the changes that Cliff Wickman has been pushing for supporting CPU hot-unplug.

- Based on 2.6.22-rc6-mm1 (minus existing container patches, see below)

- Rolled in various fix/tidy patches contributed by akpm and others

- Reorganisation of the mount/unmount code to use sget(); the new approach is modelled on the NFS superblock code. This fixes some potential lock inversions pointed out by lockdep.

- Fix various lockdep warnings

- Changed the create() subsystem callback to return a pointer to the new state object rather than updating the subsystem pointer in the container directly.

- Changed container_add_file() to automatically prefix the subsystem name (and a period) on to all container files unless the filesystem is mounted with the "noprefix" option (intended for use by the legacy cpuset filesystem emulation).

- Added a release_agent= mount option to allow the release agent path to be specified at mount time.

- css_put() is now completely non-blocking

- css_get()/css_put() avoid taking/dropping reference counts on the root state since this can't be freed anyway; this saves some atomic ops

API changes (for subsystem writers):

1) return your new css object from create() callback

2) remove the subsystem name prefix from your cftype structures

3) pass your subsystem pointer as an additional new parameter to container_add_file() and container_add_files()