ext4 supports now block sizes of up to 1MB of size, which decreases considerably [[https://lwn.net/Articles/469821/|the time spent doing block allocations]], and there is smaller fragmentation. These new block sizes must be set at creation time, using the mkfs -C option (requires e2fsprogs 1.42). This feature is not backwards compatible with older kernels. Code: [[http://git.kernel.org/linus/281b59959707dfae03ce038cdf231bf4904e170c|(commit 1]], [[http://git.kernel.org/linus/bab08ab9646288f1b0b72a7aaeecdff94bd62c18|2]], [[http://git.kernel.org/linus/7137d7a48e2213eb1f6d6529da14c2ed3706b795|3]], [[http://git.kernel.org/linus/49f7f9af4bb4d7972f3a35a74877937fec9f622d|4]], [[http://git.kernel.org/linus/fd034a84e1ea5c8c8d159cd2089c32e792c269b0|5]], [[http://git.kernel.org/linus/d5b8f31007a93777cfb0603b665858fb7aebebfc|6]], [[http://git.kernel.org/linus/3212a80a58062056bb922811071062be58d8fee1|7]], [[http://git.kernel.org/linus/53accfa9f819c80056db6f03f9c5cfa4bcba1ed8|8]], [[http://git.kernel.org/linus/84130193e0e6568dfdfb823f0e1e19aec80aff6e|9]], [[http://git.kernel.org/linus/4d33b1ef10995d7ba6191d67456202c697a92a32|10]], [[http://git.kernel.org/linus/0aa060000e83ca3d09ddc446a7174fb0820d99bc|11]], [[http://git.kernel.org/linus/5704265188ffe4290ed73b3cb685206c3ed8209d|12]], [[http://git.kernel.org/linus/24aaa8ef4e2b5764ada1fc69787e2fbd4f6276e5|13]], [[http://git.kernel.org/linus/f975d6bcc7a698a10cc755115e27d3612dcfe322|14]], [[http://git.kernel.org/linus/27baebb849d46d901e756e6502b0a65a62e43771|15]], [[http://git.kernel.org/linus/7b415bf60f6afb0499fd3dc0ee33444f54e28567|16]], [[http://git.kernel.org/linus/6f16b60690ba04cf476480a6f19b204e4b95b4a6|17)]]

Scrub read-ahead:: Scrubbing -the process of checking all the checksums of the filesystem- uses read-ahead to improve the performance. The average disk bandwith utilisation on a test volume was raised from 70% to 90%. On another volume, the time for a test run went down from 89 seconds to 43 seconds. Code: [http://git.kernel.org/linus/ab0fff03055d2d1b01a7581badeba18db9c4f55c(commit 1], [http://git.kernel.org/linus/90519d66abbccc251d14719ac76f191f70826e402], [http://git.kernel.org/linus/7414a03fbf9e75fbbf2a3c16828cd862e572aa443], [http://git.kernel.org/linus/7a26285eea8eb92e0088db011571d887d4551b0f4)]

Log of past tree roots:: Btrfs will store in the filesystem superblock information about most of the tree roots in the last four commits. A "-o recovery" mount option has been used to allow a user to use the root history log when the filesystem is not able to read the tree of the tree roots, the extent tree root, the device tree root or the csum root. Code: [http://git.kernel.org/linus/af31f5e5b84b5bf2bcec464153a5130b170b2770(commit)]

Detailed corruption messages:: Btrfs has always had [https://btrfs.wiki.kernel.org/articles/b/t/r/Btrfs_design.html"back references"] that allow to find which files or b-trees actually reference a given block, but until now walking those references has been a manual process. Code to follow these backrefs has been added, with improved messages as result. For example, after scribbled over the blocks in one file on the disk and starting a scrub, instead of just telling that block xxyyzz is bad, the kernel now will print this: Code: [http://git.kernel.org/linus/a542ad1bafc7df9fc16de8a6894b350a4df75572(commit 1], [http://git.kernel.org/linus/558540c17771eaf89b1a3be39aa2c8bc837da1a62)]

Scrub read-ahead:: Scrubbing -the process of checking all the checksums of the filesystem- uses read-ahead to improve the performance. The average disk bandwith utilisation on a test volume was raised from 70% to 90%. On another volume, the time for a test run went down from 89 seconds to 43 seconds. Code: [[http://git.kernel.org/linus/ab0fff03055d2d1b01a7581badeba18db9c4f55c|(commit 1]], [[http://git.kernel.org/linus/90519d66abbccc251d14719ac76f191f70826e40|2]], [[http://git.kernel.org/linus/7414a03fbf9e75fbbf2a3c16828cd862e572aa44|3]], [[http://git.kernel.org/linus/7a26285eea8eb92e0088db011571d887d4551b0f|4)]]

Log of past tree roots:: Btrfs will store in the filesystem superblock information about most of the tree roots in the last four commits. A "-o recovery" mount option has been used to allow a user to use the root history log when the filesystem is not able to read the tree of the tree roots, the extent tree root, the device tree root or the csum root. Code: [[http://git.kernel.org/linus/af31f5e5b84b5bf2bcec464153a5130b170b2770|(commit)]]

Detailed corruption messages:: Btrfs has always had [[https://btrfs.wiki.kernel.org/articles/b/t/r/Btrfs_design.html|"back references"]] that allow to find which files or b-trees actually reference a given block, but until now walking those references has been a manual process. Code to follow these backrefs has been added, with improved messages as result. For example, after scribbled over the blocks in one file on the disk and starting a scrub, instead of just telling that block xxyyzz is bad, the kernel now will print this: Code: [[http://git.kernel.org/linus/a542ad1bafc7df9fc16de8a6894b350a4df75572|(commit 1]], [[http://git.kernel.org/linus/558540c17771eaf89b1a3be39aa2c8bc837da1a6|2)]]

Manual inspection of the filesystem:: As part of the previous feature, some code has also been added to allow manual inspection of the filesystem from userspace utilities. To find the file that belongs to extent 5085110272 , you can run: Code: [http://git.kernel.org/linus/d7728c960dccf775b92f2c4139f1216275a45c44(commit)]

Manual inspection of the filesystem:: As part of the previous feature, some code has also been added to allow manual inspection of the filesystem from userspace utilities. To find the file that belongs to extent 5085110272 , you can run: Code: [[http://git.kernel.org/linus/d7728c960dccf775b92f2c4139f1216275a45c44|(commit)]]

The CPU bandwidth control solves this problem allowing to set an explicit maximum limit for allowable CPU bandwidth. The bandwidth allowed for a group pf processes is specified using a quota and period. Within each given "period" (microseconds), a group is allowed to consume only up to "quota" microseconds of CPU time. When the CPU bandwidth consumption of a group exceeds this limit (for that period), the tasks belonging to its hierarchy will be throttled and are not allowed to run again until the next period. Documentation: [http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/scheduler/sched-bwc.txt;hb=HEADDocumentation/scheduler/sched-bwc.txt]. Code: [http://git.kernel.org/linus/953bfcd10e6f3697233e8e5128c611d275da39c1(commit 1], [http://git.kernel.org/linus/ab84d31e15502fb626169ba2663381e34bf965b2 2], [http://git.kernel.org/linus/a790de99599a29ad3f18667530cf4b9f4b7e3234 3], [http://git.kernel.org/linus/ec12cb7f31e28854efae7dd6f9544e0a66379040 4], [http://git.kernel.org/linus/58088ad0152ba4b7997388c93d0ca208ec1ece75 5], [http://git.kernel.org/linus/a9cf55b2861057a213e610da2fec52125439a11d 6], [http://git.kernel.org/linus/85dac906bec3bb41bfaa7ccaa65c4706de5cfdf8 7], [http://git.kernel.org/linus/671fd9dabe5239ad218c7eb48b2b9edee50250e6 8], [http://git.kernel.org/linus/8277434ef1202ce30315f8edb3fc760aa6e74493 9], [http://git.kernel.org/linus/64660c864f46202b932b911a69deb09805bdbaf8 10], [http://git.kernel.org/linus/5238cdd3873e67a98b28c1161d65d2a615c320a3 11], [http://git.kernel.org/linus/8cb120d3e41a0464a559d639d519cef563717a4e 12], [http://git.kernel.org/linus/d3d9dc3302368269acf94b7381663b93000fe2fe 13], [http://git.kernel.org/linus/e8da1b18b32064c43881bceef0f051c2110c9ab9 14], [http://git.kernel.org/linus/d8b4986d3dbc4fabc2054d63f1d31d6ed2fb1ca8 15], [http://git.kernel.org/linus/88ebc08ea9f721d1345d5414288a308ea42ac45816)]

The CPU bandwidth control solves this problem allowing to set an explicit maximum limit for allowable CPU bandwidth. The bandwidth allowed for a group pf processes is specified using a quota and period. Within each given "period" (microseconds), a group is allowed to consume only up to "quota" microseconds of CPU time. When the CPU bandwidth consumption of a group exceeds this limit (for that period), the tasks belonging to its hierarchy will be throttled and are not allowed to run again until the next period. Documentation: [[http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/scheduler/sched-bwc.txt;hb=HEAD|Documentation/scheduler/sched-bwc.txt]]. Code: [[http://git.kernel.org/linus/953bfcd10e6f3697233e8e5128c611d275da39c1|(commit 1]], [[http://git.kernel.org/linus/ab84d31e15502fb626169ba2663381e34bf965b2|2]], [[http://git.kernel.org/linus/a790de99599a29ad3f18667530cf4b9f4b7e3234|3]], [[http://git.kernel.org/linus/ec12cb7f31e28854efae7dd6f9544e0a66379040|4]], [[http://git.kernel.org/linus/58088ad0152ba4b7997388c93d0ca208ec1ece75|5]], [[http://git.kernel.org/linus/a9cf55b2861057a213e610da2fec52125439a11d|6]], [[http://git.kernel.org/linus/85dac906bec3bb41bfaa7ccaa65c4706de5cfdf8|7]], [[http://git.kernel.org/linus/671fd9dabe5239ad218c7eb48b2b9edee50250e6|8]], [[http://git.kernel.org/linus/8277434ef1202ce30315f8edb3fc760aa6e74493|9]], [[http://git.kernel.org/linus/64660c864f46202b932b911a69deb09805bdbaf8|10]], [[http://git.kernel.org/linus/5238cdd3873e67a98b28c1161d65d2a615c320a3|11]], [[http://git.kernel.org/linus/8cb120d3e41a0464a559d639d519cef563717a4e|12]], [[http://git.kernel.org/linus/d3d9dc3302368269acf94b7381663b93000fe2fe|13]], [[http://git.kernel.org/linus/e8da1b18b32064c43881bceef0f051c2110c9ab9|14]], [[http://git.kernel.org/linus/d8b4986d3dbc4fabc2054d63f1d31d6ed2fb1ca8|15]], [[http://git.kernel.org/linus/88ebc08ea9f721d1345d5414288a308ea42ac458|16)]]

Linux 3.2 adds experimental support for thin provisioning in the DM layer. Users will be able to create multiple thinly provisioned volumes out of a storage pool. Another significant feature included in the thin-provision DM target is support for an arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots...), which avoids degradation with depth. Code: [http://git.kernel.org/linus/95d402f057f2e208e4631893f6cd4a59c7c05e41(commit 1], [http://git.kernel.org/linus/3241b1d3e0aaafbfcd320f4d71ade629728cc4f4 2], [http://git.kernel.org/linus/991d9fa02da0dd1f843dc011376965e0c8c6c9b53)]

Linux 3.2 adds experimental support for thin provisioning in the DM layer. Users will be able to create multiple thinly provisioned volumes out of a storage pool. Another significant feature included in the thin-provision DM target is support for an arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots...), which avoids degradation with depth. Code: [[http://git.kernel.org/linus/95d402f057f2e208e4631893f6cd4a59c7c05e41|(commit 1]], [[http://git.kernel.org/linus/3241b1d3e0aaafbfcd320f4d71ade629728cc4f4|2]], [[http://git.kernel.org/linus/991d9fa02da0dd1f843dc011376965e0c8c6c9b5|3)]]

A critical part of the writeback code is deciding how much data pending of being written can be hold on RAM. In this kernel, the algorithms to make that decision have been rewritten (check the LWN article for more details). As a result, IO seeks and CPU contentions should be greatly reduced. Users will notice a more responsive system during heavy writeback, "killall dd" will take effect instantly. Users may also notice much smoothed pause times in workloads that have the write() syscall inside its loop, and also in NFS, JBOD and concurrent dd's. Lock contention and cache bouncing in concurrent IO workloads have been much improved. Code: [http://git.kernel.org/linus/c8e28ce049faa53a470c132893abbc9f2bde9420(commit 1], [http://git.kernel.org/linus/6c14ae1e92c77eabd3e7527cf2e7836cde8b8487 2], [http://git.kernel.org/linus/af6a311384bce6c88e15c80ab22ab051a918b4eb 3], [http://git.kernel.org/linus/be3ffa276446e1b691a2bf84e7621e5a6fb49db9 4], [http://git.kernel.org/linus/7381131cbcf7e15d201a0ffd782a4698efe4e740 5], [http://git.kernel.org/linus/9d823e8f6b1b7b39f952d7d1795f29162143a433 6], [http://git.kernel.org/linus/143dfe8611a63030ce0c79419dc362f7838be557 7], [http://git.kernel.org/linus/c8462cc9de9e92264ec647903772f6036a99b286 8], [http://git.kernel.org/linus/57fc978cfb61ed40a7bbfe5a569359159ba31abd 9], [http://git.kernel.org/linus/8927f66c4ede9a18b4b58f7e6f9debca67065f6b 10], [http://git.kernel.org/linus/b00949aa2df9970a912bf060bc95e99da356881c 11], [http://git.kernel.org/linus/b48c104d2211b0ac881a71f5f76a3816225f8111 12], [http://git.kernel.org/linus/ece13ac31bbe492d940ba0bc4ade2ae1521f46a5 13], [http://git.kernel.org/linus/1df647197c5b8aacaeb58592cba9a1df322c900014)]

There has been also work to reduce the filesystem writeback from the page reclaim, which also improves performance in many cases. Code: [http://git.kernel.org/linus/ee72886d8ed5d9de3fa0ed3b99a7ca7702576a96(commit 1], [http://git.kernel.org/linus/a18bba061c789f5815c3efc3c80e6ac269911964 2], [http://git.kernel.org/linus/94054fa3fca1fd78db02cb3d68d5627120f0a1d4 3], [http://git.kernel.org/linus/966dbde2c208e07bab7a45a7855e1e693eabe661 4], [http://git.kernel.org/linus/f84f6e2b0868f198f97a32ba503d6f9f319a249a 5], [http://git.kernel.org/linus/92df3a723f84cdf8133560bbff950a7a99e92bc9 6], [http://git.kernel.org/linus/49ea7eb65e7c5060807fb9312b1ad4c3eab82e2c7)]

A critical part of the writeback code is deciding how much data pending of being written can be hold on RAM. In this kernel, the algorithms to make that decision have been rewritten (check the LWN article for more details). As a result, IO seeks and CPU contentions should be greatly reduced. Users will notice a more responsive system during heavy writeback, "killall dd" will take effect instantly. Users may also notice much smoothed pause times in workloads that have the write() syscall inside its loop, and also in NFS, JBOD and concurrent dd's. Lock contention and cache bouncing in concurrent IO workloads have been much improved. Code: [[http://git.kernel.org/linus/c8e28ce049faa53a470c132893abbc9f2bde9420|(commit 1]], [[http://git.kernel.org/linus/6c14ae1e92c77eabd3e7527cf2e7836cde8b8487|2]], [[http://git.kernel.org/linus/af6a311384bce6c88e15c80ab22ab051a918b4eb|3]], [[http://git.kernel.org/linus/be3ffa276446e1b691a2bf84e7621e5a6fb49db9|4]], [[http://git.kernel.org/linus/7381131cbcf7e15d201a0ffd782a4698efe4e740|5]], [[http://git.kernel.org/linus/9d823e8f6b1b7b39f952d7d1795f29162143a433|6]], [[http://git.kernel.org/linus/143dfe8611a63030ce0c79419dc362f7838be557|7]], [[http://git.kernel.org/linus/c8462cc9de9e92264ec647903772f6036a99b286|8]], [[http://git.kernel.org/linus/57fc978cfb61ed40a7bbfe5a569359159ba31abd|9]], [[http://git.kernel.org/linus/8927f66c4ede9a18b4b58f7e6f9debca67065f6b|10]], [[http://git.kernel.org/linus/b00949aa2df9970a912bf060bc95e99da356881c|11]], [[http://git.kernel.org/linus/b48c104d2211b0ac881a71f5f76a3816225f8111|12]], [[http://git.kernel.org/linus/ece13ac31bbe492d940ba0bc4ade2ae1521f46a5|13]], [[http://git.kernel.org/linus/1df647197c5b8aacaeb58592cba9a1df322c9000|14)]]

There has been also work to reduce the filesystem writeback from the page reclaim, which also improves performance in many cases. Code: [[http://git.kernel.org/linus/ee72886d8ed5d9de3fa0ed3b99a7ca7702576a96|(commit 1]], [[http://git.kernel.org/linus/a18bba061c789f5815c3efc3c80e6ac269911964|2]], [[http://git.kernel.org/linus/94054fa3fca1fd78db02cb3d68d5627120f0a1d4|3]], [[http://git.kernel.org/linus/966dbde2c208e07bab7a45a7855e1e693eabe661|4]], [[http://git.kernel.org/linus/f84f6e2b0868f198f97a32ba503d6f9f319a249a|5]], [[http://git.kernel.org/linus/92df3a723f84cdf8133560bbff950a7a99e92bc9|6]], [[http://git.kernel.org/linus/49ea7eb65e7c5060807fb9312b1ad4c3eab82e2c|7)]]

This systems works well, but in some cases where packets are lost, it takes too much time to recover the maximum speed. Google has developed an alternative recovering algorithm, called "Proportional Rate Reduction", which improves latency and the time to recover. For information, you can check [ http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01 a IETF draft], two slides ([http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf1], [http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf 2]), or the [https://lwn.net/Articles/458610/LWN article]. Code: [http://git.kernel.org/linus/a262f0cdf1f2916ea918dc329492abb5323d9a6c(commit)]

This systems works well, but in some cases where packets are lost, it takes too much time to recover the maximum speed. Google has developed an alternative recovering algorithm, called "Proportional Rate Reduction", which improves latency and the time to recover. For information, you can check [ http://tools.ietf.org/html/draft-mathis-tcpm-proportional-rate-reduction-01 a IETF draft], two slides ([[http://www.ietf.org/proceedings/80/slides/tcpm-6.pdf|1]], [[http://tools.ietf.org/agenda/81/slides/tcpm-2.pdf|2]]), or the [[https://lwn.net/Articles/458610/|LWN article]]. Code: [[http://git.kernel.org/linus/a262f0cdf1f2916ea918dc329492abb5323d9a6c|(commit)]]

Cross memory attach adds two syscalls -process_vm_readv, process_vm_writev- which allow to read/write from/to another processes' address space. The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. Code: [http://git.kernel.org/linus/fcf634098c00dd9cd247447368495f0b79be12d1(commit)]

Cross memory attach adds two syscalls -process_vm_readv, process_vm_writev- which allow to read/write from/to another processes' address space. The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. Code: [[http://git.kernel.org/linus/fcf634098c00dd9cd247447368495f0b79be12d1|(commit)]]

* devfreq: devfreq is a generic DVFS framework that can be registered for a device with OPP support in order to let the governor provided to DEVFREQ choose an operating frequency based on the OPP's list and the policy given with DEVFREQ [http://git.kernel.org/linus/a3c98b8b2ede1f4230f49f9af7135cd902e71e83(commit)], [http://git.kernel.org/linus/9005b65099ee4f14b6be691c4574612fe947531a(commit)],[http://git.kernel.org/linus/ce26c5bb9569d8b826f01b8620fc16d8da6821e9(commit)] * Improve performance of LZO/plain hibernation, checksum image [http://git.kernel.org/linus/081a9d043c983f161b78fdc4671324d1342b86bc(commit)] * Include storage keys in hibernation image on s390 [http://git.kernel.org/linus/85055dd805f0822f13f736bee2a521e222c38293(commit)] * Implement per-device PM QoS constraints [http://git.kernel.org/linus/91ff4cb803df6de9114351b9f2f0f39f397ee03e(commit)]

* devfreq: devfreq is a generic DVFS framework that can be registered for a device with OPP support in order to let the governor provided to DEVFREQ choose an operating frequency based on the OPP's list and the policy given with DEVFREQ [[http://git.kernel.org/linus/a3c98b8b2ede1f4230f49f9af7135cd902e71e83|(commit)]], [[http://git.kernel.org/linus/9005b65099ee4f14b6be691c4574612fe947531a|(commit)]],[[http://git.kernel.org/linus/ce26c5bb9569d8b826f01b8620fc16d8da6821e9|(commit)]] * Improve performance of LZO/plain hibernation, checksum image [[http://git.kernel.org/linus/081a9d043c983f161b78fdc4671324d1342b86bc|(commit)]] * Include storage keys in hibernation image on s390 [[http://git.kernel.org/linus/85055dd805f0822f13f736bee2a521e222c38293|(commit)]] * Implement per-device PM QoS constraints [[http://git.kernel.org/linus/91ff4cb803df6de9114351b9f2f0f39f397ee03e|(commit)]]

Summary: This release includes support for ext4 block sizes bigger than 4KB and up to 1MB, which improve performance with big files; btrfs has been updated with faster scrubbing, automatic backup of critical filesystem metadata and tools for manual inspection of the filesystems; the process scheduler has added support to set upper limits of CPU time; the desktop reponsiveness in presence of heavy writes has been improved, TCP has been updated to include an algorithm which speeds up the recovery of the connection after lost packets; the profiling tool "perf top" has added support for live inspection of tasks and libraries and see the annotated assembly code; the Device Mapper has added support for 'thin provisioning' of storage, and a new architeture has been added: the Hexagon DSP processor from Qualcomm. Other drivers and small improvements and fixes are also available in this release.

1. Prominent features in Linux 3.2

1.1. ext4: Support for bigger block sizes

The maximum size of a filesystem block in ext4 has always been 4 KB in x86 systems. But the storage capacity of modern hard disks is growing fast, and with the size of hard disks, the overhead of using such small size as block size increases. Small block sizes benefit users who have many small files, because the space will be used more efficiently, but people who uses large files would benefit of larger block sizes.

ext4 supports now block sizes of up to 1MB of size, which decreases considerably the time spent doing block allocations, and there is smaller fragmentation. These new block sizes must be set at creation time, using the mkfs -C option (requires e2fsprogs 1.42). This feature is not backwards compatible with older kernels. Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17)

Scrubbing -the process of checking all the checksums of the filesystem- uses read-ahead to improve the performance. The average disk bandwith utilisation on a test volume was raised from 70% to 90%. On another volume, the time for a test run went down from 89 seconds to 43 seconds. Code: (commit 1, 2, 3, 4)

Log of past tree roots

Btrfs will store in the filesystem superblock information about most of the tree roots in the last four commits. A "-o recovery" mount option has been used to allow a user to use the root history log when the filesystem is not able to read the tree of the tree roots, the extent tree root, the device tree root or the csum root. Code: (commit)

Detailed corruption messages

Btrfs has always had "back references" that allow to find which files or b-trees actually reference a given block, but until now walking those references has been a manual process. Code to follow these backrefs has been added, with improved messages as result. For example, after scribbled over the blocks in one file on the disk and starting a scrub, instead of just telling that block xxyyzz is bad, the kernel now will print this: Code: (commit 1, 2)

As part of the previous feature, some code has also been added to allow manual inspection of the filesystem from userspace utilities. To find the file that belongs to extent 5085110272 , you can run: Code: (commit)

btrfs inspect logical 5085110272 /mntOr to find the filename for inode number 32583:

1.3. Process bandwith controller

The process scheduler divides the available CPU bandwith between all processes that need to run. There is no limits of how much CPU bandwith each process gets if there is free bandwith available, because all processes are supposed to want as much as possible. But apparently, some companies like Google have some scenarios where this unbounded allocation of CPU bandwith may lead to unacceptable utilization or latency variation.

The CPU bandwidth control solves this problem allowing to set an explicit maximum limit for allowable CPU bandwidth. The bandwidth allowed for a group pf processes is specified using a quota and period. Within each given "period" (microseconds), a group is allowed to consume only up to "quota" microseconds of CPU time. When the CPU bandwidth consumption of a group exceeds this limit (for that period), the tasks belonging to its hierarchy will be throttled and are not allowed to run again until the next period. Documentation: Documentation/scheduler/sched-bwc.txt. Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)

1.4. New architecture: Hexagon

The Hexagon processor is a general-purpose digital signal processor designed for high performance and low power across a wide variety of applications. It merges the numeric support, parallelism, and wide computation engine of a DSP, with the advanced system architecture of a modern microprocessor.

1.5. Thin provisioning and recursive snapshots in the Device Mapper

Typically, provisioning storage capacity to multiple users can be inefficient. For example, if 10 users need 10 GB each one, you will need 100 GB of storage capacity. These users, however, very probably won't use most of that storage space. Let's suppose that, on average, they only use 50% of their allocated space: only 50 GB will be used, and the other 50 GB will be underutilized.

Thin provisioning allows to assign to all users combined more storage capacity than the total storage capacity of the system. In the previous case, you could buy only 50 GB of storage, let each users have 10 GB of theorical storage space (100 GB in total), and have no problems, because the 50 GB you bought are enought to satisfy the real demand of storage. And if users increase the demand, you can add more storage capacity. Thanks to thin provisioning, you can optimize your storage investment and avoid over-provisioning.

Linux 3.2 adds experimental support for thin provisioning in the DM layer. Users will be able to create multiple thinly provisioned volumes out of a storage pool. Another significant feature included in the thin-provision DM target is support for an arbitrary depth of recursive snapshots (snapshots of snapshots of snapshots...), which avoids degradation with depth. Code: (commit 1, 2, 3)

"Writeback" is the process of writing buffered data from the RAM to the disk, and in this context throttling means blocking processes temporally to avoid them creating new data that needs to be written, until the current data has been written to the disk.

A critical part of the writeback code is deciding how much data pending of being written can be hold on RAM. In this kernel, the algorithms to make that decision have been rewritten (check the LWN article for more details). As a result, IO seeks and CPU contentions should be greatly reduced. Users will notice a more responsive system during heavy writeback, "killall dd" will take effect instantly. Users may also notice much smoothed pause times in workloads that have the write() syscall inside its loop, and also in NFS, JBOD and concurrent dd's. Lock contention and cache bouncing in concurrent IO workloads have been much improved. Code: (commit 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)

There has been also work to reduce the filesystem writeback from the page reclaim, which also improves performance in many cases. Code: (commit 1, 2, 3, 4, 5, 6, 7)

1.7. TCP Proportional Rate Reduction

TCP tries to achieve the maximum bandwidth of a network link increasing the send rate until the network link starts losing packets. When a packet is lost, TCP slows down it tries to increase slowly the speed again.

1.8. Improved live profiling tool "perf top"

The live profiling tool "perf top" has been rewritten and improved. Beyond the prettier output, it has the ability to navigate while data capture is going on, and the new ability to zoom into tasks and libraries. Users can even see annotated assembly code, hit enter on a CALLQ instruction and get moved to the called function's annotated assembly code. This works recursively, so users can explore the assembly code arbitrarily deep. Code: many different commits

1.9. Cross memory attach

Cross memory attach adds two syscalls -process_vm_readv, process_vm_writev- which allow to read/write from/to another processes' address space. The basic idea behind cross memory attach is to allow MPI programs doing intra-node communication to do a single copy of the message rather than a double copy of the message via shared memory. Code: (commit)

5. Networking

Support for transmission of IPv6 packets as well as the formation of IPv6 link-local addresses and statelessly autoconfigured addresses on top of IEEE 802.15.4 networks. For more information please look at the RFC4944 "Compression Format for IPv6 Datagrams in Low Power and Lossy Networks (6LoWPAN) (commit)

NCI support. The NFC Controller Interface (NCI) is a standard communication protocol between an NFC Controller (NFCC) and a Device Host (DH), defined by the NFC Forum (commit), (commit)

B.A.T.M.A.N. ad hoc networking: implement AP-isolation on the receiver side (commit), implement AP-isolation on the sender side (commit)

af-iucv: The current transport mechanism for af_iucv is the z/VM offered communications facility IUCV. To provide equivalent support when running Linux in an LPAR, HiperSockets transport is added to the AF_IUCV address family (commit)

6. Device Mapper

7. Power management

devfreq: devfreq is a generic DVFS framework that can be registered for a device with OPP support in order to let the governor provided to DEVFREQ choose an operating frequency based on the OPP's list and the policy given with DEVFREQ (commit), (commit),(commit)