IOMMUs, "IO Memory Management Units", are hardware devices that translate device DMA addresses to machine addresses. Isolation capable IOMMUs perform a valuable system service, preventing rogue devices from performing errant or malicious DMAs, thereby substantially increasing the system's reliability and availability. Without an IOMMU, a peripheral device could be programmed to overwrite any part of the system's memory. An isolation capable IOMMU restricts a device so that it can only access parts of memory it has been explicitly granted access to. Operating systems utilize IOMMUs to isolate device drivers; hypervisors utilize IOMMUs to grant secure direct hardware access to virtual machines. With the imminent publication of the PCI-SIG's IO Virtualization standard, as well as Intel and AMD's introduction of isolation capable IOMMUs in all new servers, IOMMUs will become ubiquitous.

IOMMUs can impose a performance penalty due to the extra memory accesses required to perform DMA operations. The exact performance degradation depends on the IOMMU design, its caching architecture, the way it is programmed and the workload. In this paper, we present the performance characteristics of the Calgary and DART IOMMUs in Linux, both on bare metal and hypervisors. We measure the throughput and CPU utilization of several IO workloads with and without an IOMMU and analyze the results. We then discuss potential strategies for mitigating the IOMMU's costs. We conclude by presenting a set of optimizations we have implemented and the resulting performance improvements.

With Linux for the Sony PS3, the IBM QS2x blades and the Toshiba Celleb platform having hit mainstream Linux distributions, programming for the Cell BE is becoming increasingly interesting for developers of performance computing. This talk is about the concepts of the architecture and how to develop applications for it.

Most importantly, there will be an overview of new feature additions and latest developments, including:

Preemptive scheduling on SPUs (finally!): While it has been possible to to run concurrent SPU programs for some time, there was only a very limited version of the scheduler implemented. Now we have a full time-slicing scheduler with normal and real-time priorities, SPU affinity and gang scheduling.

Using SPUs for offloading kernel tasks: There are a few compute intensive tasks like RAID-6 or IPsec processing that can benefit from running partially on an SPU. Interesting aspects of the implementation are how to balance kernel SPU threads against user processing, how to efficiently communicate with the SPU from the kernel and measurements to see if it is actually worthwhile

Overlay programming: One significant limitation of the SPU is the size of the local memory that is used both its code and data. Recent compilers support overlays of code segments, a technique widely known in the previous century but mostly forgotten in Linux programming nowadays.

This paper will discuss the difficulties and methods involved in debugging the Linux kernel. Intermittent errors that occur once every few years are hard to debug ... but a problem when running across thousands of machines simultaneously. The more we scale to very large clusters, the more reliablilty becomes critical. In such environments, many of the normal debugging luxuries are gone (like a serial console, or any physical access), and we're forced to change to a different strategy to solve thorny intermittent race conditions.

We need (and have created) powerful but lightweight kernel tracing tools that are critical for cluster debugging, but also make powerful weapons in a smaller scale enviroment, where they can help debug issues more quickly and less intrusively. Real world usage examples will be included.

Cache memory compression (or compressed caching) was originally developed for desktop and server platforms but has also attracted interest on embedded systems where generally, memory is a scarce resource and hardware changes bring more costs and energy consumption. Cache memory compression brings a considerable advantage in input-output intensive applications by means of using a virtually larger cache for the local file system through compression algorithms. As a result, it increases the probability of fetching the necessary data in RAM itself, avoiding the need to make low calls to local storage. This work evaluates an Open Source implementation of the cache memory compression applied to Linux on an embedded platform, dealing with the unavoidable processor and memory resource limitations as well as with existing architectural differences.

Until now, most of the focus in Linux CPU power management has been on active CPU power management. cpufreq, which changes the processor frequency and/or voltage and manages the CPU performance levels and power consumption based on CPU load. Another dimension of CPU power management is CPU idling power. In general, there is now more focus shifting towards idle power (Energy star) and new platforms/processors are supporting multiple idle-states with different power and wakeup latency characteristics. Today most of the mobile processors support multiple idle states with varying amount of power consumed in those idle states and each state has an entry-exit latency associated. This emphasis on idle power necessitates the need for a generic Linux kernel framework to manage idle CPUs.

This paper covers 'cpuidle' - an effort toward a generic idle framework in the Linux kernel. The goal is to have a clean interface for any platform to make use of different CPU idle levels and also to provide abstraction between idle-drivers and idle-governors allowing for independent development. The target audience includes those who have a general interest in idle processor power management and its impact on battery life, developers who would like to create new and better governors, and developers interested in utilizing the cpuidle infrastructure on new platforms.

Evolution and Diversity: The Meaning of Freedom and Openness in Linux - James Bottomley

Note: paper not in proceedings

2007 looks like being the year when Free Software and Open Source finally make their differences (which have been bubbling away under the surface for over a decade) manifest. For all of its differences with "free software", Linux consistently maintains the greatest amount of innovation of any of the Open Source Operating Systems. We'll take a whimsical and offbeat tour of the reasons why this might be so from the point of view of the maintainer of possibly the least popular (certainly the least used) kernel architecture.