Virtualization and the POWER5 Architecture

IBM has made substantial improvements throughout the years on its proprietary RISC-based hardware, where mainframe-type systems are even necessary to use the new architecture. Systems like the HMC (hardware management console) and the Hypervisor (software that runs on the hardware that allows for the virtualization) are important elements of the architecture and necessary components to building systems that allow for the virtualization. The evolution of AIX and Linux and the introduction of IBM's POWER5 architecture have helped the midrange gain mainframe capabilities (reliability, virtualization, and performance).

Unlike the other RISC-based hardware vendors (Sun and HP), IBM has fully implemented most of the features of its most powerful architecture into its Linux support. This is largely due to the recent developments of the Linux 2.6 kernel, which has brought Linux into the forefront. IBM added its own code to the SUSE and Red Hat kernels to provide support to the POWER5.

Advanced Power Virtualization (APV) helps decrease TCO while allowing the use of shared I/O resources. A major feature available to p5 servers through POWER5 technology is micro-partitioning (a feature of APV), which enables the creation of multiple virtual partitions using a single processor. You can tailor each virtual host to the resource requirements of a particular application. By micro-partitioning LPARs, you can take advantage of unused clock cycles in an attempt to mirror the mainframe world by using as much of the overall CPU capacity as possible. Without partitioning, processing resources are typically underused. SMP partitioning today traditionally requires allocation of one or more entire microprocessors to each supported partition. Micro-partitions can use increments of as little as one-tenth of a processor. The results are increased productive use of system resources, higher system productivity, and lower TCO.

Hypervisor

The technology behind the virtualization capabilities of POWER5 systems is a kind of firmware known as the POWER Hypervisor. The Hypervisor supports partitioning and dynamic resource movement across multiple operating systems, without which there would be no virtualization. It is the foundation of the IBM virtualization engine, which is part of the overall POWER5 architecture. The POWER Hypervisor supports many advanced functions when compared to previous versions found in POWER4 processor-based systems. This includes processor sharing, virtual I/O, and high-speed communications among partitions. It uses a virtual LAN, while it enables multiple operating systems to run on the single system, including AIX, Linux, and i5/OS (AS400) operating systems. With support for dynamic resource movement across multiple environments, clients can move processors, memory, and I/O between partitions on the system as workloads are moved between partitions.

The POWER5 processor supports special machine instructions, which the Hypervisor uses exclusively. If an operating system instance in a partition requires access to hardware, it must first invoke the Hypervisor. The Hypervisor allows privileged access to the operating system for dedicated hardware facilities and includes protection for those facilities in processor and memory locations. Like any virtual engine, the hypervisor does come with an overhead, but not nearly as much as in systems such as VMWARE.

Advanced Power Virtualization

APV is a combination of hardware and software that supports and manages the virtual I/O environment on POWER5 systems. It provides several technologies:

Micro-Partitioning

SMT Multithreading Support

Shared Ethernet

Virtual SCSI Server

Partition Load Manager

The key ingredient to making it work is a kernel (available in AIX 5.3 and Linux versions) modified to provide the support of the Hypervisor and the p5.

Micro-partitioning (nothing new to IBM systems programmers) is actually mainframe-inspired technology, which just arrived on the midrange with the introduction of POWER5 systems. It allows the virtualization of CPUs shared by multiple partitions. One CPU can split into as much as ten logical partitions, each of which can receive as little as one-tenth of a CPU.

Historically, Unix administrators perceive their systems as CPU-bound if they have more than 40 or 50% usage. The POWER5 technology allows you to use all the resources in your environment and not worry about partitions that are 80% busy. This, of course, relies on the assumption that you did not purchase your hardware to sleep. You can even uncap your partitions to allow for other partitions to make use of their shared partitioned resources while they're not using them. This allows for logical partitions to use even more then their entitled capacity (EC) when the workload gets heavier. In my view, this is the most important feature of APV, as it really enables you to take full advantage of the POWER architecture. When using uncapped partitions, be aware of licensing implications, as each ISV has its own unique method of accounting for licensing in a partitioned world.

Virtual I/O (VIO) provides the capability for multiple logical partitions on the same server to use a single I/O adapter. This enables the consolidation of I/O resources and minimizes the number of I/O adapters necessary. From a financial standpoint, the use of VIO also provides a more economic I/O model by using physical resources more efficiently through its ability to share resources. With each partition typically requiring one I/O slot for disk attachment and another one for network attachment, this definitely puts a limit on the number of partitions you can have.

Sharing I/O resources overcomes physical limitations. Virtual SCSI provides the means to do this for SCSI storage devices. Further, VIO enables the attachment of previously unsupported storage solutions. As long as the VIO supports the attachment of a storage resource, any client partition can access this storage by using virtual SCSI adapters. Typically, a small operating system instance needs at least one slot for a Network Interface Connector (NIC) and one slot for a disk adapter (such as SCSI or Fibre Channel), but larger environments often require many more NICs and HBAs (Host Bus Adapters).

What is really impressive is that the partition itself can have any combination of physical and virtual I/O adapters. The IBM VIO Server is the link between the virtual and physical resources. It is a specialized partition that owns the physical I/O resources (only POWER5 servers support this). This server runs in a special partition that cannot execute application code. It essentially provides two functions: serving SCSI devices to client partition and also shared Ethernet adapters for VLANs.

One best practice is to implement two VIO Servers on each server to provide for availability should one VIO Server crash. It's important to understand that virtual I/O devices are complements to physical I/O adapters, not replacements. Obviously, you need physical resources on the actual VIO servers. As an aside, I would be very careful about using VIO servers in production environments, especially in an environment that requires maximum performance. I prefer to use VIO for my development, testing, and support environments.

SMT is a POWER5 enhancement that allows for multiple threads to execute concurrently on a single processor. It requires either AIX 5.3 or a supported version of Linux (RHEL4 or SLES9+), and can lead to approximately a 30% improvement in throughput. Because of its dual-core design and support for simultaneous multithreading, one POWER5 chip actually appears as a four-way microprocessor to the OS. SMT-capable processors can issue multiple instructions from different code paths in one singe cycle. Each core appears to the operating system as a two-way SMP (symmetric multiprocessor). AIX 5L V5.3 actually supports each hardware thread as a separate logical processor. It configures each dedicated partition with one physical processor as a logical two-way by default. Disabling SMT takes at least half of its logical processors offline.

To sum up, the essential characteristics of the POWER5 simultaneous multithreading implementation are:

Processor resources optimized for performance, which provide one the ability to reduce priority of a thread that may be consuming the maximum resources.

Eight priority levels for each thread that the hypervisor or system can raise or lower.

Some applications might actually benefit from turning SMT off, but that is the exception rather than the rule.

Having two threads executing on the same processor will not increase the performance of applications with execution-unit-limited type of performance issues or applications that consume all of the processor's memory bandwidth. For this reason, the POWER5 also supports single-threaded execution mode.

Managing Partitions

Partition Load Manager (PLM) automates the administration of CPU and memory across logical partitions within a managed system. PLM will automate the migration of resources based on partition load and priorities that you assign. High-demand partitions will use more resources, based on how you configure PLM. User-defined policies will govern how it moves around those resources. Only AIX supports PLM, but note that uncapping logical partitions can already provide many of the features of PLM.

Truthfully, I am not a great fan of PLM. Aside from memory-management capabilities, there is little more you can do with PLM. If you need workload management for memory capabilities, consider the DLPAR toolset from IBM alphaWorks. PLM requires a separate partition, along with assigned resources. The DLPAR toolset is much less of a burden, as it is basically just a collection of Perl scripts. It may not be as sexy as PLM, but it has most of its capabilities without the overhead. It's also very easy to install, configure, and maintain.

Deployment

People often ask me what type of server they should purchase to support their workloads. This is where you'll usually end up working with IBM or a qualified IBM business partner that can help evaluate your environment and recommend solutions based on your needs and workload. Depending on the application that you use (here is where your ISV may be able to assist you) and the performance, availability, and scalability requirements, one company may be able to get by with a p570 server costing roughly $400,000 to support 250 users, while another company may require a p595 costing $2,000,000 to support 50 users.

If you are migrating from older existing systems, your business partner has many tools that can help assist in your capacity planning/architectural efforts. One IBM developer created a unique tool that actually allows you to plug in your older system (it can even be a Sun server), and outputs the recommended configuration from a new pSeries model. Tools such as these greatly improve the ability of vendors to identify the appropriate architecture for your migration. If you want to start from somewhere, the most popular pSeries server is clearly the p570 model. Now marketed as an enterprise class machine, it can scale up to a 16-way and has a much more competitive price than its larger mainframe-looking brethren, the 590 and 595 models. Depending again on your application and Service Level Agreement (SLA) requirements, it can easily support the most demanding of ERP applications with hundreds of users.

Finally, it's important to reiterate that using the POWER5 Architecture without APV is a complete waste of its capabilities. I cannot stress enough the importance of making certain that your applications have support on AIX 5.3 (or the supported version of Linux) before buying all your new POWER5 technology. Yes, AIX 5.2 and RHEL3 will run on a P5, but it will allow you to do dynamic partitioning only. I know of companies that made substantial investments in the POWER5 architecture but use it as no more then a souped-up Regatta p690 with more rPerfs. Don't make the same mistake.

Ken Milberg
has worked for both large and small organizations and has held diverse positions from CIO to Senior AIX Engineer.