Managing Workloads in z/OS

The zSeries* platform and z/OS* can run multiple, concurrent workloads within one z/OS image or across multiple images that share the same physical processor. The dynamic workload management function, implemented in the workload manager (WLM) component of z/OS, makes this possible.

The z/OS WLM component dynamically allocates or redistributes server resources such as CPU, I/O and memory across a set of workloads based on user-defined goals and their resource demand within a z/OS image. WLM can function across multiple images of z/OS, Linux or VM sharing a zSeries processor. WLM also assists routing components and products to dynamically direct work requests associated with a multi-system workload to run on a z/OS image within a Parallel Sysplex* that has sufficient server resources to meet customer-defined goals.

Managing these workloads within a z/OS system across multiple systems in a Parallel Sysplex or sharing the same zSeries processor requires the end user to subdivide them into different categories and associate goals with each category. This is known as workload classification, where the workloads are classified into distinct service classes that define the performance goals to meet the business objectives and end-user expectations.

Classification is based on application- or middleware-specific qualifications that allow WLM to identify different work requests in the system; for example, defining a service class for all batch programs running in the same batch class or for all requests accessing the same DB2* stored procedures. During runtime, the subsystems of the OS, middleware components and applications inform WLM when new work arrives, through programming interfaces that all OS and major middleware components exploit. The interfaces allow a work manager to pass the classification attributes that the component supports to WLM. WLM associates the new work request with the service class based on the user definitions and starts to manage the work requests toward the affiliated goal definition, as shown in Figure 1.

A performance goal belongs to a certain type with an importance level between 1 and 5. That number tells WLM how important it is that the work in this service class meets its goal, if it doesnt already. WLM makes every attempt to ensure that work with a level 1 service class is meeting its goal before moving to work in importance level 2 and so forth down to level 5.

Goal type expresses how the end user wants to see the work perform in a service class. Three types of goals exist:

Response Time-This can be expressed either as "Average Response Time" (for example, all work requests in the service class should complete, on average, in one second) or as "Response Time with a Percentile" (for example, 90 percent of all work requests in a service class should complete within .8 seconds). Using a response time goal assumes that the applications or middleware tell WLM when a new transaction starts. This is the case when the component supports the WLM Enclave services, and its possible for CICS and IMS and subsystems like Job Entry Subsystem (JES), TSO and UNIX* System Services. Using a response time goal also requires that a certain number of work requests constantly arrive in the system.

Execution Velocity-Response Time isnt appropriate for all types of work, for example address spaces that dont exploit WLM services or long-running transactions with infinite ends. To manage those workloads, a formula measures a transactions velocity as a number between 1 and 99, quantifying how much time a transaction spends waiting for system resources. Higher velocity means fewer delays have been encountered.

Discretionary-Assign this goal to work that can run whenever the system has extra resources. Discretionary work isnt associated with an importance level, so it accesses resources only when the requirements for all work with an importance level can be satisfied.

A service class can also be subdivided into multiple periods. As the work consumes more resources, it may "age" from one period to another, at which point it can be assigned a new goal and importance level. The resource consumption is expressed in Service Units, a z/OS definition for work consuming resources. For example, a service class for batch jobs can be broken into three periods. The first period has a "high" execution velocity goal of 50 and an importance level of 3 for the first 2,500 Service Units. Work that takes longer would go into the second period with a defined execution velocity of 20 and an importance level of 5 for the next 10,000 Service Units. Finally, long-running batch jobs age into the third period, which is associated with a discretionary goal.

We can now classify the work and define its performance goals. The subsystems, middleware components and applications use WLM services to inform WLM when new work arrives and to encapsulate work requests into uniquely identifiable entities that can be monitored and managed. WLM constantly collects performance data by service class, compares the results with the goal definitions and changes the access to the resources for the work entities contained in the service classes based on goal achievement and demand. Data collection occurs every 250 milliseconds with goal and resource adjustment executing every 10 seconds. WLM calculates the performance index (PI) to determine whether a service class is meeting its goals. A response time goal is the quotient of the actual achieved response time divided by the goal value, and an execution velocity goal is the defined value divided by the measured value in the system. If the PI is less than one, the goal is overachieved, and if the value is greater than one the service class misses its goal. If a service class doesnt achieve its goal, WLM attempts to give that service class more of the specific resource it needs. As several service classes may be competing for the same resource, however, WLM must perform a constant balancing act, making trade-offs based on the business importance and goals of the different work types.

Whenever WLM decides to make a change, the current goal adjustment is completed and the system is monitored again for the next 10 seconds while WLM assesses whether additional changes are required. If no change was possible, WLM may look for another service class to help or attempt to help the selected service class for another resource.

The service class and goal definitions are part of the service definition for the entire Parallel Sysplex. The service definition can be subdivided into multiple service policies that allow you to dynamically change the performance goals of the service classes and have the same service definition with adapted goals in effect for certain times. For example, you may have different goals for your batch service classes during the night shift when primarily batch work runs in the system than you do during the day when the focus is online work.

Once you have a service definition, you must install it on a WLM couple data set and activate a service policy. In z/OS V1.3, dynamic workload management, or goal mode, finally replaced compatibility mode, with its Installation Performance Specifications (IPS) and Installation Classification Specifications (IPCS) parmlib members, as the way to manage work on the system.

Customers using earlier versions of z/OS, or running in compatibility mode on OS/390*, question how to migrate to goal mode. To distinguish system tasks from application work, z/OS V1.3 systems automatically run with a default policy consisting of three internal service classes. This is insufficient, so a customizable service definition, usable as a starting point for an installation-specific service definition, ships with z/OS V1.3. Another possibility is the goal mode migration tool, which assists in converting existing IPS and ICS parmlib members into a service definition for goal mode. The tool can be downloaded from the z/OS WLM Web site.

Latest Developments

Manageability enhancements-z/OS V1.1 and the availability of the first zSeries 900 introduced the Intelligent Resource Director (IRD) which, comprised of Parallel Sysplex, PR/SM* and WLM technologies, processes work in a clustered environment in a new way. Rather than distributing work to the systems, the resources are moved to where the work runs. Systems, IPLed in a logical partition (LPAR) on the same central processor complex (CPC), that belong to the same Parallel Sysplex and are running z/OS, form an LPAR cluster. Within such a cluster, the CPU weight (which specifies how much CPU capacity is guaranteed to an LPAR if demanded) can be shifted from one LPAR to another while the overall LPAR cluster weight remains constant (see Figure 2).

When all LPARs are busy, their current weights are all enforced. WLM can initiate a weight adjustment in favor of one system to help the most important work not meeting its goals. Along with CPU weight management, IRD can also manage the number of logical CPUs for a system. The LPAR overhead can be high when the logical-to-physical CPU ratio in a CPC exceeds 2.0. To avoid this, WLM manages the number of logical CPUs to be close to the physical number of CPUs that can be utilized based on the partitions current weight.

Dynamic channel path ID (CHPID) management is based on the same idea, namely managing a set of channel paths (also called managed channels) as a pool of floating channels and assigning those channel paths dynamically to DASD control units based on the importance of the work doing the I/O and the channel path delay measured for this work. Dynamic CHPID management not only allows the system to react on changing I/O patterns based on business importance, but also helps reduce the impact of the 256 channel paths limit: fixed channel paths need only be defined for availability considerations, while additional channels can be added or removed dynamically to meet the business objectives and increase the overall I/O performance based on the workload demand.

IRD also allows control of the I/O priority from its start point within the OS, through the channel subsystem, until its processed within the storage controller itself. Within the OS, an I/O request is queued in front of a particular device control block. The I/O priority determines the position of a new I/O request within that queue. When the request is processed, it flows through the channel subsystem where the I/O priority queuing allows an installation to define the partitions I/O priority relative to other partitions in the CPC. Within the storage controller, its again the priority of the I/O request that determines how fast the data is accessed from disk if the request cannot be satisfied from the cache.

The Small Product Enhancement (SPE), OW50221, extends the scope of an LPAR cluster and allows it to include non-z/OS members, particularly system images running Linux for zSeries. CPU weight can be shifted from z/OS images to non-z/OS images and vice versa. Non-z/OS images (not including OS/390) are defined as work assigned to a service class in the WLM service definition and managed toward a velocity goal.

All IRD functions can be exploited selectively. The ITSO Redbook, "z/OS Intelligent Resource Director" (SG24-5952), is recommended reading for those planning to implement any of the aforementioned functions.

Each batch job is associated with a job class that is either in JES or WLM mode. Accordingly, there are two types of initiators: JES-managed initiators selecting work from job classes in JES mode and WLM-managed initiators selecting work from job classes in WLM mode. Operators have full control over the JES-managed initiators but no control over WLM-managed initiators. When a job becomes ready for execution, an idle initiator associated with that job class selects the job. If no initiator exists or all initiators for that class are busy, the job must wait until an initiator becomes available.

That wait, or queue, time is factored into the actual goal achievement of the work as part of the overall response time or as an execution delay in case of velocity goals. If queue delay becomes a bottleneck and goals are no longer reached, WLM determines whether it can help the batch work by starting additional initiators. WLM calculates the maximum number of initiators that can be started to improve goal fulfillment without the expense of higher importance work. It also selects a system to start new initiators-systems with available capacity first, then systems with enough displaceable capacity. If more initiators are available than needed to process the batch work, WLM stops the initiators or assigns them to other job classes in WLM mode.

The current approach has limitations. A balancing decision, such as on what system additional batch work is started, can be made only when an initiator is started. Once an initiator is available, it selects whatever is eligible to be executed. If a job is submitted and an idle initiator exists, that initiator selects that job regardless of whether the system is CPU- or storage-constrained, or whether the job could run more efficiently on another system in the Parallel Sysplex. That can cause high use of systems (in the case of JES2, most often the converting system) with an impact on discretionary work thats queued behind the important work thats trying to meet its goals.

To relieve that problem, WLM provides an improved initiator balancing algorithm in z/OS V1.4. WLM stops initiators more aggressively on the most CPU-constrained system in the Parallel Sysplex when the utilization is higher than 95 percent. At the same time, another member of the Parallel Sysplex must have enough unused capacity (more than 7 percent) to restart the initiator (see Figure 3). This logical move can be done every 30 seconds. Additionally, when CPU and storage are available, up to five initiator address spaces can be started independently from goal fulfillment in response to batch demand to maximize resource utilization.

Reporting enhancements-The most extensive enhancement since the introduction of goal mode in MVS* 5.1 is the implementation of report class periods. Before z/OS V1.2, report classes appeared as simple containers for any kind of transaction-from those managed toward different velocities or different response time goals to those originating in different subsystems. This made general reporting possible, but not in specific cases where reporting makes sense for only a homogeneous set of transactions: response time distribution and subsystem work manager delays. Figure 4 illustrates an RMF Monitor III SYSRTD report. Many installations add service classes to help solve the reporting deficiencies of report classes, but service classes should only be used for management purposes. We recommend using only 25 active service class periods at any time in a system.

To solve this dilemma, WLM implemented report class periods where the actual period number is dynamically derived from the service class period in which a transaction is currently running. Even then, theres still the possibility of mixing different kinds of transactions. However, WLM can track the transactions attributed to a particular report class, informing performance monitors such as the Resource Measurement Facility* (RMF) about this when they request WLM reporting statistics. A homogeneous report class has all transactions attributed to it associated with the same service class. A heterogeneous report class has at least two transactions attributed to it that are associated with different service classes. A performance monitor can determine the homogeneity of the report class within the reporting interval and support the additional reporting capabilities.

Usually, response time distributions can only be generated for report classes when it reports on transactions managed by one service class with a response time goal. With a little trick, however, WLM can maintain the response time distribution for CICS and IMS workloads even though the CICS and IMS regions are managed towards execution velocity goals: CICS/IMS transactions are classified to a report class and a service class. The service classs response time goal is ignored for management; WLM will use it to maintain the response time distribution for the report class. RMF obtains the data and presents it throughout its reports.

Operational facilities-WLM now has the capability to reset Enclaves to a different service class. Operators can reset address spaces through an operator command or another console interface, such as the System Display and Search Facility (SDSF). The RESET-command allows the operator to specify a different service class, quiesce an address space (and swap it out if swappable) or resume it in the original service class (i.e., reclassify it according to the rules in the current service policy). In this way, address spaces can be slowed down or accelerated depending on the operators intention.

Until z/OS V1.3, nothing could be done with Enclaves once they were created. That release provides a similar reset capability as address spaces. But rather than giving operators a new command to enter a somehow circumstantial Enclave token (Enclaves have no name comparable to an address spaces job name), WLM provides an API for authorized programs to reset an Enclave. SDSF uses that API to allow operators to reset the service class of an independent Enclave, quiesce the Enclave or resume it. That support is implemented on SDSFs ENC-screen and requires SPE PQ50025.

The latest z/OS release, z/OS V1.4, includes the following WLM items, in addition to the aforementioned batch initiator balancing enhancements:

WLM enhanced its monitoring programming interfaces to allow middleware like WebSphere* to collect and report on subsystem states. This, together with enhancing the capability of WebSphere to classify work on z/OS, is to improve and ease the deployment of major middleware applications on z/OS.

The installation and customization of WLM has been enhanced by integrating this function into z/OS Managed System Infrastructure Setup. This integration allows users to more easily adjust the size of the WLM couple data set.

Summary

These enhancements are the most visible functions implemented within WLM. Together with many other improvements, they contribute to the evolution of WLM into one of the most sophisticated workload management solutions in the IT industry.

The z/OS WLM component dynamically allocates or redistributes server resources such as CPU, I/O and memory across a set of workloads based on user-defined goals and their resource demand within a z/OS image.

IBM Systems Magazine is a trademark of International Business Machines Corporation. The editorial content of IBM Systems Magazine is placed on this website by MSP TechMedia under license from International Business Machines Corporation.