Example 2: Unable to Get Desired Throughput

Problem Description

CustomerB was in the midst of implementing a new application, performing stress tests before going into production. While performing the stress test, the database server could not ramp up to the desired number of client applications while achieving the required response times.

Problem Analysis and Resolution

Database manager configuration

The three parameters in the database manager configuration that stand out are highlighted in bold below. In this case, the application is mainly an online application; therefore, intra-partition parallelism should be disabled. Because there will be a large number of concurrent applications, the sort heap threshold (SHEAPTHRES) will need to be high; however, the existing value is sufficient.

Database configuration for database sample

The three parameters in the database manager configuration that stand out are highlighted in bold below.

Because the applications will be performing a lot of inserts/updates/deletes, the log buffer size (LOGBUFSZ) will be important. However, the log buffer is sized adequately for this workload.

Because this is an online application, the sort list heap (SORTHEAP) should not be too large. When it is too large, the DB2 optimizer will tend to favor sorts over index scans, and with a large number of concurrent applications, sorting will be detrimental to performance.

In this case, the default buffer pool size (BUFFPAGE) is 1,000 4-KB pages. To determine the real size of the buffer pool, use the following statement:

select * from syscat.bufferpools

In this case, there is only one buffer pool (IBMDEFAULTBP), and it has a size of 5,000 4-KB pages.

This is a rather small buffer pool; therefore, increasing the size of the buffer pool likely will help performance.

Step 2: Make Changes Based on the Examination of Configuration Information

The size of the buffer pool was increased as follows:

alter bufferpool IBMDEFAULTBP immediate size 500000

Step 3: Retest

After increasing the size of the buffer pool, the response time improved, and more concurrent users were able to connect to the database and run the applications; however, the system was still unable to attain the required number of applications.

Step 4: Check System for I/O Bottlenecks, Excess Paging, or Other Processes/Applications Using Excess System Resources

This requires operating system tools such as vmstat, iostat, and/or top to capture the memory, I/O, and process level information. It is important to capture the snapshots from these tools over a period of time, not taking just one snapshot, especially because the first line of the output of the vmstat and iostat tools contains average information since the server was started and is not really useful in analyzing a problem.

While the applications are running, and particularly when the system begins to stop responding, capture iostat output, as follows:

The iostat output shows that one disk is much busier than all of the other disks; therefore, it is important to understand what part of the database is physically stored on Hdisk5.

In this case, the disks are not striped volumes, they are just a bunch of disks (JBOD); therefore, use the operation system tools to determine what file system was created on Hdisk5, then analyze the database to determine what part of the database is stored on that file system.

Step 5: Determine What Is on Hdisk5

The operating system tools indicate that the file system /tablespaces/sample/temp was created on Hdisk5. Based on the name of the file system, it appears to be where the temporary table space was created. This can be verified as follows:

Therefore, the disk with the most activity is the disk where the temporary table space has been placed.

Step 6: Determine and Implement a More Optimal Database Layout

Based on the iostat output, there are 10 physical disks in the server. To eliminate the I/O bottleneck and spread the I/O across as many disks as possible, the physical design of the database needed to be changed. Instead of isolating each of the table spaces to its own physical disk, each table space would be created with eight containers, with one container on each of the disks Hdisk2 through Hdisk9. In this case, because it was an online system, one disk was set aside for the database logs and another for the operating system paging space.

To change the physical layout of the database, an offline backup was taken, the database was dropped, and a redirected restore was performed. During the redirected restore, the table space definitions were changed.

Step 7: Retest

After changing the table space definitions, the response time improved even more, and the desired number of concurrent applications were able to run successfully.

The iostat output now shows that the I/O is spread evenly across the data disks, and none of the disks is busy more than 15% of the time. This leaves room for the I/O subsystem to handle additional requests if needed.