Yet another performance conundrum

I have posted earlier in regards to this topic and how got a lot of help from the experts. Was hoping for one more piece of advise.

We have 4 MQT's (with refresh deferred) that refresh in parallel. Each of the MQT's have close to 200 million records. I used a lot of advise for improving the performance:

1) Journal are turned off
2) Commitment control is off
3) Indexes are dropped before refresh and created after the refresh
4) We have used service programs to improve efficiency
5) MQT's are refreshed in parallel.
6) All queries in the program use SQE.
7) The IGNORE_DERIVED_INDEX is set to *YES in QAQQINI
8) DEGREE is set to *OPTIMIZE
9) OPTIMIZATION_GOAL is set to *ALLIO in QAQQINI

After all this, before we deployed to our production schema we ran the jobs on a staging region on our production server. The refresh of all 4 MQT's in parallel took 60 minutes on the production server on the staging schema. Unfortunately, when the job moved to production and ran on the production schema (on the same server as the staging region), the job took more than twice the time (i ran the refresh multiple times to ensure plan caches were built).

One of the differences that i noticed was that the longest running MQT was assigned multiple threads when i do a "Work with Threads" on the staging schema. On production schema, however, this same job is being assigned only thread. These are threads that the system is assigning and is not something i am doing in my program. Is there any way i can control this? Would this be causing the job to perform slower?

Who is Participating?

AS/400 DB2 doesn't support multiple database instances in a single system (or LPAR). AS/400 DB2 just simply isn't configured the same way as DB2 LUW or mainframe.

However, Kent's advice still applies, to a certain extent.

The concept of a DB2 "region" is a mainframe concept that doesn't apply to the AS/400, so I'm a little confused by your use of the term. When you say "region", what exactly do you mean? Are you referring to different LPARs, or just collections of libraries withing the same partition (or unpartitioned system)?

I'm going to assume for now that we are talking about one of more "staging" libraries, and one or more "production" libraries on the same LPAR (on same unpartitioned system).

There are lots of reasons that you may see a difference:

1) Schema differences between the two sets of libraries.
2) Differences in the size of the table in the staging and production libraries.
3) Overall load on the system during the "staging" run vs. the "production" run.
4) Possible differing QAQQQINI settings between jobs.
5) Production and staging jobs running in different subsystems and/or memory pools.
6) Production and staging jobs running under different priorities
7) Production and staging jobs running against differing workloads in the system (system busier when production run happens, for example)

A very likely cause is that I/O activity on the system at the time of the production run was higher than at the time of the staging run. If the rebuild process is going to be I/O constrained, then the system will start fewer threads, since the extra threads cannot be kept busy if disk arm usage is too high. I cannot tell you for sure without reviewing comparative performance data.

The query optimizer can choose to use any number of tasks for either I/O or SMP parallel processing to process the query or database file keyed access path build, rebuild, or maintenance. SMP parallel processing is used only if the system feature, DB2® Symmetric Multiprocessing for OS/400®, is installed. Use of parallel processing and the number of tasks used is determined with respect to the number of processors available in the server, this job has a share of the amount of active memory available in the pool in which the job is run, and whether the expected elapsed time for the query or database file keyed access path build or rebuild is limited by CPU processing or I/O resources. The query optimizer chooses an implementation that minimizes elapsed time based on the job has a share of the memory in the pool.

So available memory is an issue, available CPU is a factor, presence or absence of SMP feature is a factor, IO utilization is a factor, etc. Other factors that can have an impact on runtime include overall system activity level, QMAXACTLVL system value setting, pool activity levels, competing jobs in the system at runtime, job priorities, and more.

Can you control it? Well, you can make sure that there is plenty of memory and CPU available during the production run (limit other activity). I haven't tested it, but it is possible that you can improve build performance by specifying PARALLEL_DEGREE *MAX, but you need to make sure that there is adequate memory available in the pool.

You definitely need to limit I/O activity, if possible.

It is really tough to give concrete advice without understanding the subsystem, memory, and CPU configuration of your system, plus the workload characteristics of your system during the production run.

Bottom line: in order for the system to be able to use more threads in these rebuilds, you need three things: Plenty of memory available to the job, low I/O levels in other jobs (low disk arm utilization, generally), and available processor resources.