Records in sysdistrib, systables, and syscolumns tables for specific tables/columns will each be locked momentarily each time UPDATE STATISTICS needs to update a row or add a new one. If your apps run with LOCK MODE WAIT… they won't notice. There's the overhead of the index and/or table scans needed to gather the statistics and for sorting the data. Normally it's minimal, but I'd wait for a light load period to run it the first time. Once the initial stats exist and are up-to-date, only a small number of tables will be updated each day.

Also, existing prepared statements MAY have their stored query plans invalidated by the run which would cause an error return of -710 the next time you OPEN a cursor against them or execute the statement so you probably will want to bounce servers and other long running applications immediately after the dostats completes on that database or make the run during downtime unless you have code in your servers that will reprepare the statements and continue when they get an error of -710. (This required reprepare step was finally automated in IDS 11.10 and later, so that problem goes away when you upgrade.) Most prepared queries will not be affected, but sometimes …

When the cpuvp has no work to do, it blocks itself on a semaphore and is not activated until there are threads in the ready queue that it could be processed. The order that the cpuvps are reactiviated is ordered by cpuvp number. Hence, the first cpuvps tend to get more system time than the later cpuvps.

Also, there are a couple of other things to consider. The master timer is associated with the first cpuvp. The master timer wakes up once a second and does a bit of houskeeping work such as moving threads from a waiting/sleeping state to the active queue.

It's a bad idea to place TCP poll threads in the CPU VPs. That's because the CPU VPs cannot block on the sockets, as the NET VPs can, since they have other work to do, so they have to poll the sockets periodically when they are between other work and at certain defined breakpoints in the code where the active threads can release the VP for polling. This will indeed increase the CPU usage of all of the CPU VPs running poll threads, however, it will be just wasted cycles and not any increase in work volume, performance, or responsiveness. Keep the TCP poll threads in the NET VPs configuring, as I said, one for every 200 expected connections. Third, as I think I hinted, there's nothing wrong with the CPU utilization pattern you are seeing currently. It just happens that the lower numbered CPU VPs tend to take most of the work onto themselves simply because they are least likely to be asleep or even swapped out when a new request comes in since they are actively servicing existing requests. Thus often by the time another CPU VP awakens to poll the ready queue some lower numbered VP has already drained the queue and scheduled the work. Only when the lower numbered CPU VPs are too busy to take on more work or are in the middle of an uninterruptable (word?) section of code will the lower order CPU VPs take on a job. So the 'inverted triangle' of CPU times you are seeing in the onstat -g glo listing is normal and indicated a well tuned instance that has lots of free cycles available for more work when it's presented.

My only point about the shared memory poll threads is that IF you only have one CPU VP out of many polling shared memory that will by the first CPU VP, and as described, that's typically the most busy CPU VP in versions earlier than 10 (the second CPU VP tends to be busier in 10.00 and later due to some restructuring of overhead task code), since it mainly polls when it's not busy it also tends to take even more of the work onto itself than is the case for TCP connection work and so gets even busier, polling less often. Thus, having only the one poll thread tends to reduce responsiveness if shared memory connections are common amd tends to skew the CPU usage triangle even more than normal. Having SHM poll threads in all CPU VPs reduced the effect and spreads the work a bit more evenly. I'm not saying the work needs to be spread for any reason other than that user requests will tend to be processed more quickly since they spend less time waiting to be picked up out of the shared memory request buffer and end up more likely to be picked up immediately by an otherwise idle CPU VP than to be placed in the ready queue to wait again for a free VP.

The big problem with cooked files still is performance. All writes and reads to/from cooked files MUST go through the UNIX buffer cache. This means an additional copy from the server's output buffer to the UNIX buffer page then a synchronous write to disk. This is opposed to a write to a way file where the server's output buffer is written directly to disk without the intervening copy. This is just faster. Anyone who has written anything that can test this can attest to the difference in speed. Here are my test results:

From this you can clearly see the cost of Cooked files and of synced cooked files. The tests were done with a version of my ul.ec utility modified to optionally open the output file O_SYNC. Cooked disk partition is almost 50% slower than raw disk partition and cooked filesystem files are almost 60% slower. The penalty for O_SYNC is an additional 5% for cooked files and negligible for RAW files (as expected). The test file was 2.85MB written using 4K I/O pages (the default fopen application buffer size) which should simulate Informix performance. The Cooked and Raw disk partition tests were conducted to a singleton 9GB Fast Wide SCSI II drive using the raw and cooked device files corresponding to the same drive.

Also, if you have inline polling for TCPIP or SHM connections, the cpuvp running the poll threads will poll to see if there is any incoming network requests. So those cpuvps don't queue themselves on the wait semaphore so often.

Understand, there is no performance advantage in trying to schedule activity roundrobin on the CPUVPS. In fact right the opposite is true. When the CPUVP is not running, it must block itself on its semaphore. Blocking and unblocking the activity on the CPUVP would cause an OS process context switch. The context switch is much more expensive overall than simply leaving the secondary CPUVPs in a blocked state and only utilizing them in an 'overflow' case.

Bulk inserts (implemented using insert cursors and “put” statements with IFX_USEPUT) can be very fast, but obviously have to be used within a single transaction, which might be a problem, are only available in later versions, and did have some bugs.

FET_BUF_SIZE: What is your current value? You may want to experiment with values of 16K or 32K to see which is quicker.

OPTOFC: I can't see any downside to enabling this.

OPTMSG: Ignore this, as I think it relates to ESQL-C rather than JDBC.

IFX_AUTOFREE: If you have to reopen the same cursor continually, this would actually slow it down rather than declaring the cursor just once and not freeing the cursor each time. However, if you think Java is unable to reuse the same cursor object and would leak memory otherwise, this would help rather than separate freeing it.

CKPTINTVL - the time between the end of one checkpoint and the beginning of the next.

PHYSFILE - the larger the physical log, the more data has to be cleared out at the end of the checkpoint. Also the physical log can be triggering your checkpoints if you have CKPTINTVL set too high and PHYSFILE too low.

LRUS - more LRUs will flush more quickly during LRU flush time leaving less chance that there will be more dirty buffers than LRU_MIN_DIRTY at checkpoint time.

LRU_MAX_DIRTY - the % of dirty buffers in an LRU queue that triggers an LRU write.

LRU_MIN_DIRTY - the % of dirty buffers left in an LRU queue when an LRU write completes.

NUMAIOVPS - if your chunks are COOKED files or devices then AIO VPS are used to flush dirty buffers to disk, both at LRU flush time and at chunk write time during a checkpoint. If you chunks are not all RAW devices, then you need the greater of the number of LRU queues or 1.5 times the number of COOKED chunks in order for dirty page flushing to be time efficient.

If you are using a few large chunks instead of a larger number of smaller chunks, you will have only one flush thread per chunk running at checkpoint time. More chunks means more flush threads means less time to flush.

Finally, how many disk spindles are involved with flushing your chunks? If it's all going to partitions on a single HUGE spindle or even several spindles attached to a single controller channel, then you may be hitting the IO limits of your subsystems. More spindles and/or more controller channels involved means lower flush times.