Random postings about IBM Informix

Non-uniform memory access or NUMA is not a new concept but high end multiprocessor Intel-based servers are increasingly configured with this architecture, bringing it more to the mainstream. Put simply NUMA means that instead of all processors accessing your main system memory through a common bus, each processor is allocated an even share of the memory that it can address directly. If a processor needs to access memory controlled by another processor it can do so through that other processor.

Linux kernels from v2.5 onwards are aware of any NUMA architecture and it can be displayed using numactl -H or numactl –hardware:

The above is from a four socket server. It shows that fetching from local memory is weighted at ’10’ and from memory controlled by other processors ’21’. I strongly suspect these weightings are hard coded.

numactl -H also shows information about how the memory is split between processors. The term ‘node’ is used:

How is this parameter set on your system? You can check by running cat /proc/sys/vm/zone_reclaim_mode

If it’s set to 1 on your Informix system you should definitely read on. You’ll be glad to hear this parameter can be changed dynamically.

In the latest kernels (2014 onwards) this commit means that the parameter will never be set on your system automatically but if you’re running an enterprise Linux you could be on a kernel version like 2.6.32 (RHEL 6) where this can occur: although patched the base version of this dates from 2009.

I am not sure of the exact criteria that determine when older Linux kernels will switch on this feature at boot up. I think you need a modern four (or more) processor server with a NUMA architecture but there may be other requirements.

It’s interesting to read the slightly repetitious kernel commit log:

When it was introduced, zone_reclaim_mode made sense as NUMA distances punished and workloads were generally partitioned to fit into a NUMA node. NUMA machines are now common but few of the workloads are NUMA-aware and it’s routine to see major performance degradation due to zone_reclaim_mode being enabled but relatively few can identify the problem.

Those that require zone_reclaim_mode are likely to be able to detect when it needs to be enabled and tune appropriately so lets have a sensible default for the bulk of users.

This patch (of 2):

zone_reclaim_mode causes processes to prefer reclaiming memory from local node instead of spilling over to other nodes. This made sense initially when NUMA machines were almost exclusively HPC and the workload was partitioned into nodes. The NUMA penalties were sufficiently high to justify reclaiming the memory. On current machines and workloads it is often the case that zone_reclaim_mode destroys performance but not all users know how to detect this. Favour the common case and disable it by default. Users that are sophisticated enough to know they need zone_reclaim_mode will detect it.

On our Informix system, which uses huge pages, we have experienced long disruptive checkpoints as a result of zone reclaiming. As huge pages are not swappable, it’s likely to be our monitoring and other non-Informix processes provoking the zone reclaims.

The long checkpoint behaviour can be summarised as:

A checkpoint is triggered by CKPTINTVL.

Informix instructs all threads to finish what they are doing and goes into state CKPT REQ.

One or more threads may be in critical section and must continue to the end of this section before it can stop.

A zone reclaim is occurring and I/O throughput dramatically decreases and this thread takes many seconds to come out of critical section.

All active threads wait (state C in the first column of onstat -u).

Eventually the operation completes, the checkpoint actually occurs very quickly and processing continues.

This behaviour can occur in later versions of the engine with non-blocking checkpoints.

If you have the mon_checkpoint sysadmin task enabled (I strongly recommend this), information about your checkpoints will be written to sysadmin:mon_checkpoint. (Otherwise you only retain information about the last twenty checkpoints visible through onstat -g ckp.) A tell tale sign is a large crit_time, nearly all of the checkpoint duration, and a much smaller flush_time.

You can get further evidence of whether a zone reclaim might be occurring at the same time by looking at the number of pages scanned per second in the output from sar -B. (sar is a very sophisticated monitoring tool these days with views into many aspects of the operating system.)

One test you can try (on a test server) is LinkedIn Engineering’s GraphDB simulator. It’s a C++ program that mimics the behaviour of GraphDB and is designed to provoke zone reclaim behaviour from the Linux kernel if it is switched on.

On our test system we can leave it running for hours without zone reclaim enabled and monitor it through sar -B.

Meanwhile I/O latencies reported by the test program escalate up to 36000 ms. We actually have to kill the test program within 30 seconds of changing the kernel parameter to avoid the system becoming so unresponsive it cannot maintain sshd connections.

In our real world Informix example we are not using the page cache anything like as aggressively and when the problem occurs I/O demands reduce as we get down to a single thread in critical section. Thus we don’t see pages scanned at the rate in the test, just a clear increase.

It’s worth mentioning that new NUMA capabilities have been added to the Linux kernel in version 3.8 (and later in 3.13) so RHEL 7 users might see slightly different behaviour.

Here’s a blog post on something I didn’t even know existed until recently: Informix simple password encryption. This does what it says in the tin and nothing else: it encrypts the password you use to authenticate to the database server which is sent in the clear otherwise. If you weren’t aware of this then it’s certainly worth reading on!

I don’t know when this feature came into the product but some of my IBM contacts didn’t seem to be aware that it exists either and having tried to use it, it’s apparent that this is a cobweb area of the product with silly niggles stopping it from working properly in a client/server environment.

However it is a documented feature and the documentation is here. I’ve linked to the 11.70 docs but, as far as I can tell, the 12.10 docs are identical.

So why am I not using the more advanced network data encryption which has come into the product more recently, encrypts all your network traffic using SSL and is probably better supported? There are two reasons: this does more than I need and there will be an overhead to using SSL. This may be small but I have not had the chance to quantify it yet.

Below is what your auditor doesn’t want to see. Below is a TCP dump of me connecting to an environment called ‘test 4’ with user test_user and password ABCDefgh1234. I’m using a standard onsoctcp TCP connection.

Simple password encryption is easy to set up but, as we’ll see, the manual misses a step or two and a couple of hacks are needed to get this working. All the problems are at the client end and you won’t see them if you test the connectivity entirely within the server using a full engine install.

At the server end we need a concsm.cfg file, which by default lives in $INFORMIXDIR/etc/concsm.cfg but you can override with the easy to remember INFORMIXCONCSMCFG environment variable, which works in a similar fashion to INFORMIXSQLHOSTS.

My server concsm.cfg file contains the following:

SPWDCSM("/opt/informix_test4/lib/csm/libixspw.so", "", "p=1")

I also set up an additional sqlhosts entry:

test4_tcp_secure onsoctcp myserver 9101 csm=(SPWDCSM)

Finally I make sure test4_tcp_secure is listed as one of my DBSERVERALIASES in my onconfig and bounce the instance. Unfortunately I don’t think this parameter is dynamically configurable.

We can of course now test this all with the confines of the server and it will work!

Let’s move onto the client side where things are not quite as straightforward.

One of my mottos is that everyday is a school day and on one day last week I learnt that when you install the Informix server the gskit is installed for you. The gskit is mentioned in the machine notes for the Linux x86_64 release, for example:

a. Before uninstalling GSKit, verify that it is not needed on your
system. It is possible that software other than Informix Database
Server requires GSKit. Uninstall by identifying and removing GSKit
packages using the command-line interface:

Run rpm command with the -qa option to obtain a list of installed
GSKit packages with their exact names.
For example:
rpm -qa | grep gsk

As root user run the rpm command to remove each package as needed.
For example:
rpm -ev gskssl64-8.0.15.6 gskcrypt64-8.0.15.6

b. If you want to restore Secure Sockets Layer capability after you
have uninstalled GSKit, see the readme file in $INFORMIXDIR/gskit
for how to install GSKit.

15. Simple Password Communications Support Module

The name of the IBM Informix shared library for Simple Password CSM on
Linux is libixspw.so.

I’ve also drawn attention to point 15, in case this is different on your platform.

Compliance is one of those things you can hardly ignore as a DBA these days. Whether it’s a PCI-DSS, financial or internal best practice audit, at some point someone is going to ask you whether you are using database auditing. In my experience the auditors struggle to ask Informix specific questions but this is one that always comes up.

I guess there are three answers to this question:

Yes, we use database auditing.

No we don’t use Informix auditing but we have a third party solution somewhere else in the stack that means someone else worries about it.

Can we have a compensating control, please?

Rarely I find that auditors are too concerned about the detail of what you’re actually auditing. If you can log in, do some stuff and show them that this resulted in some messages in the audit log, they are usually happy. They are usually more concerned about where the logs go, who can read them and so on.

While the auditors are clearly not Informix DBAs familiar with all the auditing pneumonics, they are not daft and know they can take most of what they asked for granted next year and ask more probing questions next time.

So should you look at onaudit for your requirements? It’s been around a long time but I expect it may see a pick up in interest as more and more systems take payments in one way or another. In some ways it could do with some updates. Integration with syslog, allowing easy upload to a centralised question, is needed. There is an RFE open for this (id 58678). It’s not mine but it had six votes when I last checked and it deserves more!

Positives about onaudit include:

It’s free with all editions.

Provided you stay away from selective row auditing (I don’t cover this in this blog post) and don’t try to audit much or any of what your application does the overhead is negligible.

It gives you as a DBA a clearer idea of what is happening on your system.

So I think it’s certainly worthy of consideration. I know some customers prefer security solutions external to the database like Guardium but these are costly. I don’t know much about them so I shall leave that thought there.

Auditing needs to be part of a more general secure framework. If everyone at your site logs in as user informix or any shared account, the worst case being the same account as your application, it’s not going to be as useful. Applying rules by user will be difficult or impossible.

Some sites I’ve seen let DBAs do all their work as user informix. It definitely saves developing a more secure framework for DBAs to work in (this is not a good thing!) but has disadvantages. Even if you avoid shared passwords by using sudo to informix (on UNIX) having logged in as yourself, you’d need then to cross-check with the secure logs on your server to see who it was and if two people have escalated privileges at the same time it can be tricky to distinguish their actions. Ideally you need DBAs and every other real person working under their own user ids as much as possible.

To work as a DBA without access to the informix account you simply add yourself to the same group as the group owning the $INFORMIXDIR/etc folder and grant yourself dba in any databases you need to do DDL in, plus sysmaster, sysadmin, sysutils, sysha and sysuser but it still presents the following challenges which may require specific sudo type solutions:

Starting an instance; stopping one is no problem.

Running xtrace and certain oncheck commands.

Additionally as a DBA you may need root access occasionally for installations, upgrades and to use debuggers.

So before you even start there are some highly desirable prerequisites:

Your applications use their own account or (ideally) accounts and real users cannot run ad-hoc sessions using these.

DBAs practise what they preach and administer the system under their own accounts as much as possible.

Getting this far can be a struggle but even if you’re only some of the way there, you can still proceed.

The next step is consider whether to install Informix with role separation. I’m not going to discuss this at length so I’ll point to the documentation. There are no real gotchas here: it works pretty much as it says on the tin. The key idea is that it separates the DBAs from the people who decide what should be audited and who can see the audit trail. In practice I think total separation is impossible because the people deciding what should be audited need to understand the impact on the system of what they audit and the volume of data this produces. It is certainly possible to slow a system down by auditing every update.

So you’re now ready to switch on auditing? Nearly. If you monitor your system via onstat or have scripts which call onmode, ‘onmode -c [un]block’ being a specific example where care is required, you need to be aware that in all but the latest Informix releases, this includes right up to 12.10.FC5W1, as soon as you switch on auditing your onstat and onmode commands will run more slowly. This can also affect admin API command equivalents and not just the ones which are direct equivalents for onmode. The situation can get quite bad when lots of these commands run at the same time, leading to significant delays in the response from these commands.

Fortunately there are some fixes for this:

IC89645

TURNING ON THE AUDITING LEVEL 1 ADDS AN UNNECESSARY DELAY TO ONSTAT AND ONMODE COMMANDS

This has been around for a while and appeared in 11.70.FC7W1. However it is not very effective and only eliminates the delay if the volume of onstat commands being run on your system is low.

This is completely effective and means that onstat and onmode behave identically to when auditing is switched off but it only works if you do not have any audit masks which log the use of these commands.

There are workarounds for the auditing delay such as using sysmaster equivalents for the onstat commands and performing onstat commands inside an onstat -i interactive session.

Finally you’ll want to consider setting up some audit masks. I take the following approach to this:

_require mask

This mask defines the minimum events to be audited for all users. I put everything that’s common in here.

_default mask

If an account is not assigned a specific mask, it will pick up all the events in there. To avoid having to assign masks to all real users, I don’t assign them any mask and then they automatically inherit this one (in addition to what is in the _require mask).

Other masks

For my applications and other accounts needing special treatment, I create a custom mask and assign it to the user.

Finally if you’re feeling brave switch auditing on with some commands like:

This is a quick post about parallel index builds. Today I was building with PDQPRIORITY a unfragmented detached index on a large table fragmented by range with ten large fragments and I saw this message in the online log:

You can see I am quite a long way short of the temp space required here; I need just over thirteen times more.

In this instance I have eight temporary dbspaces available and all are listed in the DBSPACETEMP onconfig parameter and I have no local override. They are all 2 Gb and using a 16 kb page size so have 131072 pages each and, as I am in single user mode, I know they are all free. onstat -d confirms that 131019 pages of 131072 are free in each of them. In case it’s relevant I also have 1,027,203 2 kb pages free in rootdbs.

The first thing that confuses me is the 8,385,216 pages the online log message says are available, which is more than I actually have. 131019 * 8 = 1048152. I think this is a bug as it’s a factor of 8 out. It’s probably assuming a 2 kb page size somewhere and my 16 kb dbspaces are a 8x multiple of this. I am using Linux so is Informix using native page size units and just not making it clear?

The index I am creating is on a bigint field and there are 7,076,224,823 rows. If I assume 110,566,014 pages actually means 210 Gb, the engine is calculating 32 bits/row or 4 bytes/row exactly which sounds right.

You do not have to take action. This is a warning. The database server will create the index one fragment at a time, instead of all at once.

However, it does advise me that cancelling the statement, adding more temp space and starting again would be a good idea. This is prescient as we’ll see.

Analysing this now it is probably going to fail somewhere because I need thirteen times more space but the engine can only divide the workload by working on a single fragment at a time. There are ten and they are not all exactly the same size. In fact my largest fragment has 1,950,612,068 rows, 27% of the total and based on 4 bytes/row the largest fragment I can handle would have only 536,653,818 rows. I suspect this means to succeed I will need at least 30,478,314 2 kb pages available to complete the build. I hope this all makes sense anyway!

Foolhardily and possibly because I get distracted by something I leave it to run. More messages appear in the log as the build progresses:

Nearly four hours after it began at 14:27:41 my index build fails with:

212: Cannot add index.
179: ISAM error: no free disk space for sort

Harumph.

I guess there are two annoying things about this:

The support article is only right if your largest fragment will not require more space than is available.

The failure could have been foreseen at the beginning by looking at row counts.

Anyway, I hope this helps. If I get time I will do some more testing on this to confirm some of the assumptions I have made in writing this article. Feedback is welcome as ever (via https://informixdba.wordpress.com for those of you reading this on PlanetIDS)!

Introduction

Let’s start at the beginning. Why do we run UPDATE STATISTICS at all? When we write an SQL query and send it to the database engine to execute, there may be several ways that the engine can run the query. For example if there are two tables the engine can start with the first table and join the second to it or begin with the second and join the first. There may be also two filter conditions, one of which may very specific and pick out only a few rows; the other may be very general. It should be apparent that some ways are more efficient than others, sometimes by several orders of magnitude.

Informix uses a cost-based optimizer to determine how to run queries. This relies on metadata to provide information about how large tables are, how many distinct values there are and other information about your data. We call these pieces of information statistics and if we also have a histogram of a column showing the abundance of specific values or ranges we call this a distribution.

The optimizer looks at possible query plans and chooses the one with the lowest costs assigned to it, according to the statistics and distributions. This may or may not be the best plan possible; if not you may see poor performance. Inaccurate cost predictions could be because your statistics are inadequate or out of date.

Maintaining the statistics and distributions is a key DBA responsibility.

What statistics and distributions should you maintain? The purpose is to ensure the optimizer selects the most efficient query plan for your queries.

These query plans should be stable over time. Normally stability is achieved through not changing things: this is why your Change Manager likes to say no (sometimes). However, with UPDATE STATISTICS stability comes from regularly refreshing your statistics and distributions: this is a change to your system you may be doing daily.

A slide from my presentation to the IIUG 2015 Conference.

What statistics do you need? The Informix Performance Guide offers a general set of recommendations. However:

They may be more than you need and therefore more to maintain, which could be a headache with a large system.

In some specific cases they may not be good enough. The result of this can be an application codebase full of query directives (instructions to the optimizer to run queries in a particular way).

The guide doesn’t say much about how frequently you should run UPDATE STATISTICS.

Statistics and distributions

Statistics

Distributions

Updated by UPDATE STATISTICS [LOW].

systables:

nrows: number of rows.

npused: number of pages used on disk.

ustlowts: when UPDATE STATISTICS was last run.

syscolumns: for indexed columns only:

colmin: the second lowest value.

colmax: the second highest value.

sysindices:

levels: number of B-Tree levels.

leaves: number of leaves.

nunique: number of unique values in the first column.

clust – incremented when values in the index are different to the last: max value is the number of rows; a low number indicates lots of duplicates.

nrows: number of rows.

ustlowts: when UPDATE STATISTICS was last run.

ustbuildduration: time to build index statistics.

Updated by UPDATE STATISTICS MEDIUM or HIGH.

Consist of histograms for values or value ranges in equally sized buckets in sysdistrib.

Fragment-level statistics are stored in sysfragdist.

Auto Update Statistics (AUS) basics

Auto Update Statistics (AUS) consists of two database scheduler jobs. These are stored in the sysadmin database in table ph_task with configuration settings in ph_threshold.

Auto Update Statistics Evaluation

This calls a set of stored procedures which populate the sysadmin:aus_command table with a prioritized list of UPDATE STATISTICS commands to run. The code for these procedures is in $INFORMIXDIR/etc/sysadmin/sch_aus.sql

Auto Update Statistics Refresh

This is a UDR that does the work of calling the UPDATE STATISTICS commands. In older versions this was done with SPL code. Auto Update Statistics Refresh cannot be called even manually without the database scheduler running.

If your instance has a stop file ($INFORMXDIR/etc/sysadmin/stop) to prevent the scheduler initialising with the instance you must remove it. Before you do so it’s a good idea to review which jobs are enabled (tk_enable='t') in the ph_task table. (Using a stop file is a bad idea with 12.10 because running without the scheduler stops some system processes from functioning, so if you’re doing this, you ought to change it even if you don’t want to use AUS.)

One advantage of AUS is that it works on all of your databases, including sysmaster.

Instance parameters affecting UPDATE STATISTICS

Three onconfig parameters affect statistics maintenance, independently of the tool you use to update statistics:

Parameter

Description

AUTO_STAT_MODE

Controls whether to only update statistics or distributions the engine considers stale when running UPDATE STATISTICS.

STATCHANGE

Controls the percentage change beyond which statistics are considered stale if AUTO_STAT_MODE is enabled. This can be overridden at table level.

USTLOW_SAMPLE

Whether to use sampling for UPDATE STATISTICS LOW.

AUS specific parameters

AUS’s configuration settings stored in ph_threshold work independently of these system level settings.

These can be updated via SQL without using OpenAdmin Tool:

UPDATE
sysadmin:ph_threshold
SET
value='0'
WHERE
name='AUS_CHANGE';

Parameter

Description

AUS_AGE

The number of days after gathering statistics that they are considered stale.

AUS_CHANGE

Prioritises tables based on the percentage of rows that have changed. This is not to be confused with STATCHANGE.

AUS_PDQ

The PDQPRIORITY to use while updating statistics. If you are using workgroup edition set this to 0 to avoid annoying messages in the online log about not being allowed to use PDQ.

AUS_SMALL_TABLES

The small table row threshold. These tables are considered volatile and updated on every run.

AUS_RULES

0

When set to zero, AUS updates any existing distributions only and won’t ensure that a minimum set of distributions exists according to rules. This allows you more fine-grained control of what you maintain. Be aware that creating an index automatically builds HIGH mode distributions on leading columns with data so if you don’t want these you would need to drop them.

1

Where no existing distributions exist, AUS will create:

HIGH mode distributions on all leading index columns.

MEDIUM mode distributions on other indexed columns.

For most systems ‘1’ is the value you should use. If in doubt, use this.

You’ll notice in the description for AUS_RULES=1 that AUS does not automatically generate any distributions on non-indexed columns. However it will maintain any existing distributions regardless of the value of AUTO_AUTO_RULES, even if AUS wouldn’t have created them if they didn’t exist.

Non-indexed columns and distributions

If you are migrating from dostats you will have MEDIUM mode distributions on all non-indexed columns. It’s possible to drop these on individual columns using:

I’d test before doing anything like this so you may just choose to leave them in and let AUS maintain them. If UPDATE STATISTICS MEDIUM doesn’t take very long on your system this is probably the best choice.

Are these distributions important? My answer is that it depends but, more often than not, no. The Informix Performance Guide recommends you have them but this is just a general recommendation. At this point it’s important not to lose sight of the fact that the goal is not to have the most comprehensive, high-resolution and up to date statistics possible; it is to ensure you can guarantee that your queries always run efficiently. Given that we can’t be updating statistics and distributions on all columns all of the time some compromises are inevitable.

Often when running a query the main risk is the optimizer choosing to use sequential scans instead of an index. This risk is greatly reduced, if not eliminated, if the onconfig parameter OPTCOMPIND is set 0. The downside of this is that the optimizer won’t select a sequential scan when it is the best query plan available unless there are no other options.

In general distributions on a column are more useful if there is some skew in the data. However be aware that for non-indexed columns syscolumns.colmin and syscolumns.colmax are never populated by UPDATE STATISTICS LOW so the optimizer is truly blind about the data ranges without a distribution.

I’m going to run through an example now using this table:

people table

indices

person_id

serial

not null

ix_people1

(person_id)

name

varchar(254)

not null

ix_people2

(age)

gender

char(1)

not null

age

int

not null

The table will be populated with random ersatz data as follows:

1 million rows.

Ages evenly distributed between 5 and 84.

75% female, 25% male.

I’ll be running this query:

SELECT
name
FROM
people
WHERE
age BETWEEN 18 AND ? AND
gender=?

And tweaking these parameters:

Upper age limit in the query.

Gender in the query.

Whether I have a medium mode distribution on gender.

The value of OPTCOMPIND in the onconfig.

My results were as follows:

Upper age

Gender

Actual rows returned

Medium mode distribution on gender?

OPTCOMPIND

Estimated rows

Estimated cost

Plan

25

f

74817

No

0

9988

21104

Index path

2

21104

Index path

Yes

0

75539

21104

Index path

2

21104

Index path

25

m

25061

No

0

9988

21104

Index path

2

21104

Index path

Yes

0

24539

21104

Index path

2

21104

Index path

30

f

121748

No

0

16232

38477

Index path

2

33923

Sequential scan

Yes

0

122439

38477

Index path

2

33923

Sequential scan

30

m

40572

No

0

16232

38477

Index path

2

33923

Sequential scan

Yes

0

39881

38477

Index path

2

33923

Sequential scan

What conclusions can we draw from this example?

OPTCOMPIND was a determining factor, not the presence of a medium mode distribution on gender.

Having the distribution gave a much better estimate of the number of rows returned.

The optimizer never used a different plan for the male or female queries.

Of course this is one example and you may have some different ones.

Columns with incrementing values

Let’s illustrate a different point with another example.

Maintaining distributions on any sort of log or journal where there is a timestamp field can be a problem. The highest value in your distribution is wrong almost immediately after calculating it because new rows are being added all the time with later timestamps. This means that if you do a query over recently added data your distributions may tell the optimizer that there’s no data.

ledger table

indices

line_id

serial

not null

ix_ledger1

(line_id)

account_id

int

not null

ix_ledger2

(account_id, creation_time)

creation_time

datetime year to second

not null

ix_ledger3

(creation_time)

ix_ledger4

(account_id)

for_processing table

indices

line_id

int

not null

ix_fp1

(line_id)

creation_time

datetime year to second

not null

ix_fp2

(creation_time)

Both of these tables have over 1 million rows and new rows being added continuously.

Use the index on creation_time on for_processing, join to the ledger table on line_id and then filter on the account_id column.

Use the index on account_id on the ledger table, join by line_id and then filter on the creation_time column.

The risk with the first one is that a lot of rows are read, only to eliminate the vast majority of them when the account_id criterion is applied.

The optimizer may prefer to drive off creation_date, particularly if the distribution indicates there are no data past the time the distribution was gathered. This is because it believes (wrongly) that selecting from a date range in the for_processing table where it believes there is no data at all, avoiding the need to do any work, is more efficient than selecting out an account from the ledger and then joining the for_processing table.

This can be very slow for large date ranges. This is particularly true when there are a large number of accounts.

Can AUS help you here? Not really, this example is more to point out a danger. The risk of running into a problem like this can be massively increased if you use the default STATCHANGE value of 10. This is because here it is the age of the distributions that matters, not how much the data has changed.

My recommendation is:

In your onconfig either:

set AUTO_STAT_MODE to 0.

Or set STATCHANGE to 0 if AUTO_STAT_MODE is 1.

If there are tables for which this is not appropriate, do so at the table level:

ALTER TABLE <tabname> STATCHANGE <percent>;

In my view restricting updates only when there have been absolutely no changes is the only safe way.

However the fix is not switched on unless you set onconfig parameter: SQL_DEF_CTRL 0x2. This can be switched on dynamically with onmode -wm. With this switched on, date/time distributions are effectively extended into the future by up to 30 days. While the answer to the question how old can my statistics and distributions be is nearly always it depends, with this switched on there is a hard limit.

In Informix 12.10.FC5 the fix is included and is now the default behaviour.

The date or datetime column concerned must have a default value of TODAY or CURRENT. The code also compares the last timestamp in the distribution with the time the distribution was gathered. The two must be close together to activate the feature.

This fix also works on serial, serial8 and bigserial fields.

This feature is a little tricky to test because you must:

Populate the tables with a large volume of data.

Update the distributions.

Add some more data with later timestamps.

Wait a while!

Here are my results:

Date range

Within the distribution bounds

Beyond the distribution upper bound

Beyond the distribution upper bound with IC91678

Query drives off

account_id

creation_time

account_id

Costs: lead with account_id

264

398

398

Costs: lead with creation_time

79004

4

398

Estimated rows

20

1

1

Actual rows

10

1

1

Wildcard queries and insufficient distribution resolution

Another problem you may have with the standard distributions created by UPDATE STATISTICS HIGH is insufficient resolution. By default the data are divided into 200 equal-sized buckets (resolution 0.5) and this may not suffice for some wildcard queries. The optimizer may be grossly wrong in its estimates for the number of rows returned and this can be improved by increasing the number of buckets.

customer table

indices

customer_id

serial

not null

ix_customer1

(customer_id)

user_name

varchar(50)

not null

ix_customer2

(user_name)

Again this table will be populated with random ersatz data as follows:

9 million rows.

This gives a distribution bucket size of 45000 with 0.5 resolution.

And my query:

SELECT
FIRST 1000
customer_id,
user_name
FROM
customer
WHERE
user_name
LIKE 'BAN%'
ORDER BY
customer_id;

Look at the query carefully. It’s the sort of query you might get from a web customer search form. The ORDER BY on customer_id is important because it gives the engine the option of avoiding sorting any data if this index is used to select the data. If the optimizer thinks the user_name criterion is not very selective, i.e. there are a lot more than 1000 customers whose user name begins with the letters BAN and it will find them quickly without reading many rows, it may prefer this query plan.

There are two credible plans for this query:

Use the index on user_name, read all the rows it points to and then sort on customer_id, return the first 1000.

Use the unique index on customer_id, read the rows it points to, filter, stop at 1000 rows. This plan does not require a sort.

There is a third option of scanning the entire table but it’s unlikely the optimizer will choose this.

Let’s look at the results of running this query with a standard HIGH mode distribution on customer_name, a second distribution with ten times the resolution and no distribution at all.

Distribution

Variable

HIGH 0.5 resolution

HIGH 0.05 resolution

No distributions

Selected index (no directive)

customer_id

customer_name

customer_id

Costs: lead with customer_id

3480

313249

174

Costs: lead with user_name

567525

1214

12435040

Estimated rows

90000

1000

1800000

Actual rows

1028

1028

1028

This example is real but also carefully selected because different three letter combinations may give different and more sane results, even with the standard distribution resolution.

If instead I run this query:

SELECT
FIRST 1000
customer_id,
user_name
FROM
customer
WHERE
user_name
LIKE 'TDM%'
ORDER BY
customer_id;

How frequently should I run UPDATE STATISTICS?

A common question a DBA may ask is how often is it necessary to UPDATE STATISTICS? A straightforward if unhelpful answer is often enough to ensure efficient query plans. More specifically:

On tables without columns with incrementing values, by which I mean that new values lie inside the range of the existing data, or where these columns are not used in the selection criteria for queries, it may be safe to use STATCHANGE (with AUTO_STAT_MODE) to regulate the frequency based on percentage change at the table level.

With incrementing values the working set can quickly get beyond the min/max values in the statistics or distributions. Here I’d recommend being much more aggressive and update based on age, regardless of the volume of change. This especially applies to your distributions.

If your system is small enough that you can run a full UPDATE STATISTICS every day there is no harm in doing this. It is probably overkill but it is one way of playing safe. To do this set AUS_CHANGE to 0 and make sure both the scheduler and the evaluator run daily. For larger systems you do need to be more selective about how often you run UPDATE STATISTICS.

Monitoring AUS

Does AUS just work? Well yes, it was aimed at the embedded market and it is pretty robust. On a larger system there is more to go wrong and so I’d recommend you check that:

the parameters are set correctly (in case they change if you rebuild the sysadmin database, for example).

there are no errors running UPDATE STATISTICS.

all statistics and distributions are sufficiently up to date.

the window you give the refresh job to run in is long enough.

Check

Notes

Correct parameters

Job scheduler running and dbScheduler thread exists.

Evaluator and refresh tasks are enabled and have the expected schedules.

No pending commands at the end of the refresh task window (aus_command.aus_cmd_state='P');

UPDATE STATISTICS FOR PROCEDURE

How does AUS deal with stored procedure plans as stored in sysprocplan? Well it doesn’t directly and does not call UPDATE STATISTICS FOR PROCEDURE [procname].

My take on this is that routines referencing updated tables will be recompiled anyway the first time they are called immediately after running UPDATE STATISTICS. On a busy OLTP system this will probably happen before you have chance to update procedure plans manually. If you do have reason to do this, you will need to do it manually and if you do don’t set PDQPRIORITY.

If your system does have dead time there may be a small advantage to running this but I don’t think it really matters that much.

Method for running UPDATE STATISTICS [LOW]

Let’s now look at how AUS actually calls UPDATE STATISTICS. As discussed earlier the evaluation task creates a list of commands to run and these are run by the refresh task exactly as you see them when you query the aus_command table.

Let’s start by considering UPDATE STATISTCS [LOW].

AUS simply calls UPDATE STATISTICS [LOW] as one command without specifying any column names and I have seen it suggested that this is a bad thing (I don’t agree). The popular alternative, dostats, runs UPDATE STATISTICS LOW separately for each index key. Performance-wise there is not as much difference as you might expect, I suspect this is because data tend to be in the buffer cache on repeated calls. But is there any difference to the end result?

Using Informix 11.70.FC7W1 I performed the following test:

Created two identical empty tables with twelve indices and loaded both with the same data.

Ran AUS on one, dostats on the other.

Unloaded and compared systables, syscolumns and sysindices for both tables.

Apart from tabid, tabname, index names and build times, the unloads were identical. My conclusion is that is no difference and AUS is slightly faster because it is done with one command in a single pass.

Performance of UPDATE STATISTICS [LOW]

In terms of performance there is actually very little you can do to influence the speed of UPDATE STATISTICS LOW. The parameters DBUPSPACE, PSORT_NPROCS and even PDQPRIORITY have no effect.

Setting USTLOW_SAMPLE in the onconfig file is the only performance optimisation available apart from general server tuning.

It is supposed to be possible to get parallel scan threads for all fragmented indices when PDQPRIORITY is set to at least 10 but I can’t reproduce this. This only works with USTLOW_SAMPLE is switched off.

Performance of UPDATE STATISTICS MEDIUM or HIGH

For UPDATE STATISTICS HIGH and MEDIUM there are a few parameters that can influence performance and with the exception of PDQPRIORITY you can’t use any of them with AUS. In summary they are:

how much memory and disk are available for sorting when PDQPRIORITY is not set.

whether to use indices for sorting.

whether to print the explain plan.

DBSPACETEMP: overrides default onconfig value.

PSORT_DBTEMP: allows you to specify a filesystem instead of DBSPACETEMP.

PSORT_NPROCS: specifies number of threads for sorting.

Having done tests with this parameter:

The disk and memory parameters don’t have any effect if you set PDQPRIORITY. If you do up to DS_TOTAL_MEMORY multiplied by the effective PDQPRIORITY (as a percentage) can be allocated.

Even if set, they only have the effect of restricting the number of columns that can be processed in a single pass so a lot of the time they make no difference. For a better understanding of how this works see John Miller III’s article on Understanding and Tuning Update Statistics.

Setting the explain output via the directive setting works a bit weirdly: you need to run set explain on separately to enable the explain plan even if set here but disabling it with this parameter does work.

The default is to use indices for sorting. Switching this off (directive 1 or 2) was up to 5x slower on my test system.

Here are some performance tests showing the effects of different DBUPSPACE settings on UPDATE STATISTICS HIGH. My performance tests are all carried out on a powerful 32 core Linux x86_64 server with 256 Gb RAM and fast SAN storage (not my laptop). For my tests I did a dummy run first to try and even out any effects of caching in the SAN. I then did a run immediately after initialising my instance (cold) and a repeat run (warm) for each result.

There is no significant difference between using 15 Mb of memory for sorts and 50 Mb of memory for sorts. I suspect this is because the number of scans is the same.

Using indices for sorting (the default) is faster than not using indices for sorting when not using PDQ.

The use of light scans (avoiding the buffer cache) reduces the variation between the cold and warm results.

Despite having a partitioned table and PDQPRIORITY set, the interesting thing here is that during the execution I observed no parallelism. You can see this for yourself by identifying the session running UPDATE STATISTICS and looking at the threads the session is using. I get something like this for my partitioned table:

It is either because PSORT_NPROCS is set in your environment or set in the database engine’s environment when it was started.

Let’s now looking at the effect of PSORT_NPROCS. The only way to use this with AUS is to set PSORT_NPROCS when you start the database engine, which will of course affect all sessions.

Setting PSORT_NPROCS is the only way to get any parallel processing with UPDATE STATISTICS HIGH or MEDIUM. It has no effect on UPDATE STATISTICS LOW. Setting PDQPRIORITY only provides more memory and allows HIGH and MEDIUM mode distributions on multiple columns to be built in a single pass, if enough memory is available. There will be one SORT memory pool per column being processed regardless of PDQPRIORITY. Sorting with PSORT_NPROCS set can be faster as we’ll see now.

It is vital that the task name begins with Auto Update Statistics Refresh as shown here otherwise some of the internal code that stops the evaluator from running at the same time as the refresh tasks will not work.

Think of it as a batch process where the goal is to update all your statistics and distributions, not doing an individual table as fast as possible.

I recommend this method if you need extra grunt! Then run without PDQ (AUS_PDQ = 0).

Pros and cons of using PSORT_DBTEMP

Let’s move to consider another environment variable you can’t easily use: PSORT_DBTEMP. Is using a filesystem for sorts faster than temporary dbspaces? For this I am not going to do any performance tests largely because a comparison between the local SATA disks used for filesystems and the battery-backed, part solid state SAN used for the database storage on my server is not a fair fight.

If you want to use PSORT_DBTEMP with AUS, again you will need to set it in your environment when initialising the server and use it across your system.

The only definitive benefit of pointing to a filesystem using PSORT_DBTEMP instead of using DBSPACETEMP is that the Linux filesystem cache has an effect. This means that your temporary files may never be committed to disk, giving you a performance advantage. Another interesting alternative is to use a RAM disk.

Otherwise when considering whether to use a filesystem over temporary dbspaces, I would consider your hardware.

Recent UPDATE STATISTICS defects

Below are some defects relating to UPDATE STATISTICS that I have personally logged with IBM. All are general UPDATE STATISTICS issues and whether you use AUS or not does not affect your chances of hitting these. I have written some comments underneath.

IT06767 UPDATE STATISTICS HIGH ON TABLE BIGGER THAN 1.1 BIO ROWS CAN CAUSE EXCEPTION IN SRTQUICK

This defect can only be hit if you use PDQ and a large amount of memory is allocated.

IT06726 Assert Warning Invalid index statistics found when using statistics sampling with index having many deleted items

Requires USTLOW_SAMPLE to be switched on. It is fairly harmless but does write some garbage to sysfragments. Re-running UPDATE STATISTICS LOW usually fixes it.

IT02679 UPDATE STATISTICS HIGH ON A FRAGMENTED TABLE LEADS TO ERROR MESSAGE 9810

This can only be seen if using fragment level statistics.

IT05463 Fragment based update stats high consumes a huge amount of sblobspace

This can only be seen if using fragment level statistics. It is not worth trying to make your sbspace extremely large to work around this problem.

Another fragment level statistics one. With the fixes for IT02679, IT05463 and IT05639 this feature is solid. I wouldn’t enable this for a table unless you have all of these in your version.

Most of these defects are fixed in 12.10.FC5.

UPDATE STATISTICS performance summary

Overall then what seems to be the best balance between speed and use of resources on your system is:

In terms of AUS, leave AUS_PDQ at 0.

If you need more throughput add more Refresh tasks to the scheduler.

That’s it.

Set DBUPSPACE=0:50:0 in the server environment.

I hope this has been useful. I’ll end with a quick summary of some of the more interesting performance points from this presentation which apply regardless of whether you use AUS or not:

Setting PDQPRIORITY only provides more memory for UPDATE STATISTICS HIGH or MEDIUM and does not provide parallel processing. It may even make it run slower.

If you want multi-threaded processing, this only works by setting PSORT_NPROCS and then only with UPDATE STATISTICS HIGH or MEDIUM. Because it does not require PDQ you can use this with workgroup edition.

In my tests only when used together do PDQPRIORITY and PSORT_NPROCS improve performance. PDQ does switch on light scans though which avoid churning your buffer cache..

Not using indices for sorting for UPDATE STATISTICS HIGH or MEDIUM can be significantly slower.

The performance of UPDATE STATISTICS [LOW] can be improved by setting USTLOW_SAMPLE.

Again, this is another blog post about nothing new at all but an attempt to put down my understanding of temporary tables down in a way that will help me when I refer back to it and hopefully be of more wider use.

When looking at things like this a test system is always essential for checking things and finding some surprising results. If you don’t have one you can soon set one up with VMWare or Virtual Box and a copy of Informix Developer Edition. For this post I am going to set up a simple test instance with four specific dbspaces:

A logged rootdbs with plenty of free space, called rootdbs.

A logged dbspace for my data and indices, called datadbs.

A temporary dbspace without logging, called tmp2k_unlogged.

A temporary dbspace with logging, called tmp2k_logged.

What’s the difference between a temporary dbspace with logging and a normal dbspace? Nothing except that one appears in the DBSPACETEMP onconfig setting.

Consider the SQL statements which explicitly create temporary tables in a session:

For these tests we must have TEMPTAB_NOLOG set to 0 in our onconfig otherwise the second statement will silently have a with no log criterion added to it.

Let’s run these, use oncheck -pe to see which dbspace they get placed in and then use onlog -n <starting log unique identifier> to see if changes to these tables get logged or not:

DBSPACETEMP setting

unlogged table

logged table

dbspace used

logged operations

dbspace used

logged operations

tmp2k_unlogged:tmp2k_logged

tmp2k_unlogged

no

tmp2k_logged

yes

tmp2k_unlogged

tmp2k_unlogged

no

datadbs

yes

tmp2k_logged

tmp2k_logged

no

tmp2k_logged

yes

NONE

datadbs

no

datadbs

yes

So already a few interesting results drop out:

We can specify both logged and unlogged dbspaces using the DBSPACETEMP parameter.

The engine will prefer logged dbspaces for logged tables and unlogged dbspaces for unlogged tables.

If an unlogged dbspace is not available the engine can use a logged dbspace and create unlogged tables in it.

If a logged dbspace is not available the engine will use an alternative dbspace because an unlogged dbspace does not logging at all. In this case it has chosen datadbs, because this is the dbspace in which I created my database.

At this point it’s worth referring to an IBM technote on this subject. This suggests some more tests but already my results are not in agreement with the first example given:

If we have created a dbspace named tmpdbs, but we could not see it was marked as ‘T’ in the result of onstat -d. We set DBSPACETEMP configuration parameter to tmpdbs. On this condition, tmpdbs will be used for logged temporary tables. That means if a temp table is created with ‘WITH NO LOG’ option, the server will not use it.

This is implying that my tmp2k_logged dbspace (it will not have the ‘T’ flag) cannot be used for unlogged temporary tables. You can see from my table that this isn’t true and I invite you to test this for yourself.

select * from sysmaster:systables into temp mytab_unlogged with no log;

and:

select * from sysmaster:systables into temp mytab_logged;

And the results:

DBSPACETEMP setting

unlogged table

logged table

dbspace used

logged operations

dbspace used

logged operations

tmp2k_unlogged:tmp2k_logged

tmp2k_unlogged

no

tmp2k_logged

yes

tmp2k_unlogged

tmp2k_unlogged

no

datadbs

yes

tmp2k_logged

tmp2k_logged

no

tmp2k_logged

yes

NONE

datadbs

no

datadbs

yes

Again my results are in disagreement with the IBM technote which says:

If DBSPACETEMP is not specified, the temporary table is placed in either the root dbspace or the dbspace where the database was created. SELECT…INTO TEMP statements place the temporary table in the root dbspace. CREATE TEMP TABLE statements place the temporary table in the dbspace where the database was created.

In both cases in my tests the engine chose datadbs and not my root dbspace.

Let’s move on a bit. How do I know what temporary tables are in use on my system as a whole?

Another way is to run oncheck -pe and have a look at what is in your temporary dbspaces. Here you may also see space used by the engine for sorting, marked with SORTTEMP, or temporary tables created implicitly by the engine for query processing. However whatever type of object it is, you will find it impossible to match anything you see to a particular session by this method; it is only possible to match to a user name which would only allow positive identification if there was only a single session per user.

There is another way to match tables to sessions which works for explicitly created temporary tables, for which I don’t want to claim any credit because I cribbed it from the IIUG Software Repository. The script is called find_tmp_tbls and it its present state is broken when used with 11.70 (and hasn’t been tested since version 9.30.FC2 according to its README): at least it does not work with 64-bit Linux, mainly because the syntax for onstat -g dmp seems to have changed slightly. I managed to fix it up, however.

It’s a little complicated to follow but the basic steps are this:

You need to start with a given session and check if it has any temporary tables. (Unfortunately I don’t know a way of working backwards from the temporary table to see which session it belongs to either through onstat or the SMI interface.)

Get the session’s rstcb value, either from onstat -g ses <sid> or from the first column in onstat -u.

Run onstat -g dmp 0x<rstcb> rstcb_t | grep scb. Note that the rstcb value must be prefixed by 0x.This should return an scb value in hex.

Take this value and run onstat -g dmp <scb> scb_t | grep sqscb. Your address must start with 0x again and this is true throughout all the examples. This will return two values; take the one labelled just sqscb.

Feed this value into another dmp command: onstat -g dmp <sqscb> sqscb_t | grep dicttab. This will return another value.

Finally take this and get the partnum(s) of the temporary tables for the session by running: onstat -g dmp <dicttab> "ddtab_t,LL(ddt_next)" | grep partnum.

For reference there is information about explicit and implicit temporary tables and temporary sort space in these tables in the sysmaster database:

sysptnhdr

sysptnext

sysptnbit

sysptntab

sysptprof

systabextents

systabinfo

systabnames

systabpagtypes

So in conclusion I hope this post brings together some useful information about explicit temporary tables. Personally I’d like to be able to get a complete picture of which sessions what statements are using temporary space, which this doesn’t give. If I find anything it will be subject of a future blog post.

The Agile development model IBM has moved to for Informix development means that there’s always a few extra features in each fix pack. There are two rather nice enhancement in 11.70.FC8 when building a foreign key which alleviate some pain points when doing administration on large tables. The first is a no validate option which means that no checking of the data is done when creating the foreign key. This is appropriate in situations like migrating platforms when you know that referential integrity is already assured and you want to avoid the overhead of re-checking it all again on the new system. I want to look instead at the other enhancement: improved performance when creating a foreign key.

Nearly all of the time creating a foreign key constraint is spent checking the data; modifying the database schema to add the constraint is a very quick operation.

Prior to 11.70.FC8 creating a foreign key on a large table was a long process and with non-partitioned tables it was a single-threaded operation which did not benefit from the use of indices. Typically you’d see a thread working away like this:

Even with parallel operations foreign key builds could still take a long time.

With 11.70.FC8 foreign key builds are a lot faster and here is a test case which you can try. I’m going to create a 10 million row table, which is large enough to demonstrate the problem, and a referenced table with up to 32767 unique values and see how long it takes to create a foreign key.

Courtesy of the Informix FAQ I am going to create these procedures to generate random numbers.

And for comparison here is the time taken for some related SQL queries (without PDQ):

Query

Duration (s)

select distinct widget_id from referencing;

9

select widget_id, count(*) from referencing group by widget_id;

7

select widget_id from referencing where widget_id not in (select widget_id from referenced);

37

The purpose of this table is to show the relative times of the foreign key build and not to benchmark any hardware or Informix itself so I’m not going to divulge what hardware I’m used for the tests. The server I used did have multiple processors and CPU VPs configured so that PDQ queries could scale out. The number of ready threads was very low or none during the operation.

I ran all the tests a few times and there was a small amount of variation so I’ve only used a low precision in the results.

So what conclusions can we draw from this?

The stand-out result is that the verification of foreign keys is much faster in FC8 but only for serial operations.

Serial foreign key builds in FC8 are significantly faster than any SQL query I could write that checked the referential integrity.

In FC7 and earlier parallel operations may be slower.

In FC8 there is not the same speed improvement for parallel operations and they are still much slower.

Another interesting finding is the large difference in performance between the expression-based fragmentation and round-robin fragmentation when using parallel operations. I had assumed that this kind of operation would be agnostic to the type of fragmentation used. I checked this a few times but examining the threads when it is running shows something different is definitely happening.

Round-robin fragmentation looks like the below. Notice that most of the threads are active:

There is an extra scan thread (scan_1.8) because I have an (empty) remainder fragment in addition to my 8-way fragmentation.

This would go some way to explaining why round-robin is faster.

Finally, if I could wander into the realms of speculation for a moment, it does look like IBM has implemented the parallel foreign key constraint checking slightly differently for each fragmentation method, which might explain why the improved performance is only seen with serial operations.

To sum up, this is a wonderful enhancement but I’ll be careful to switch off parallel processing when building foreign keys. It greatly speeds up index builds on fragmented tables, which is the type of operation you might do at the same time so I can foresee scripts with parallel processing being enabled and disabled several times to ensure best speed.