I ran into a simple problem the other day: I got an error while creating an index because the key was too big to fit in my index. As you may remember, the maximum size of an index key on a standard Unix/Linux system is 387 bytes.

Why do we have this limit?

This is a function of the page size and the way a B-tree index works. With the limit of 387 bytes on a 2K page, we can have at least 5 keys per page. This way, we divide the data in at least 5 parts at each level. the end result is eliminating comparisons to get to our our result faster. If we had only one key per page, it would be the equivalent of doing a sequential scan so the index would be useless.

In IDS version 10.0 (2005), Informix introduced the configurable page size. from that point on, it is possible to create DBspaces with page sizes of up to 16KB in size. the page sizes available has to be a multiple of the basic page size: 2KB or 4KB.

These larger pages can provide better performance when you have a wide table where the row size could be, let say 12KB. This way, you can fit an entire row in a page instead of using page chaining to support these larger rows. The savings in I/O could make a noticeable difference in performance in many situations.

Coming back to my indexing problem, I can fix it by using a larger page size. According to the documentation, the maximum index key size is as follow for each page sizes:

page size

max key size

2048 (2KB)

387

4096 (4KB)

796

8192 (8KB)

1615

12288 (12KB)

2435

16384 (16KB)

3245

If your key fits in a 2KB page (shorter than 387 bytes), you could still use a larger page size for your index. The difference is that more keys would fit in one page so the index will not be as deep so it could provide additional performance.

Why not simply use the 16KB page size everywhere?

The short answer is that you could waste space on the page used for a table. A page can include a maximum of 255 rows. If your page size is 16KB and your row contains only two integers (2 x 4 bytes), you could, in theory, have over 2000 rows in that page. Since we are limited to 255 rows, we are wasting over 14,000 bytes.

Why not use four or five different page sizes?

Each page size requires its own buffer pool. We have to decide how much memory to allocate for each of these pools. Our decision may not result in the optimal memory allocation. The result is that some pools will have too much memory and others would benefit from more. Bottom line, this would make system administration more complex.

I would suggest to limit ourselves to two page sizes. The default page size and another one. The second page size depends on the environment requirements. I would also look at the size of the I/O on the particular machine and how many requests do multiple I/O on sequential data.

If you haven't looked at the configurable page size in IDS, maybe it is a good time to do so now.

I saw the cover of Computer world the other day with a title of "Swinging toward centralization". I'm not one to be jumping on trends but I think this idea has merit. To me, it ties into virtualization, possibly cloud computing, and also the IBM concepts of the smart planet.

Centralized IT could mean first the optimization of hardware resources. The best approach is to use virtualization so all the hardware resources can be used optimally. For example, instead of having, let say 100 computers running at 50%-70% utilization, you can centralize and use virtualization and either reduce the number of computers to around 70 or use the extra capacity for growth. This is a pretty conservative example. Just consider this quote from Computer World, April 20, 20009:

"Austin Energy: With a new virtual environment, applications run on 150 servers instead of 600"

Centralization gives you this opportunity. Note that I'm talking about centralizing the hardware resources. If you centralize processing for one large application, you'll likely need the help of advance features such as IDS Continuous availablity (CAF) and the integrated replication capabilities (HDR and ER).

Centralization does not mean that the personnel must also be centralized. Today, network access is pretty much a fact of life (I so wanted to use the word ubiquitous!). All the application and system management can be done from anywhere. For IDS, just consider the Open Admin Tool for IDS (OAT) or management tools from our partners such as AGS and CobraSonic. Managers can consider these resources as part of a "cloud".

What a nice segway to my next point

We hear a lot about cloud computing. You can buy time on some machines in the cloud. We could also mention software as a service like in the case of LotusLive (see https://www.lotuslive.com/en/) or the IBM cloud offering. This does not mean that you have to go outside to have a cloud. You could create a cloud from your centralized data center and provide capacity on-demand based on resource optimization.

When we talk about a large centralized data center, the server consolidation is only part of the savings. the saving in energy can be significant. The other day, I listened to a presentation by an IBMer that manages a large data center providing services worldwide. Here are the type of things he did:

His team installed active RFID sensors to monitor the temperature and humidity levels in different areas of his data center, including multiple locations in the racks, and at different times. With this information, he was able to clearly identify machine needs. At one point, he was able to identify that if he installed a (raised) floor tile with holes at a specific location, he could eliminate his "hot spot" without increasing his air conditioning needs. He even figured out the correlation between applications and machines heat output. So he can regulate the room temperature based on which application is running!

Talk about a great example of a smarter planet: instrumented, interconnected, intelligent (devices).

Since I've been on a common driver kick lately, might as well keep on going...

There was a chat with the lab on Feb 25th that talked about the common Java JDBC driver (referred as the JCC driver): Top 10 reasons to consider IBM Data Server Driver for JDBC and SQLJ for IDS
servers.

You can use the JCC driver with IDS when connecting using the DRDA protocol. Some of the benefits include:

Better integration with WebSphere

Ability to use the capabilities PureQuery

Better tracing and debugging

Full IDS clustering support

Superior performance over the Informix JDBC driver

All this is significant:

PureQuery can increase the performance of SQL statements by analyzing the usage and make changes transparently from the application. For example, it can detect the use of the same statement with different literals and convert that under the cover into a prepared statement.

Full IDS clustering support includes working with the connection manager to automatically and transparently connect to an alternate server when the primary fails.

Superior performance: It provides a 5% to 10% performance boost over the Informix JDBC driver.

If you are using Java, maybe it is time to start looking into the JCC driver. You can download it from the IBM site at (10MB):