Friday, October 07, 2011

Index-Only Scans: We've Got 'Em

Tom Lane committed a patch for index-only scans by myself and Ibrar Ahmed, which also incorporated some previous work by Heikki Linnakangas, after hacking on it some more himself. Woohoo!

There is, of course, more work to be done here - performance fine-tuning, cost estimation, extensions to the core functionality - but the core of the feature is now in. If you get a chance, please test it out and let us know how it works for you.

For those that may not have been followingalong at home, what we're essentially doing here is allowing any index to act as a "covering index". If all of the columns the query needs are available from the index tuple, we'll skip fetching the corresponding heap (table) page if every tuple on that page is visible to all running transactions.

Although I know we're not even really done with this feature yet, I can't help wondering what's next. Index-only scans have so often be cited as "the big performance feature that PostgreSQL is missing" that it's become something of a cliché. Now that we have them, what will take their place as the next big thing?

Robert great work. I got used to having index only scans in Oracle so having this in my toolbox on PG is a welcome addition.

Another poster does bring up a good point though, what is the incremental work to stop having to do FTS' just to get a count. I can't tell you how many (times a day) that I stub my toe on that.

I'm no expert on PG's internals but it seems logical that if you can scan the index to retrieve consistent data that, unless the index is partial, that you could use it to get a faster count(*) than scaning the table.

Although for many cases if clustering were kept updated (not necessarily in adjacent blocks) it would be as good or better for some workloads. Keeping record clustered in chunks could reduce IO by a lot, although it might not give the clean sequential scans full clustering does.

I'd be interested in contributing 1k to any kind of bounty for this feature. Email me at dave at oneit dot com dot au if anyone tries to organise this.

Nice, we've made the "think indexes" work, but I'm sure what Tom Lane put in is much better.

I think, while we are at indexes, the next thing would be index compression -- not in the sense of block compression, but simply in not repeating to store the data for the front part of the composite index key.

a b row1 1 11 2 21 3 31 3 a1 3 b1 3 c1 3 d1 4 42 1 52 2 62 3 72 4 8

would become

a b row1 1 1- 2 2- 3 3- - a- - b- - c- - d- 4 42 1 5- 2 6- 3 7- 4 8

where "-" would require no space. If you have this, there would be no need for the non-1NF arrays any more to implement things like text indexing.

While we are at it, another nice feature would be constant columns that are stored only in metadata. This works well together with partitioned table keys. Example could be the "discriminator" column in Hibernate. If you partition table by Hibernate discriminator, why store that same value over and over again?

Less redundantly stored data -> less IO. And block compression does not really address that and has too many difficulties still.