Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I need to select records inside a range of time: my table time_limits has two timestamp fields and one property integer. Indeed there are other info columns but not used for any queries, only for storing information, I won't speak about these ones.

What could I do to optimize the search ? You can see all the time is spent scanning the two timestamps columns once the id_phi is set to 0. And I don't understand the big scan (60K rows!) on the timestamps, aren't they indexed ? ( both the primary key and the idx_inversed I created)

Should I change from timestamp types to something else ?

I have read (really little) about GIST and GIN indexes, I just know they can be more efficient on certain conditions for custom types. Should I go this way to have a efficient time index ?

well it is 45s. I don't know why it says 45ms. I wouldn't even start complaining if that was as fast as 45ms... :-) Maybe a bug in the output of explain analyze. Or maybe it's the time of the analyze to perform. Dunno. But 40/50 seconds is what I measure.
–
Stephane RollandApr 9 '13 at 21:19

1

The time reported in the explain analyze output is the time the query needed on the server. If your query takes 45 seconds, then the additional time is spent transferring the data from the database to the program running the query After all it's 62682 rows and if each row is large (e.g. has long varchar or text columns), this can impact the transfer time drastically.
–
a_horse_with_no_nameApr 9 '13 at 22:23

3 Answers
3

Solution for Postgres 9.1

In most cases the sort order of an index is hardly relevant. Postgres can scan backwards practically as fast. But for range queries on multiple columns it can make a huge difference. I wrote more in this closely related answer on SO.

Sort order of the first column id_phi in the index is irrelevant. Since it's checked for equality (=), it should come first. You got that right. More in this related answer:Multicolumn index and performance
Postgres can jump to id_phi = 0 in next to no time and consider the following two columns of the matching index.

These are queried with range conditions of inverted sort order (<=, >=). In my index, qualifying rows come first. Should be the fastest possible way with a B-Tree index1:

If it qualifies, keep fetching rows until the first one doesn't (super fast).
Continue with next value for column 2 ..

Postgres can either scan forward or backward. The way you had the index, it has to read all matching on the first two columns and then filter on the third. Be sure to read the chapter Indexes and ORDER BY in the manual. It's fits your question pretty well.

How many rows match on the first two columns?
Only few with a start_date_time close to the start of the time range of the table. But almost all rows with id_phi = 0 at the chronological end of the table! So performance deteriorates with later start times.

Planner estimates

The planner estimates rows=62682 for your example query. Of those, none qualify (rows=0). You might get better estimates if you increase the statistics target for the table. For 2.000.000 rows ...

... might pay. Or even higher. More in this related answer on SO. My guess would be you don't need that for id_phi (only few distinct values, evenly distributed), but for the timestamps (lots of distinct values, unevenly distributed).
I also don't think it matters much with the improved index.

CLUSTER / pg_repack

If you want it faster, yet, you could streamline the physical order of rows in your table. If you can afford to lock your table exclusively for a few seconds (at off hours for instance) to rewrite your table and order rows according to the index:

ALTER TABLE time_limits CLUSTER ON idx_time_limits_inversed;

If concurrent use is a problem, consider pg_repack, which can do the same without exclusive lock.

The effect: fewer blocks need to be read from the table and everything is pre-sorted. It's a one-time effect deteriorating over time, if you have writes on the table.

GiST index in Postgres 9.2+

1 With pg 9.2+ there is another, possibly faster option.

A GiST index for a range column. There is also a range type for timestamp [without time zone]: tsrange. The additional integer column is normally faster with a B-Tree index, which is also smaller and cheaper to maintain. But the query would probably still be faster all over.

I should tell this at least only once, that each of your answers on SO and DBA are of really high added value/exprertise, and most of the time the most complete. Just to say it once: Respect!.
–
Stephane RollandApr 10 '13 at 2:14

I have to let finish the big bulk copy generated from the intensively awkward query of mine, so making the process really slow, it was turning for hours before I asked the question. But I have calculated, and I decided to let it turn until tomorow morning, it'll be finished, and the new table ready to be filled tomorow. I have tried to create your index concurrently during the job, but due to too much access(i think), the creation of the index should be locked. I'll repeat this same test time again tomorow with your solution. I have also looked at how upgrading to 9.2 ;-) for debian/ubuntu.
–
Stephane RollandApr 10 '13 at 3:29

the tsrange type really recovers the data I want to encode. It's only with your explication that I grasped that the database had no idea of the relative order between the two different timestamp column. I'll welcome 9.2.
–
Stephane RollandApr 10 '13 at 3:39

1

@StephaneRolland: it would still be interesting why the explain analyze output shows 45milliseconds while you see the query taking over 40 seconds.
–
a_horse_with_no_nameApr 10 '13 at 7:19

I already have both index. Except the primary key is the other, but the index you propose already exists, and is the one that is used if you look at the explain: Bitmap Index Scan on idx_time_limits_phi_start_end
–
Stephane RollandApr 9 '13 at 20:29