Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. Join them; it only takes a minute:

I need to query the latest entry from a table with >130mn entries using PostgreSQL 9.5. There are two columns of interest, row_id (integer, the primary key), and row_time (timestamp, not null, with btree index).

I don't understand why this is happening. It would be easy to retrieve the rows with the maximum row_time using the index (typically less than 10 rows) and then sort just these. Instead, the full table is scanned, which takes minutes instead of the sub-millisecond time of the first query.

Creating a multicolumn index on (row_time, row_id) makes PostgreSQL chose an index scan again, but increases the size of the index considerably. A subquery to get MAX(row_time), then filter by it and sort the result by row_id is also fast but seems overly complicated for my simple goal. Am I missing something here?

1 Answer
1

If you create an index on (row_time DESC, row_id DESC), then the second case will will act just as the first one.

LIMIT will always operate after sorting is taken care of, so when sorting without an index is necessary, it will sort the entire recordset prior to processing it through LIMIT.

I wonder what is it you want to achieve? Applying LIMIT over a primary indexed order, then including all rows matching the sorted field(s) in the limited result, and finally sorting the limited results by any other fields and limit again. This is the way to go in this case: