Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

@Catcall says this on the subject of storage order (clustered index) and the output order

A lot of people believe that a clustered index guarantees a sort order on output. But that's not what it does; it guarantees a storage order on disk.
See, for example, this blog post.

I've read the blog post by Hugo Kornelis and understands that an index doesn't guarantee that the sql server reads the records in a specific order. Yet I have a hard time accepting that I can't assume this for my scenario?

As you can see, my table rows are small (16bytes) and I've got only one index, a clustered. In my scenario, the table consists of 100.000.000 records at this moment (and this will most likely increase tenfold).

When the database server queries this table it has two ways of finding my rows, either it seeks the primary key and thereby reading and returning my values in desc. order of Date, or it has to do a full table scan. My conclusion is that a full table scan on all those records will be way too slow and the database server will therefore always seek the table via its primary key and thereby returning the values sorted by Date DESC

Why do you want to be able to rely on this assumption so badly? Why don't you just put an ORDER BY on there, then you know you can rely on it.
–
Aaron Bertrand♦Jun 13 '12 at 17:25

For 2 reasons, curiosity and because the ORDER BY clause is a large performance hit for me (read the other question for more info). I have a solution that works for now, but it won't hold when and if my traffic increases.
–
m__Jun 13 '12 at 17:28

ORDER BY shouldn't be a performance hit if you're relying on the order that you're seeing without the order by - that doesn't make sense to me.
–
Aaron Bertrand♦Jun 13 '12 at 17:30

1

The only thing that guarantees result set order is an ORDER BY clause in your query. This is true for SQL Server, Oracle, MySQL, and any other RDBMS you can think of. Try anything else and you are setting yourself up for a surprise cup of FAIL.
–
Nick ChammasJun 13 '12 at 23:13

1 Answer
1

Let me try to explain why you should not do that, why you should never assume that an SQL-product will return a result set in a specific order, unless you specify so, whatever indices - clustered or non-clustered, B-trees or R-Trees or k-d-trees or fractal-trees or whatever other exotic indices a DBMS is using.

Your original query tells to the DBMS to search the SensorValues table, find rows that match the 3 conditions, order those rows by Date descending, keep only the first row from those and - finally - select and return only the SensorValue column.

SELECT TOP 1 SensorValue
FROM SensorValues
WHERE SensorId = 53
AND DeviceId = 3819
AND Date < 1339225010
ORDER BY Date DESC ;

These are very specific orders you have given to the DBMS and the result will most probably be the same every time you run the query (there is a chance it might not, if you have more than one row that match the conditions and have the same max Date but different SensorValue but lets assume for the rest of the conversation that no such rows exist in your table).

Does the DBMS have to do this, to run this query, the exact way I describe it above? No, of course not and you know that. It may not read the table but read from an index. Or it may use two indexes if it thinks it's better (faster). Or three. Or it may use a cached result.

What is guaranteed though is that it will return the exact same result, every time you run it - as long as no rows are inserted, deleted or updated.

This query tells to the DBMS to search the SensorValues table, find rows that match the 3 conditions, order those rows by Date descending,, don't care about the order, keep only one row and - finally - select and return only the SensorValue column.

So, it basically tells the same as the first one, except that it tells that you want one result only that matches the conditions and you don't care which one.

Now, can we assume that it will give always the same result because of the clustered index?- If it does use this clustered index every time, yes.

But will it use it?- No.

Why not?- Beacuse it can. The query optimizer is free to choose a path of execution every time it runs a statement. Whatever path it sees fit at that time for that statement.

But isn't using the clustered index the best/fastest way to get results?- No, not always. It most probably is the first time you run the query. The second time, it may use the cached result. The 1000th time the result may have been removed from the cache and another result may exist there. Say, you had executed this query just before:

And the cached result (from the above query) is another, different one that still matches your conditions but is not the first in your (wanted) ordering. And you have told the DBMS not to care about the order.

OK, so only cache can affect this?- No, probably many other things, too. Like other indexes that could be considered (at that time) by the DBMS as better for this query. Like if some future developer changes or completely removes this clustered index you have. Like if you or some other developer adds another clustered index (SQL-Server does not have such feature now but it may have in the future or you move to another DBMS). Like if you updated and the new optimizer has a minor bug.

I hope you are convinced that you should never rely that an SQL query will return results in a specific order, unless you specify so. Amd never use TOP without ORDER BY, unless of course you just want one result and you don't care which one is returned.

Couple of things: (1) No, we can't assume any particular result because of the clustered index. That's true in general, but even more so here because Date is the partitioning key. (2) SQL Server uses cached execution plans, but never cached results.
–
Paul WhiteJun 14 '12 at 13:42

@SQLKiwi: thnx. I was trying to give a more general answer, not only about SQL-Server. I'll update my answer with this info.
–
ypercubeJun 14 '12 at 15:16

SQL Server Enterprise Edition does have an Advanced Scanning feature which is kind of similar in that you may get different results because of concurrent queries. Not sure exactly when this kicks in though.
–
Martin SmithJun 14 '12 at 16:32

Another thing that potentially "randomizes" resultset order (even if the query is apparently driven by an ordered index) is parallelism. I saw an app that had been happily running broken SQL start behaving badly after enabling automatic parallelism (not SQL Server, but I guess that might apply there too).
–
MatJun 30 '14 at 14:23