Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

From time to time, consumers of my database processes will ask for an estimate of when a given task will be done. While I feel like I know how to read an EXPLAIN in most database engines, I have trouble trying to translate this to "ask me again in 15 minutes". Does anyone know a good "rule of thumb" to use for any particular database?

I realize this isn't going to be a hard and fast rule, but even being able to give a ballpark figure could be useful in some instances.

2 Answers
2

Cost based optimizers all work via a variety of proprietary algorithms (or you can read them for open source databases), but they typically work by assigning a reference operation a value of 1. For example, in SQL Server an operation with a cost estimate of 1 takes 1/320th of a second on a reference computer under some developer's desk in Redmond. The costing is just a relative guess of how expensive a query will be. Many RDBMSes use this cost in establishing priority or, in the case of deadlocks, to kill off cheaper queries (they take less time to run again). But it's all just a guess based on the information that the query optimizer has at its disposal at the time the query is being run.

Peter is correct, the best you can hope for is running some benchmark queries in ideal scenarios and using those to base best guesses on. You have to deal with a lot of different points of contention in an RDBMS, so it's difficult to specifically determine how any given query will perform in the real world.

Don't forget, the computer was was used to come up with that costing in Redmond probably died 10-15 years ago.
–
mrdenny♦Jan 5 '11 at 21:54

Good explanation of how CBO numbers were based. Thanks!
–
TMLJan 7 '11 at 16:24

@mrdenny - Nope, they keep that current, they just adjust what a value of 1 is for each release. Allegedly. But since the query planner is the super secret sauce behind SQL Server, we'll never really know.
–
Jeremiah PeschkaJan 10 '11 at 12:02

1

see, the last time I asked I was told that the values didn't mean anything any more as they hadn't been changed in several releases. They told me to just go for as low a number as possible.
–
mrdenny♦Jan 10 '11 at 18:50

I suspect that this is not possible the way you imagine it. One important reason is that the actual run time is very hardware dependent, and many of the optimization decisions that the database engine does are effectively about balancing the use of the different hardware components (e.g., disk, memory, CPU).

I suggest that you run a bunch of queries relevant to your application, record the cost estimations and the run times, and try to make sense of that data. You might get a nice linear relationship, or you might learn that the cost numbers are worthless for this purpose.