One of my good friends, Piet de Visser commented on a recent post that “380 {consistent gets} is too much” per row returned. He is right. He is wrong. Which?

Piet is actually referring to a given scenario I had just described, so he is probably {heck, he is spot on} correct as his comment was made in context – but it set me to thinking about the number of times I have been asked “is the number of consistent gets good or bad” without any real consideration to the context. The person asking the question usually just wanted a black/white good/bad ratio, which is what Piet also mentioned in his comment, a need to discuss such a ratio. I am on holiday in New England with my second bottle of wine, memories of having spent the day eating Lobster, kicking though Fall leaves and sitting by a warm fire reading a book, so I am mellow enough to oblige.

Sadly, out of context, no such ratio probably exists. *sigh*. There evaporates the warm glow of the day :-).

The question of “is the number of consistent gets per row good or bad?” is a bit like the question “is the pay rate good enough?”. It really depends on the whole context, but there is probably an upper limit. If I am helping my brother fit {yet another} kitchen then the pay rate is low. He has helped me fit a few, I have helped him fit a few, a couple of pints down the pub is enough and that equates to about 30p an hour. Bog standard production DBA work? 30-60 pounds an hour seems to be the going rate. Project Managing a system move that has gone wrong 3 times already? I need to factor in a long holiday on top of my normal day rate, so probably high hundreds a day. £10,000 a day? I don’t care what it is, I ain’t doing it as it is either illegal, highly unpleasant, both or involves chucking/kicking/hitting a ball around a field and slagging off the ref, and I ain’t good at ball games.

I have a rule of thumb, and I think a rule of thumb is as good as you can manage with such a question as “is {some sort of work activity on the database} per row too much?”. With consistent gets, if the query has less than 5 tables, no group functions and is asking a sensible question {like details of an order, where this lab sample is, who owes me money} then:

below 10 is good

10-100 I can live with but may be room for improvement

above 100 per record, let’s have a look.

Scary “page-of-A4″ SQL statement with no group functions?

100-1000 consistent gets is per row is fine unless you have a business reason to ask for better performance.

Query contains GROUP BY or analytical functions, all bets are pretty much off unless you are looking at

a million consistent gets or 100,000 buffer gets, in which case it is once again time to ask “is this fast enough for the business”.

The million consistent gets or 100,000 buffer gets is currently my break-even “it is probably too much”, equivalent to I won’t do anything for £10 grand. 5 years ago I would have looked quizzically at anything over 200,000 consistent gets or 10,000 buffer gets but systems get bigger and faster {and I worry I am getting old enough to start becoming unable to ever look a million buffer gets in the eye and not flinch}. Buffer gets at 10% of the consistent gets, I look at. It might be doing a massive full table scan in which case fair enough, it might be satisfying a simple OLTP query in which case, what the Hell is broken?

The over-riding factor to all the above ratios though is “is the business suffering an impact as performance of the database is not enough to cope”? If there is a business impact, even if the ratio is 10 consistent gets per row, you have a look.

Something I have learnt to look out for though is DISTINCT. I look at DISTINCT in the same way a medic looks at a patient holding a sheaf of website printouts – with severe apprehension. I had an interesting problem a few years back. “Last week” a query took 5 minutes to come back and did so with 3 rows. The query was tweaked and now it comes back with 4 rows and takes 40 minutes. Why?

I rolled up my mental sleeves and dug in. Consistent gets before the tweak? A couple of million. After the tweak? About a hundred and 30 million or something. The SQL had DISTINCT clause. Right, let’s remove the DISTINCT. First version came back with 30 or 40 thousand records, the second with a cool couple of million. The code itself was efficient, except it was traversing a classic Chasm Trap in the database design {and if you don’t know what a Chasm Trap is, well that is because Database Design is not taught anymore, HA!}. Enough to say, the code was first finding many thousands of duplicates and now many millions of duplicates.
So, if there is a DISTINCT in the sql statement, I don’t care how many consistent gets are involved, of buffer gets or elapsed time. I take out that DISTINCT and see what the actual number of records returned is.

Which is a long-winded way of me saying that some factors over-ride even “rule of thumb” rules. so, as a rule of thumb, if a DISTINCT is involved I ignore my other Rules of Thumb. If not, I have a set of Rules of Thumb to guide my level of anxiety over a SQL statement, but all Rules of Thumb are over-ridden by a real business need.

Right, bottle 2 of wine empty, Wife has spotted the nature of my quiet repose, time to log off.