On 23 November 2011 02:49, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> There is no sort of systematic labeling of error messages in the log
> to enable the DBA to figure out that the first error message is likely
> nothing more serious than an integrity constraint doing its bit to
> preserve data integrity, while the second is likely a sign of
> impending disaster. And not just figure it out, but filter out the
> stuff that's actually worth worrying about and alert on it. If the
> first error message shows up in the log of a server I'm administering,
> IDC. If the second one shows up, I want to be woken up in the middle
> of the night immediately, and for that matter let's page the back-up
> on call while we're at it. Right now, the best option is probably to
> use something like tail_n_mail which, IIRC, has lots of hardcoded
> error strings in it to help separate the wheat from the chaff, but
> that's just a workaround for our failure to classify things properly.
Robert wrote this on the -bugs list, lots of people agreed that it was
a good idea, and it was never really followed up. I think he
subsequently hit the nail on the head when he said:
On 24 November 2011 16:14, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> What I think we want to distinguish between is things that are
> PEBKAC/GIGO, and everything else. In other words, if a particular
> error message can be caused by typing something stupid, unexpected,
> erroneous, or whatever into psql, it's just an error. But if no
> input, however misguided, should ever cause that symptom, then it's, I
> don't know what the terminology should be, say, a "severe error".
It's a shame that we don't have the ability to do this right now.
Considering that it is likely to be a very mechanical change, with
virtually no capacity to cause bugs, though likely to make DBAs lives
easier, I feel that we should try and get this into 9.2. I realise
that this is a case of introducing a new feature after the nominal
deadline, and I don't generally have a blasé attitude toward such
things, but I happen to think making an exception is justified in this
particular case.
I could perform the simple work of introducing the new severity level,
if that was asked of me, while other people could perhaps make a
judgement as to what constitutes a "severe error", with perhaps people
who are particularly experienced with particular subsystems taking it
upon themselves to make those sorts of judgements for that subsystem.
I doubt that there will be a lot of grey area, so it shouldn't take
that long to do this.
From a quick look through the postgres.pot file, which is the file
actually modified by translators, ISTM that most error messages are
really obviously either one (error) or the other (severe error), and
of those the vast majority are the former, not the latter. The patch
footprint here, in raw terms of the number of lines modified, might
end up being surprisingly small.
--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services