On Wed, Nov 23, 2011 at 7:19 AM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> On 23 November 2011 02:49, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> There is no sort of systematic labeling of error messages in the log
>> to enable the DBA to figure out that the first error message is likely
>> nothing more serious than an integrity constraint doing its bit to
>> preserve data integrity, while the second is likely a sign of
>> impending disaster.
>
> +1
>
> I suggested that there be an INTERNAL_ERROR severity level before on
> this list, in response to an opaque internal error that was raised in
> the planner due to a bug in master (it was a simple elog() call that
> raised the error), and the idea was not well received. Tom said that
> "Well, the SQLSTATE for this sort of thing is already
> ERRCODE_INTERNAL_ERROR". A quick search of that shows that it only
> appears in the following places:
I mostly agree, but I don't think "internal error" is aiming at quite
the right target. For one thing, one big cause of concern is when the
database observes difficulty accessing the underlying storage. Those
errors are not "internal" to the database at all - they are coming
from the operating system. IME, users often don't realize what those
messages mean. So even aside from the difficulty of systematic log
filtering, it's easy for an inexperienced DBA to read a message
complaining about inability to write a block or flush XLOG and think
"oh, what a buggy piece of software PostgreSQL is" or perhaps "I
wonder why it's having so much trouble performing that operation?".
What we want them to think is "oh crap! my disk is dying".
So I would propose to steer clear of the word "internal", because the
really scary errors typically are not internal to PostgreSQL at all.
What I think we want to distinguish between is things that are
PEBKAC/GIGO, and everything else. In other words, if a particular
error message can be caused by typing something stupid, unexpected,
erroneous, or whatever into psql, it's just an error. But if no
input, however misguided, should ever cause that symptom, then it's, I
don't know what the terminology should be, say, a "severe error". So,
for example, these would all be severe errors:
cannot commit a transaction that deleted files but has no xid
StartTransactionCommand: unexpected state %s
could not create file "%s": %m
xlog flush request %X/%X is not satisfied --- flushed only to %X/%X
All of these are situations that no SQL command should ever be able to
manufacture. Either PostgreSQL has a bug, or the hardware is dying,
or at a minimum there is a disk full, something that the DBA will
certainly want to know about sooner rather than later. The failure of
the those transaction is not the user's "fault"; some external
circumstance has intervened.
Now, in some environments, it may be that things like unique key
violations are worrisome, because they may indicate *application*
bugs. So it would still be up to the DBA to provide monitoring for
those conditions as needed. But at least grepping the logs for severe
errors would provide an easy way for DBAs to know whether the database
believes itself to be sick.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company