Re: Unquoted special characters in regexps

From:

martin rudalics

Subject:

Re: Unquoted special characters in regexps

Date:

Fri, 03 Mar 2006 08:42:01 +0100

User-agent:

Mozilla Thunderbird 1.0 (Windows/20041206)

> I do not see what this problem has to do with "\\]" vs ']'.
>
> This seems to be just a case of forgetting to double up `\' for Lisp
> syntax.
That's precisely what I meant. If programmers consistently double up
backslashes for _all_ escaped brackets it's usually simple to guess when
one of them has been omitted. Otherwise you always have to consider the
possibility that the author wanted to close a character alternative here
and messed up some preceding part.
You have a long-standing experience (or maybe some sixth sense) for
discovering wrong regexps faster than most of us. But you should
occasionally think of less experienced programmers who try to guess the
motivations for writing an expression like
(string-match "[^\\]\\(\\([\\][\\]\\)*\\)\"[ \t,]*"
definition start)
in `mailalias.el'. It's got no less than three backslashes preceding
non-escaped right brackets. Can you tell me what the author wants to
match? If, by default, I have to consider the possibility that a `]'
may either close a character alternative _or_ stand for itself, the
number of interpretations of such expressions explodes combinatorially.
Programmers should avoid confusion by not putting `\\' at the end of a
character alternative unless its needed as in `[^\\]'.
> The present regexp is valid, but the syntax it is looking for seems
> bizarre. On the other hand looking for things like:
>
> "[123] [5] [2034] "
>
> seems to make sense.
Because people are used to consider objects like "[123] [5] [2034]"
well-formed and objects like "123]", "]5]", "[2034 " bizarre. Most
humans _do_ expect to find some sort of symmetry in the things they
observe. Symmetry is a driving principle of mathematics and computer
sciences. Often, it's a lack of symmetry that makes people aware of
faults or other anomalies.