I reported this one last summer while you were in draft stage, but no
action was taken. Maybe you were just too busy. Anyway, my problem
with it all didn't go away. I have continued concern about and objection
to the present [...] notational devices--both in the way they are defined
and the way they are used...
(1) The notation
[a-zA-Z]
is too-briefly described in chapter 6 as 'matches any character with a
value in the range(s) indicated (inclusive).' I think this needs
elaboration. At the VERY least, it should say 'from a to z and from
A to Z'.
(2) WORSE, the notation [a-zA-Z0-9_.:] is NOWHERE defined. Indeed, the
notation [abc] is not even defined.
(3) [^abc] is only scantily defined, although one must infer from context
using superhuman skills that the "^" is part of the "not" notation and
not part of the characters that are disallowed. Without more exposition,
there is no way to discern that [^abc] doesn't mean
Char - ( '^' | 'a' | 'b' | 'c' )
since there is no use of [...] shown and one might therefore assume
that when hyphens are not present, there is an exclusion applied.
(4) If you assume [abc] is defined as meaning the enclosed characters,
then how do you know that [#x12-#x14] doesn't mean
'#' | 'x' | '-' | '1' | '2' | '4'
? My conclusion is that you can't let this go without saying.
It may be that people can figure this spec out pragmatically,
but it is not the case that the spec really DEFINES a notation plainly.
Personally, I would MUCH rather not see a hairy definition for [].
I would rather see see a simple syntax definition of [], EVEN IF
it led to more complex notations like:
[a-z] | [A-Z] | [0-9] | '_' | '.' | ':'
and even if instead of [^abc] you saw:
Char - ('a' | 'b' | 'c')
Another thing I like about "Char - ('a' | 'b' | 'c')" is that it makes
clear what the set is that abc are being removed from. When you don't
specify, it might mean Char or it might be some other set.
Among other things, using a more cumbersome notation would encourage
you to name these odd little collections of characters. Why on earth
is "_", ".", and ":" allowed in one case but another arbitrary-looking
set in another context?? If you named these better, and used descriptions
like:
lc-alpha | uc-alpha | digit | nameprefix
in place of
[a-zA-Z0-9_.:]
it would make a lot more sense and would have a normative effect on the
terminology used by parser-writers to describe these odd little sets.
-kmp
-----------
DISCLAIMER:
The above are my personal feelings and not necessarily
Harlequin's official position.