--- code/trunk/ChangeLog 2007/08/15 14:35:57 216
+++ code/trunk/ChangeLog 2007/09/14 09:14:24 244
@@ -1,7 +1,65 @@
ChangeLog for PCRE
------------------
-Version 7.3 09-Aug-07
+Version 7.4 10-Sep-07
+---------------------
+
+1. Change 7.3/28 was implemented for classes by looking at the bitmap. This
+ means that a class such as [\s] counted as "explicit reference to CR or
+ LF". That isn't really right - the whole point of the change was to try to
+ help when there was an actual mention of one of the two characters. So now
+ the change happens only if \r or \n (or a literal CR or LF) character is
+ encountered.
+
+2. The 32-bit options word was also used for 6 internal flags, but the numbers
+ of both had grown to the point where there were only 3 bits left.
+ Fortunately, there was spare space in the data structure, and so I have
+ moved the internal flags into a new 16-bit field to free up more option
+ bits.
+
+3. The appearance of (?J) at the start of a pattern set the DUPNAMES option,
+ but did not set the internal JCHANGED flag - either of these is enough to
+ control the way the "get" function works - but the PCRE_INFO_JCHANGED
+ facility is supposed to tell if (?J) was ever used, so now (?J) at the
+ start sets both bits.
+
+4. Added options (at build time, compile time, exec time) to change \R from
+ matching any Unicode line ending sequence to just matching CR, LF, or CRLF.
+
+5. doc/pcresyntax.html was missing from the distribution.
+
+6. Put back the definition of PCRE_ERROR_NULLWSLIMIT, for backward
+ compatibility, even though it is no longer used.
+
+7. Added macro for snprintf to pcrecpp_unittest.cc and also for strtoll and
+ strtoull to pcrecpp.cc to select the available functions in WIN32 (where
+ different names are used).
+
+8. Changed all #include to #include "config.h". There were also
+ some further cases that I changed to "pcre.h".
+
+9. When pcregrep was used with the --colour option, it missed the line ending
+ sequence off the lines that it output.
+
+10. It was pointed out to me that arrays of string pointers cause lots of
+ relocations when a shared library is dynamically loaded. A technique of
+ using a single long string with a table of offsets can drastically reduce
+ these. I have refactored PCRE in four places to do this. The result is
+ dramatic:
+
+ Originally: 290
+ After changing UCP table: 187
+ After changing error message table: 43
+ After changing table of "verbs" 36
+ After changing table of Posix names 22
+
+ Thanks to the folks working on Gregex for glib for this insight.
+
+11. --disable-stack-for-recursion caused compiling to fail unless -enable-
+ unicode-properties was also set.
+
+
+Version 7.3 28-Aug-07
---------------------
1. In the rejigging of the build system that eventually resulted in 7.1, the
@@ -98,21 +156,72 @@
the "low surrogate" sequence 0xD800 to 0xDFFF. Previously, PCRE allowed the
full range 0 to 0x7FFFFFFF, as defined by RFC 2279. Internally, it still
does: it's just the validity check that is more restrictive.
-
-16. Inserted checks for integer overflows during escape sequence (backslash)
- processing, and also fixed erroneous offset values for syntax errors during
- backslash processing.
-
+
+16. Inserted checks for integer overflows during escape sequence (backslash)
+ processing, and also fixed erroneous offset values for syntax errors during
+ backslash processing.
+
17. Fixed another case of looking too far back in non-UTF-8 mode (cf 12 above)
- for patterns like [\PPP\x8a]{1,}\x80 with the subject "A\x80".
-
+ for patterns like [\PPP\x8a]{1,}\x80 with the subject "A\x80".
+
18. An unterminated class in a pattern like (?1)\c[ with a "forward reference"
caused an overrun.
-
-19. A pattern like (?:[\PPa*]*){8,} which had an "extended class" (one with
- something other than just ASCII characters) inside a group that had an
- unlimited repeat caused a loop at compile time (while checking to see
- whether the group could match an empty string).
+
+19. A pattern like (?:[\PPa*]*){8,} which had an "extended class" (one with
+ something other than just ASCII characters) inside a group that had an
+ unlimited repeat caused a loop at compile time (while checking to see
+ whether the group could match an empty string).
+
+20. Debugging a pattern containing \p or \P could cause a crash. For example,
+ [\P{Any}] did so. (Error in the code for printing property names.)
+
+21. An orphan \E inside a character class could cause a crash.
+
+22. A repeated capturing bracket such as (A)? could cause a wild memory
+ reference during compilation.
+
+23. There are several functions in pcre_compile() that scan along a compiled
+ expression for various reasons (e.g. to see if it's fixed length for look
+ behind). There were bugs in these functions when a repeated \p or \P was
+ present in the pattern. These operators have additional parameters compared
+ with \d, etc, and these were not being taken into account when moving along
+ the compiled data. Specifically:
+
+ (a) A item such as \p{Yi}{3} in a lookbehind was not treated as fixed
+ length.
+
+ (b) An item such as \pL+ within a repeated group could cause crashes or
+ loops.
+
+ (c) A pattern such as \p{Yi}+(\P{Yi}+)(?1) could give an incorrect
+ "reference to non-existent subpattern" error.
+
+ (d) A pattern like (\P{Yi}{2}\277)? could loop at compile time.
+
+24. A repeated \S or \W in UTF-8 mode could give wrong answers when multibyte
+ characters were involved (for example /\S{2}/8g with "A\x{a3}BC").
+
+25. Using pcregrep in multiline, inverted mode (-Mv) caused it to loop.
+
+26. Patterns such as [\P{Yi}A] which include \p or \P and just one other
+ character were causing crashes (broken optimization).
+
+27. Patterns such as (\P{Yi}*\277)* (group with possible zero repeat containing
+ \p or \P) caused a compile-time loop.
+
+28. More problems have arisen in unanchored patterns when CRLF is a valid line
+ break. For example, the unstudied pattern [\r\n]A does not match the string
+ "\r\nA" because change 7.0/46 below moves the current point on by two
+ characters after failing to match at the start. However, the pattern \nA
+ *does* match, because it doesn't start till \n, and if [\r\n]A is studied,
+ the same is true. There doesn't seem any very clean way out of this, but
+ what I have chosen to do makes the common cases work: PCRE now takes note
+ of whether there can be an explicit match for \r or \n anywhere in the
+ pattern, and if so, 7.0/46 no longer applies. As part of this change,
+ there's a new PCRE_INFO_HASCRORLF option for finding out whether a compiled
+ pattern has explicit CR or LF references.
+
+29. Added (*CR) etc for changing newline setting at start of pattern.
Version 7.2 19-Jun-07