mxTextTools Change Log

The change log includes a detailed description of all changes to this package in the recent releases.

Version: 3.2.3

Change from 3.0.0 to 3.1.0

Fixed a segfault when trying to unpickle a UnicodeTagTable. Reported by Andrew Dalke.

Changes from 2.0.3 to 3.0.0

Version 3.0.0 introduces full Unicode support to mxTextTools and the Tagging Engine which was implemented for one of our customers. As a result, a few things had to be restructured and modified. Hopefully, the new design decisions will provide more room for future enhancements.

The new version is expected to behave nearly 100% backward compatible to previous versions. If needed, aliases or factory functions were provided to maintain interface compatibility.

Moved command constant definitions from Constants.py to the C extension.

Restructured tag commands and their numbering so that low-level commands come before the special ones. Old tag tables need to be "recompiled" due to this change !

Added a Tag-Table compiler. The tagging engine will now only work with compiled TagTables.

Made TE polymorph w/r underlying datatype and created two versions: one for unsigned char and one for Py_UNICODE.

Wrote Tag-Table cache support.

tag() now accepts keyword arguments.

Merged BMS and FS into a new TextSearch object. The used algorithm is now an argument to this single object constructor.

Passing an unknown search object type to the TE is now an error.

Nearly all instances where a SystemError could have been raised now raise an mxTextTools.Error instead.

Removed support for buffer-compatible input objects. This will probably be reintegrated in some future release.

Added new AllInCharSet and IsInCharSet commands.

Implemented Unicode support in search objects using a trivial algorithm. Translation is not supported for Unicode.

Added a huge set of regression tests for all the C APIs and the Tagging Engine.

Fixed a bug in the strip APIs which caused a core dump in situations where the complete string contents would have been stripped. Thanks to Jeffrey Chang for finding this one.

Fixed a bug in the handling of SubTable: the subtaglist entries of the tag table entries pointed recursively to the taglist containing them. This was updated to the documented behaviour of using None for the subtaglist entries.

Added support for a context object which is passed along while processing a tag table with the Tagging Engine.

Added more type casts to the C code to make some pedantic compilers happy (eg. the Mac OS X one).

Fixed a bug found by Simon Cusack in the RTF.py example.

Fixed a bug in tagdict(). Thanks to Joel Rosdahl for reporting this.

Fixed a bug in tag() which caused the IsNot/IsNotIn commands to scan beyond the end of the text slice (eventually causing a segfault). Thanks to Reinhard Engel for reporting this.

Added test to the Tag Table Compiler to check for empty match strings. These are no longer allowed for low-level commands (which wouldn't match in such a case anyway). This allows the Tagging Engine to run faster, since it doesn't have to check for this case anymore.

Changes from 2.0.2 to 2.0.3:

Added isascii().

Changes from 2.0.0 to 2.0.2:

Fixed a bug in the Words.py example. Thanks to Michael Husmann for finding this one.