Change Log

0.999

Fix #115: lxml treewalker can now deal with fragments containing, at
their root level, text nodes with non-ASCII characters on Python 2.

0.99

Released on September 10, 2013

No library changes from 1.0b3; released as 0.99 as pip has changed
behaviour from 1.4 to avoid installing pre-release versions per
PEP 440.

1.0b3

Released on July 24, 2013

Removed RecursiveTreeWalker from treewalkers._base. Any
implementation using it should be moved to
NonRecursiveTreeWalker, as everything bundled with html5lib has
for years.

Fix #67 so that BufferedStream to correctly returns a bytes
object, thereby fixing any case where html5lib is passed a
non-seekable RawIOBase-like object.

1.0b2

Released on June 27, 2013

Removed reordering of attributes within the serializer. There is now
an alphabetical_attributes option which preserves the previous
behaviour through a new filter. This allows attribute order to be
preserved through html5lib if the tree builder preserves order.

Removed dom2sax from DOM treebuilders. It has been replaced by
treeadapters.sax.to_sax which is generic and supports any
treewalker; it also resolves all known bugs with dom2sax.

Removed the deprecated Beautiful Soup 3 treebuilder.
beautifulsoup4 can use html5lib as a parser instead. Note that
since it doesn’t support namespaces, foreign content like SVG and
MathML is parsed incorrectly.

Removed simpletree from the package. The default tree builder is
now etree (using the xml.etree.cElementTree implementation if
available, and xml.etree.ElementTree otherwise).

Removed the XHTMLSerializer as it never actually guaranteed its
output was well-formed XML, and hence provided little of use.

Removed default DOM treebuilder, so html5lib.treebuilders.dom is no
longer supported. html5lib.treebuilders.getTreeBuilder("dom") will
return the default DOM treebuilder, which uses xml.dom.minidom.

Optional heuristic character encoding detection now based on
charade for Python 2.6 - 3.3 compatibility.