I have been a long time fan of Tidy, a tool to clean up and do some basic checks on the code. However, the tool is not really being updated any more, and since I have moved to using HTML5 and ARIA on all my new projects, it has lost much of its usefulness.

I also see no momentum picking up and thus think it should be considered folding Tidy into html5lib. By that I mean using html5lib to get Tidy like functionality.

Tidy must go HTML5

Here is the deal with HTML5. Pretty soon every browser will have an HTML5 parser. Except for IE, browsers do not have multiple parsers.

This means that tokenization and DOM tree building will follow the rules defined in HTML5 – as opposed to not really following any rules at all, since HTML 4 never defined them.

Simply put, there is no opt out of HTML5. An HTML 4 or XHTML 1.x doctype is nothing more than a contract between developers. Technically all it does is to set the browser in standards compliance mode.

Thus, I do not see any future in a tool that does not rely on the HTML5 parsing algorithm. Tidy can not grow from its current code base, but needs to have the same html5lib at its core that is in the HTML5 validator, which basically is the same as the one being used in Firefox 4.

Indentation. Preferably with an option not to have block elements with a very short text content not to be broken up into 3 rows as in Tidy today.

Besides purification and linting, such a tool/library can be used for:

Security. This will require the possibility of white and/or blacklisting elements and attributes. And preferably also attribute values.

HTML post processing. This will enable authors to see indented code, that is explicit, while at the same time such "waste" can be removed before gzipping. This would be akin to JS minification and it could be performed on the fly from within PHP, Python, Java, Ruby, C#, server side JS or whatever. It can also be done manually before uploading from the development environment to production - or it could be integrated into the uploading tool!

Checking templates

The main feature that Tidy has today, is the ability to handle templates, by preservering/ignoring PHP or other server side code. To what extent the HTML5 parser can be modified to handle that feature I do not know.

From a maintenance and bug fixing point of view, I see huge wins in having a common base for Tidy, the HTML5 validator and HTML parsing in Gecko.

In fact, a very radical idea for Firefox (or any other browser using html5lib) would be to actually integrate these tidy-inspired features directly in their development tools, re-using the existing parser! A Firebug extension that lets me validate as well as tidy up my code directly within the browser would be super awesome!

But the actual possibility thereof is beyond my technical knowledge to evaluate, so I need to hear from people who know this stuff better than I do.

Integration with accessibility checking

Although automatic testing can not not substitute manual tests, they can give a developer an in the ball park idea about the accessibility of a page and fix the most obvious mistakes.

The fact that Tidy today do integrate WCAG 1.0 is better than nothing and any implementation of Tidy5 should strive to integrate WCAG 2.0 in a similar fashion. That really is a no brainer. Having to use only one tool and getting all errors in the same buffer (for programmers) or the same console (for manual checks) is certainly convenient.