On the SitePoint PHP blog today there's a new post from Craig Buckler for the WordPress users out there. The HTML that this popular blog/CMS tools spits out can sometimes be not-so-semantic. Craigshares a tip on cleaning up one aspect of it - the methods returning lists for menus or sitemaps.

love WordPress. I also love clean semantic HTML. Unfortunately, several of the standard WordPress theme functions return code that is a little untidy. For me, the primary culprits are wp_list_pages() and the newer wp_nav_menu(); both return an unordered list of page links.

He gives an example of a sample list generated by wp_nav_menu() that's full of badly formatted and unnecessary elements. To help fix the issue, he shares his regular expression-based call to strip out things like extra tabs, empty classes and all title attributes. Obviously you can customize this as you need, but it's a good start towards something that's a bit cleaner and up to code.

In a new post to his blog Benjamin Schneider looks at how you can use the Tidy extension to clean up the (X)HTML markup that comes out of your application.

Tidy is a very cool PHP extension. You can let it tell you what kind of mistakes you might have in your HTML markup and even correct it for you - if you want. In my projects I use it to give me a hint if my generated markup is invalid. This way I can easily correct it during development without being dependent on external validators. In this post I will show you how easy it is and how few lines of code you need to make your application show any potential errors you might have in your HTML markup.

He shows how, with the help of output buffering, to grab the HTML output of your script and push it through the Tidy functionality (via a call to tidy_parse_string) and outputting the results. It even has a built in error catcher for when it finds invalid formatting in the generated markup. You can find out more about the features of this extension in the PHP manual.

Ryan Mauger has a new post today looking at how to combine the Tidy extension for PHP and Firebug with a Zend Framework application to keep your HTML neat and valid with a handy bit of feedback for debugging.

With Zend Framework there is an easy way to ensure that you always create valid HTML in your applications. This involves the use of a simple Front Controller Plugin, and the php Tidy component. [...] So you can use tidy for filtering user input, what about using it to effectivly clean my documents and ensure my output is always valid?

He starts off with the pieces of the puzzle you'll need - the front controller plugin for the Zend Framework application, using the Tidy extension to filter your HTML and combining the two to make a dispatchShutdown() call to make the transformations. Firebug comes in to help with logging the issues Tidy found so you can correct them.

Matthew Turland has posted a guide to compiling PHP with the Tidy extension (a tool to clean and repair HTML documents through direct manipulation).

I dug around a bit, but most resources I came across on Google were about using the tidy extension for PHP rather than doing a custom build of PHP that included the tidy extension. Once I figured the details out, I thought I'd share. They admittedly seemed somewhat obvious after the fact, though also were not communicated as explicitly as I would have liked anywhere that I could see.

TO use his method you'll need to have CVS working (either on a server or, like he did, a local machine) and be able to grab the latest PHP 5.3.x and tidy extension versions. When you do the compile for PHP, all you need to do is point it at the CVS checkout of tidy and you should be all set.

DevShed wraps up their series on using the Tidy extension with this last tutorial showing methods of tracking parse errors with the help of the library.

So, the question that comes up is the following: what is the next step? Well, from a PHP developer's point of view, tracking all the errors that occurred when parsing a concrete (X)HTML string might be quite useful. Therefore, in this final tutorial of the series I'm going to cover some new functions bundled with the Tidy extension which are designed to show you the potential errors raised when interpreting (X)HTML data.

The tutorial shows how to use the tidy_get_error_buffer, tidy_access_count, tidy_error_count and tidy_warning_count functions to search through your code and handle whatever errors (and error data) that might come up.

In this new tutorial from DevShed, they demonstrate the use of a handy little bit of functionality to help keep your code (and markup) clean - Tidy.

Now that you know that the Tidy (X)HTML formatting/correcting application can be called directly from your own PHP 5 scripts, over the course of this series, which is comprised of three friendly tutorials, I'm going to walk you through using the bunch of useful functions included with this library.

This tutorial (part one) talks about the parsing of (X)HTML strings and using the tidy_clear_repair, tidy_parse_file and tidy_repair_file functions to handle the cleanup of (X)HTML strings and files automatically.