Technote 10005: Introduction to cleaning HTML code

Using the HyperText Studio's powerful clean feature you can create your
own cleans when you want to remove or modify sets of tags and attributes in
a document. You can also modify the default clean sets. This technote
takes you through the process of creating a custom clean and introduces you
to many of the concepts used in HTML cleaning.

Note: This technote assumes some knowledge of
HTML code.

Opening the associated file

Download the file used in this tutorial by clicking
here. Unzip the file and extract faq.aspx. This file is
part of the HyperText Studio tutorial and was originally produced in Microsoft
Word but has already been cleaned by the HyperText Studio's Microsoft Word
clean set.

Select File | Open and open faq.aspx.

Go to Source view.

Creating a Custom Clean Set

A clean set contains the rules used to clean a document.

Select Tools | Reformat Code.

Select Clean Code.

Click Browse.

Click New. The currently saved list of settings displays.

Type myclean as the name
of the Clean Set.

Click Browse.
The Clean Set dialog box opens.

Leave the dialog box open.

Creating a New Match

A clean rule is made up of a match, which finds a tag to work with and an
action, which does something with the tag that has been matched.

In our case, some of the style sheet classes that were created while cleaning
the Microsoft Word 2000 code are redundant, so you are going to remove them.
You will create four matches, with their corresponding actions, to remove all
span tags that have the class attribute set to "class5", "class6",
or "class7", and to remove the "MsoNormal" class from p
tags.

Click New Match. The Clean Rule Match dialog box opens.

Fill in the boxes as shown below.

Click OK. Leave the Clean Settings dialog box open.

Creating a New Action

Now that a match has been created, you need to configure the match to do
something - this is called an action.