The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

HTML Cleaner

Does anyone know of a really good program that automatically removes redudant nested and empty tags and the such? I have some really bad code I need to clean up, and its so bad it crashed Dreamweaver Ultradev 4, code cleaner. You can see the code at www.lynwoodtheatre.com

5000 lines? The page you linked to only has about 830 lines. If you like, I could clean up your code for you. It would take me about an hour or two. We could work out a trade or some sort of recompense. It's up to you though.

If that many lines of code are bad, then your best bet would be to create a blank HTML document and start from scratch.

Wow...I just took a closer look at your code, lines 57-74 only have TWO real words, the rest is redundant tags. Wow..you weren't kidding. But the thing is that it wouldn't take much to fix all that. Actually less than one hour I would say, even if I had to rebuild that page from scratch.

I would actually recommend using CSS. There are only 3 or 4 real styles on that page. You could EASILY use CSS and have the site be compatible even with Netscape 4.7.

here ya go

Went on a delete spree for a few minutes till I got it down to a level of chaos that dreamweaver could halfway handle.. that was by far _the_worst_ and most _bloated_ code I have ever seen in my life. But I guess thats what you get when you combine WYSIWYG with cluele.. err.. people that don't know any better.. :P

Anyway, next time don't be afraid to just dive into the source and start deleting all of those redundant font tags... there were literally thousands of them.. ugh.. And whoever built that page should take a look at the html source that their WYSIWYG editor is creating before publishing it.

Moral of the story... WYSIWYG can be downright evil in the wrong hands, and a blessing in the right ones.

I am getting paid to clean that site up and another one too. Your right, I can easily go through and manually clean it, but I was hoping there was an easier way since I am pressed for time. By the way: I got the 5000 line number, after running it through pretty print (http://selfpromotion.com/prettyprint.t) and having it formatted. When that was done, there was 5000 lines, and some of them were indented so far it was kind of funny.

Oh, and the lady claims she used dreamweaver 2 to make it, but I guarantee dreamweaver doesn't make that kind of code. No matter how much you hate WYSIWYG editors, dreamweaver is not that bad...hmm.

Another thing: That pages takes like a minute to load because of all the bloat. Interesting how much time extra HTML adds.

It's because that page as it exists is 99k. I did a little test to see how much the file size would come down. I removed all of the </b> and </font> tags and it came dow to 76k. Just by removing those tags!

Let's say that someone types the word movie into DW. Then they want to make the word "verdana". Then they go back at a later time and make the word font size="4". Then they go back later and make that word green. I would guarantee you that DW would create 3 different font tags to wrap that text in.

Then (in the visual mode) if you go back simply delete that word, the FONT tags will remain. So, I gather what happened is that whoever maintains the site would edit that page and reapply the styles time after time, resulting in the page as it is now.

CSS would be an imperative for this person who is maintaining the site. The only thing you need to tell them is "You don't need to apply ANY font styles." All they have to do is type the new text over the existing text, save it and upload it.

Now, let me give you a quick rundown of the benefits of CSS. I just took a look at that site and it only has 8 different text styles. You could easily create a text style for each one of them and apply it to the text using a span tag. For example, the dates. They are in Arial, green and about a font size of 3. I would creat these styles:

body, td, p, div { font-family: arial; font-size: 10pt; }

.date { font-size: 12pt; color: #339900; }

The first style tells every piece of text that is contained in the BODY tag, a TD tag, a P tag or a DIV tag to be arial at 10pt (which correlates to a font size="2"). That way you don't EVER have to declare a font face again for the rest of the page. It is automatically applied to anything contained inside one of those tags.

The second style ".date" is a "class". You have to explicitly apply this one. So, when you get to where your date will be you do this:

<span class="date">July 20 - 26</span>

It's that simple. Since they will never see the HTML your client can then edit the text to his/her heart's content and will never get screw up the underlying structure.

I agree with creole that css is the way to go. I use css on my site and it works well, but I need to convert it all to a separate "style sheet" to make it work better overall.

He's right about DW 4 Creole. While it *might* be possible to get it to do dupe font tags, when you edit text now and change the font attributes, it doesn't. It just changes the one value and doesn't add anything else. Also, if you choose the colored text in visual mode and hit "delete" it get's rid of everything, font tags and all. Perhaps older versions of dw had this problem, but macromedia isn't foolish. Most of this is fixed in 4 and will only get better when they do their next rev I would guess...

I respect knowing your limitations, and your clients. But, as I mentioned before, you can't afford NOT to use CSS. She doesn't need to know HTML to use it, and you don't need to know CSS to use it. All she would have to do is type over the previous information, easy as that. YOu set up the CSS in advance (I'll help you with it, no big deal) and then leave it alone.

By the way, I just spent a little time taking out all of the reduntant tags (out of curiousity) and I got the page size (minus images) down to 9k. It would be about 2 or 3 k larger after all of the styles were setup but that's a far cry from 99k (which is what it is now).

But she will still be using Dreamweaver to change the page. It wont just be a but and dry replace the text in notepad. She will probably be changing text, chaning colors, chaning fonts, etc. all in Dreamweaver.

Do the colors on the site change regularly? Is the green always there? Are the fonts always arial? Are the sizes alwasy the same? If she changes those regularly then yes, CSS is not the answer. But if she never changes them (or very rarely ever changes them, then CSS would be perfect.