"combined with most people STILL using tags for what they look like instead of what they mean -- and it's hardly a shock you see bloated slow inaccessible pages vomited up using hundreds of K of code to deliver less than 10k of content.

Can't argue with that at all. "

Must be my turn then.

For HTML, the content itself is just data from "somewhere" (typically a database); and the job of a web developer is to design and control the presentation of that content. Idiotic crud like CSS and "semantic" tags (even old stuff, like the "em" tag) just make it harder for web developers to do their job well (as they can't really say how any specific browser might render it), and push developers towards using things like Flash as a way of controlling presentation.

One supposed advantage of semantics is that it's "important" for accessibility. This is misguided, as most accessibility stuff needs to be taken care of at the OS's level so that all applications (not just stuff displayed by a web browser) is usable; and if accessibility is taken care of at the OS's level (like it already is in most OSs) then HTML itself needn't care about most accessibility. Of course for blind people, you really don't want HTML at all - you want something designed for complete control over audio (both sound and speech synthesis), including timing, volume, position, etc; and you want web developers to design sites specifically for audio (including site navigation, etc) instead of trying to make something intended for visual content delivery (and primarily used for visual content delivery) work in a "half-assed, almost better than nothing" way.

Then there's thing like search engines, which are already capable of producing extremely good results from things that lack any semantics; like PDF files, MS word files, text files, etc. They have no need for semantic markup, and often the tags intended for search engines (like the "keywords" meta-tags) are deliberately ignored by search engines.

For anything more than that, if you want "raw content + semantics" then use XML (not HTML and not XHTML).

Basically, the only valid reason for any/all tags in HTML is to tell the browser how something should look. Rather than concentrating on doing the job it was intended for (and doing it well); since HTML4 they've been making it worse just to make HTML something it was never intended to be.

Idiotic crud like CSS and "semantic" tags (even old stuff, like the "em" tag) just make it harder for web developers to do their job well (as they can't really say how any specific browser might render it)

Hi! How are things back in 1992?

Seriously... So you want to go back to purely using tags to control presentation? Really?

I have been doing this since 1996 or so. You can not and have never been able to say how any specific browser might render things reliably. It has always been horseshoes and hand grenades until CSS became prevalent. It is still hit or miss to some extent, but much better than it ever was in the past. If you want to control presentation, that is what CSS is for - HTML is for describing document structure, linking, and embedding.

In hindsight I almost wish all browsers shipped with absolutely no presentation defaults at all and forced authors to implement complete CSS styling for their work. It would actually save me some trouble most of the time as the first step required when you want pixel perfect rendering across modern browsers is to reset all that crap anyway.

Realistically it isn't that way though because most of the time you really don't need pixel perfect rendering and the defaults generally behave similarly enough that if you keep things simple they work well. I have nothing against this approach to doing web pages - simple is often good... As long as you are not naive enough to thing that "good" means "identical" it is a perfectly valid way to do things.

One supposed advantage of semantics is that it's "important" for accessibility. This is misguided, as most accessibility stuff needs to be taken care of at the OS's level so that all applications (not just stuff displayed by a web browser) is usable;

It doesn't matter what is doing the accessibility. Please explain how the OS is supposed to do it without something telling it WTF it is looking at? You make it sound like semantic tags are a bad thing... You do realize most of those presentational tags from HTML 2.0 you seem fond of ARE semantic tags don't you? H1 doesn't mean "really big font", it means "Top Level Heading", it just also happens to render with a really big font in most browsers. There really are only a small handful of tags that were ever in HTML that can be considered purely presentational. The font tag for sure, i and b are really both - big, small, sub, sup - hell that is about it really, everything else is semantic.

The point is if you care how it looks you need to use CSS to control it - the tags purpose is to convey document structure. That is essentially what semantic means. If you want HTML without semantics... well you really don't have anything left. Hell, if you don't care about semantics just make a jpeg and use an img tag... Really, why not?

Then there's thing like search engines, which are already capable of producing extremely good results from things that lack any semantics; like PDF files, MS word files, text files, etc.

You do realize all of those things have semantics in them don't you? They may not always be explicit, but even text files have semantics (TITLE IN ALL CAPS). How do you think Google extracts information like the title form say a .doc file when the metadata is missing? It looks for the first heading in it. Same with pdfs. HTML just makes it more explicit and well defined. How is that a bad thing?

Basically, the only valid reason for any/all tags in HTML is to tell the browser how something should look.

You have a seriously misguided view of what HTML is, what it is for, and how it is actually used.

Seriously... So you want to go back to purely using tags to control presentation? Really?

Yes, definitely. However I would also like things to have improved - more tags for direct control of presentation, and better browser compatibility.

I have been doing this since 1996 or so. You can not and have never been able to say how any specific browser might render things reliably. It has always been horseshoes and hand grenades until CSS became prevalent. It is still hit or miss to some extent, but much better than it ever was in the past. If you want to control presentation, that is what CSS is for - HTML is for describing document structure, linking, and embedding.

HTML should be for describing document structure for the purpose of presentation; not for describing document structure for no purpose at all.

In hindsight I almost wish all browsers shipped with absolutely no presentation defaults at all and forced authors to implement complete CSS styling for their work. It would actually save me some trouble most of the time as the first step required when you want pixel perfect rendering across modern browsers is to reset all that crap anyway.

The stupidity of "pixels" as a measurement for anything (especially for web pages where there's no sane way of knowing what the size the user's screen is) is a different issue. Unfortunately, true resolution independence would require something extremely complex, like teaching browsers that percentages can contain fractions (e.g. size="12.34%").

Realistically it isn't that way though because most of the time you really don't need pixel perfect rendering and the defaults generally behave similarly enough that if you keep things simple they work well. I have nothing against this approach to doing web pages - simple is often good... As long as you are not naive enough to thing that "good" means "identical" it is a perfectly valid way to do things.

Agreed. I'm not after "every pixel is identical in all browsers". I just want the browser to do what I say without requiring an extra layer of bloat (CSS) to tell it to do what I say; in the way that HTML3 used to, but hopefully with even more control (like "{div bgcolour="#1234"}" for e.g.).

It doesn't matter what is doing the accessibility. Please explain how the OS is supposed to do it without something telling it WTF it is looking at? You make it sound like semantic tags are a bad thing... You do realize most of those presentational tags from HTML 2.0 you seem fond of ARE semantic tags don't you? H1 doesn't mean "really big font", it means "Top Level Heading", it just also happens to render with a really big font in most browsers. There really are only a small handful of tags that were ever in HTML that can be considered purely presentational. The font tag for sure, i and b are really both - big, small, sub, sup - hell that is about it really, everything else is semantic.

Have you had a look at the accessibility features in something like Windows or Gnome? Things like shifting hue for colour blind people, screen magnification, etc? The only thing not covered is blind users and screen readers (but that's an entirely separate issue).

Note: I don't use the heading tags (I prefer doing the "{big}{big}{b}" thing, so I know what it should look like). In the same way I don't use "{thead}" or "{th}". I use tables for layout control. If I want an actual table with something like "Figure 1.3" underneath it; then I create a table (with borders) inside another table (without borders) and have "Figure 1.3" in the second row of the outer table (because "{tfoot}" is displayed as just another row and not underneath the table, and "{caption}" is displayed above the table (WTF?) which isn't what I want either). I use the "title" attribute for tooltips and not for titles. For code I don't use "{code}", but prefer "{tt}" and a heap of "+nbsp;" (with "+gt;", "+amp;", etc) so that I can still do stuff like syntax highlighting.

I honestly don't think I use any of the tags intended for semantics, because none of them are displayed how I want them to be displayed. The only exception to this is HTML links (and anchors).

The point is if you care how it looks you need to use CSS to control it - the tags purpose is to convey document structure.

To attempt to control presentation you are meant to use CSS (which is a bloated mess that would never have been needed if W3C had their priorities right).

That is essentially what semantic means. If you want HTML without semantics... well you really don't have anything left. Hell, if you don't care about semantics just make a jpeg and use an img tag... Really, why not?

Bandwidth and linking.

You do realize all of those things have semantics in them don't you? They may not always be explicit, but even text files have semantics (TITLE IN ALL CAPS). How do you think Google extracts information like the title form say a .doc file when the metadata is missing? It looks for the first heading in it. Same with pdfs. HTML just makes it more explicit and well defined. How is that a bad thing?

I didn't say it was bad for search engines to use the semantic markup. I did say that search engines don't need the semantic markup or justify the existence of semantic markup.

You have a seriously misguided view of what HTML is, what it is for, and how it is actually used.

I don't care how it is, I'm talking about how it should have been.

Have a look at the source for OSnews' main page. It's just over 62 KiB and consists of a mixture of JavaScript and HTML. On top of that there's the CSS which is another 23.3 KiB, plus another little CSS for RSS (2.1 KiB). Then there's a total of 167.4 KiB of extra javascript files, where the largest is for jQuery. That's a total of 254.8 KiB of data (not including icons, pictures, etc). The content is only 19.3 KiB. How much of the remaining 235.5 KiB is there to control "look and feel"?

Now click on your browsers "refresh" button to refresh the OSNews main page. How long did it take to complete? For me it took a total of 8 seconds to drag all that data half way around the world, partly because the browser can't start fetching all the data at the same time (for e.g. it can't know which CSS file to download until after it's started decoding the HTML).

Of course for blind people, you really don't want HTML at all - you want something designed for complete control over audio (both sound and speech synthesis), including timing, volume, position, etc; and you want web developers to design sites specifically for audio (including site navigation, etc) instead of trying to make something intended for visual content delivery (and primarily used for visual content delivery) work in a "half-assed, almost better than nothing" way.

"Of course for blind people, you really don't want HTML at all - you want something designed for complete control over audio (both sound and speech synthesis), including timing, volume, position, etc; and you want web developers to design sites specifically for audio (including site navigation, etc) instead of trying to make something intended for visual content delivery (and primarily used for visual content delivery) work in a "half-assed, almost better than nothing" way.

Somebody hasn't heard of Aural Stylesheets. "

You're right - I hadn't heard of Aural Stylesheets (and I wouldn't be surprised if most people haven't).

Do they magically restructure an entire web site? For example, with an Aural Stylesheet would the OSNews main page automatically be split up into many smaller (easier to navigate) pages with no more than about 8 articles/news items per page; with the headlines as a single list at the beginning (and all the extra clutter like the search, login, and the "legalese" at the bottom shifted to a separate page)? Or was I right from the start - it's a barely adequate compromise that fails to come close to being usable on it's own (unless web developers deliberately design a radically different "intended for audio" version of their site, that shares nothing in common with the "intended for video" version other than the database backend)?

Of course for blind people, you really don't want HTML at all - you want something designed for complete control over audio (both sound and speech synthesis), including timing, volume, position, etc; and you want web developers to design sites specifically for audio (including site navigation, etc) instead of trying to make something intended for visual content delivery (and primarily used for visual content delivery) work in a "half-assed, almost better than nothing" way.

Better standards support for screen-readers would be welcome, but tags like em and other 'crud' are very useful for rendering HTML in Braille or have it read by screen-readers. Compared to Word or PDF, HTML (when written with semantic tags) is far superior for the blind and visually impared.

For HTML, the content itself is just data from "somewhere" (typically a database); and the job of a web developer is to design and control the presentation of that content. Idiotic crud like CSS and "semantic" tags (even old stuff, like the "em" tag) just make it harder for web developers to do their job well (as they can't really say how any specific browser might render it), and push developers towards using things like Flash as a way of controlling presentation.

Semantic markup and separation of presentation from content allows you to use less markup in your server side code -- it allows you to reskin the entire page without once touching whats making the markup -- it usually ends up less code overall.

How is that "making developers work harder" -- or are you one of those who's worshiping at the feet of the dipshit photoshop jockeys who think they know ANYTHING about accessibility, maintainability, or even what's practical to implement on a website in the first place?

Your entire post reeks of failing to understand the technologies you're running your mouth about! But then, I could say the same thing about the people who coded the latest iterations of Hotmail, Yahoo's entire site, or Google Search...

Type of statements I'd expect from someone white-space stripping to hide bad coding practices, still using tables for layout, failing to put a doctype on there so it has to hack around IE being in quirks mode, line-breaks instead of paragraphs, non-breaking spaces and line-breaks to do padding's job, tables for NOTHING (worse than for layout!) and inlining ALL their presentation so they can't even leverage caching models across pages.

Much less a lack of headings, lists or anything else to make things like search engines and screen readers treat a page as anything more than one giant run-on paragraph. Author meta after HEAD is closed, multiple instances of closing head and opening body, inlined style attributes, invalid inlined styles... Triple-nested BIG tag doing H1's job... You know, 2.5k doing 1.1k's job?