Nice, but a little reminder that the "See Example" links are still totally
broken without JS. Should be easy to fix. May want to just simply them all
shown by default when JS is off (and with the "See/Hide Example" links
hidden, since they'd be useless).
Also, the [your code here] link is really goofy. You're using JS to pop up a
dialog box with instructions. Just make it a n ordinary page link to a small
separate page, so:
1. People will be be able to select/copy/paste the text (esp the text
"digitalmars.D")
2. You can include links (for instance, to
http://digitalmars.com/NewsGroup.html )
3. Seriously, required JS just to show a few words of text? I know JS alert
boxes are trivial, but so are new pages.
I like the new homepage overall, but these little things (esp on a homepage)
can make a site seem very unprofessional.

Nice, but a little reminder that the "See Example" links are still totally
broken without JS. Should be easy to fix. May want to just simply them all
shown by default when JS is off (and with the "See/Hide Example" links
hidden, since they'd be useless).

Not being an expert, I open the floor to pull requests.

Also, the [your code here] link is really goofy. You're using JS to pop up a
dialog box with instructions. Just make it a n ordinary page link to a small
separate page, so:
1. People will be be able to select/copy/paste the text (esp the text
"digitalmars.D")
2. You can include links (for instance, to
http://digitalmars.com/NewsGroup.html )
3. Seriously, required JS just to show a few words of text? I know JS alert
boxes are trivial, but so are new pages.
I like the new homepage overall, but these little things (esp on a homepage)
can make a site seem very unprofessional.

There's so little content to display, I found the dialog most
appropriate. Of course that doesn't work for people without JS.
Andrei

Nice, but a little reminder that the "See Example" links are still
totally
broken without JS. Should be easy to fix. May want to just simply them
all
shown by default when JS is off (and with the "See/Hide Example" links
hidden, since they'd be useless).

Nice, but a little reminder that the "See Example" links are still
totally
broken without JS. Should be easy to fix. May want to just simply them
all
shown by default when JS is off (and with the "See/Hide Example" links
hidden, since they'd be useless).

http://goo.gl/pxtQE - the same link, just in case the above one gets
wrapped and doesn't work)
Basically, xHTML uses <foobar />, html uses <foobar> or
<foobar></foobar> depending on the tag.
The rest of the issues are with non-standard tags, eg <nobr> and <font>
I think.

HTML 5 says <foobar /> and <foobar> are equivalent and both valid for
elements that are not expected to have a closing tag. What happens in
practice, with any HTML version, is that browsers just ignore the "/".
Maybe using the HTML 5 doctype will make the validator happier
(although it might start to complain about other things):
<!DOCTYPE html>
--
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Why does it have an HTML 4.01 doctype but then go on to use XHTML syntax???
Stewart.

I wouldn't know. What needs to be done?

Are you saying someone else put that doctype there behind your back? Or that
you found it
on a lot of webpages and just copied it without any clue of what it means?
Since what you've obviously learned is a mishmash of old-fashioned HTML, modern
HTML and
XHTML, the first step would be to learn the differences between them. This
will get you
started:
http://www.w3schools.com/html/html_xhtml.asp
My preference nowadays is to use XHTML Strict
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
though I may still use other doctypes while maintaining sites that have been
around for a
long time, like XHTML Transitional
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
The main difference between Strict and Transitional is that Transitional
includes legacy
presentational elements and attributes whereas Strict doesn't. The proper way
to do
things nowadays is to write the content and structure in (X)HTML and use CSS
for the
presentation.
But whatever I try to validate it as, there are errors.
A good plan would be to change it to XHTML 1.0 Transitional and fix the handful
of
validation errors that still show up. I can help you to do this. Once you've
got your
head around these, you could consider migrating to XHTML 1.0 Strict.
Stewart.

Does validation make any positive difference at all?
I used to do it, but it prohibits things that are useful
and work fine in practice* without offering much, if anything,
in return.
I've found that checking for well-formedness - that your tags
are closed, attributes quoted, and entities encoded - is worth
it, but the structure imposed by the doctype offers almost no
help.
* An example being custom attributes. The html5 validator will
allow some of them, but the other ones won't.

Does validation make any positive difference at all?
I used to do it, but it prohibits things that are useful
and work fine in practice* without offering much, if anything,
in return.
I've found that checking for well-formedness - that your tags
are closed, attributes quoted, and entities encoded - is worth
it, but the structure imposed by the doctype offers almost no
help.
* An example being custom attributes. The html5 validator will
allow some of them, but the other ones won't.

I've come to figure the same thing. I like to keep things as compliant as I
can within reason, but I've been wondering more and more how strict it's
really worthwhile to be.
For example, I have an articles section on my site that (currently) uses
TangoCMS. I neither know nor care what doctype TangoCMS is sending out (and
I have even less interest in mucking with it's internals to change it), and
yet when I want to bold or italicize something in a post, I've started going
back to <b> and <i>. Why?
A. They're not as insanely verbose as <span style="font-weight: bold;
font-style: italic"> (And they're much, MUCH easier to remember: Is it
"text" or "font"? Is it "-bold", "-weight", "-italic", "-style", "-slant",
"-skew" or some nonstandard made-up term like "text?/font?-decoration:
line-through"? Who has *ever* called that anything but strikeout?). Or even
<span class="bold"> (And even at that I'd have to go out of my way to go
grab and mess with the CSS files).
B. It works. On everything.
C. Let's face it, it's always going to work.
D. I don't give a rat's ass about the purity of HTML "content" vs style.
(X)HTML *is* presentation if you ask me: the content is in the model, and
that's stored in a DB, not XML files. And it all renders the same anyway.
And seriously, who's going to be applying a custom stylesheet to my pages?
(and if they do, they can just change the defined styles for b{} and i{}).
E. The W3C can kiss my ass ;)

And seriously, who's going to be applying a custom stylesheet
to my pages?

My work D project recently brought on a new designer. Unlike
the old designer who would mock it up and send me a picture,
the new guy actually edits the site himself.
But, he doesn't have access to all the html, and what he does
have access to, I don't want him to edit anyway.
He restyles the whole site through css; applying a custom
stylesheet to my page. (He, and the boss, fought this for
a while, but I think they are seeing the benefits of my
approach since I pushed back on it.)
It works quite well - the html I output doesn't have to change
for each client, so adding new content doesn't require any
repetition. We just do another stylesheet adjustment. The
same html can be dropped in many places too.
When they wanted the news added to the sidebar, I just said
document.requireElementById("sidebar-new-holder").appendChild(
getNews().toHtmlElement());
and the sidebar css adjusted it; I didn't have to write new
queries or templates. Very convenient.
The important thing though is to make sure the html describes
the data well. Once you put in any kind of presentation in there,
you break this approach.
class="red" no no, what if it's a blue theme?
class="brand-name" there we go.
class="grid" no no, what if we want it in a linear column?
class="news-item" there we go.
and so on; the html describes the data in as much detail as is
reasonable and the css makes the rest work.

The important thing though is to make sure the html describes
the data well. Once you put in any kind of presentation in there,
you break this approach.
class="red" no no, what if it's a blue theme?
class="brand-name" there we go.
class="grid" no no, what if we want it in a linear column?
class="news-item" there we go.
and so on; the html describes the data in as much detail as is
reasonable and the css makes the rest work.

In many cases I do agree, but there are some problems:
A. While CSS is acceptable for styling (though I would change some things),
it's pure shit for layouts.
CSS3 doesn't change my mind on that. And beyond that, I have zero faith in
W3C's ability to ever contrive "(X)HTML and CSS" into a real "model and
view". No matter how much we *want* (X)HTML to be purely data-description,
it just *isn't* and likely never will be.
B. When you're talking about *inside* an article or posting, etc, all of
that *is* the content. If the author intends something to be bold, italic,
red, green, blue, whatever, then they should be able to specify it as such
and not some vague psuedo-equivalent like "emphasis", "comment", "string
literal", etc, that may or may not exist in the site's CSS and may or may
not always even be what the author really wanted anyway.
C. You may be operating with a workflow where the web designer is CSS-only,
but that's not always the case, and I think reasonable argments can be made
for doing it differently (point "A" above, for example).

A. While CSS is acceptable for styling (though I would change
some things), it's pure shit for layouts.

I wouldn't say that, completely. I do use a html template,
but only for the outer layouts; it's a frame of sorts that
I can put content into.
The content itself gets by pretty ok with css as long
as you group it. Sometimes the order of appearance matters
(ugh ugh ugh, I hate css float especially) but it's not
bad most the time. Not perfect, but it gets the job done.

B. When you're talking about *inside* an article or posting,
etc, all of that *is* the content.

Yes, for the most part. I'd still say specify only what
needs to be specified for the content, so then it fits
in better with the user's environment. (In this case,
it's perverted in that the user is the web designer, but
it's the same idea.)
So <i>is cool</i>, but <body color=""> is probably bad.

C. You may be operating with a workflow where the web designer
is CSS-only, but that's not always the case, and I think
reasonable argments can be made for doing it differently (point
"A" above, for example).

Yeah, that's why I do the hybrid thing, but when they tried to
make the arguments to get more access to the html, I reject them
and so far I think I'm right. (The designer hasn't actually needed
to edit any of the html files I gave him access too!)

C. You may be operating with a workflow where the web designer is
CSS-only, but that's not always the case, and I think reasonable argments
can be made for doing it differently (point "A" above, for example).

Yeah, that's why I do the hybrid thing, but when they tried to
make the arguments to get more access to the html, I reject them
and so far I think I'm right. (The designer hasn't actually needed
to edit any of the html files I gave him access too!)

It's an interesting problem that's not too dissimilar to what game
developers have wrestled with the last so many years: Programmable rendering
piplines (pixel shaders) allow phenomenal control over how a surface
looks...and that's clearly within the realm of "artist"...but it's actually
done with code: the programmer's realm. Artist's can't code. Coder's can't
draw. So who does it? And how? A combination artist/coder? Where do you find
them? And if you do, how can hope to you afford them?
They've found ways to get by, and have gotten better at dealing with the
issue, but it was never an easy problem and it's still not entirely solved.
The problem here is very similar. There are designers who come up with the
"look", but their medium is code-based ((X)HTML and CSS), and it's tightly
integrated with the full-on code of the programmer's realm. (Personally, I
think the real solution here is abandon (X)HTML/CSS in favor of some unified
thing that's actually *designed* to be a real presentation layer, and not a
hacked up document format, and then create RAD-style editors based on it.)

For example, I have an articles section on my site that (currently) uses
TangoCMS. I neither know nor care what doctype TangoCMS is sending out (and
I have even less interest in mucking with it's internals to change it), and
yet when I want to bold or italicize something in a post, I've started going
back to<b> and<i>. Why?
A. They're not as insanely verbose as<span style="font-weight: bold;
font-style: italic">

<snip>
But you shouldn't be using <span style="font-weight: bold; font-style: italic">
anyway.
You should be looking at what the boldness or italicness _means_, and either
using the
appropriate semantic HTML element or (if one doesn't exist) defining a CSS
class named
after this semantic.
This is also about making code self-documenting.
Stewart.

For example, I have an articles section on my site that (currently) uses
TangoCMS. I neither know nor care what doctype TangoCMS is sending out
(and
I have even less interest in mucking with it's internals to change it),
and
yet when I want to bold or italicize something in a post, I've started
going
back to<b> and<i>. Why?
A. They're not as insanely verbose as<span style="font-weight: bold;
font-style: italic">

<snip>
But you shouldn't be using <span style="font-weight: bold; font-style:
italic"> anyway. You should be looking at what the boldness or italicness
_means_, and either using the appropriate semantic HTML element or (if one
doesn't exist) defining a CSS class named after this semantic.
This is also about making code self-documenting.

If it's actually part of some <span class="concept">ui element</span>, or
<span class="concept">widget</span>, or some <span class="concept">standard
recurring concept</span>, etc, then yes, I would agree in that case, <span
class="person">Stewart</span>.
But if it's <i>just</i> ordinary text that simply needs to be <b>bolded</b>
or <i>italicized</i>, then handling it in any roundabout way like that is
just <i>ridiculous</i> (and "self-documenting" would be completely
inapplicable).
In such a situation, replacing hardcoded bold or italic with some vague
concept of "emphasis" (old-school example: the <em> tag) or
"extra-emphasis", etc, is not only a useless abstraction merely for the sake
of abstraction, it <b><i>can</i></b> subtly change meaning/interpretation of
the actual <i>content</i> because only the <i>author</i>, not the stylist,
is able to look at the final result and know whether the result
<b><i>correctly</i></b> depicts the amount/type of emphasis intended.
Additionally, how does the stylist know if a given styling is going to cause
too much visual noise? Or be too visually monotone? They <i>can't</i>,
because it's <i>completely</i> dependent on the text that the
<b><i>author</i></b> writes. It might be too much visual stuff for one
article and just right for another. Only the text's author can know what's
appropriate, not the stylesheet.

But if it's<i>just</i> ordinary text that simply needs to be<b>bolded</b>
or<i>italicized</i>, then handling it in any roundabout way like that is
just<i>ridiculous</i> (and "self-documenting" would be completely
inapplicable).

You miss the point - why would you need to bold or italicise "ordinary text"?
If the
point is to illustrate what bold looks like, or what italics look like, _then_
it might
make sense to use presentational markup....

In such a situation, replacing hardcoded bold or italic with some vague
concept of "emphasis" (old-school example: the<em> tag)

or
"extra-emphasis", etc, is not only a useless abstraction merely for the sake
of abstraction, it<b><i>can</i></b> subtly change meaning/interpretation of
the actual<i>content</i> because only the<i>author</i>, not the stylist,
is able to look at the final result and know whether the result
<b><i>correctly</i></b> depicts the amount/type of emphasis intended.

It seems to me that the essence of what you're saying is that the choice of
<em> and
<strong> is too coarse-grained for your purposes. I'm not sure how best to
deal with this
either. Moreover, what markup are you going to use so that it
looks/sounds/feels right in
non-graphical browsers?

Additionally, how does the stylist know if a given styling is going to cause
too much visual noise? Or be too visually monotone? They<i>can't</i>,
because it's<i>completely</i> dependent on the text that the
<b><i>author</i></b> writes. It might be too much visual stuff for one
article and just right for another. Only the text's author can know what's
appropriate, not the stylesheet.

If the author is overusing emphasis, manually setting font weights and stuff to
compensate
seems to me to be trying to fix the wrong problem.
Stewart.

But if it's<i>just</i> ordinary text that simply needs to
be<b>bolded</b>
or<i>italicized</i>, then handling it in any roundabout way like that is
just<i>ridiculous</i> (and "self-documenting" would be completely
inapplicable).

You miss the point - why would you need to bold or italicise "ordinary
text"?

To be clear, I didn't mean that as in "plaintext"...if that's what you
meant...? I meant like the examples in that paragraph (not all of which were
literal examples of bold/italic).

If the point is to illustrate what bold looks like, or what italics look
like, _then_ it might make sense to use presentational markup....

Only "might"? ;)

In such a situation, replacing hardcoded bold or italic with some vague
concept of "emphasis" (old-school example: the<em> tag)

Ok. It was a dedicated HTML tag instead of a span/div with class attribute.
Seems like most of those are non-kosher these days.

or
"extra-emphasis", etc, is not only a useless abstraction merely for the
sake
of abstraction, it<b><i>can</i></b> subtly change meaning/interpretation
of
the actual<i>content</i> because only the<i>author</i>, not the stylist,
is able to look at the final result and know whether the result
<b><i>correctly</i></b> depicts the amount/type of emphasis intended.

It seems to me that the essence of what you're saying is that the choice
of <em> and <strong> is too coarse-grained for your purposes.

Yes. Well, too vague, really.

I'm not sure how best to deal with this either.

It's easy to deal with: You just say "Fuck dat 'purity' booshit, I'm usin'
<b> and <i>!!" :)
And as far as inferring semantic meaning, I think it's pretty obvious that
<b> and <i> imply "this text is emphasised". (Not that I can imagine any
realistic use for being able to identify what text is emphasised.)

Moreover, what markup are you going to use so that it looks/sounds/feels
right in non-graphical browsers?

Non-graphical browsers are going to result in a *lot* of difference from the
original style/layout anyway. There's a lot of stuff that's going to be
wrong. If you're using one, it's just understood that you're merely viewing
an approximation.

Additionally, how does the stylist know if a given styling is going to
cause
too much visual noise? Or be too visually monotone? They<i>can't</i>,
because it's<i>completely</i> dependent on the text that the
<b><i>author</i></b> writes. It might be too much visual stuff for one
article and just right for another. Only the text's author can know
what's
appropriate, not the stylesheet.

If the author is overusing emphasis, manually setting font weights and
stuff to compensate seems to me to be trying to fix the wrong problem.

Not necessarily. Imagine a paragraph that uses a fair amount of italic, but
not quite an overuse of italic, so it still looks fine. If that's done with,
say <em>, and the stylist changes <em> from italic to either bold or
bold+italic, it's suddenly going to look like shit. It'll *become* an
overuse, and the only way for the stylist to fix it is to just let the
author choose bold/italic/etc on their own.
Maybe I'm just atypical as an author, but when I write something and use
emphasis, I take into account things like bold/italic and how it'll look
when I decide what to emphasise, how, and how much. If I *do* use things
like <em>, I inevitably end up choosing them based *not* on "level of
emphasis" but on whether they end up being bold/italic/underline/etc...Which
obviously defeats the whole damn point of <em>, etc. I'd be surprised if
most people do it any different from that. Heck, I almost always end up
changing my emphasis/bold/italic/etc after writing+previewing it because it
never looks right until I've tweaked it *taking into account* the final
presentation. Honestly, I can't imagine how anyone could do it effectively
without having direct control over such things (even if it's by abusing
levels of emphasis as euphamisms for more specific stylings). I think
there's good reason wiki markups invariably have syntax for "bold" and
"italic" rather than "emphasis".
There's two basic problems with the idealistic separation of presentation
from content:
1. (X)HTML and CSS are just simply not very good as "(X)HTML is content" and
"CSS is presentation". You can get by in *some* cases, but in general
they're just poorly suited for it. I think that *part* of the problem may be
that it's like ColdFusion: A mediocre Model and a mediocre View hooked
directly together with basically no Controller.
2. Content and presentation *are not always separable*. There *is*
interplay. And this makes a strict and complete separation of content and
presentation nothing more than yet another example in programming's long
history of idealistic dreams (like Java's "everything must be OO" purity,
Haskell's "everything must be functional" purity, etc.) As always, puritism
sucks and needs to tempered with pragmatism.

Ok. It was a dedicated HTML tag instead of a span/div with class attribute.
Seems like most of those are non-kosher these days.

<snip>
I think half these tags just fell out of fashion when somebody invented the
likes of <b>
and <i>. It was probably for a combination of reasons:
- fewer characters to type
- it's just one tag to remember for bold, and one for italics, rather than lots
of
different ones for emphasis, terms being defined, book titles, addresses,
variables in
mathematical expressions, biological taxa, foreign words/phrases, etc.
- no discrete set of semantic elements can be sure of covering _all_ possible
things bold
and italics may be used to denote.
- people wanted, in a time before CSS, to be able to "force" certain rendering,
as opposed
to the potentially application-dependent rendering of semantic elements.
And so it stuck. It's perhaps as a concession to these that HTML 4.01 and
XHTML 1.0 have
kept <b> and <i> even in strict mode.
I recall reading somewhere that in HTML5, they are redefined along the lines of
"stuff
that is typically printed in bold" and "stuff that is typically printed in
italics". But
just looking at the current working draft:
http://www.w3.org/TR/html5/text-level-semantics.html#the-i-element
"The i element represents a span of text in an alternate voice or mood, or
otherwise
offset from the normal prose in a manner indicating a different quality of
text, such as a
taxonomic designation, a technical term, an idiomatic phrase from another
language, a
thought, or a ship name in Western texts."
"The b element represents a span of text to which attention is being drawn for
utilitarian
purposes without conveying any extra importance and with no implication of an
alternate
voice or mood, such as key words in a document abstract, product names in a
review,
actionable words in interactive text-driven software, or an article lede."
Stewart.

Yes:
- it's a useful step in diagnosing problems with a webpage
- it helps with cross-browser compatibility
- it helps syntax-highlighting and code-folding editors
- it enables code-manipulation tools to work correctly
- it's good for your public image

I used to do it, but it prohibits things that are useful
and work fine in practice* without offering much, if anything,
in return.

What are these "things that are useful" to which you refer?
<snip>

* An example being custom attributes. The html5 validator will
allow some of them, but the other ones won't.

Are custom attributes distinguished from standard attributes in some way, in
order to keep
attributes invented by different browser manufacturers from clashing with each
other and
with attributes that become part of a later HTML standard?
Stewart.

I'll agree that some of the validator's things help
with that, but not all of it.
If you write <a href="#"><div>block in inline</div></a>,
the validator will reject it, but it works... and that's
a useful thing when doing drag+drop applications (since
older IE doesn't let you drag other elements).
Though, I can see your point with cross-browser
compatiblity, to an extent, as that code sometimes (not
always... it can change across refreshes of the same
page...) brings out bugs in Firefox 3.6.
Anyway, though, the specific doctype still isn't terribly
important, since, in practice, tools tend to ignore it
anyway. Browsers see it's presence as an on/off switch
with standards compliance mode vs quirks mode; declaring
the wrong one doesn't break anything. (Indeed, HTML5
has agreed to use the common, previously wrong, shorthand
of <!DOCTYPE html> as the new standard!)
Stuff like improperly closed tags or bad entity
encoding can break, but that's pretty well independent
of doctype validation. That's simply a matter of the
document being well-formed.

What are these "things that are useful" to which you refer?

There's the drag and drop issue from above, the custom
attributes thing from below, and sometimes, using certain tags
or generally accepted shorthand. (For instance, <script>
used to require a type, but it worked without it anyway.
Again, the html5 folks decided to adjust the standard to
fit the practice - something I actually like about them. This
is a really minor thing, though.)

Are custom attributes distinguished from standard attributes in
some way, in order to keep attributes invented by different
browser manufacturers from clashing with each other and with
attributes that become part of a later HTML standard?

They are in html5 - the data- prefix is allowed and reserved
for the user. In older versions of the html standard, there
weren't allowed at all, whether prefixed or not. (They did
work in practice, though.)
I like custom attributes a lot, since they add a richness
that Javascript (and CSS too) can exploit in interactive
pages.
They're the main thing I miss if I validate with one of the
other DTDs.

I'll agree that some of the validator's things help
with that, but not all of it.
If you write <a href="#"><div>block in inline</div></a>,
the validator will reject it, but it works... and that's
a useful thing when doing drag+drop applications (since
older IE doesn't let you drag other elements).

What built-in support does HTML/JS/CSS have for dragging of elements? I always
understood
that it had to be explicitly implemented in JS in terms of
onmousedown/onmousemove/onmouseup or something like that, and therefore cannot
in itself
be something that some browsers support and others don't.
Moreover, dummy hrefs are an abomination. Not just compatibility when JS is
disabled -
this link is also the one followed when you open a link in a new window/tab.
This
regularly bites me.
<snip>

Anyway, though, the specific doctype still isn't terribly
important, since, in practice, tools tend to ignore it
anyway. Browsers see it's presence as an on/off switch
with standards compliance mode vs quirks mode; declaring
the wrong one doesn't break anything. (Indeed, HTML5
has agreed to use the common, previously wrong, shorthand
of <!DOCTYPE html> as the new standard!)

Strange. I don't recall ever seeing <!DOCTYPE html> before HTML5 came along.
But I am made to wonder why. What will happen when HTML6 comes out? Or have
they decided
that validators are just going to update themselves to the new standard rather
than
keeping separate HTML5/HTML6 DTDs (or whatever the HTML5+ equivalent of a DTD
is)?
PNG's reason for not including a version number in the file is to avoid the
scenario where
a program knows only of PNG up to version 1.2, and rejects a file as being in
PNG 1.3 even
though all the critical chunks conform to the PNG 1.0 spec. See
http://www.libpng.org/pub/png/spec/1.2/PNG-Rationale.html#R.Chunk-naming-conventions
But I have trouble believing anybody would make a web browser that rejects HTML
files as
being in too new a version of HTML.

Stuff like improperly closed tags or bad entity
encoding can break, but that's pretty well independent
of doctype validation. That's simply a matter of the
document being well-formed.

No, because in order to determine whether it's well-formed, one must know
whether it's
meant to be in SGML-based HTML, HTML5 or XHTML.
<snip>

Are custom attributes distinguished from standard attributes in some way, in
order to
keep attributes invented by different browser manufacturers from clashing with
each
other and with attributes that become part of a later HTML standard?

They are in html5 - the data- prefix is allowed and reserved
for the user. In older versions of the html standard, there
weren't allowed at all, whether prefixed or not. (They did
work in practice, though.)

<snip>
So it's something that web authors can use to store custom data in an element
for
scripting purposes, but browsers aren't supposed to have any built-in handling
of them?
Stewart.

http://dev.w3.org/html5/spec/dnd.html
It started as an IE5 feature, and is now being expanded
to everyone else. In the old IE, it only worked on some
text, links, and images, but the new standard says you
can set it on whatever you want. Still, if you want to
support them all, <a> is the way to do it.

Moreover, dummy hrefs are an abomination. Not just
compatibility when JS is disabled - this link is also the one
followed when you open a link in a new window/tab. This
regularly bites me.

Yea, I hate them too. In practice, I try to put them somewhere
useful, if I can. (In my app with the drag drop, the links lead
to the contact profile page; this is a mailing list/CRM app.)

But I am made to wonder why. What will happen when HTML6 comes
out?

I guess the idea is there won't be a html6; instead they'll just
keep <s>breaking</s> evolving the current thing and expect
everyone to keep up.

No, because in order to determine whether it's well-formed, one
must know whether it's meant to be in SGML-based HTML, HTML5 or
XHTML.

Meh, it works anyway. One reason is websites tend to be so poorly
written that if you tried to be strict, you'd just break most of
them!
Anyway, this said, if dpl.org wanted to validate, I don't think
it'd be a *bad* thing. (I'd say go with xhtml; I feel dirty
saying this, but I almost.... like..... xml for this kind of
thing.)

So it's something that web authors can use to store custom data
in an element for scripting purposes, but browsers aren't
supposed to have any built-in handling of them?

Right. You aren't even supposed to use them with other third
party tools; the idea is that area is completely open for the
page author and his scripts to do with as he pleases.

No, because in order to determine whether it's well-formed, one must know
whether it's meant to be in SGML-based HTML, HTML5 or XHTML.

Meh, it works anyway. One reason is websites tend to be so poorly
written that if you tried to be strict, you'd just break most of
them!
Anyway, this said, if dpl.org wanted to validate, I don't think
it'd be a *bad* thing. (I'd say go with xhtml; I feel dirty
saying this, but I almost.... like..... xml for this kind of
thing.)

Yea, HTML looks, acts and feels like XML so it may as well actually *be*
XML. Plus, tranformations to/from HTML is one of the main reasons for XML
anyway. So they *should* be compatible.
('Course there's *technically* SGML too, but honestly, HTML is the only
reason anyone's ever cared about or even known about SGML. It may as well
not exist.)

Strange. I don't recall ever seeing <!DOCTYPE html> before HTML5 came along.
But I am made to wonder why. What will happen when HTML6 comes out?
Or have they decided that validators are just going to update
themselves to the new standard rather than keeping separate HTML5/HTML6
DTDs (or whatever the HTML5+ equivalent of a DTD is)?

Thing is, if they could have removed the doctype completely they would
have done so. The doctype doesn't tell anything meaningful to a
browser, except that today's browser use the presence of a doctype to
switch between a quirk mode and a standard mode. <!DOCTYPE html> was
the shortest thing that'd make every browser use standard mode.
The problem was that forcing everyone to specify either one or another
HTML version is just a exercise in pointlessness. Most people get the
doctype wrong, either initially or over time when someone updated the
site to add some new content. If you're interested in validating your
web page, likely you'll know which version you want to validate against
and you can tell the validator.

Stuff like improperly closed tags or bad entity
encoding can break, but that's pretty well independent
of doctype validation. That's simply a matter of the
document being well-formed.

No, because in order to determine whether it's well-formed, one must
know whether it's meant to be in SGML-based HTML, HTML5 or XHTML.

Perhaps for it matters for validation if you don't say which spec to
validate against, but validating against a spec doesn't always reflect
reality either. There is no SGML-based-HTML-compliant parser used by a
browser out there. Browsers have two parsers: one for HTML and one for
XML (and sometime the HTML parser behaves slightly differently in quirk
mode, but that's not part of any spec).
And whether a browser uses the HTML or the XML parser has nothing to do
with the doctype at the top of the file: it depends on the MIME types
given in the Content-Type HTTP header or the file extension if it is a
local file. HTML 5 doesn't change that.
Almost all web pages declared as XHTML out there are actually parsed
using the HTML parser because they are served with the text/html
content type and not application/xhtml+xml. A lot of them are not well
formed XML and wouldn't be viewable anyway if parsed according to their
doctype.
--
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

No, because in order to determine whether it's well-formed, one must know
whether it's
meant to be in SGML-based HTML, HTML5 or XHTML.

Perhaps for it matters for validation if you don't say which spec to validate
against, but
validating against a spec doesn't always reflect reality either. There is no
SGML-based-HTML-compliant parser used by a browser out there. Browsers have
two parsers:
one for HTML and one for XML (and sometime the HTML parser behaves slightly
differently in
quirk mode, but that's not part of any spec).

But there is a subset of HTML that is likely to be parsed correctly by
browsers' HTML
parsers, and this subset is all the HTML you're likely to need to use most of
the time.
On the other hand, the interpretation of tag soup is undefined and liable to
vary from
browser to browser. So validation certainly helps you out here.

And whether a browser uses the HTML or the XML parser has nothing to do with
the doctype
at the top of the file: it depends on the MIME types given in the Content-Type
HTTP header
or the file extension if it is a local file. HTML 5 doesn't change that.
Almost all web pages declared as XHTML out there are actually parsed using the
HTML parser
because they are served with the text/html content type and not
application/xhtml+xml. A
lot of them are not well formed XML and wouldn't be viewable anyway if parsed
according to
their doctype.

But does any pre-HTML5 spec stipulate that HTML parsers accept tag soup in the
first
place? ISTM this is all down to a tendency of browser/engine authors to
implement
fallback for malformed HTML but not for malformed XML.
Stewart.