Monthly Archives: October 2006

Earlier today, Joe
Clark responded to Tim Berners-Lee’s announcement of the W3C’s
new plan for HTML. Joe has argued that fixing HTML is not
as important as accessibility, and that the WCAG working
group is in much more serious trouble than the HTML working group. For this,
he seems to be criticizing the W3C’s decision.

He has also criticised the WHATWG and their work on HTML 5, claiming that
they’re making the same mistakes as the W3C, and raised several issues and suggestions
for improving HTML.

Working Groups

I’m not disputing his claims about the problems with the WCAG working group
or the WCAG 2.0 guidelines. In fact, I agree with him in that regard. But
to imply that that the W3C doesn’t need to do anything about (X)HTML or the
HTML working group is misguided. Something desperately needs to be done with
each of the HTML, XForms and WCAG working groups; they’re all in serious trouble
and their problems need to be fixed.

The W3C’s decision to deal with the HTML working group is at least a step
in the right direction. Whether the chosen path is right or wrong is
yet to be seen and I’m trying to reserve judgment until more information
is available. But I promise, as soon as
the
information I’ve asked for becomes
available, I’ll review it thoroughly.

The Problems with HTML

HTML is a topic of interest. But it isn’t an outright fiasco. HTML, in large
part, works fine right now.

It is true that HTML 4.01 itself isn’t a total fiasco. In many ways, it works.
We’ve been using it for years without too many problems and will continue to
do so for many years to come. However, there are many problems and limitations
with it that need to be addressed.

There are also many problems with the W3C’s HTML working group, who have been
ignoring the needs of real world developers and pushing ahead with a
language that is doomed to failure.
They’ve not only completely ignored
backwards compatibility issues, but have failed to adequately address many
other issues that have been raised.

The Process

…which means that, with the W3C’s glacial processes, we’ll have a spec document
to look at in 2010 and actual browser support in 2012.

Yes, it is true that specs do take a long time to write and a long time to
implement, but there are reasons for that. The process involves many steps,
designed to help ensure the quality of the final specification. But, due to
the process, actual browser implementation must occur before the spec
can be finished, not years afterwards.

HTML has samp, var, and kbd. I use all of them and I am pretty much the
only one who does.

How is HTML 5 repeating the same mistakes? HTML 5 did not introduce code,
kbd, samp and var, they have been retained from HTML 4
and there is no reason to remove them.
HTML 5 has not yet introduced any more elements specifically for computer
science and mathematics. In fact, so far, proposals specifically for mathematics
have, at this stage, been rejected mostly due to backwards compatibility
and complexity issues.

The following is a list of all of the elements introduced in the current drafts
of HTML 5 (I hope I didn’t miss any).

Of all of those new additions, which ones specifically fall into
the category of computer science and/or mathematics?

But, true to member biases, “HTML5” bans the use of dl–dt/dd for dialogue,
a usage permitted by the HTML spec and in wide use by intelligent developers
like me who have to mark up documents unrelated to computer science. (They’d
prefer you use a thicket of blockquotes and cites. And, presumably, nullify
all the indention and italicization that browsers will do by default.)

This is a complex issue. Many people have argued in the past that definition
lists are strictly for marking up definitions, and that the description about
using it to
markup dialogue given in the HTML 4.01 spec is a mistake.
But, in my view, such arguments are based mostly on the name of element,
rather than its actual definition. The definition list, which I think should
be called a description list (or association list), has proven much more
useful in the real world as a generic structure for many kinds of name/value,
term/description, or key/value pairs, and reserving it strictly for definitions
is not very practical.

This is a classic problem in HTML development: The people doing the work
are geeks with computer-science interests who do not understand, for example,
newspapers, or screenplays, or, really, print publishing in general. In some
obscure way, they disdain print publishing, as the Web is not print. Indeed
it isn’t, but print has structures the Web doesn’t, and it doesn’t have them
because people like these refuse to acknowledge they exist or simply refuse
to consider them.

I am not going to disagree with that but, Joe, as you clearly do know about
such things, why don’t you get
involved? If you, or anyone else presents use
cases and other real world evidence that supports the need for something to
be added, and it can be shown that existing markup is inadequate, we can develop
something to solve the problem together.

This attitude – still present in WHAT WG, though it is separate and was formed
later – can be summed up as “Until we decide you are using our computer-science
tags adequately, we won’t even consider the semantic needs of your documents.”

The WHATWG is not just trying to solve problems for marking up computer-science
documents. What we need is to document use cases and other evidence to show
that something will be useful. We don’t need to see people using non-existent
markup before we’ll consider it. We need to look at what people are using
and the kind of content they are publishing to see where the limitations lie
with existing markup. There is no point introducing new markup if existing
markup is already suitable or if people aren’t going to use it anyway, so we
need evidence that they will.

Markup Suggestions

For “HTML5” and the new HTML variants, why can’t we just adopt what’s already
been done in other namespaces, like the Text Encoding Initiative and tagged
PDF? Yes, I really mean the latter.

We can. We just need to document things like use cases to show how such markup
would be used, real world example content (from any media) for which it would
be suitable, how authors are already marking up such content, the limitations
of existing markup and an explanation of why new markup would be useful to authors
and what benefits it provides to users.

We assuredly could use elements from tagged PDF like:

annotation

That’s a reasonable suggestion. Why is existing markup inadequate for this?
What benefit would it provide to authors and users?

note and reference for footnotes, endnotes, and sidenotes (not aside in
“HTML5”)

There are many examples of footnotes used on the web, such as Wikipedia for
instance, and I do believe it would be a valuable addition. The difficulty is
in working out the best way to mark it up.

As for notes, the XHTML Role Attribute
module already has a note value for
such things. Whether or not the role attribute will be included in HTML
5 is not yet clear. It’s been discussed before, but I can’t recall if the issue
has been resolved or not.

A large-scale division of a document. This type of element is appropriate
for grouping articles or sections.

It’s not clear to me when I would use it or why it would be useful to do so.
Could you provide a use case?

caption generically applicable to tables and figures

This has been discussed before and the issue is still open. The major problem
is related to backwards compatibility. Unfortunately, it’s not as simple as
just using the caption element because when a caption element occurs outside
of table markup, current browsers do not include the element itself within the
DOM.

There is also the issue of how to associate the caption with the figure.
Since, unlike table, img is an empty element, so the caption can’t be included
within it. It also can’t be included inside the object element, because it
would be considered fallback content and not visible in current browsers.

bibliographies, tables of contents, and indices (some in “HTML5”)

For tables of contents, isn’t existing list markup good enough? Would it
be beneficial to explicitly mark the content as the TOC? Could the role attribute
address this problem?

I don’t know much about bibliographies and indices, so no comment

nonstruct for generic groupings

Why is this useful? I don’t understand how it is different from div, the
definition given in the PDF reference was not clear to me.

A grouping element having no inherent structural significance; it serves
solely for grouping purposes. This type of element differs from a division
(structure type Div; see above) in that it is not interpreted or exported to
other document formats; however, its descendants are to be processed normally.

Could you provide a use case?

formula

Similar concepts have been discussed before. As far as I know, the issue
is still open. But doesn’t that fit into the category of science and mathematics
that you had issues with earlier?

Proposed Fixes

Nonetheless, aren’t the easiest fixes those that would make many nominally
invalid documents valid and help accessibility?

Ban tables for layout.

This will no doubt be done when the table section is written. The last I
heard about this was that it’s scheduled for later this year or early next year.

Allow fragment identifiers to start with any ASCII character, not just
a letter. Suddenly hundreds of millions of Blogger comment URLs become valid.

That was a limitation of the SGML heritage of HTML, which has unfortunately
been carried over into XML as a validity constraint for attributes of type
ID. Note that HTML 5 is no longer considered an application of SGML, it
has its own syntax requirements, but XHTML 5 is based on XML.

However, regardless
of the validity constraint (as I understand it), (X)HTML 5 effectively dispenses
with DTD based validity, in favour of much more rigorous conformance requirements
and there is no mention of what constitutes a valid ID attribute. I believe,
as long as it’s well-formed and doesn’t contain whitespace, it will be considered
conforming (though I’d need to confirm that).

Give us actual rowgroups (not just tbody) along with colgroups in tables
and maybe browsers will begin to support both of them. (Table headers also
badly need fixing.)

Could you elaborate a little? What are the problems and limitations with
thead, tfoot and tbody? As far as I can tell, the HTML table row group model
is the same as that in Tagged PDF, so that didn’t give me any clues as to what
you mean.

Let us nest certain block-level elements in certain other ones right away,
à la XHTML 2. A p really should be able to contain an ol.

This is already allowed. However, there are backwards compatibility issues
with making the DOM match in HTML (not XHTML). If browsers were to suddenly allow such elements
to appear within p elements in the DOM, it could potentially break millions
of pages. However, this is not an issue in XHTML 5, it is already explicitly allowed.

Make embed legal. Give it up, people: object doesn’t work and never will.

This already planned to be introduced, it just hasn’t made it into the spec
yet.

Give us back dir and menu. They used to be in HTML before the W3C decided
that CERN physics papers never need directories and menus.

The menu element has already been brought back. What’s the use case
for dir? The HTML 2.0, 3.2 and 4.01 specs are incredibly vague on this issue
and seem to only indicate presentational differences, which aren’t even visible
in current browsers. How is it different from ul? What problem does it solve?
What benefits does it provide?

Exactly one year ago from this day, I published part
1 in a series of articles about
CSS cascading and inheritance. However, due to various factors (mostly laziness),
the sequel never got published… Until now! Today, I’m going to take a break
from the XBL series of articles (which will resume in a day or so) and
finally publish the long awaited conclusion to this series. If you haven’t
read part 1 (or even if did a long time ago), I suggest you do so now
before continuing.

Following on from the first article in which we looked at how to find all
the style declarations that applied to each element, we’re going to show how
these are sorted by order of precedence, to determine which ones are applied
to the element.

Sorting

Steps 2, 3 and 4 of the algorithm deal with sorting the declarations into
the order of precedence. From the exercise in the part 1, we were left with
4 rule sets which applied to the p element in the sample document. For the
purpose of this exercise, I’m just going to add a few more declarations and
annotate them with their origin.

If we discard the selectors for now (they’ll be need again in step
3 below), we’re
left with a list of declarations. A declaration is a property and its associated
value. We can then proceed to sort them into the order specified.

This step is important for cases where two declarations for the same property
have the same importance and origin. In this example, this occurs for both
user normal declarations (two text-align declarations) and author
normal declarations (two line-height declarations).

As you can see, both have the same selector (p), which has a specificy of
0,0,1. So, sorting by specificity in this case makes no difference. However,
for the author normal declarations, these are the rule sets involved:

These 2 rule sets each use different selectors which have different specificity.
The selector #content p has a specificity of 101. The selector p has a specificity
of 0,0,1. Since 1,0,1 is a higher specificity than 0,0,1, the former takes
precedence. So the order of author normal
declarations is changed to the following:

margin: .8em 0;

line-height: 1.4;

line-height: 1.2;

Step 4

The forth and final step of the sorting process involves sorting declarations
which have the same importance, origin and specificity by the order they are
specified in the CSS. This is where the order of the user
normal declarations from step 3 is resolved. In this example, given that I listed the declarations
in the order in which they appeared, no change needs to be made to the above
list.

In cases where there is more than one declaration for a property, the latter
declaration overwrites the former, which is effectively discarded. This leaves
the following list of declarations to be applied to element.

text-align: justify;

margin: .8em 0;

line-height: 1.4;

text-indent: 0;

background: blue none;

color: white;

This concludes the series about the cascade, but the related issue of inheritence
still needs to be addressed and I intend to do so at some point in the future.
However, I don’t expect that it will take another year before I do… But who
knows?