Tagged as

Quoting the Quotes

The article describes a client-side workaround using JScript for the missing functionality in Microsoft Internet Explorer to add language-dependant quotation marks around, in particular [q], but also [blockquote] HTML elements.

Introduction

This article describes a client-side workaround using JScript for the missing
functionality in Microsoft Internet Explorer to add language-dependant quotation
marks around, in particular <q>, but also <blockquote> HTML
elements.

Motivation

It happens frequently in the daily life of a webmaster that you need to cite
an external source, that you need to write spoken dialogue, or any such related
activity. HTML 4.01 and because of
it, XHTML 1.0 defines the elements
<q>, <blockquote>, and <cite> to accomplish said tasks.

Visual user agents must ensure that the content of the Q element is rendered
with delimiting quotation marks. Authors should not put quotation marks at the
beginning and end of the content of a Q element.

User agents should render quotation marks in a language-sensitive manner (see
the lang attribute). Many languages adopt different quotation styles for outer
and inner (nested) quotations, which should be respected by
user-agents.

So, according to the specification the user agent, in our case Microsoft
Internet Explorer 6, should automatically add quotation marks before and after
the text, even language dependant, but alas, life isn't always as rosy as the
specifications picture it to be, because Microsoft Internet Explorer 6 doesn't
support this behaviour, at all. Not only does it ignore the language-dependance,
but it also ignores adding quotation marks.

There are, of course, ways to remedy this shortcoming, but they all require
some work, and they may not always be equally useful, and some may even cause
inconsistencies with other user agents, e.g. Mozilla and Opera. The list below
sums up a few different approaches.

Manually add quotation marks at the beginning and end of the
quote.This would interfere with user agents adhering to the
specification in that they will display two sets of quotation marks, the ones
given they will place around such elements in the specification, and the ones
that have been manually placed.

Perform the addition of quotes server-side based
on user agent info submitted to the server.This requires that the
hosting provider of your site supports a server-side scripting language of some
kind, for instance PHP, ASP.NET, or JSP. However, there are still
numerous hosting providers that do not provide such a solution, so while it may
be useful for those who have the means, it is not for everyone.

Perform the addition of quotes client-side using
a script.This is the solution I have opted for in this article. While it
does not work for every single user of Microsoft Internet Explorer (it does not
work if the user has turned off JScript support), it will work for everyone
else. What is more important it will work for everyone who hasn't actively made
changes to their Internet Explorer settings as JScript support is activated by
default.

Force Microsoft to remedy the
oversight.This is hardly a viable solution, even for a larger group of
people to do.

Locating the Hunting Grounds

Before we can start building the script we need to be aware of a few things
from the specification:

By default <q> should have quotation marks added.

By default <blockquote> should not have
quotation marks added as it has previously been used to inset text. This
behaviour has been deprecated.

The specification states that it should be possible
to add quotation marks to blockquote elements using Cascading Style Sheets, however,
conveniently, Microsoft Internet Explorer doesn't support this either.

The quotation marks added should be
language-dependant. This means that we need to pay attention to the lang and
xml:lang attributes of elements, and further to process these in order of
precedence. According to the XHTML
1.0 specification §C.7 then xml:lang always takes precedence over lang.

As supporting content generation with Cascading Style Sheets would require
writing a script which will accurately parse CSS and apply formatting and
content generation to the document elements, we will settle with a simpler
solution: being able to specify whether to add quotation marks to blockquotes in
the script. This should pose no greater problem for the webmaster, but you lose
a bit of flexibility.

Investigating the Document Structure

With what we've summarised so far, we should be able to figure out what the
script should do in relation to <q> and <blockquote> elements, but
we also need to get to the elements somewhere, somehow. If we, for a moment,
presume that our webpage is well-formed XML then we might have a structure much
like this:

The more programming inclined of us will invariably recognize this as a tree,
and what better way is there to traverse a tree than to use recursive functions?
In particular, I will be using a preorder traversal of the tree.

The diagram above doesn't actually entirely depict the internal document tree
that Microsoft Internet Explorer generates from the page source, as it also has
text-nodes for text in elements (as far as I can deem it is only for block
elements that text-nodes are generated, e.g. they should never be generated for
<q>, <a>, etc.). These text-nodes are characterized by having their
nodeName variable set to #text.

As we can see from the diagram we could be in the situation where an image is
the first element of a blockquote. Incidentally an image element cannot contain
HTML, so we need to take an alternate course of action in this case: inserting
an extra text-node before the image. Fortunately this can be achieved easily
using methods on the blockquote element. Likewise if the image element is the
last child of the blockquote element.

Languages/Sprache/Sprog

The next big deal to cover before we go overboard and code happily through
the night, is the tiny little phrase in the specification: User agents should
render quotation marks in a language-sensitive manner.

What language-dependance is there to this? Quotation marks are just "..." and
'...', are they not?

It would be much too simple if all languages used the same quotation marks —
life just doesn't work like that! It is, of course, easier for those of us who
speak more than one language to notice this difference in behaviour between
languages.

For instance in Denmark text is quoted like this: At være eller
ikke at være., or using one of the alternative forms: »At være eller ikke at
være.« In French they use guillemets to quote text: Le roi est mort,
vive le roi! Progressing to other languages the quotation marks keep
changing. I haven't had the inclination of constructing an exhaustive list of
quotation marks based on various languages, nor have I made the script support
languages that are written right-to-left.[1]

It is also possible to have quotations inside quotations. In general this
means using a single-sign version of the outer quotation (except in English). To
simplify matters I have chosen just to alternate the quotation mark as quotes
are nested, and not to support any of the alternate quotation styles for the
various languages. For instance Danish and Norwegean both have two commonly used
alternatives than the one presented in the table below.

Language

Begin outer

End outer

Begin inner

End inner

American (en-us)

“

”

‘

’

Dansk (da)

„

”

‚

’

Deutsch (de)

„

“

‚

‘

English (en)

‘

’

“

”

Français (fr)

«

»

‹

›

Norsk (no)

„

”

‚

’

Svenska (se)

”

”

’

’

The table above has been constructed from the following references: English/American,
Norwegean, German/French, Swedish. Only the
Norwegean reference is an official reference, most language councils do not
publish the language's grammar and usage online (at least not what I was able to
locate). The Danish quotation marks have been taken by the official publication
by the Danish Language Council. If you want to make corrections, give references
to further languages, etc., feel free to contact me.

Harvesting the Fruits

Now that we have come all this way, from reading the specification to
linguistic analysis we are finally able to construct the script. There are a few
things that we would like to keep optional, and thus we support configuring the
script by placing a few global variables at the top of the script, this
includes: whether to use xml:lang (so this script can be used with
HTML 4.01 as well), what elements to modify (whether to add quotation marks to
both <q> and <blockquote>), what the default language should be, and
finally whether to reset the quotation depth if we change the language of
contents of some element through the document tree. These four configurations
will be kept in the elements: reset_level_on_new_lang,
use_xml_lang, modify_elements, and
default_language.

Apart from the configurability the script isn't much more than a few
functions: get_quotes which gets the quotation mark characters
based on a language string, parse_element which is the work-horse
of the program, this is the function that takes care of everything, but I will
cover this in greater detail in a few moments. Finally there is
q_fix, which is the entry-level function. This sets up the initial
language and begins the tree descent.

get_quotes

get_quotes is at large fairly uninteresting as it merely builds
an array with begin/end quotes for both nesting levels and return this.

q_fix

As I have only had the time to test the script with Microsoft Internet
Explorer 6 the function will limit the script to work with this. It should be
fairly straightforward to extend it to other versions if they support the full
range of methods and properties as well.

Following, it queries whether the <html> element has the
xml:lang (if used) or lang attributes set and use them in order of
precedence. Then it proceeds to examine the <body> tag for the same.
Lastly it passes the <body> element to parse_element.

parse_element

This is probably the most interesting part of the script as this is the thing
that resolves all elements, place all quotation marks, and well... you get the
picture.

The first part examines the language of the passed element. If it is
different from the language of the parent the new language will be used
(xml:lang or lang, in order of precedence).

The second part examines whether the current element is one of the elements
listed in the modify_elements variable at the top of the script. If
it is we roll out the core logic. Providing it is a <q> element we just
add the begin and end quotation mark to its innerHTML property. The
benefit of <q> is that its contents are severely limited by the DTD (I am
presuming that we are using a strict document model, I haven't tested how well
it holds up to more relaxed DTDs).

<blockquote> on the other hand is a great deal trickier as it is a
block element and as such can contain a lot of elements, including elements that
cannot contain HTML/text themselves, e.g. <img>. The problematics with
placing the first quotation marks are mirrored in placing the last quotation
mark within a <blockquote> element, so I will settle with explaining the
first: If the element has no children then its innerHTML property
will have the beginning quotation mark added, else if the first child is a
text-node it will have the quotation mark added, else if the first child element
can contain HTML it will have the quotation mark added. As a last resort we will
add a text-node as a first child element to the <blockquote> element.

Lastly the quotation level will be increased if the element was in
modify_elements. Regardless we will continue with the child
elements of the current element with the newest language and quotation
level.

That is all there is to it, really.

Integrating the Script

Integrating the script into your own pages is fairly painless, all it takes
is an extra line added to your <head> section and calling
q_fix in the onload event of <body>. The
following excerpt of an HTML file shows this:

That should be doable even for the most JScript-phobic webmasters out there
(I hope).

Customizing the script

If you do not wish to reset the quotation nesting if you change language
somewhere down through the document, then find the variable
reset_level_on_new_lang and replace the 1 with a
0.

If you are only using HTML 4.01 and thus don't want to support the
xml:lang attribute then find use_xml_lang and change
1 to 0.

If you do not wish to have quotation marks added to <blockquote> then
find the line modify_elements = new Array('q', 'blockquote'); and
change it to modify_elements = new Array('q');

Lastly, if you write your pages in a different language than English and
don't want to place manual lang attributes everywhere you can find
default_language and change en to the language code of
your choice.

Adding Languages

If the need arises you can manually add language definitions to the script,
or change existing ones. If you navigate to the get_quotes function
you should be able to see something like this:

First off you will want to copy this to a new block and change
'en' to the language code of the language you wish to add, for
instance es for Spanish. quotes[0] defines the beginning outer
quotation mark, quotes[1] the beginning inner quotation mark,
quotes[2] the finishing outer quotation mark, and
quotes[3] the finishing inner quotation mark.

The '\uXXXX' refers to a UNICODE character definition. The UNICODE
site contains charts which lists
the various characters and their number. If you have a new language to add, find
the characters in the UNICODE charts and then copy their numbers over the
existing numbers.

The break; statement must remain there. It tells the script not
to overwrite your settings for that language with the settings of the next
language.

Future Pursuits

There are, of course, always things to improve, always things to add, always
things to do, and never really enough time to do it in — ah, the joys of having
a job. I rarely work with JScript so I can only presume what the efficiency of
the script will be, but as far as I can reckon it should only touch any element
once, so it should be fairly efficient (we do need to touch every element down
the tree to see whether the language changes). This might be extremely
inefficient if you only have few quotes on a page, then it might be more
efficient just finding the quotation elements and walking up the document tree
to determine the language.

The next step would be to automatically support for alternate quotation marks
for various languages, and also to expand the list of quotation marks for
languages. The current amount of languages is still fairly limited, but with a
bit of luck it can increase steadily. If you want to contribute knowledge of
quotation marks for some language, please include a book and/or web reference so
that I can validate your claims.

Of course, the big pursuit would be to write a custom CSS parser in JScript
that will override the computations by IE so that we can support the content
generation capabilities of CSS2, in particular the :before and :after pseudo
elements. This is, however, a large endeavour to take on and not one that I am
prepared to spend a lot of time on.

Notes and Acknowledgements

Technically speaking we can circumvent this by specifying the
end quotes as the begin quotes in the script, and specify the begin quotes as
the end quotes in the script. This might, depending on your point of view, be a
slight hack, but as far as I can see, it should work.

Development-related pages

History

25th Nov. 2003: Initial release.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Comments and Discussions

Good article, as you point out it is not an ideal solution but then it is not an ideal situation.

There may be a better way of finding all q elements though; getElementsByTagName is very handy. I have not tried it for inserting text before and after the returned elements but I have used it for other things (e.g. stripping, replacing etc.) and it works well.

getElementsByTagName
Returns a NodeList of all the Elements with a given tag name in the order in which they would be encountered in a preorder traversal of the Document tree.
Parameters
tagname
The name of the tag to match on. The special value"*" matches all tags.
Return Value
A new NodeList object containing all the matched Elements.
This method raises no exceptions.

Thanks for your comments. As I point out in the first section of "Future Pursuits" then finding all the "q" elements solely, requires that you for each element walk up the document tree to figure out the language in the current context and propagate this information down to the actual "q" element.

Supposing that many "q" elements reside inside the same block-level element it might be a further optimisation to do write-back with language information on each level down through the tree (that way you have to walk a shorter distance up the tree for the consecutive "q" elements). This is also the optimisation behaviour employed by some disjoint set data structure implementations, for those interested.

So to reiterate, sure, getElementsByTagName is a very good choice if you only want to touch the "q" elements, but since we need to support complete language support, we would have to walk up the tree for each such "q". I guess the article isn't quite clear about my motivation for not using getElementsByTagName.