Tech Notebook

Tech talk alert

These are some technical notes about creating a standards-based genealogy web site, for those with a interest in such things. These news items are not about genealogy, but about web standards, browsers, search engines, software and such that relate to genealogy, standards-based web sites in general and this site in particular. The background knowledge assumed by the individual tech notes varies wildly; each post generally assumes enough background knowledge to keep the post relatively short.

Safari for Windows Public Beta Font Fix

2007Jul12

After some brief analysis of the Safarifont problem, I can globally explain the problem and, more importantly, present a Quick and Dirty fix you can apply immediately.

the problem

The problem is that Safari's current font handling is not broken, but even broken by design. The Apple programmers made several rather basic user interface and programming mistakes.

They do not respect the user's default font settings, but hardcode Lucida Grande as the font for menus. Although that is not their only violation of good user interface design in general and the Windows User Interface Guidelines in particular. I for one am willing to forgive all this it all in a Beta. The current Safari for Windows seems a rather straight port that simply still needs to be adapted to the Windows platform conventions.

They apparently do not bother to check font handles for validity. They surely would have defaulted to the system menu font if they did? That they not even check handles makes you really wonder how reliable the rest of the code is...

. They do include the necessary font files in the distrubution, but do not install it. That is the kind of problem you do expect in hardly tested Beta code, but then again, not the kind of problem you expect in a Public Beta.

the real fix

The real fix is not for Apple to install the font files properly - although they should certainly do so - but to respect the user's font settings, and default to Windows system font if they cannot get a valid font handle.

the Quick and Dirty fix

There are two things you need to do: you need install the font and, unbelievably enough, you also need to make Safari understand that you did.

Installing the fonts

The font file are in the Safari resources drectory, %PROGRAMFILES%\Safari\Safari.resources.
Find the files “Lucida Grande.tff” and “Lucida Grande Bold.tff” in that directory. Both are about 1 MB large.
Copy them into the Windows font folder, (%FONTS%, a.k.a.%WINDIR%\Fonts, typically C:\Windows\Fonts); you will see a copy dialog appear that confirms that Windows is not just copying files, but in fact installing fonts.

make Safari understand

Alas, even with the two fonts properly installed, Safari does not still not display any text. Safari comes with an Apple font list file, that is a file that lists fonts, that does not mention these fonts. You must add the fonts to Safari's font list yourself.

Safari's font list is in a file is named Font.plist. Its location depends on your user name. You can find it in the directory C:\Documents and Settings\%USERNAME%\Local Settings\Application Data\Apple Computer\Safari. The Font.plist file is an XML file. If you do not have an XML editor, you can view and edit with NotePad. When you view it, you will see a long list of fonts, in alphabetical order. What you need to do to make the Safara Public Beta work, is insert two lines in that file for the two Lucida Grande font files, and insert them in their proper alphabetical position.

If your font folder is C:\Windows\Fonts, the two lines you need to insert are:

Now, if you are developer, you will probably just insert these two lines manually. But for everyone else, there is a an easier way, a method that does not require you to edit any XML file.

After some experimentation, I figured that the easiest way to get it right without risking messing up the font resource file would be to deinstall and reinstall Safair; now that the font is installed in Windows, the Safari install will create a new Fonts.plist file that includes those two fonts. And once that's done, the Safari menus actually work. In fact, you do not even need to deinstall it, you can just run the installation program again, and choose the ≴Repair≵ option. It will recreate the Fonts.plist file, you just need to either delete or rename it first.

share and enjoy

The use of a font list file is hard to understand. What is it good for? Windows offers font enumeration functions. Surely Apple does not expect me to delete their file and "Repair" Safari everytime I install or remove font? I know, I know, it is a Beta, but it is a Public Beta. More than that, it a Public Beta that Apple is deliberately drumming up a lot of media attention. This Beta is their first version ever for Windows platform. There is no earlier non-Beta release for curious consumers to default to...

I feel quite sure that Apple is already are regretting the decision to push this early version out as a Public Beta. As I write this, the net is already buzzing about multiple security flaws in Safari for Windows that all seem to result from sloppy coding, and Steve Jobs's claims that Safari is considerable faster than Opera, Firefox and Internet Explorer is getting slammed left and right.

I dare say that not everyone who downloadsd the Safari for Windows Public Beta will be prepared for a genuine Beta experience. Still, if you are Windows-based web developer, you may want to give it a whirl to check your own web site out in Safari.

Safari is Apple's web browser, and WebKit is the browser engine it is build on. The WebKit engine was originally developed for Mac OS X, but has now been ported to Windows. There are bound to be some small rendering differences between Safari for Mac OS and Safari for Windows, but you can now test your website against Safari for Windows on Windows XP. It is even Windows Vista compatible, or so Apple claims. My experience with the current Beta is that it does not even work on Windows XP.

Like Opera and Firefox, Safari is standards-based browser, and Apple offers a Safari developer page to help you with Safari-specific stuff, such as user-agent strings, and the current level of CSS support, but also more general stuff such as web development best practices.

There are two editions to choose from: with or without Apple QuickTime. I opted for the full package. When you try to install, Apple will offer to install Apple Software Update and their Bonjour Service.
According to the installation dialog box, the Bonjour service detects shared resources on my network and makes them available in Safari. Hm, I thought Windows Explorer did that already. I may have a look at it later, but it does not sound like I really need or want this, so I bonjoured the Bonjour check box.

The first time I started Safari for Windows, it displayed the apple.com web site just fine, but all the menu text was missing, and blindly entering another web address did not work either. Even after a restart of Windows, Safari was still not displaying any text...
I have already seen various public complaints about Safari font behaviour, some of these blaming Apple for rushing alpha code out of the door just to be in time for the WWDC keynote. Oh well, it is a Beta product, and with a flood of complaints about the broken font support heading Apple's way, it is likely to be fixed soon.

Rating labels update

2007Apr14

I have updated the rating labels. An update was overdue. The Ladies Jolink page is about World-War II persecution and the Geerhard Jan Jolink page is about his senseless murder. In rating systems, these topics are considered technical and news references to intolerance and violence.

I originally created a SafeSurf label for the domain. I have updated this SafeSurf label, and added ICRA labels.

All pages share the same ICRA label, even though they do not all share the same rating. The nice thing about the ICRA method is that allows you to specify a label for most of your site and then specify the exceptions, all in one questionarre. You can even given helpful names to these exceptions. The other often used system, SafeSurf, wants me to label either my entire domain or generate an individual label for each page. It does allows you to override the site rating for individual pages, and I have now taken advantage of that capability.

Understandably, neither system works well with the cheap frameset redirection I use. Neither organisation provides guidance on what to do. I decided to update the SafeSurf label for my domain and apply an ICRA label for its actual location.

The really nice thing about the ICRA system is that they generate two labels for you, a modern Content Label and a traditional PICS label. The Content Label is an RDF-based format being developed by the W3C to replace the older PICS standard. Although not a standard yet, several browsers support it already. The PICS label is provided for browsers that do not support the newer Content Label system yet.

Tip: The tags that SafeSurf generates do not validate! The meta tags they generate are not closed. You should correct this before applying the tag to your site. The ICRA meta tags are fine.

Genealogie Werkbalk Alert

2007Apr11

I think I first tried the Genealogie Werkbalk more than a year ago when its version number was still 0.x and it was for Internet Explorer only. Nowadays there is a Firefox variant as well. It is a toolbar-sized add-on that offers a menu of links to various, mostly Dutch, genealogy web sites, most notably the various web sites of its creator.

A few weeks ago I tried version 1.1 for Firefox. It was a total disaster; it was hard to click a button, I had to click close the right edge of a button to register a click at all. As soon as I clicked anywhere within the browser window, it turned grey, and whatever I tried in the window or the menu, whenever I clicked a message stating “TypeError: window._content has no properties” popped up. I could not even enter a new web address, all I could do was terminate the browser session and restart it. A few tries confirmed that this defect happened every time, so I did the only sensible thing and deinstalled it.

toolbar impacts browser behaviour

I did not give it much thought again until a few days ago, when a fellow genealogist told me the menu on my web site does not work for him. I was rather surprised. After all, I use nothing but web standards on this site, the hover CSS hover technique used for the menu is reliable and even quite common by now, I do not rely on JavaScript, and I have taken steps to ensure that the hover works in Internet Explorer. I inquired after his browsing setup and promptly got this reply: Internet Explorer 6 with the Adobe Acrobat plug-in and the Genealogie Werkbalk.

I installed version 1.1 of the IE variant in Intenet Explorer 7 and noticed no immediate problems. Things seemed to slow down a bit, but the browser seemed otherwise unaffected. Using Microsoft's Virtual PC image, I then installed it in Internet Explorer 6 - and it promptly messed up the browser's behaviour; the web site menu no longer worked. When I deactivated the add-on, the browser behaved normally again. I used the same Virtual PC image to install Firefox 2.0.0 again without downgrading my real setup - and confirmed the problems I experienced earlier.

You may never have heard of this particular toolbar, but it is fairly popular with Dutch genealogists, and fellow researchers are part of this site's audience, so its defects are a real issue.

The toolbar defects do not seem to impact Internet Explorer version 7. This is probably because the Internet Explorer team has overhauled the Add-On Manager in Internet Explorer 7 to improve browser stability. I am guessing it is now compensating for several common programming mistakes known to impact earlier versions.

update: cursory analysis of the defect

The Firefox variant can easily be inspected. The manifest file install.rdf contains the GUIDdaf44bf7-a45e-4450-979c-91cf07434c3d, which seems to be an example value from the Mozilla documentation which should probably never be used for an actual toolbar, as it leads to multiple toolbars having the same identifier...

A quick browse of the sources suggests that these were created with SoftomateToolBar Studio, version 2.17.0.4. Toolbar Studio is a product for quick creation of browser toolbars, first released in 2005. Originally aimed at the creation of Internet Explorer toolbars, it now comes with limited Firefox support. This tool makes creating toolbars very easy, perhaps too easy, as there is no need learn anything or understand what you are doing. The regname value XBTB06823 found in Genealogie Werkbalk appears in toolbars based on the included like_google.cab example. When I created a toolbar based on that sample, using the current version of ToolBar Studio, version 3.0.1.0, I experienced no problems in either Internet Explorer 6, Internet Explorer 7 or Firefox 2.0. So it seems the Genealogie Werkbalk defects originate with either the older version of the tool or the modifications made to the example.

Visual Link Check

2007Apr10

Over the past few days, I have been doing a visual link check. Automated tools are happy when they find a page, any page, so a visual check is necessary from time to time. While following each and every link on the One Name Research page, I found myself correcting about a dozen edit errors I had made over time, some as serious as not using the right link. I updated another half dozen links, including one old link that now pointed to a sex portal that's abusing the popularity built by the former owner of the domain. I emailed two webmasters with quick fixes for issues that prevented display of their site. I finished with a quick count and an automated check that all the current links work fine; there are 442 links to mostly Dutch one name studies there - and they all lead to working websites.

NoScript adds blacklist

2007Mar25

NoScript has just been updated. NoScript already offered a whitelist of sites you trust to execute scripts. It now also features a blacklist, for those annoying domain names that keep popping up again and again, all over the web. Now you can block those add kings once and for all, and never be bugged about them again.

Tidy as Mark-up Validator

2007Mar22

You might think the HTML Validator for Firefox gives you a good idea of the quality of the web, but it does not. The actual quality of a page is often considerably worse than Tidy indicates.

Tidy was not created as a validator, but as an utility to tidy up sloppy code, to turn a legacy of tag soup into valid mark-up. It was conceived as an auto-fixing syntax checker, not as a mark-up validator. The problem is not that Tidy's fixes are best guesses and should be checked by both a human and a real validator - for a published page, all such problems should be in the past already. The real problem is a design decision that seriously reduces Tidy's usability as a validator.

errors are warnings

The problem is that Tidy classifies all the errors it can fix as warnings. It makes perfect sense to classify the errors it can fix as Tidy-fixable errors, to have you concentrate your efforts on the non-fixable errors. It is great that Tidy can fix erroneous mark-up, but it makes no sense to classify fixable errors as mere warnings. That Tidy can fix many errors is cool, but that does not mean those errors are not errors anymore. Errors and warnings are still two different things. Mark-up is in either error or it is not.

When you are just using Tidy to clean up your legacy mark-up, you are probably happy to have the tool do much of the fixing for you. Yet the problems start there already, as Tidy's error and warning counts mislead you about the actual quality of the legacy mark-up. Still, the erroneous classification is no problem if you let it fix all it can fix. It is merely ironic that Tidy deprecates its own abilties itself by suggesting that it never fixes any errors.

warnings are errors

Once you start using Tidy as a validator, as HTML Validator for Firefox does, the design mistake becomes apparent. Because Tidy lumps many actual errors in withthe warnings, the extension's overall status for a page will often be warning when it should really be error. The practical upshot is that you come to associate the warning icon with erroneous mark-up.

deliberately empty tag

If you visited my valid pages with the extension installed a few weeks ago, and saw the warning status, you were likely to think the page has errors. Tidy used to complain about the following bit of mark-up with the message Warning: trimming empty <...>.

<span class="checkmark"></span>

That mark-up is fine. It is a deliberately empty tag. The stylesheet uses the CSS pseudo-element :after to insert a checkmark (✔) there. That's admittedly a roundabout way of adding it, but it is perfectly legal. In fact, if the extension were just a bit smarter, it might suppress the empty element warning, not just because it is a <span> tag, but because it looks for the :before or :after pseudo-element on the applied class.

quick fix: invisible filler

I wanted a quick fix, to suppress the warning message without rethinking the style sheet. I did so by putting something innocuous into the empty tag. An encoded space (&#032;) is nearly perfect; nearly, because a space has width, and to make matters worse, just what the width of a space is can varies with the font, the surrounding characters and the browser.

The ideal filler character is code point U+200D, the Zero Width Space, but that character is not included in the Arial font and, well, the brief version of the story is that Internet Explorer 5 and 6 have problems dealing with characters that are not in Arial. Within the WGL4 character set supported by Arial 2.0 and later, code point U+00AD, the Soft Hyphen, seems the best choice - and good enough to do without a separate Internet Explorer style sheet. Inserting that character into the previously empty tag solved the issue; the page looks no different, but Tidy is happy now.

<span class="checkmark">&#173;</span>

Still, all is not well. The problem remains that even brief experience with Tidy's erroneous classification soon makes you think of its warnings as errors, and of all pages with warnings as broken pages.

Tidy-as-validator

Only when I "fixed" the perfectly fine mark-up by adding a superfluous character to get rid of the warning did HTML Validator for Firefox tell me that I could use the logos for valid mark-up. The decision behind that particular logic is quite understandable; most of Tidy's apparent warnings are really errors, and you do not want to allow valid mark-up logos on erroneous pages. Then again, not all of Tidy's warnings are fixable errors that have been misclassified. Some of its warnings really are warnings and it is wrong to "fail" mark-up validation and withold a valid mark-up logo because of a mere warning. Every valid page deserves to be recognised as such, and that is not what Tidy-as-validator is doing right now.

not always a "fix"

An empty tags are not an error, but it is a good thing to warn about during testing. I demonstrated that the empty tag warning is one Tidy warning I can suppress, but not all of Tidy's warnings can be suppressed. One of my pages has a nested quote. It is a perfectly valid nested quote. There is nothing wrong with it. Tidy quite reasonably warns: nested <q> elements, possible typo. There is nothing wrong with that either. There seems to be nothing I can do to get rid of that warning. Tidy-as-validator treats the warning as yet another misclassified error and witholds its approval from the mark-up. HTML Validator for Firefox not only witholds the logo, but also shows the warning icon, which is likely to make you think the page has errors. All this is not how things should be.

tidy up Tidy

I modified my mark-up to suppress a Tidy-as-validator warning to demonstrate a "fix", but I am of a good mind to change it back. After all, the original mark-up is valid and that soft hyphen does not belong there. Creating working valid mark-up and style sheets is hard enough without trying to accomodate a tool that is broken by design. It is not the original mark-up that needs to be fixed. The original mark-up is valid, and a validator has to recognise that.

It is Tidy-as-validator that should be fixed. That fix has to start at the Tidy design, by proper classification of all issues into unfixable errors, fixable errors, and warnings. Once the issues are properly classified, Tidy-as-validator can show icons for pages with errors, pages with Tidy-fixable errors, and error-free pages. It can continue to show warnings, but it can stop treating all warnings as misclassified errors, and make a proper validation decision.

This free Firefox extension checks every page you load into Firefox. It displays an icon on the status bar that indicates the quality of the page. The three status icons are briefly described as a green checkmark, a yellow exclamation mark and a red cross. The extension shows the green check mark if your site checks out fine, it shows the yellow exclamation mark if there are warnings, and it shows the red cross if there are mark-up errors. You can switch the status bar display from merely icon to icon with the error and warning count, so that you don't need to click anything at all. Whenever you load pages for a visual check, you get to see Tidy's judgement of the code - and so does everyone else who uses it.

Because it works locally, on the page loaded into the browser, the Tidy extension is hardly a drain on system resources. Tidy's small status bar icon does not distract either. There is simply is no reason to ever uninstall it, or even switch the status bar display back from an icon with counts to just an icon. In fact, it is kind of interesting to see the mark-up quality of the web;

HTML Tidy for Firefox validation results

website

Tidy

SGML

Combined

errors

warnings

errors

warnings

errors

warnings

Results obtained with HTML Validator 0.0839 on 2007 Mar 20. You can click the sites to see how well they do they today.

Yes, Google did managed to cram that many errors and warnings into its signature near-empty page. I only used google.us, because Google has the annoying habit of switching me to google.nl when I specify google.com. This link should provide the same results for you and me.

HTML Validator 0.839 beta

The current version of the validator extension is 0.795. That version uses the Tidy validator exclusisvely. I made the above table using the beta for HTML Validator 0.839. The 0.8x beta adds OpenSP validation. OpenSP is the SGML
parser used by the W3C mark-up validator. The validator extension now gives you a choice of validators: Tidy, OpenSP or serial: Tidy followed by OpenSP.

Although it's name has always been HTML Validator for Firefox, it has often called Tidy for Firefox, Now that it supports two different validators, that just isn't right anymore.

The current beta does not add up the Tidy and OpenSP counts as I did in the table. When you opt for serial exection of the Tidy and OpenSP, it simply continues to display the OpenSP counts once it's done. It would be better to display a combined count. Simply adding the two counts is not perfect, but it sure beats having a good OpenSP validator score hiding a bad Tidy score.

fast validation

The inclusion of OpenSP, the W3C validator code, in the validator extension is a welcome relieve. All my pages display a link to the W3C validator, and I always used that to validate them. Lately however, the W3C validator service seems to be down a lot. I hope the frequent downtime is caused by its growing popularity, but now that I have this extension I do not have to worry about it anymore. The extension performs all its validation locally for instant results. You don't need the W3C validator service, you don't need to upload anything, you don't even need a live Internet connection; you can use this on a disconnected laptop.

Starting with this version, the validator extension comes with a choice of icons to diplay on pages that validate without error or warnings. Unlike most such images, these are not 88 x 31 pixels, but 78 x 32 pixels. In addition to the W3C validation logos for passing the OpenSP validation, it offers a “Tidy” logo for pages that pass Tidy's validation, and a “HTML” logo for pages that pass both Tidy's and OpenSP's validation.

Internet Explorer 5 is dead

2007Mar05

Interestingly, the blog entry about the updated Windows XP with Internet Explorer 6 image notes that no Internet Explorer 5.5 image will be made available, because browsers older than Internet Explorer 6 are hardly used. All versions of Internet Explorer before version 6 constitute less than 1 percent of all browser usage.

That is an important remark. Microsoft is effectively, but quite officially, saying that we do not need to support Internet Explorer 5 anymore. The Internet Explorer team is stating that Internet Explorer 6 has become the minimum version to test with. That amounts to an official declaration that Internet Explorer 5 is dead now.

Updated Windows XP with Internet Explorer 6 image

2007Mar05

Following the earlier release of a free Virtual PC hard disk image of Window XP with Internet Explorer 6 that bombs on 2007 April 1, the Internet Explorer team has released a new hard disk image. No other official solution has been made available yet, and the latest post in the Internet Explorer team blog suggests that they will just continue to release images. That is disappointing.

The World Wide Web Consortium has announced the creation of a new HTML Working Group to create a successor to HTML 4.01. That successor may very well receive version number 6 to avoid confusion with the HTML 5 specification put forward by the WHATWG. It is apparently in response to that group, that the W3C is now putting resources behind classic “tag-soup” HTML as well as the XML-based XHTML.

The Architectural vision for HTML/XHTML2/Forms admits that there has been little incentive for web developers to improve their code and leave tag-soup behind. It is true that XHTML itself offers few immediate benefits, but it provides a platform that other technologies build on. for The increasing adoption of Cascading Style Sheets for flexible browser-independent layouts is major reason for many web developers to shape up and start creating well-formed code; simply because it is hard to apply a style to some element, sequence of elements, or combination of elements if you do not make clear where those elements start or end. The complete omission of even the briefest mention of CSS, even though its existence is the reason for the deprecated status of many HTML elements, suggest unawareness of this growing impetus behind the adoption of XHTML.

The W3C will continue to develop the much criticised XHTML 2 along the new HTML version, but is likely to develop it more along the practical lines of the WHATWG instead of the theoretical lines pursued in the past. Hopefully, a serious effort will be made fold the somewhat renegade HTML 5 specification into the new W3C specifications, and keep the strong correspondence between clean HTML and XHTML*Well-formed HTML 4.01 and XHTML 1.0 Transitional are so similar that conversion from one to another is relatively trivial.
. If the new working group is successful in pulling this off, the future specifications are likely to be known as HTML 6 and XHTML 6.

New Year 2007: site refresh

2007Jan07

The site has been refreshed. It took hard work throughout December to make this happen.

There are small changes to the layout.

There are still very few bitmaps, but all bitmaps auto-resize with the size of the browser or the size of the font.

All links have been checked and many have been updated, sometimes with links to the Internet Archive.

Many items have been given identifiers so you can link directly to those items instead of just the page.

To support resizing of bitmaps images with the font size, most bitmaps are currently two times as high and width as the size they initially display in.

Specifying the CSS version

2007Jan06

The current style sheet uses a few CSS3 features. If a browser does not support those, they will be ignored. That's how CSS has been designed. There is no way to specify the CSS version used, because you do not need to specify it. At least, that's the official theory.

A small practical problem is that the W3C's CSS Validator defaults to the latest official version, which is still CSS 2.1, and does not ignore, but complains about newer features. As a result, use of CSS 3 features - even when these are no longer expected to change, and some browsers already implement them - results in your cutting-edge style sheets being flagged as invalidCSS.

The trick to validating CSS3 already is to specify version 3 when validating. The validator does support version 3. The validator pages do not tell you how to do this, but after a bit of experimenting I figured out that you can use specify version 3 like this:

http://jigsaw.w3.org/css-validator/check/referer?profile=css3

There is one small issue with this approach. Now, the version number is embedded in the validation link. This is not an immediate problem, the link will continue to work fine when CSS 3 becomes official. It will even continue to work fine when the latest version is 3.1 or 4.0. I can the links back as soon as CSS3 as soon as the validator defaults to version 3. The validation will only fail when I try try to use even newer features, and by that time, the problem is easily solved by changing the links again.

Free Windows XP with Internet Explorer 6 Virtual PC image

2006Nov30

A major problem with Internet Explorer 7 is that it replaces Internet Explorer 6. You cannot keep Internet Explorer 6 installed when you install Internet Explorer 6. The average user does not care, but web developers have complained loudly. They want to install both versions on the same system, to quickly switch between versions to make sure their web site works fine both.

Microsoft has acknowledged the problem and given in to their demands. Microsoft and now offers the Internet Explorer 6 Application Compatibility VPC Image. It is a half a gigabyte download, which you need to unpack into a 1½ gigabyte Virtual PC hard disk image. That virtual hard disk actually contains a full, licensed and activated copy of Windows XP Service Pack 2 (and high-priority fixes through 2006 November) with both Internet Explorer 6 and the Internet Explorer 7 Readiness Kit installed on it.

According to the readme.txt, the license expires on 2007 Mar 30, and you are recommended to back up any important files before April 1. The readme.txt is probably error. According to the download page and the IE blog, the image expires on April 1, so "March 30" should probably read "March 31".

You will need Virtual PC 2004 or later to use the image, and that is a free download too.

Tips: You will probably want to create a new virtual machine and then add the virtual hard disk to it. Choose to use at least 256 MB or RAM instead of the ridiculously low 128 MB the Virtual PC wizard recommends. You do not need to log in, but you probably want to create a shared folder to access local files from within the virtual PC. Make sure to create a permanent mapping to that folder instead of a temporary one.

If you do not want to sacrifice so much hard disk and RAM to Internet Explorer 6, you can also try the unofficial standalone Internet Explorer 6.
Manfred Staudinger's Taming Your Multiple IE Standalones explains how it is done. Be warned that it is works up to a point, and is not without problems. I just went ahead, and found that this works for a quick check of the layout, but that the hover effect was broken; when you run Internet Explorer 6 standalone along version 7, IE 6 seems to think it is IE 7, and processes conditional comments accordingly, so my hover fix is not loaded. You can fix that by removing Internet Explorer registry keys…

I decided to download and install Internet Explorer 6 on an older machine. I had already upgraded that machine to Internet Explorer 7, but this kind of problem is exactly why I keep older machines around.

Google Site Map becomes de facto standard

2006Nov15

Google's Grace Kwak announces that both Yahoo! and Microsoft are now supporting the Google Sitemaps specification.
With this support from Google's major competitors, Sitemaps (sansGoogle) is now a de facto industry standard.

Google's sitemap site still exists, but there is a new Sitemaps site now.

Microsoft creates Outlook HTML problems

2006Nov08

Microsoft has replaced Outlook's HTML rendering engine. Outlook always used Internet Explorer's rendering engine. Trident, Internet Explorer's rendering engine is available to other applications as the WebBrowser component. Several Microsoft and third-party applications use this, including Microsoft Outlook. The practical upshot until now was that the Trident engine is the de facto standard across many Windows applications, and that if something worked in Internet Explorer you could expect it to work in Outlook. That is no longer the case. You can no longer rely on Internet Explorer to test HTML emails meant for Outlook users.

Instead of relying on the improved engine available in Internet Explorer 7, Microsoft Outlook 2007 uses the Office HTML rendering engine. The practical upshot is that Outlook 2007 is less HTML-capable than previous versions. When it comes to HTML rendering, Outlook 2007 is not backwards compatible.

why oh why

Why oh why did the Microsoft Office team do this? Briefly, because Outlook is no longer using its own editor at all, but is using a stripped-down variant of Word. The stripped down Word is better than the old editor. Moreover, you do not need to have Word installed to have the familiar Word interface and features in Outlook, and you do no longer have to wait while Word takes it sweet time to load. The stripped down variant really is part of Outlook, and loads quickly. It provides Word's HTML capabilities to Outlook 2007. So, the reasoning apparently goes, because the editor already has HTML capabilities, Outlook does not need to rely on the Internet Explorer engine. Now, avoiding the Internet Explorer engine makes some sense considering its infamously spotty security record, but Internet Explorer 7 is supposed to solve all that, and its engine would maintain backwards compatibility.

One alleged benefit of this switch to the Office HTML rendering engine is that the HTML rendering no longer depends on the particular version of Internet Explorer that has been installed. That is a vacuous argument. It is a drawback as well, with Outlook no longer taking advantage of upgraded Internet Explorer engine updates and fixes, but requiring separate fixes. Besides, you still have to deal with different version of Outlook. In fact, you now have to deal with different versions of Outlook in addition to different versions of Internet Explorer. And, oh, multiple versions of Outlook in addition to multiple versions of Word. This change is definitely not an improvement from a version management viewpoint.

This change effectively pushes Yet Another Broken HTML engine (YABHE) onto the web, and one that is clearly less capable than Internet Explorer 5. The Microsoft Office team may not be out to destroy web standards, but it sure seems that they do not care much about the quality of the Office HTML rendering engine. Perhaps one reason for that is they do not consider it important, because they see Office's OpenXML and XPS, codename Metro, as the electronic document formats of choice.

Internet Explorer team

By the way, judging by these unedited Q & A excerpted from the 2006 June 8 Internet Explorer chat, the Internet Explorer team seems to have been painfully unaware of this major Office Outlook change during most of the development of Outlook 2007:

Max Stevens [MSFT] (Expert):

Q: I installed Outlook 2007 beta2 along with IE7 beta2. I noticed that some HTML messages appear different than it was in outlook 2003 with IE7. Do outlook 2007 makes different use of IE engine and how?

A: One of the largest changes that you might have seen with Outlook 2007 is that they install a new font, segou (to be honest, I forget the exact spelling at the moment). Perhaps that's the difference in rendering that you're seeing?

Q: [Q11] no it's not different font (font is nice though :)). The borders are different too, and it feels like some CSS and CSS2 stuff was scraped from the message during rendering, so CSS decorated text and other objects are shown different.

A: At least from an IE perspective, we don't treat office 2007 differently from any other version of Office. Office themselves might have updated how they integrate with IE, but the office team are the experts there.

limitations overview

The step back in HTML capabilities is a serious one. Here is a brief overview, deliberately broken into two columns:

no background images

poor background colour

no images for list bullets

no CSS position, float or clear

broken box model

no deprecated HTML elements

no animated GIFs

no JavaScript

no applets

no forms

no Flash (or other plug-ins)

no frames

Now, I am no big fan of HTML email in general. Many email clients quite correctly refuse to display any HTML email until you green-list the sender or at least decide to allow that particular email right now, for security reasons. Then again, HTML is a widely supported standard for richly formatted documents, so it is a good choice if you want to break out of the text-only mold. A major benefit over other formats such as Adobe's PDF is that it will display in many email clients.

The limitations in the right column are things you should generally not be doing anyway. These practices are annoying, bad design, an accessibility issue, a security risk or perhaps several of these at once. That Outlook 2007 forces you to reconsider these practices instead of continuing to blindly support them is to be softly applauded. Softly, not loudly, as blocking them by default but offering the user the ability to green-list senders would be better than bluntly failing to display what used to work before. The list of HTML tags that are not supported has a lot in common with the list of elements not supported in XHTML, so that's good too, but I fail to understand why <q> is among the not supported tags. That is as unnecessary frustrating as not supporting <abbr> in Internet Explorer.

The limitations in the left column represent a real and serious step backwards. This breaking change will seriously slow down the adoption of CSS and sent designers back to abusing tables for layout.

The better solution is to focus on your newsletter instead of Outlook 2007's limitations. If you have a newsletter, you are probably offering a text-only variant already. On top of that, readers can upgrade to other email clients and you can send HTML documents as an attachment to be opened in their browser of choice.

It is hard to believe Microsoft would stick by this limited rendering engine, so it seems a safe bet to assume that the criticism will be addressed relatively soon, perhaps by allowing the Internet Explorer 7 rendering engine in the upgrade, together with a green list.

New HTML Working Group

2006Oct27

Tim Berners Lee announced the creation of a new HTML Working Group in his blog post Reinventing HTML. The idea is to have a group to evolve HTML independently of the already existing XHTML 2 Working Group, and to work on forms as well.

This sounds remarkable similar to what the WHATWG is doing with it Web Applications 1.0 (
a.k.a.HTML 5) and Web Forms 2.0 specifications, which should be compatible with
HTML as well as XHTML.

This is more or less an official announcement that HTML 4.01 should no longer be considered as the final HTML specification before the
XHTML 1.0 specification. It seems to imply that there will be an official HTML 5 from W3C.

The XHTML 2 draft specification has received lots of criticism, and perhaps
a less ambitious XHTML 1.2 is called for, but reviving pre-XML mark-up language while the W3C's own increasingly popular CSS is driving home the real-world importance of correct mark-up, does not provide a clear and consistent vision of the future web.

<abbr> versus <acronym>

2006Oct25

HTML 4 supports both <abbr> and <acronym>. The idea behind this is that abbreviations and acronyms are not the same thing. There are two problems with this approach, a semantic and a technical one.

semantical

Semantically, the division into two tags is problematic in more than just one way. Many people do not know the differences between an abbreviation and acronym, in fact the definitions used in the HTML specification itself are subject of debate, so it might be better to avoid the distinction altogether and use just one tag. Then again, an initialism is not the same as an acronym, so perhaps there should in fact be three instead of two tags. Perhaps even a fourth, to support truncations as well. And what about the many border cases, such as abbreviations that have become words; does anyone really think of Benelux or radar as an abbreviation any more?

A practical solution, geared towards accessibility, would be to redefine, eh, clarify the difference between <abbr> and <acronym> thus: use <abbr> when the abbreviation must be spoken normally and use <acronym> when it must be spelled out. However, even that will not work. Consider the actual pronunciation of CD-ROM; the CD part of this abbreviation is pronounced letter-by-letter as an initialism, while the ROM part is pronounced as a word, like an acronym. So what does that make CD-ROM in HTML parlance? An acronym or an abbreviation?

technical

The other problem is a technical one: Internet Explorer does not support <abbr>.

The obvious technical solution to the situation sketched so far would be to ignore the existence of <abbr> and use <acronym> exclusively. However, you would be lying to you readers; all acronyms are abbreviations, but not all abbreviations are acronyms, and the difference matters. When you explicitly inform the
UA that an abbreviation is in fact an acronym, a screen reader is likely to take you at your word, and pronounce it as an acronym, even when it isn't one, and that is not desirable. Thus, marking all abbreviations as if they are acronyms just because Internet Explorer does not support the HTML standard, is wrong; You may want to support Internet Explorer, but you should not allow Internet Explorer to make you do the wrong thing, only to have other browsers respond correctly to erroneous mark-up…

abbr only

Aabbreviation is the general concept, initialisms and acronyms are particular kind of abbreviations. We do not really need separate tags for initialism or acronyms if there already is a tag for abbreviations. Therefore, the draft XHTML 2 deprecates the specific <acronym> tag in favour of the general <abbr> tag. You can still inform a browser that some abbreviation is in fact an acronym, using the CSS 2 style speak: spell-out to override the default speak: normal. This is the superior solution; it allows you to apply different speaking style to different parts of an abbreviation, thus allowing you provide correct speaking instructions for compound abbreviations.

Thus, responsible forward-compatible mark-up avoids <acronym> and uses <abbr> exclusively. I try to create XHTML 1.1 that will be easily upgraded to XHTML 2 when it becomes a standard, so I have always used <abbr> exclusively. The real benefit of making this choice early on is not having to worry about changing every <acronym> into <abbr> later.

the problem: Internet Explorer

The only problem with that is Internet Explorer 5 and 6 do not support it, but Internet Explorer 7 does support <abbr> since Beta 1. So, this Internet Explorer problem has, as expected, been solved by the free upgrade. Still, what about those visitors who have not upgraded yet, and visit the site using Internet Explorer 5 or 6?

The simplest solution is to use <abbr> and be done with it. Real browsers will do the right things, and it degrades gracefully in almost-browsers that do not really support HTML; Internet Explorer will still display the body text, so what is the big deal? Internet Explorer does not show your abbreviations, but all the text is there, and you have more things to do than support all of Internet Explorer's failings. A free upgrade is available, and users will upgrade over time.

solutions

A simple, but cumbersome solution is to add additional tags, for example a <span> around or inside each <abbr> to associate a class with it that allows you to style the content. Another solution is to simulate proper behaviour through JavaScript, but that would require scripts to be turned on. Early in 2004, Dean Edwards came up with a simply brilliant solution: use <html:abbr> instead of <abbr>. The html: prefix is perfectly legal, and it makes Internet Explorer recognise <abbr> as an HTML tag. He explains the details in the source code of his abbr-cadabra page.
Alas, it turned out that this confuses validators and does not work in the Safari or Konquerer browsers - and knowingly breaking the page in several other browsers just to fix it in Internet Explorer is not the way to go, not even if those browsers are wrong too; it would be replacing problem with another, and that aint much progress.

Internet Explorer component

I have created an IEHTC that replaces all <abbr> with <acronym> when the page is loaded into Internet Explorer 5 or 6. The HTC is loaded through the Internet Explorer-specific stylesheet. Teaching the old dog a new trick is rounded off by adding a style for acronym to the IE-specific style sheet, to make it show a dotted underline and the question mark cursor. The beauty of this solution is that it uses an Internet Explorer-specific technique to solve an Internet Explorer-specific problem, in a way that does not alter the page. Others browsers do not see any of it, and therefore cannot be affected by it either.

I have added speaknormal, speaknone and speakspellout classes to the style sheet, and used these classes in this note. The need for the speaknone may not be immediately obvious; it has been applied to the dash between the CD and ROM parts of CD-ROM, to prevent it from being spoken aloud.
I've styled the abbreviations in this note this just to show how the <abbr> tag and the CSS 2speak property do not just obviate the need for the <acronym> tag, but are in fact a superior solution. You should generally be able to rely on screen readers having a large dictionary of common abbreviations and their proper pronunciations, but this is ideal for specifying the correct pronunciation of uncommon abbreviations. Always using it is not a good idea; users may have set preferred pronunciations, and you should avoid meddling with that, and only override those when it really makes sense to do so.

Internet Explorer 7 PNG and hover

2006Oct22

transparent PNG

One recurring web developer complaint about Internet Explorer 6 is that it does not support
PNG transparency. Internet Explorer 7 finaly provides proper support for PNG transparency.

This site uses filter:progid:DXImageTransform.Microsoft.AlphaImageLoader, an Internet-Explorer-specific graphics filter, in its separate Internet-Explorer style sheet to make Internet Explorer 5 and 6 support PNG transparency. That technique will remain in place for a long time to come; it will be many years before all the CSS hacks needed to work around the limitations of IE 5 and 6 can be considered obsolete.

hover effects

IE 5 and 6 do not support the :hover pseudo-class, IE 7 does. The Jolink site includes a HTML Component, an Internet Explorer extension, to make Internet Explorer do the right thing. The HTC is specified in the Internet Explorer style sheet. It is not necessary for Internet Explorer 7, but it remains in place to support Internet Explorer 5 and 6.

Internet Explorer 7 Developer extras

2006Oct19

There are many add-ons available for Internet Explorer, and Microsoft has collected them on the Addd-Ons for Internet Explorer site. A particular useful example is ieSpell, a spell-check add-on that supports English, Amglish and Canglish, without requiring Microsoft Office. Guess I am all out of excuses regarding spelling errors now.

Alert: The install program start by recommending a backup. That is always good advice, but perhaps a bit of overkill for a mere browser install. It also recommended to disable antivrus and antispyware, and then enable it again once the install is done. I object to this Microsoft advice, and strongly recommended that you disregard it. You do not need to turn antivirus or antispyware of for a mere application install. Do not endanger your machine just because Microsoft asks. Tell them, politely, to get stuffed. Install with protection or don't install. I had the Release Candidate installed, and even in that more complex situation, installation worked without disabling protection.

Google Base

2005Jun2

Google's Bindu Reddyannounced Google Base.
Google Base is a database of user-provided content; we provide the data, and Google may include it in their search index. It is a database for which you can define your own items, that comes with several predefined data formats, for such things as real estate and products. It does not seem of obvious and immediate benefit to genealogists. One predefined category is "people profiles", but that is for posting your own profile, and you are not likely to find many family members there yet. Still, with anyone able to define their own records it is a service to keep an eye on.

Tip: Meanwhile, Google Base looks like a good way inform the world about a limited-run privately published genealogical publication. Details of private publications are probably not posted anywhere by your printer. You can submit a book record through Google Base, with or without a image of the book cover, and include ordering information or just a link to your website. It might be included in Froogle as well as Google.

Google Site Map

2005Jun2

Google's Shiva Shivakumarannounced Google Sitemaps.
Google Sitemaps is an XML-based file format that allows you to specify information about your site. Google Sitemaps is different from robots.txt, it does not replace but complement it. Sitemaps is still experimental, and Google expects to expand the specification in the future.
Sitemaps is not a proprietary specification. Google offers the specification under the Attribution-ShareAlike 2.0 license.

I have created a sitemap to help Google index the site.

Tip: You can use sitemaps to make sure that Googles finds your orphan pages; pages with no other page linking to them. You can of course already do that by going to Google and manually submit the pages, but using Google Sitemaps avoid having to keep typing the web page address and is likely work for additional search engines in the future. An orphan page is in fact a great way to find lost family. You create a page with the full name you are looking for and some other details they, and preferably only they, are likely to put into a search engine, and they will find the page you created just for them. Once the page is indexed, everyone might find it, so you must think carefully about what you post. Just a name, a place of birth, perhaps a birth date, together with some sentence including words like "looking for" and "lost" might help you find each other, especially if you know of something unique.

There is no doubt that the W3C has drifted from HTML as its central specification to XML as its central specification, but seeing a group of browsers manufacturers wanting to take standard development in another direction than the organisation they are part of, is disconcerting.

The overview of partner names has been generated which a custom .NET application written in C#, which I especially designed just to generate that overview.

The custom application parses the nearly ten megabyte GEDCOM file, finds all the Jolink, Joolink, Jolinck etcetera, then gets the names of all their partners, extracts the surnames, removes duplicates, sorts these surnames smartly, and finally writes out the XHTML page complete with headings and all, in UTF-8.

That the overview of partner names contains Jolink is not a mistake. There have been several Jolink-Jolink marriages.

GEDCOM files use the ANSEL character set. Microsoft .NET uses 16-bit Unicode. This site is in UTF-8. Care has been taken to ensure that accented characters survived the character set conversions and display correctly.

Jolink site goes live

2004May28

The Jolink site goes live. The site is frames-free, JavaScript-free and entirely standards-based; the site is based on XHTML 1.1 with CSS and uses the PNG image format exclusively.

XHMTL

I have decided to do the right thing from the start. The site validates as XHTML instead of HTML.
I did not choose to merely go with XHTML 1.0 Transitional instead of HTML 4. I decided to go all the way. The site is not just XHTML 1.0 Strict compliant, but XHTML 1.1 compliant. I am keeping an eye on XHTML 2.0, specifically by avoiding the use of <br />.

Microsoft Internet Explorer does not understand "application/xhtml+xml", but the XML prolog sends it into Quirks Mode anyway, and that works fine. Well, it works for static pages served by my web host at least, which is probably sending HTTP headers telling your browser to expect a HTML page before it in fact receives an XHTML 1.1 page.

XML prolog

All pages do include the XML prolog, that is the <?xml version="1.0" encoding="UTF-8"?> bit on the very first line. The XML prolog is mandatory in XHTML; XHTML is an XML application, and without the prolog it just ain't XML. Many web authors leave it out, as Internet Explorer 5 and 6 switch to Quirks Mode upon encountering it. They try hard to keep Internet Explorer in Strict Mode, but IE so easily switches to Quirks Mode that just trying to keep it in Standards Mode as you expand or change your style sheet aint easy. I deciced to go with both full standards-compliance and the certainty that Internet Explorer is in Quirks Mode.

character set

The site does not use any operating-system specific character set, but uses Unicode throughout, specifically UTF-8, as recommended by the W3C. I deliberately err on the safe side by encoding all characters as if I was using UTF-7; This allows editing without having to worry about their treatment due to small but annoying differences between Windows Latin-1 (code page 1252) and Unicode.