Web designers and developers are often heard to pour vitriol in the direction of Internet Explorer. I personally find myself cursing its name perhaps once a week. It's always difficult to believe this is a product Microsoft is still trying to promote.

Contemporary GUI development toolkits require an HTML rendering component: Java has javax.swing.JEditorPane; KDE has KHTML; Gtk has GtkHTML. MFC and .NET have Internet Explorer. In this task it is arguably successful: despite being too heavyweight, buggy, and non-portable, it is at least fast and very simple to embed. The applications which use Internet Explorer tend to use it in restricted circumstances so that the largest class of failings - how it measures up to the wild wild web - are not encountered.

With Firefox 3 right around the corner and a total radio silence on a potential IE8, it's worth considering how much Microsoft has to do to maintain its web browser's competitiveness.

Standards

Internet Explorer is of course lacking in standards support. XHTML and SVG are hugely significant formats entirely unsupported by Internet Explorer. That Internet Explorer supports RSS now is immaterial while nobody knows exactly what RSS is for, but at least it's not a misfeature. New standards are coming thick and fast, faster than Microsoft has been implementing them. The formats Internet Explorer does supports are not fully or correctly integrated with one another and as the collection of web formats continues to flourish this integration problem becomes harder and harder.

For example, Internet Explorer boasts a near-perfect XSL implementation. However, it dumps and reparses the result document. Because it doesn't support XHTML anyway, the resulting tree is interpreted as tagsoup. All of this decouples XML processing from the final DOM and introduces an array of bugs.

It would be hard for a non-programmer to imagine how much functionality this kind of integration entails. It's relatively easy to write or licence code which implements individual standards but integrating all of them to form one coherent whole - as is invariably implied in the specifications themselves - is much more work. The following diagram is admittedly somewhat arbitrary in scope, but may illustrate my point:

Bugs

What Internet Explorer does implement is riddled with bugs. These bugs are easy to fix but there are a huge number of them. Whereas the Mozilla project employs Bugzilla to allow users to report bugs, Internet Explorer relies on a testing team, and this is bound to result in fewer bugs being spotted. CSS is most often cited as an example because the bugs here result in immediately visual results, such the striking failure of Internet Explorer to render the Acid2 test, stemming not only from missing features but the incorrect implementations of many.

However, there are similar bugs throughout the software, documented in varioussources around the Internet, alas none comprehensive enough to be a first port-of-call. My assertion here is that Microsoft has made some improvement between IE6 and IE7, they are unlikely even to be aware of all of the bugs that their software exhibits.

Misfeatures

In many cases Microsoft has deliberately violated the specifications. This is worse than simple omission: Microsoft has started down some paths from which it would be hard to return to a position of standards compliance. Even if they can appreciate that these features are unnecessary and undesirable, they will be disinclined to remove features which they have previously promoted which can cause existing applications and websites to break. This actually makes Microsoft's work harder, because they have to implement more than just standards compliance; they have to support their own misfeatures too, which are certainly not guaranteed to work interoperable with accepted web standards.

Examples include Moniker, which is an unsolicited part of the Microsoft netcode which guesses resource MIME types rather than obeying the HTTP Content-Type header, and conditional comments, under which CSS and HTML comments are not ignored, but interpreted, and certain statements force Internet Explorer to omit sections of markup.

Embedding

A disproportionate amount of Microsoft's development budget for Internet Explorer has to be spent on continued support of ActiveX. ActiveX connects the browser to arbitrary code. This is useful for the GUI toolkit aspect of Internet Explorer but it is detrimental for websites - not only because it is a source of security vulnerability, but because it is not portable so alternatives have to be found. ActiveX is not in very significant use in the wild, except as a route to browser compromise of course. This is both because of Microsoft's own security restrictions (which have neutered it) and other browsers' refusal to support it (which means it's not portable anyway). Meanwhile Javascript has become much more portable and powerful, such that ActiveX is no longer necessary for most rich web applications.

Developers, Developers, Developers

Web developers want standalone and even cross-platform versions of Internet Explorer. This is getting less likely: Microsoft is aiming to make new software compatible only with Windows Vista. Microsoft may be entitled not to want to play catch up in this area, but the approach could start to alienate them from web developers (a demographic skewed towards Linux), designers (whose demographic is skewed towards Mac), and everyone else (skewed towards Windows XP... for the moment).

The internal inconsistencies in Internet Explorer also rule out an extension for developers as good as Firebug, for the foreseeable future. There is an Internet Explorer developer toolbar which looks similar but which is vastly inferior and only serves to highlight Internet Explorer's bugginess.

Failing to cater to developers doesn't overcome the requirement for developers to target Internet Explorer... yet. But it has an effect. Personally, I have converted more than 25 people to Firefox.

Conclusions

Until we can gauge the rate of progress being made towards Internet Explorer 8, it's hard to assess if Microsoft is taking their web browser seriously. Five years of neglect leaves a lot of ground to make up, as described above. But I'd go so far as to predict that Microsoft won't keep up with the competition. They don't have the inclination, and although they have the money it's impossible to instantly convert money into good software (especially when you've got no experience at doing so... ;-). Internet Explorer 7 is really not all that different to Internet Explorer 6.

If the gulf does continue to widen, some sites - at first intranet sites, and sites developed by Firefoxies and non-Windows users - may drop Internet Explorer support to focus on real browsers. At that moment, Internet Explorer's days as a desktop web browser really will be numbered.

Most mistakes web designers make stem from the assumption that the way they are seeing the site in their web browser is the way everyone else sees it. By using uncommon defaults in your web browser, you can ensure that you have left to the default only those aspects of a page which you had intended. I call this "misconfiguring" because it is intentionally configuring a web browser to display pages wrongly. If the page looks correct with such settings it is probable nothing has been left to chance.

It is common practice to test a web site in various web browsers, but this testing can give a false impression of the compatibility regarding different default settings. This is because almost all browsers have settled on a common out-of-the-box set of defaults. The user is allowed to override these defaults. Fail to anticipate this and your website risks visual problems for some users (perhaps 1% or so, to take a wild guess). Problems include:

Character set issues, such as Â£ or the opposite, a broken character symbol (the question mark diamond in Firefox) rather than £.

Specifying a font colour but not a background colour, causing clashes or even invisible text. Perhaps 10% of websites fail to specify a page background colour but assume that it will default to white.

Linking images which are intended to composite onto white. This looks dreadful onto most other colours.

Copy looking illegible due to tiny serif fonts (sans-serif is more legible on low-resolution screens).

Pale-coloured boxouts. What appears pale is in fact a function of the background colour. A light pink box appears pale on a white background, but very bright on a black background, for which the corresponding effect would be a dark red box.

The browser defaults which you may want to change include

Character set

Background colour

Text colour

Font style

Font size, although in theory you should avoid specifying this for accessibility reasons.

Misconfiguring a web browser is an art. It is perfectly acceptable to use the defaults a user specifies, as long as all of them are respected. While you should feel free to wholly b0rk a browser which you use purely for testing, it is more beneficial to misconfigure the web browser that you use for primary development. For me, this is the same web browser that I use for everything else, Firefox. Therefore the misconfiguration has to be something that isn't wholly unusable to me. More importantly, the new defaults have to be ones that I would be unlikely to use for a website, otherwise I could still rely on my defaults.

Because I tend to use sans-serif fonts, white backgrounds and black, grey or blue text, and UTF-8 character set, my browser is set to default to serif, coppery-orange text on a mid-grey background. I'm experimenting with ISO-8859-11 (Thai) as my default character set, because this isn't compatible with UTF-8 nor ISO-8859-15 for the most common problem area: £ and € symbols. It is compatible for ASCII-range symbols, so try UTF-16 if you are aiming for perfect incompatibility. I don't specify font-size but I Ctrl-roll my mousewheel on occasion to watch how the site changes at different font sizes.

So my default browser settings look like this. If I ever see these styles on a page, I know I've failed to specify something.

For most of us, visualising date and time comes very naturally. I'm sure if we surveyed how people visualise date and time there would be some similarity among a plethora of different answers. However, one form stands alone for its ubiquity: a year-planner-style calendar. Twelve grids of numbers, each seven columns wide, four to six rows deep. Click on a number to do something with that date.

Calendars are so simple for me and - statistically - you that it's easy to forget that a server-generated calendar like this is actually not accessible at all. The issue here is linearisation: flattening out the days of the calendar to a script that can be read - aloud, in braille... or by a search engine. A calendar in this form linearises to a script that reads like this:

And so on. It's not useful to read out 365 numbers (366 next year) and expect users to just wait to respond when the day of the year they are looking for comes up.

What is the accessible version? Well, the approach to take is to work out the linearisation we would like first. For example,

The following periods are currently available: the fifteenth of October to the twenty-ninth of November, then the eleventh of December to the seventeenth. Which date are you interested in?

Now, for a web page we might not expect to follow this pattern exactly, but the model is clear: list the calendar information, then query for a date. It looks something like this:

Periods Available

15th October - 29th November

11th - 17th December

What date are you interested in?

This is not bad even in a graphical user agent; describing calendars as schedules is not hard to visualise. Note that it would be perfectly possible to use Javascript to convert this calendar to a visual representation. Whether using Javascript in this way is accessible is open to debate. There is no reason screen readers can't execute Javascript. Some, I believe, already do. But there is a trend for graphical browsers to provide Javascript and for non-graphical browsers not to.

I'd suggest if you are willing to forgo an approach that caters ideally to non-visual users, it's possible to do better. We can cheat the system, almost, by using alt attributes in images and image maps to provide the schedule version as above, but for the images and maps to supply the visual layout. In my latest project, I'm also using line boxes full of images rather than a table. This allows a single <a> tag to span a whole month worth of dates. But what it saves on the semantics, it loses on file size. A cheaper approach might be client-side image maps, especially of the lesser-known form involving <a shape=""> tags rather than <area shape=""> elements, or similar features in SVG.

However, even with the approaches I've explored above, this is one area where I suspect there may not be a very solution that does equally well for visual and non-visual users. 365 dates worth of information is difficult to represent in a concise form. That's why we use calendars.

In a previous post I discussed the method I used to integrate Paypal's
Encrypted Web Payments in generic SSL terms I hoped would make it easy to
implement from scratch in any language. I've had a request from Ross Poulton to
share the Python code that makes it work using the M2Crypto wrapper. So, here
it is:

fromM2CryptoimportBIO,SMIME,X509fromdjango.confimportsettingsclassPaypalOrder(dict):"""Acts as a dictionary which can be encrypted to Paypal's EWP service"""def__init__(self,*args,**kwargs):dict.__init__(self,*args,**kwargs)self['cert_id']=settings.MY_CERT_IDdefset_notify_url(self,notify_url):self['notify_url']=notify_url# snip more wrapper functionsdefplaintext(self):"""The plaintext for the cryptography operation."""s=''forkinself:s+=u'%s=%s\n'%(k,self[k])returns.encode('utf-8')__str__=plaintextdefencrypt(self):"""Return the contents of this order, encrypted to Paypal's certificate and signed using the private key configured in the Django settings."""# Instantiate an SMIME object.s=SMIME.SMIME()# Load signer's key and cert.s.load_key_bio(BIO.openfile(settings.MY_KEYPAIR),BIO.openfile(settings.MY_CERT))# Sign the buffer.p7=s.sign(BIO.MemoryBuffer(self.plaintext()),flags=SMIME.PKCS7_BINARY)# Load target cert to encrypt the signed message to.x509=X509.load_cert_bio(BIO.openfile(settings.PAYPAL_CERT))sk=X509.X509_Stack()sk.push(x509)s.set_x509_stack(sk)# Set cipher: 3-key triple-DES in CBC mode.s.set_cipher(SMIME.Cipher('des_ede3_cbc'))# Create a temporary buffer.tmp=BIO.MemoryBuffer()# Write the signed message into the temporary buffer.p7.write_der(tmp)# Encrypt the temporary buffer.p7=s.encrypt(tmp,flags=SMIME.PKCS7_BINARY)# Output p7 in mail-friendly format.out=BIO.MemoryBuffer()p7.write(out)returnout.read()

The settings required are as follows:

# path to keypair in PEM formatMY_KEYPAIR='keys/keypair.pem'# path to merchant certificateMY_CERT='keys/merchant.crt'# code which Paypal assign to the certificate when you upload itMY_CERT_ID='ASDF12345'# path to Paypal's own certificatePAYPAL_CERT='keys/paypal.crt'

Do you expect web developers to hold qualifications in computer science? By the same account, you should expect search engine optimisation (SEO) specialists to hold a degree in statistics or game theory. Or computer science, in fact.

Ever since I set up Mauve Internet, it has been asserted on the website that SEO is a myth. In recent weeks I have brushed up on my understanding of the realm of SEO so as to defend Mauve Internet's practices. What I have encountered could reasonably be described a religion. Scant evidence is mused over, formulated into doctrine, and memorized by rote. The priests of SEO wield power in the eyes of the faithful, they preach their beliefs to others and they have heated religious debates about which beliefs are important.

Building a site which is genuinely more popular than the competition is the crux of search engine ranks and the responsibility for that lies entirely with the site owner. There are also a wealth of accessibility techniques for removing barriers to spidering, and there are some common sense techniques, like canonicalising URLs so as not to divide the weight of the page. But these are within the remit of the developer, who, if they are any good, will have done them as standard. More importantly, these are done once and for all. These do not yield incremental improvements and they do not need to be continually revised.

I don't believe SEO specialists stick to this territory, although hopefully many now pay attention to it. SEO specialists I have corresponded with carve out a niche where they can remain unchallenged, a territory of keyword density, meta tags, link depth, link penalties and link juice shaping, the application of ill-defined theories which are unproven (in some cases, disproven) and which they can continue to charge for as they tweak in response to the latest webstats.

The assertion that SEO is a racket can be easily substantiated. If website owners could, by invoking SEO voodoo, position themselves arbitrarily highly in the natural listings of search engines, then the search results would be determined by website owners as a function of time and money. The usefulness of the search would quickly degenerate and users would migrate to other search engines who provide better quality results. Therefore, search engines would not make as much money from sponsored links. Search engines like making money from sponsored links, so they won't allow this to happen.

This isn't some abstract scenario I've imagined. It actually happened in the late 1990's to the search engine Altavista. Altavista's search results had become a free-for-all and it haemorraged users, primarily to Google, whose search results were vastly superior and clean of link farms. I watched it happen; in fact I was one of Altavista's users who switched to Google.

The one thing we know for certain about the ranking systems of search engines is that they are extremely complex and closely guarded secrets. They don't have to be scrutable or even produce optimal results: they merely need to produce good results - which implies being hardened against exploitation.

There are several situations in web application programming where it is necessary to schedule events to happen in the future, outside of the request driven model. Some of the most common are these:

Expiring static files from the webserver. Some data can be cleaned up whenever a page is requested. On occasion, though, the application establishes the contract that a file will stay around for a fixed period of time. When access to these files is provided by the webserver (not through the application itself) then the files need to be deleted at a given future moment.

Time-based notifications. For example, if you deal with dates and times in your web application it's sometimes necessary to actually notify users (most often, via email) at a given time. It's clearly not acceptable to wait until someone hits a page (possibly hours or days later) to issue these notifications.

Syndication. Polling data on remote servers has to be done ready for when a user hits a page, because otherwise it can introduce an unacceptable delay while variously contactable remote hosts are queried.

In several of my web applications now I've come to a sticking point when it comes to scheduling events. As far as I know this is always left up to the developer to arrange. Scheduling events is considered outside their remit.

There are a few solutions I know of.

The application can provide a script which the administrator must schedule to be run periodically at install time. Drupal, for example, recommends adding a crontab entry which periodically wgets a script on the web site. In redistributable apps, many users will obliviously skip this step and wonder why the application won't work.

Run scheduled tasks after serving each page. This approach doesn't solve the above problems. In mod_php/perl/python applications this hogs a webserver thread too, which could degrade performance.

There are websites like webcron.org that will fetch a script on your server at intervals. It would be madness to rely on this in your own applications or suggest this as a solution for a redistributable applications, so it's only suitable as a fallback if all else fails.

The application may be able to use to the system scheduler (cron/at on Posix, Windows Scheduler Service on Windows). While it should be possible for a PHP application to enqueue things into the webserver's user's crontab (as long as PHP isn't restricted to "safe mode"), I'm not sure that this is advisable. Most offline applications I know that need to schedule something spawn their own daemon to handle scheduled events, even if it sits idle most of the time.

I can't see why the frameworks shouldn't provide an API for scheduling tasks. This would have the advantage of being simple, integrated and portable, and it could negotiate to use the platform scheduler or fall back to spawning a daemon to dispatch events.

PHP4 is apparently going to be supported only until the end of the year. The idea is to push developers towards PHP5. Matt Mullenweg notes that PHP4 is adequate for a lot of developers, but also claims that PHP5 adoption is poor because PHP5 hasn't been marketed properly to developers. I don't believe this. PHP5 is patently a better language, resolving the single most dreadful language problem that PHP4 exhibits: object copying on reference. Expert developers know this, amateur developers are unaware of the problem and use PHP5 with a kind of religious zealotry.

This migration poses a particular challenge. Rarely does a language change so drastically without offering a simple migration strategy. Vast amounts of legacy code simply don't work on PHP5. mod_php4 and mod_php5 don't run in the same Apache instance so it's not trivial to configure a box to serve some sites with PHP4 and some with PHP5. There is no solution that does not require a lot of sysadmin work setting up proxies, or even virtual machines, and of course, this is not the kind of thing distros do out of the box.

Many PHP programmers aren't even aware of the copying quirks of the old object model and, therefore, the majority of PHP applications will work out of the box, or with very few modifications.

In fact, that entire subsection of the documentation carries the tone that compatibility problems are inconsequential. I think it's telling that the PHP documentation can't produce a complete list of reserved keywords. As the user-submitted comments note, there are at least half a dozen missing from the list.

I've also discovered that PHP4 is simply not available for Ubuntu Feisty. While I can understand that there is a genuine desire to move the PHPosphere forward, it's incredibly dumb to gauge whether people are ready to ditch PHP4 by looking at supported, off-the-shelf web applications, rather than considering the volume of cheap legacy applications. Many people simply need both.

For myself, I'm happy to carry out the migration, but it's annoying that it's been handled so badly that my job in doing so is so very much harder. Hard enough that I've already put it off for years.

I am an XML addict. XML has that simplicity and elegance that programmers crave. XML represents a flow of structured data between applications in a form that is an ideal blend of computer-readability and human readability, and that makes profound sense to a lot of people.

XHTML bottles that for web markup. HTML does not.

I have been using XHTML exclusively since before I started Mauve Internet. The transition was not hard because I had already been working for a long time in the kind of rigid mindset that XHTML mandates. My HTML was not tag soup, and this was instinctive, because I'm a perfectionist and not a pragmatist. There are actually advantages to this anyway; for example it's possible to relocate <P>'s that are explicitly closed anywhere within an HTML document without changing their semantics. This is not possible with implicitly-closed elements, because implicit closing is context-sensitive. Also DOM scripting makes much more sense if the UA's DOM matches the apparent source structure.

Anyway, publishing XHTML requires these steps:

Change the DOCTYPE and add the XHTML namespace.

Get your markup to validate as XHTML. This is simple, because XML is simple.

Get rid of the inelegant commenting you've been using to hide styles and scripts from old browsers. This always made me queasy anyway. So link scripts and stylesheets instead.

Negotiate on the HTTP Accept header (because not having a working website in IE is not usually acceptable). I prefer to procedurally convert XHTML to real HTML rather than use the XHTML compatibility provisions. This requires maybe 30 lines of code in Python but obviously adds a small overhead in extra processing.

Make any scripts XHTML/HTML agnostic. document.write(), the function that largely guarantees your pages won't degrade gracefully without Javascript, must go. document.createElementNS() should now replace document.createElement(), if it exists.

Make any styles XHTML/HTML agnostic. The big catch is the difference in body versus html element semantics.

Serving XHTML gains you XML elegance, an extensive suite of tools, embedding XML from other namespaces, embedding XHTML in other XML, custom extensions (useful for scripting), DOM libraries and easier processing, screen-scraping and so on. You lose very little. There's a few niggles involved in serving it and then Mozilla won't display it incrementally until Firefox 3.

Other than that, and as I've already implied, XHTML codifies the best practice for web page design. Much stuff that was inelegant and hard to maintain in HTML is banned or really inconvenient in XHTML, and this is as a direct consequence of XML being rigidly elegant and hard to shoehorn sloppiness into. You should treat XHTML conformance not as conformance to a different markup language but to a best practice, maintainance-friendly school of thought.

I briefly mentioned Internet Explorer's lack of support. Poor, dear old Internet Explorer, being a shit, as ever, like a bigoted, racist, unintelligible old man whom you'd rather not converse with any longer than you have to and who you secretly hope would just die. IE got left behind with XHTML, or rather it got left behind entirely for five years before it had cottoned onto XHTML. IE has its little crowd of web developers who prioritise it, treating IE's behaviour as the standard rather than... well, the standards. Similarly, there are those people who just don't know or don't care but use software which embeds the IE-powered MFC CHTMLView and therefore targets IE.

Obviously nobody would reasonably hold up the corpus of websites which aren't using XHTML yet, and that small collection of compatibility problems, as evidence that XHTML is dead, in the face of the overwhelming value to its users, would they?