The big news in the Welsh-language IT world this week was Microsoft's announcement of a Welsh-language pack for Windows XP and Office.

Or was it? According to this morning's edition of the most widely-read Welsh national newspaper, the Western Mail, the big news isn't so much that Microsoft are now supporting Welsh, but that they're playing catch-up by supporting Welsh.

Linux, of course, is ahead of the game.

Those of you who read Telsa's diary and a few other sources are probably quite familiar with the shenanigans of various Welsh open-source translation teams. There are now countless pieces of Welsh-language open-source software. Getting one up on Microsoft was less a reason for these efforts than was promoting Welsh in the first place, but somehow our various translation teams have managed to turn what should have been a glorious PR day for Microsoft into something of an own goal.

But the Western Mail piece does imply that it would be futile to try and play down the significance of a Windows XP in Welsh. And they're right: this is unquestionably a great leap forwards for Welsh-language IT, and gives further credibility to Welsh as a language of commerce and business. It's not all good news, though: the current understanding is that a Welsh Windows will always be a Welsh Windows [EDIT: unless you laboriously uninstall the LIP each time you want to switch back to US English], a great marketing move in a bilingual nation. <STRIKE>which means I will probably never buy Welsh Windows - I can't afford two licenses.</STRIKE>

Quite apart from licensing constraints, I don't need to tell Advogators about the dangers of relying on one company to provide you with Welsh-language support in perpetuity, rather than an open system within which literally dozens of people are working on Welsh translations. It's not even clear whether Welsh will make it into Longhorn - though I'd be surprised if Windows' internal language support changes sufficiently by then for it not to.

It remains unlikely, though, that anything would have happened about this had Microsoft not been at least aware of the existence of Welsh Linux. At the first LREC in 1998, I listened to a Microsoft spokesman outline a vague roadmap to make Windows available in more languages. Nowhere, not once, was Welsh mentioned by him. Other European minority languages made an appearance (notably Catalan - well, the conference was in Granada), but Welsh might as well not even have existed.

Then, prodded and buoyed no doubt by EBLUL and the WLB, the rest of the software world started to take notice. Translations started first on a relatively small scale, with a Welsh Opera being a good Christmas present in 2000. Then, things snowballed, until significantly, August 2003's National Eisteddfod saw a demonstration of a Welsh-language OpenOffice running on a Welsh-language KDE/GNOME desktop. Microsoft appear to have been planning a Welsh Windows since... September 2003. Draw your own conclusions.

It's not as if yesterday's announcment has made translators rest on their laurels though. In the very immediate future lies a Welsh Evolution, thanks to yesterday's GNOME announcements. [EDIT: later removed from GNOME 2.6 essentials, but there will be a Welsh Evolution soon.] And as the Welsh open-source community now seems to have a stable structure to accept and create translations for most major packages, who knows what might be in store further ahead?

But for today, though, good morning. And it is a very good morning in Wales.

Thought some here might like to know that University of Wales Bangor are holding an e-Welsh day on Saturday November 30th. This is to celebrate the formation of their new 'e-Welsh: Terminology and Language Engineering' unit. I mention it here only because the day 'will be concentrating especially on what open source software has to offer small languages such as Welsh, with the intention of creating an e-Welsh network of contacts to promote and give direction to this work.'

1030-1400 in Bangor, simultaneous English translation provided. Further details are available.

One, though, had to be mentioned here; a ty penguin beanie. Unfortunately for the present-givers, and hilariously for me, the nametag of the penguin wasn't checked before it was handed over. Which really should have been done...

Finally managed to get some thoughts together on the spam/non-spam issue, a mere fortnight behind pretty much everybody else.

I've focused on the corpus collection side of things, since I worked on the SpeechDat(II) project for a while (the link via the Welsh flag on that page is long down, sorry). I could've written more about lexical model adaptation, but chose not to in the end.

I have this account's passphrase back (it was obvious when I saw it, but then these things always are like that I guess). Thanks to Telsa and to yosh for their help.

I've been wondering about the way that the current group of probabilistic spam-filters, from Vipul's Razor via spamassassin to those inspired by Paul Graham's work, actually collect their spam/non-spam corpuses, and, where appropriate, adapt their n-gram and other lexical analyses. I'm putting that here in order to embarrass myself into writing something about it in the very near future.

A lot's happened in the past month.
My PhD grinds on, very slowly - current deadline for completion is March 31st. I have a Real Job for when I finish that. And I've almost completely neglected Advogato (sorry), but I'm glad I'm not a Journeyer any more.
Later then...