Pages

Wednesday, 28 October 2009

Paul on Liberal Burblings has drawn my attention to a letter written by Californian Governor Arnold Schwarzenegger to the members of the Californian State Assembly refusing to sign an assembly bill. The thing is there is an interesting message if you just read the first letters of the two paragraphs in the letter. They are in order F, u, c, k, Y, o and u. Is Mr Schwarzenegger a fan of acrostics I wonder.

A rather crude calculation by Gary Langer from ABC news puts the odds of this happening by chance at one in 10 billion although he is assuming uniform distribution of letters throughout the language and is simply using (1/26)^7 which of course is a bit simplistic (and is actually around one in 8 billion - he's rounding up by 2 billion).

A slightly more sophisticated way of doing this would be to weight the letters using letter frequency within English. According to this Wikipedia article, the letter frequencies are: f = 2.228%, u = 2.758%, c = 2.782%, k = 0.772%, y = 1.974%, o = 7.507%.

4 comments:

Your calculation gives the probabiltiy of producing the sequence 'fuckyou'. However a new paragraph was inserted between the 'k' and the 'y' to create 'fuck you'. So the probability of having a new paragraph in this position also needs to be taken into account.

I don't know how you would include this in the equation, but it would definitely lower the odds even more.

However, there is another factor. Even if a random letter turned out like that, you might notice it, and not send it for obvious reasons. So the chances of getting such a letter, without it having been constructed, would be even smaller than you have calculated.

You asked here how I got to a trillion to one? Well it is as Dr Jansons says a matter of frequency of initial letters in words within the working vocabulary of the Swearminator. Words starting in all the letters apart from U and K are reasonably numerous, though in a couple of misives whose initial letters I quote in my post I got 4 As, 4 Os, 2 Is, 2 Ys and an E in the vowels. And an assortment of consonants including 4 Cs and 1 F.

With not even the slightest instance of any sensible hidden words. Oh, ha, yo, id, oid (which may appeal to Dr Jansons), ac, ba and ad being your lot as far as I can see.

A factor of just 5 is I suspect a rather low one to allow for frequency of *initial* letters within the Swearminator's vocabulary. So the good doctor may be wrong strictly speaking re order of magnitude.

I'd also stick another potential weighting factor in the mix; that is that long words may be more likely to appear at the start of lines than short ones.

And an observation also. The swearminators other letters in the public domain - there are loads, he/the post holder is prolific - show a significant resistance to "widows and orphans" - i.e. the odd single words that often arise when paras are allowed to break as they naturally do on a particular measure.

In other words someone on the staff is either casting off carefully and tweaking point size to avoid, or redrafting to avoid these - I often do the latter myself on my blog for my preferred browser. A little anal actually. I publish a post, look at it, go back and try to get rid of widows and orphans. Comes from being an ex sub and typographer.