What actually is the longest word (in any language) encoded by the reference human genome? If I had the time and computer power I’d have a look…

Guesstimate – it’ll be somewhere in the 4-5 letter range, depending on letter frequency in the target language.

Bear in mind the rules of this game…the letters are the amino acids specified by codons (three bases of DNA). There are 20 amino acids in most living things, so you can’t spell every word–or you can use alternatives, like using V for U. (Here’s a table.)

Knock yourself out. I do have vague recollections of someone doing something similar a long time ago, when the database was much, much smaller.

I had not heard about anyone trying this before, but it sounds like a lot of fun. I’m a complete novice when it comes to reading genomes with BLAST, so I won’t try. But if anyone wants to post the longest word they can find, let’s see what you get. (Maybe I’ll get my word-guru brother to team up with a geneticist…that would be interesting.)

If you think about it, life on Earth is probably coming up with stray words in its many genomes, which then turn to gibberish (to our eyes), only to produce new words for us to find. The four-billion-year world search, as it were.

Update: Stephen Matheson offers easy step-by-step instructions. Thanks! Without a Z in the genetic code, I can’t make an egotistic search for Zimmer. But here’s Darwin lurking in bacteria.