How many tweets are possible?

Randall Monroe's latest "What If?" explores the total number of possible English-language tweets:

Based on the rates of correct guesses—and rigorous mathematical analysis—Shannon determined that the information content of typical written English was around 1.0 to 1.2 bits per letter. This means that a good compression algorithm should be able to compress ASCII English text—which is eight bits per letter—to about 1/8th of its original size. Indeed, if you use a good file compressor on a .txt ebook, that’s about what you’ll find.

If a piece of text contains n bits of information, in a sense it means that there are 2n different messages it can convey. There’s a bit of mathematical juggling here (involving, among other things, the length of the message and the concept of unicity distance), but the bottom line is that it suggests there are on the order of about 2140×1.1≈2×1046 meaningfully different English tweets, rather than 10200 or 10800.

Now, how long would it take the world to read them all out?

Reading 2×1046 tweets would take a person nearly 1047 seconds. It’s such a staggeringly large number of tweets that it hardly matters whether it’s one person reading or a billion—they won’t be able to make a meaningful dent in the list in the lifetime of the Earth.

The number of possible 160 character ascii messages is not the # of possible tweets in English. If you constrain it to the actual vocabulary of English including slang and misspelled words, you cut down the possibilities enormously. Still a rough go to read it all, but I suspect a much lower number of messages that are arguably english sentences.

yeah, y’all’s thought was my thought. i was wondering how Monroe was arriving at “meaningful English tweets” using his methodology versus what you and chaircrusher suggest. Monroe’s idea sounds more like Borges’ Library of Babel to me.

Randall used the concept of informational entropy, which is basically how many bits you need in order to fully specify something under a given set of assumptions. Two coin flips has two bits of entropy, but two coin flips where you know at least one is heads has about 1.6 bits. This amount also happens to be the theoretical lower bound on how small you can compress something.

He cited a fairly well-known paper by the father of information theory that says, given the assumption that a string of characters is a grammatical english sentence, each character has an entropy of around 1.1 bits. This is how he accounted for the fact that most ASCII strings are not grammatical english: normally, a random ASCII string will have about 7 bits per character.

You’ve just described a very simplistic, if computationally expensive, pre-computed compression algorithm. Because after all, that’s what you get when you run a compression program: a binary number for which there exists a corresponding piece of data.
You recover that data through a process called decompression. Which in the case of your index isn’t very computationally expensive, but is instead absurdly costly in storage space.

In a similar vein, I’ve had to design a number of small 16×16 pixel icons used for web site favicons, application drop down menus, file lists and the like.

It’s quite an art to make a 16×16 icon that’s colorful, attractive and meaningful that doesn’t look like something out of a broken video card or a child’s stick figure drawing.

How many such icons are possible? Assuming 24 bit pixels (8 bits for each of red, blue and green) gives 2^24 or 16,777,216 colors per pixel. With 256 pixels per icon you get (2^24)^256. Essentially a 256 digit, base 16,777,216 number or 2^6,144 different icons. We shouldn’t run out for a while.

I’ve always wondered what kind of fabulous icon a real artist such as a Picasso, Warhol or Pollack could produce on such a tiny canvas.

Using the Shannon entropy of common English gives a low estimate for this, because all the sentences that are uncommon are added back into the corpus. Shannon measured the ability of humans to guess the next character in a string, but the guesses relied on the fact that, say, “I am reading a book” is more common that “I am reading a boot”. In an “all possible sentences” corpus, both are equally likely; we lose the ability to down-rate the “long tail” of correct but unlikely sentences.

Long tail is fine, as Shannon’s 1.1-1.2 bits per letter is average encoding length, with more frequent letters encoded with fewer bits.

You bring up a great point though, it looks like the encoding is based off of a first order character frequency. Once you bring in word frequency and word transition frequencies (both also discussed by Shannon), the average encoding bit count changes.