In 1948 when Claude Shannon was inventing information science [pdf] (and, I’d say, information itself), he took as an explanatory example a simple algorithm for predicting the element of a sentence. For example, treating each letter as equiprobable, he came up with sentences such as:

XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD.

If you instead use the average frequency of each letter, you instead come up with sentences that seem more language-like:

IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE.

Then Shannon changes his units from triplets of letters to triplets of words, and gets:

THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.

Pretty good! But still gibberish.

Now jump ahead seventy years and try to figure out which pieces of the following story were written by humans and which were generated by a computer:

In a shocking finding, scientist discovered a herd of unicorns living in a remote, previously unexplored valley, in the Andes Mountains. Even more surprising to the researchers was the fact that the unicorns spoke perfect English.

The scientist named the population, after their distinctive horn, Ovid’s Unicorn. These four-horned, silver-white unicorns were previously unknown to science.

Now, after almost two centuries, the mystery of what sparked this odd phenomenon is finally solved.

Dr. Jorge Pérez, an evolutionary biologist from the University of La Paz, and several companions, were exploring the Andes Mountains when they found a small valley, with no other animals or humans. Pérez noticed that the valley had what appeared to be a natural fountain, surrounded by two peaks of rock and silver snow.

Pérez and the others then ventured further into the valley. “By the time we reached the top of one peak, the water looked blue, with some crystals on top,” said Pérez.

“Pérez and his friends were astonished to see the unicorn herd”Pérez and his friends were astonished to see the unicorn herd. These creatures could be seen from the air without having to move too much to see them – they were so close they could touch their horns.

While examining these bizarre creatures the scientists discovered that the creatures also spoke some fairly regular English. Pérez stated, “We can see, for example, that they have a common ‘language,’ something like a dialect or dialectic.”

Dr. Pérez believes that the unicorns may have originated in Argentina, where the animals were believed to be descendants of a lost race of people who lived there before the arrival of humans in those parts of South America.

While their origins are still unclear, some believe that perhaps the creatures were created when a human and a unicorn met each other in a time before human civilization. According to Pérez, “In South America, such incidents seem to be quite common.”

However, Pérez also pointed out that it is likely that the only way of knowing for sure if unicorns are indeed the descendants of a lost alien race is through DNA. “But they seem to be able to communicate in English quite well, which I believe is a sign of evolution, or at least a change in social organization,” said the scientist.

The answer: The first paragraph was written by a human being. The rest was generated by a machine learning system trained on a huge body of text. You can read about it in a fascinating article (pdf of the research paper) by its creators at OpenAI. (Those creators are: Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever.)

There are two key differences between this approach and Shannon’s.

First, the new approach analyzed a very large body of documents from the Web. It ingested 45 million pages linked in Reddit comments that got more than three upvotes. After removing duplicates and some other cleanup, the data set was reduced to 8 million Web pages. That is a lot of pages. Of course the use of Reddit, or any one site, can bias the dataset. But one of the aims was to compare this new, huge, dataset to the results from existing sets of text-based data. For that reason, the developers also removed Wikipedia pages from the mix since so many existing datasets rely on those pages, which would smudge the comparisons.

(By the way, a quick google search for any page from before December 2018 mentioning both “Jorge Pérez” and “University of La Paz” turned up nothing. “The AI is constructing, not copy-pasting.”The AI is constructing, not copy-pasting.)

The second distinction from Shannon’s method: the developers used machine learning (ML) to create a neural network, rather than relying on a table of frequencies of words in triplet sequences. ML creates a far, far more complex model that can assess the probability of the next word based on the entire context of its prior uses.

The results can be astounding. While the developers freely acknowledge that the examples they feature are somewhat cherry-picked, they say:

When prompted with topics that are highly represented in the data (Brexit, Miley Cyrus, Lord of the Rings, and so on), it seems to be capable of generating reasonable samples about 50% of the time. The opposite is also true: on highly technical or esoteric types of content, the model can perform poorly.

There are obviously things to worry about as this technology advances. For example, fake news could become the Earth’s most abundant resource. For fear of its abuse, its developers are not releasing the full dataset or model weights. Good!

Nevertheless, the possibilities for research are amazing. And, perhaps most important in the longterm, one by one the human capabilities that we take as unique and distinctive are being shown to be replicable without an engine powered by a miracle.

That may be a false conclusion. Human speech does not consist simply of the utterances we make but the complex intentional and social systems in which those utterances are more than just flavored wind. But ML intends nothing and appreciates nothing. “Nothing matters to ML.”Nothing matters to ML. Nevertheless, knowing that sufficient silicon can duplicate the human miracle should shake our confidence in our species’ special place in the order of things.

(FWIW, my personal theology says that when human specialness is taken as conferring special privilege, any blow to it is a good thing. When that specialness is taken as placing special obligations on us, then at its very worst it’s a helpful illusion.)

Robert Epstein argues in Aeon against the dominant assumption that the brain is a computer, that it processes information, stores and retrieves memories, etc. That we assume so comes from what I think of as the informationalizing of everything.

The strongest part of his argument is that computers operate on symbolic information, but brains do not. There is no evidence (that I know of, but I’m no expert. On anything) that the brain decomposes visual images into pixels and those pixels into on-offs in a code that represents colors.

In the second half, Epstein tries to prove that the brain isn’t a computer through some simple experiments, such as drawing a dollar bill from memory and while looking at it. Someone committed to the idea that the brain is a computer would probably just conclude that the brain just isn’t a very good computer. But judge for yourself. There’s more to it than I’m presenting here.

Back to Epstein’s first point…

It is of the essence of information that it is independent of its medium: you can encode it into voltage levels of transistors, magnetized dust on tape, or holes in punch cards, and it’s the same information. Therefore, a representation of a brain’s states in another medium should also be conscious. Epstein doesn’t make the following argument, but I will (and I believe I am cribbing it from someone else but I don’t remember who).

Because information is independent of its medium, we could encode it in dust particles swirling clockwise or counter-clockwise; clockwise is an on, and counter is an off. In fact, imagine there’s a dust cloud somewhere in the universe that has 86 billion motes, the number of neurons in the human brain. Imagine the direction of those motes exactly matches the on-offs of your neurons when you first spied the love of your life across the room. Imagine those spins shift but happen to match how your neural states shifted over the next ten seconds of your life. That dust cloud is thus perfectly representing the informational state of your brain as you fell in love. It is therefore experiencing your feelings and thinking your thoughts.

That by itself is absurd. But perhaps you say it is just hard to imagine. Ok, then let’s change it. Same dust cloud. Same spins. But this time we say that clockwise is an off, and the other is an on. Now that dust cloud no longer represents your brain states. It therefore is both experiencing your thoughts and feeling and is not experiencing them at the same time. Aristotle would tell us that that is logically impossible: a thing cannot simultaneously be something and its opposite.

Anyway…

Toward the end of the article, Epstein gets to a crucial point that I was very glad to see him bring up: Thinking is not a brain activity, but the activity of a body engaged in the world. (He cites Anthony Chemero’s Radical Embodied Cognitive Science (2009) which I have not read. I’d trace it back further to Andy Clark, David Chalmers, Eleanor Rosch, Heidegger…). Reducing it to a brain function, and further stripping the brain of its materiality to focus on its “processing” of “information” is reductive without being clarifying.

I came into this debate many years ago already made skeptical of the most recent claims about the causes of consciousness by having some awareness of the series of failed metaphors we have used over the past couple of thousands of years. Epstein puts this well, citing another book I have not read (and another book I’ve consequently just ordered):

In his book In Our Own Image (2015), the artificial intelligence expert George Zarkadakis describes six different metaphors people have employed over the past 2,000 years to try to explain human intelligence.

In the earliest one, eventually preserved in the Bible, humans were formed from clay or dirt, which an intelligent god then infused with its spirit. That spirit ‘explained’ our intelligence – grammatically, at least.

The invention of hydraulic engineering in the 3rd century BCE led to the popularity of a hydraulic model of human intelligence, the idea that the flow of different fluids in the body – the ‘humours’ – accounted for both our physical and mental functioning. The hydraulic metaphor persisted for more than 1,600 years, handicapping medical practice all the while.

By the 1500s, automata powered by springs and gears had been devised, eventually inspiring leading thinkers such as René Descartes to assert that humans are complex machines. In the 1600s, the British philosopher Thomas Hobbes suggested that thinking arose from small mechanical motions in the brain. By the 1700s, discoveries about electricity and chemistry led to new theories of human intelligence – again, largely metaphorical in nature. In the mid-1800s, inspired by recent advances in communications, the German physicist Hermann von Helmholtz compared the brain to a telegraph.

Maybe this time our tech-based metaphor has happened to get it right. But history says we should assume not. We should be very alert to the disanologies, which Epstein helps us with.

Getting this right, or at least not getting it wrong, matters. The most pressing problem with the informationalizing of thought is not that it applies a metaphor, or even that the metaphor is inapt. Rather it’s that this metaphor leads us to a seriously diminished understanding of what it means to be a living, caring creature.

Noam Chomsky and Barton Gellman were interviewed at the Engaging Big Data conference put on by MIT’s Senseable City Lab on Nov. 15. When Prof. Chomsky was asked what we can do about government surveillance, he reiterated his earlier call for us to understand the NSA surveillance scandal within an historical context that shows that governments always use technology for their own worst purposes. According to my liveblogging (= inaccurate, paraphrased) notes, Prof. Chomsky said:

Governments have been doing this for a century, using the best technology they had. I’m sure Gen. Alexander believes what he’s saying, but if you interviewed the Stasi, they would have said the same thing. Russian archives show that these monstrous thugs were talking very passionately to one another about defending democracy in Eastern Europe from the fascist threat coming from the West. Forty years ago, RAND released Japanese docs about the invasion of China, showing that the Japanese had heavenly intentions. They believed everything they were saying. I believe this is universal. We’d probably find it for Genghis Khan as well. I have yet to find any system of power that thought it was doing the wrong thing. They justify what they’re doing for the noblest of objectives, and they believe it. The CEOs of corporations as well. People find ways of justifying things. That’s why you should be extremely cautious when you hear an appeal to security. It literally carries no information, even in the technical sense: it’s completely predictable and thus carries no info. I don’t doubt that the US security folks believe it, but it is without meaning. The Nazis had their own internal justifications. [Emphasis added, of course.]

I was glad that Barton Gellman — hardly an NSA apologist — called Prof. Chomsky on his lumping of the NSA with the Stasi, for there is simply no comparison between the freedom we have in the US and the thuggish repression omnipresent in East Germany. But I was still bothered, albeit by a much smaller point. I have no serious quarrel with Prof. Chomsky’s points that government incursions on rights are nothing new, and that governments generally (always?) believe they are acting for the best of purposes. I am a little bit hung-up, however, on his equivocating on “information.”

Prof. Chomsky is of course right in his implied definition of information. (He is Noam Chomsky, after all, and knows a little more about the topic than I do.) Modern information is often described as a measure of surprise. A string of 100 alternating ones and zeroes conveys less information than a string of 100 bits that are less predictable, for if you can predict with certainty what the next bit will be, then you don’t learn anything from that bit; it carries no information. Information theory lets us quantify how much information is conveyed by streams of varying predictability.

So, when U.S. security folks say they are spying on us for our own security, are they saying literally nothing? Is that claim without meaning? Only in the technical sense of information. It is, in fact, quite meaningful, even if quite predictable, in the ordinary sense of the term “information.”

Second, I disagree with Prof. Chomsky’s generalization that governments always justify surveillance in the name of security. For example, governments sometimes record traffic (including the movement of identifiable cars through toll stations) with the justification that the information will be used to ease congestion. Tracking the position of mobile phones has been justified as necessary for providing swift EMT responses. Governments require us to fill out detailed reports on our personal finances every year on the grounds that they need to tax us fairly. Our government hires a fleet of people every ten years to visit us where we live in order to compile a census. These are all forms of surveillance, but in none of these cases is security given as the justification. And if you want to say that these other forms don’t count, I suspect it’s because it’s not surveillance done in the name of security…which is my point.

Third, governments rarely cite security as the justification without specifying what the population is being secured against; as Prof. Chomsky agrees, that’s an inherent part of the fear-mongering required to get us to accept being spied upon. So governments proclaim over and over what threatens our security: Spies in our midst? Civil unrest? Traitorous classes of people? Illegal aliens? Muggers and murderers? Terrorists? Thus, the security claim isn’t made on its own. It’s made with specific threats in mind, which makes the claim less predictable — and thus more informational — than Prof. Chomsky says.

So, I disagree with Prof. Chomsky’s argument that a government that justifies spying on the grounds of security is literally saying something without meaning. Even if it were entirely predictable that governments will always respond “Because security” when asked to justify surveillance — and my second point disputes that — we wouldn’t treat the response as meaningless but as requiring a follow-up question. And even if the government just kept repeating the word “Security” in response to all our questions, that very act would carry meaning as well, like a doctor who won’t tell you what a shot is for beyond saying “It’s to keep you healthy.” The lack of meaning in the Information Theory sense doesn’t carry into the realm in which people and their public officials engage in discourse.

Here’s an analogy. Prof. Chomsky’s argument is saying, “When a government justifies creating medical programs for health, what they’re saying is meaningless. They always say that! The Nazis said the same thing when they were sterilizing ‘inferiors,’ and Medieval physicians engaged in barbarous [barber-ous, actually – heyo!] practices in the name of health.” Such reasoning would rule out a discussion of whether current government-sponsored medical programs actually promote health. But that is just the sort of conversation we need to have now about the NSA.

Prof. Chomsky’s repeated appeals to history in this interview covers up exactly what we need to be discussing. Yes, both the NSA and the Stasi claimed security as their justification for spying. But far from that claim being meaningless, it calls for a careful analysis of the claim: the nature and severity of the risk, the most effective tactics to ameliorate that threat, the consequences of those tactics on broader rights and goods — all considerations that comparisons to the Stasi and Genghis Khan obscure. History counts, but not as a way to write off security considerations as meaningless by invoking a technical definition of “information.”

Yesterday I tried to explain my sense that we’re not really suffering from information overload, while of course acknowledging that there is vastly more information out there than anyone could ever hope to master. Then a comment from Alex Richter helped me clarify my thinking.

We certainly do at times feel overwhelmed. But consider why you don’t feel like you’re suffering from information overload about, say, the history of stage costumes, Chinese public health policy, the physics of polymers, or whatever topic you would never have majored in, even though each of these topics contains an information overload. I think there are two reasons those topics don’t stress you.

First, and most obviously, because (ex hypothesis) you don’t care about that topic, you’re not confronted with having to hunt down some piece of information, and that topic’s information is not in your face.

But I think there’s a second reason. We have been taught by our previous media that information is manageable. Give us 23 minutes and we’ll give you the world, as the old radio slogan used to say. Read the daily newspaper — or Time or Newsweek once a week — and now you have read the news. That’s the promise implicit in the old media. But the new medium promises us instead edgeless topics and endless links. We know there is no possibility of consuming “the news,” as if there were such a thing. We know that whatever topic we start with, we won’t be able to stay within its bounds without doing violence to that topic. There is thus no possibility of mastering a field. So, sure, there’s more information than anyone could ever take in, but that relieves us of the expectation that we will master it. You can’t be overwhelmed if whelming is itself impossible.

So, I think our sense of being overwhelmed by information is an artifact of our being in a transitional age, with old expectations for mastery that the new environment gives the lie to.

No, this doesn’t mean that we lose all responsibility for knowing anything. Rather, it means we lose responsibility for knowing everything.

On a podcast today, Mitch Joel asked me something I don’t think anyone else has: Are we experiencing information overload? Everyone else assumes that we are. Including me. I found myself answering no, we are not. There is of course a reasonable and valid reason to say that we are. But I think there’s also an important way in which we are not. So, here goes:

There are more things to see in the world than any one human could ever see. Some of those sights are awe-inspiring. Some are life-changing. Some would bring you peace. Some would spark new ideas. But you are never going to see them all. You can’t. There are too many sights to see. So, are you suffering from Sight Overload?

There are more meals than you could ever eat. Some are sooo delicious, but you can’t live long enough to taste them all. Are you suffering from Taste Overload?

Or, you’re taking a dip in the ocean. The water extends to the horizon. Are you suffering from Water Overload? Or are you just having a nice swim?

That’s where I think we are with information overload. Of course there’s more than we could ever encounter or make sense of. Of course. But it’s not Information Overload any more than the atmosphere is Air Overload.

It only seems that way if you think you can master information, or if you think there is some defined set of information you can and must have, or if you find yourself repeating the mantra of delivering the right information to the right people at the right time, as if there were any such thing.

The ordinary language use of “information” in some ways is the opposite of the technical sense given the term by Claude Shannon — the sense that kicked off the Information Age.

Shannon’s information is a measure of surprise: the more unexpected is the next letter a user lays down in Scrabble, the more information it conveys.

The ordinary language use of the term (well, one of them) is to refer to something you are about to learn or have just learned: “I have some information for you, sir! The British have taken Trenton.” The more surprising the news is, the more important the information is. So, so far ordinary language “information” seems a lot like Shannon’s “information.”

But we use the term primarily to refer to news that’s not all that important to us personally. So, you probably wouldn’t say, “I got some information today: I’m dying.” If you did, you’d be taken as purposefully downplaying its significance, as in a French existentialist drama in which all of life is equally depressing. When we’re waiting to hear about something that really matters to us, we’re more likely to say we’re waiting for news.

Indeed, if the information is too surprising, we don’t call it “information” in ordinary parlance. For example, if you asked someone for your doctor’s address, what you learned you might well refer to as “information.” But if you learned that your doctor’s office is in a dirigible constantly circling the earth, you probably wouldn’t refer to that as information. “I got some information today. My doctor’s office is in a dirigible,” sounds odd. More likely: “You’ll never guess what I found out today: My doctor’s office is in a dirigible! I mean, WTF, dude!” The term “information” is out of place if the information is too surprising.

And in that way the ordinary language use of the term is the opposite of its technical meaning.

I know many others have made this point, but I think it’s worth saying again: We are the medium.

I don’t mean this in the sense that we are the new news media, as when Dan Gillmor talks about “We the Media.” I cherish Dan’s work (read his latest: Mediactive), but I mean “We are the medium” more in McLuhan’s “The medium is the message” sense.

McLuhan was reacting against information science’s view of a medium as that through which a signal (or message) passes.

Information science purposefully abstracted itself from every and any particular medium, aiming at theories that held whether you were talking about tin can telephones or an inter-planetary Web. McLuhan’s pushback was: But the particularities of a medium do count. They affect the message. In fact, the medium is the message!

I mean by “We are the medium” something I think we all understand, although the old way of thinking keeps intruding. “We are the medium” means that, quite literally, we are the ones through whom information, messages, news, ideas, videos, and links of every sort move — and they move through this “channel” because we decide to move them. Someone sends me a link to a funny video. I tweet about it. You see it. You send a Facebook message to your friends. One of them (presumably an ancient) emails it to more friends. The video moves through us. Without us, the transport medium —” the Internet — is a hyperlinked collection of inert bits. We are the medium.

Which makes McLuhan’s aphorism more true than ever. In tweeting about the video, I am also tweeting about myself: “This is the sort of thing I find funny. Don’t I have a great sense of humor? And I was clever enough to find it. And I care enough about you— and about my reputation — to send it out to you.” That’s 51 characters over the the Twitter limit, but it’s clearly embedded in my tweet.

Although there are a thousand ways “We are the medium” is wrong, I think what’s right about it matters:

Because we are the medium, one-way announcements, such as a tweet to thousands of followers, still has a conversational element. We may not be able to tweet back and expect an answer, but we we can pass it around, which is a conversational act.

Because we are the medium, news is no longer mere information. In forwarding the item about the Egyptian protestor or about the Navy dealing well with a gay widower, I am also saying something about myself. That’s why we are those that formerly were known asthe audience: not just because we can engage in acts of journalism without a newspaper behind us, but because in becoming the medium through which news travels, some of us travels with every retweet.

Because we are the medium, fame on the Net is not simply being known by many because your image was transmitted many times. Rather, if you’re famous on the Internet, it’s because we put ourselves on the line by forwarding your image, your video, your idea, your remix. We are the medium that made you famous.

It is easy to slip back into the old paradigm in which there is a human sender, a message, a medium through which it travels, and a human recipient. It’s easy because that’s an accurate abstraction that is sometimes useful. It’s easy because the Internet is also used for traditional communication. But what is distinctive and revolutionary about the Internet is the failure of the old diagram to capture what so often is essential: We are not users of the medium, and we are not outside of the medium listening to its messages. Rather, we are the medium.

Think about cooking as the predigesting of food â€” making it easier for food to be digested. Cooks prepare food in external stomachs. Our brains evolved because we discovered how to cook. Can we look at information that way?

We talk about info overload, but not food overload. Having too much food isn’t a problem so long as we make sure that people have access to the excess. As JP thought trhough the further analogies between info and food, he realized there were three schools of how to prepare food. 1. The extraction school divides and extracts food, and serves them separately. 2. Another ferments food. You put foods together, and something new occurs. 3. Raw food is like the Maker generation of information: I want to fiddle with it myself, and I need to know that it came without additives.

We can think about what we do with information using these three distinctions. Some of us will work with the raw data. Some of us will prefer that others do that for us. Information should learn from food that it needs a sell-by date. E.g., look at how the media use Twitter. Twitter is a different type of food â€” more like raw â€” than you get through the institutional delivery methods.

Should we have an information diet? Would watching a single news outlet be the intellectual equivalent of the Morgan Spurlock “Supersize Me” movie? Maybe information overload is a consumption problem. We need to learn what is good for us, what is poison, what will make us unhealthy…

In his important 1996 book, Using Language, Herbert H. Clark opens Chapter 7 by analyzing two lines of conversation between ” a British academic” and “a prospective student”:

When Arthur says “u:h what modern poets have you been reading -” he doesn’t want Beth merely to understand what he means â€” that he wants to know what modern poets she has been reading. He wants her to take up his question, to answer it, to tell him what modern poets she has been reading. She could refuse even though she has understood. To mean something, you don’t have to achieve uptake, and to understand something, you don’t have to take it up. Still, Beth’s uptake is needed if she and Arthur are to achieve what Arthur has publicly set out for them to do at this point in their interview. p. 191

My first response, and probably yours, is: Well, duh But that’s the point. The fact that Clark has to explicitly state that we ask questions usually in order to get a response is evidence of just how deeply we’ve adopted the information-based paradigm that says that communication consists of the transfer of messages from one head to another. Language is a social tool used by embodied creatures to accomplish complex and emergent projects in a shared world. The transfer of messages is the least of it.