Demystifying the Turing Test

Alan Turing’s article “Computing Machinery and Intelligence“ (1950), that involuntarily introduced the test for machine intelligence, also marked the birth of a new branch of philosophy: can “thought” be reduced to the execution of an algorithm in the brain?
The British philosopher John Lucas published "Minds, Machines and Goedel" (1959), a "proof" based on Goedel's own incompleteness theorem that machines cannot become more intelligent than us, an argument retold by the British physicist Roger Penrose in his book “The Emperor's New Mind” (1989).
Goedel’s theorem states that every formal system (arithmetic and up) contains a statement that cannot be proven true or false. Indirectly, Goedel's theorem states the preeminence of the human mind over the machine: some mathematical operations are not computable, nonetheless the human mind can treat them (at least to prove that they are not computable). Humans can realize what Goedel’s theorem states, whereas a machine, limited to mathematical reasoning, would never realize what it states. We can intuitively comprehend a truth that the computer can only try (and, in this case, fail) to prove. Therefore no mathematical system can fully express the way Kurt Godel's mind thinks.

One counterargument originated from the philosopher Hilary Putnam: a computer can observe the failure of “another” computer’s formal system, just like a human mind can observe it. A computer can easily prove the proposition “if the theory is consistent, then the proposition that there is at least one undecidable proposition is true”; which is exactly all the human mind is capable of doing.
Goedel’s theorem sets a limit, not to the intelligence of machines, but to the human mind: the human mind will never be capable of building a machine that can think. This does not prove that machines cannot think.
Rudy Rucker, for example, believes that we cannot build a machine that has our mathematical intuition but such a machine can exist. It cannot be built by humans and its functioning cannot be understood by humans, but it could be built by Darwinian evolutionary steps starting from a man-made machine. What Goedel's theorem asserts is that "the human mind is not capable of formulating all of its mathematical intuitions" (quoting Goedel himself).
The British physicist Stephen Hawking notes that the behavior of earthworms can probably be simulated adequately with a computer, because they do not worry about Goedel sentences. Darwinian evolution can generate human intelligence from earthworm intelligence through a process (natural selection) for which Goedel's theorem is also irrelevant. Therefore, Goedel's theorem does not forbid the advent of an intelligent computer.
Finally, Aaron Sloman pointed out that Goedel’s theorem is false in some nonstandard mathematical systems Goedel’s theorem applies to mathematical systems that are consistent (i.e., do not contain a contradiction), but that can only be if the undecidable statement is added to the system, assuming either true or false. Nonstandard models assume that it is false. Goedel’s theorem, because of the way Goedel carried it out (by employing infinite sets of formulas), leaves the illusion of proving a truth which in reality is never proved, cannot be proven and must be arbitrarily decided (“The Emperor's Real Mind“, 1992).

John Searle's article "Minds, Brains, and Programs" (1980) opened another front by using the thought experiment of the "Chinese room" to expose the behaviorist fallacy of Artificial Intelligence: the man locked in the room, a man who knows absolutely nothing of Chinese and may not even know that it is Chinese, in charge of writing down the answers corresponding to the questions posed to him, would appear to be a fluent Chinese speaker to anyone who doesn't see how the man is producing the answers. That man does not know Chinese, no matter how good the answers are, and, by analogy, a computer does not "think", no matter how well it does what it is programmed to do. Paraphrasing Fred Dretske, a computer does not know what it is doing, therefore “that” is not what it is doing.

The simplest counter-argument to Searle's argument is that the man may not “know” Chinese, but the room (i.e., the man plus the rules to speak Chinese) does qualify as a fluent Chinese speaker, as someone who "understands" Chinese.
It is also not clear what we really mean by "understanding". In a sense, Searle simply slowed down and broke down the process of understanding: what we do when we understand something is precisely what the man does in the room. Searle's objection sounds more like: if you can tell what the mechanism is that produces "understanding", then that cannot be true "understanding". Basically, the question is whether the simulation of a mind is itself a mind or not.
Inspired by Edmund Husserl's phenomenology, another US philosopher, Hubert Dreyfus, launched a third line of attack. Dreyfus pointed out that intelligent behavior (notably, comprehending a situation) cannot ignore the context (in which the situation occurs) and the body. The information in the environment is fundamental for a being's intelligence, as is the fact that it needs to be organized as a situation in which the body operates ("What Computers can't do", 1979). But Dreyfus was mainly criticizing the symbolic, knowledge-based school of A.I. that relied on encoding rules
Influenced by the German philosopher Martin Heidegger, Terry Winograd argued that intelligent beings act, don't think: people are "thrown" in the real world. They "think" only when action does not yield the desired result. Only then do they pause to picture the situation in its complexity and decompose it into its constituents, and try to infer action from knowledge.
Similarly, Rodney Brooks argued that intelligence cannot be separated from the body: intelligence is not only a process of the brain, it is embodied in the physical world. Every part of the body is performing an action that contributes to the overall "functioning" of the organism in the environment. Intelligence "is" about moving in a physical world and cannot exist without a physical world ("A Robust Layered Control System for a Mobile Robot", 1986).
The Turing Test was basically about building a machine that can answer questions
in a manner indistinguishable from the manner in which a typical human being answers.

If you ask the questions that make us human, all computer programs fail the Turing Test, and they fail in awkward manners.

Linguists like to talk about the difficulty of understanding ambiguous sentences such as "Prostitutes appeal to Pope" and "Iraqi head seeks arms". But the job of a machine gets even more difficult when common sense is involved. In the sentence "Carl, who died last year, was a great scientist, and his son Dale has fond memories, and he now takes care of the center" it is pretty clear to whom the "he" refers, because one of the two men is dead and therefore he cannot take care of the center (or of anything else). This is not obvious to a machine that doesn't know what "dying" implies.

Ask the machine "The doll will not fit in the box because it is too big: which one is too big, the doll or the box?" If you ask questions like this one, the human being will get them right almost 100% of the time, but the machine will only get them right 50% of the time because it will simply be guessing (like flipping a coin). Ask just two sentences like this, and, most likely, you will know whether you are talking to a machine or to a human being. The machine has no common sense: it doesn't know that, in order to fit inside a box, an object has to be smaller than the box. This is the essence of the Winograd Schema Challenge devised by Hector Levesque, at the University of Toronto in 2011.
A problem with the Turing Test is that it is not clear what it is supposed to "measure": cognition or consciousness? Nowhere does Turing bother to distinguish among them. It is also a little unfair to make it a "yes/no" question: people's intelligence can vary greatly, from Einstein down to some politicians. Intelligence comes in degrees. Animals are intelligent, to some degree. It is debatable whether they are capable of thinking (conscious). A mentally-retarded person may not be intelligent, but she is presumably conscious, to some degree. Consciousness too may come in degrees. Turing does not discriminate and therefore does not tell us what his test is supposed to measure; and his test is a "zero/one" binary test that does not reflect the continuum of intelligence, from idiot to genius, that we experience in our reality.

The Turing Test needs a better definition, starting with the setting itself:
which instruments must be used? Turing’s test uses a human being (let's call him the "observer" or "judge", which is really the instrument of this measurement) to decide whether a machine is as good as a fellow human being (let's call him the "reference"). Thus both the instrument and the reference are humans. He does not provide a prescription for what the observer and the reference must be. Can a mentally retarded person be the reference? Can somebody under the influence of drugs be the reference? Or does it have to be the most intelligent human? The result of the test will obviously vary wildly depending on whom we choose.
As for the judge, Turing's article doesn't specify which type of human he wants to preside the test: a priest, an attorney, an Australian aborigine, an avid reader of pornographic magazines, a librarian, a mathematician, an economist, his close friends, a gullible person, a skeptic person...? Clearly, the kind of questions asked by the "judge" depend on who and what she is.

The judge has to determine whether the answers to her questions come from a human or a machine. If the judge cannot tell the difference, and mistakes the machine for the human, then the machine passes the test. But what conclusions should we draw about the human who failed the test. In other words, if a machine fails the test, then the judge may concludes that it is not intelligent: but what is the judge entitled to conclude if a human fails the test? That humans are not intelligent?

Depending on your definition, computers and robots may already be "cognitive" systems: they are capable (to some degree) of remembering, learning and reasoning.
But they can usually do it only in a narrow domain.

As the US computer scientist Stuart Russell remarked, Turing's definition is at the same time too weak and too strong. Too weak because it does not include “intelligent” behavior such as "dodging bullets" and too strong because it does include unintelligent beings such as Searle's Chinese-room translator. Most children, who cannot answer a lot of questions that an adult could answer, would not pass the test, but that does not make them machines.

Philosophers can split the hair as much as they like, but the Turing test simply measures how good the siftware is at answering questions, and nothing more.
Answering questions is not a sign of "thinking" just like washing dishes is not, although both are among the many operations that thinking beings can do.

We don't normally ask if dead people think, or if furniture thinks: we assume that being alive is a precondition to thinking. Before we ask whether machines can think, we should therefore ask whether they can be alive.