The machine's operation is defined by a finite set of transitions. A transition is specified by four items: a state, a symbol, another state, and an action.
The transition is applicable whenever the machine is in the first state, and the
head is on the specified symbol. It is applied by moving into the second state and performing the action. The action is either a move to the left, or a move to the right, or a symbol (to be written on the tape, overwriting whatever symbol was there).

That's it. Anything that behaves according to these specifications is a Turing machine. The transitions completely define the machine's operation. Different Turing machines differ by having different transition tables. The set of transitions must be finite, implicitly restricting the set of internal states and the set of different symbols the machine can read or write to be finite as well.

If no two transitions share the same first state,symbol pair, which ensures that at most one transition is applicable at any point in the computation, the Turing machine is said to be deterministic.

A computation of a Turing machine is a (finite) sequence of valid applications of transitions. Clearly, the set of possible computations depends on the initial tape content. The Turing machine may end up in a state, above a symbol, for which no applicable transitions exist. The computation is finished: the Turing machine halts. It is also possible for a Turing machine never to halt.

For example, consider the Turing machine without any transitions. This machine, regardless of the tape contents, will always halt immediately. Now consider the machine with one transition: in state s, on symbol a, it will go to state s and write symbol a. This Turing machine will always halt whenever it is started with its tape head on a symbol other than a; but when started on an a, it will never halt.

The next section is going to be tricky: it deals with a pet peeve of mine. I know it's a pet peeve because whenever I bring it up people will disagree with me, refuse to understand the point, or refuse to understand its relevance. So I'm interested in your views on the matter.

Note that a Turing machine is a simple device that could actually be built: its mathematical model is realistic. Clearly, a real-life Turing machine could malfunction or break down, but it is often argued that Turing machines are inherently unrealistic because they contain an infinite tape. This is false.

It is false for two reasons:

the tape isn't part of the machine

the tape doesn't need to be infinite

As you can see in the definition above, the Turing machine itself consists of a tape head, some way of holding a state, and a mechanism to impose a set of transition rules. It assumes the presence of a tape on which it can move its head and on which it can read and write its symbols, but the tape is not actually part of the machine. It is neither part of it mathematically (there is no mention of the tape in the mathematical definition of a TM) nor conceptually.
Sometimes it is argued that a Turing machine models computers; certainly, with computers, we consider internal memory and disk space part of the machine, so analogously, the tape should be considered part of a Turing machine. But a Turing machine doesn't model computers, it models the activity of computing. When we think of that activity, we do not consider it to be restricted to a fixed amount of working memory. When we think of a person conducting computations, we do not assume the person is operating with a given, fixed amount of resources such as working memory or scrap paper. What is more, we do not consider scrap paper to be part of the person. Likewise, a Turing machine does not contain the tape it uses.

From the definition above you can also see that the computation of a Turing machine will never use more than a finite amount of tape. To be exact, after n steps it can never have strayed further than n positions from where it started. It has never looked anywhere else on the tape. So the tape doesn't need to be infinite: in actual practice, it can be extended whenever the tape head reaches an end.

However, while we never actually need an infinite amount of tape, it is fundamental to computing with Turing machines that we can supply arbitrary amounts of tape as the need arises. For some Turing machine configurations, it is possible to calculate a maximum stretch of tape in advance such that the machine's operation will never run off of it, no matter what the initial tape contents are; but this is not true in general. As a matter of fact, some computational tasks can only be implemented by Turing machines with the property that, no matter how much tape you provide initially, even if you only decide on the amount after looking at the initial tape contents, the machine may still turn out to run off the end of it when put to work. There is a hard mathematical proof of this fact. So it is fundamental to the computing power of Turing machines that they can write on an arbitrarily large amount of tape; that we cannot even predict how much tape will be needed. This is what books and articles about Turing machines (such as the writeups below) really mean to say when they (again, incorrectly, in my opinion) write that Turing machines "have" an "infinite" amount of tape.

Many things can be said about the details of computing with Turing machines. This leads to general conclusions about the power of computing in general (computability and decidability), but we can also study certain classes of Turing machines, restricted in the way they use their tape, and the expressive power that results from these restrictions: the restrictions on which most research has focused are summarized in the Chomsky hierarchy.

A Turing machine is capable of performing any mechanical sequence of steps (any algorithm) but I do not believe it is fair to say that they can do everything that a general purpose computer can do.

By definition, you cannot examine a Turing machine's tape until the machine has halted. However, an operating system (OS) on a computer can run indefinitely but that doesn't stop us seeing it's state. In fact, it would be impossible to use an OS without being able to examine it's state while it is running.

Another difference that computers have over Turing machines is user input. Here, I am using user to mean any agent external to the processor. It can be a human user or a piece of hardware. A program sometimes has to halt and wait for input, which may never come. In this situation, a computer can timeout but a Turing machine has no concept of time.

I do not dispute that the core of a computer can be represented by a Turing machine. Any sequence of statements that run uninterrupted can be performed by a Turing machine but I think that a computer is more than simply a machine to step through mechanical procedures. It allows interaction and alteration of those procedures as they run.

Why are Turing Machines important?

It has been mentioned that Turing Machines only emulate one aspect of a computer. So why are they so important to the field of computer science?
The answer is that Turing Machines are the simplest construct that is able to implement algorithms. Obviously algorithms can include certain things which we don't think of as being able to be written on a tape solved symbolically. Yet, if there exists a problem of any type that is difficult to solve, and if that problem can be solved by an algorithm (ie. a sequence of finite steps), then a Turing Machine can represent the difficult part.

The practical significance is that all programming languages are capable of solving the same set of a problems. A Turing Machine is simply easier to theorize about because it has the simplest possible definition. By proving that a problem is Turing Decidable, Turing Recognizable or neither, we prove things about the limitations of computers. Proving things directly about a computer or programming language would be infinitely harder and not any more useful.

What about unusual electronic constructs like neural nets?

To the best of my knowledge, Neural Networks are still built entirely out of hardware that is predictable. Anything that's behaviour can be predicted (even if only on a theoretical level) can be represented as a Turing Machine.

Why are Turing Machines important? - Part 2

An important point that many of the above writers make is that if that problem can be solved by an algorithm (ie. a sequence of finite steps), then a Turing Machine can solve the problem. However, it is also very important to note that the opposite is also true. That is: Any problem of computation that CANNOT be solved by a Turing machine (this includes nondeterministic Turing machines as well as the simpler deterministic systems described above) CANNOT be solved by any known computer. The most famous example of an unsolvable problem is the Halting Problem.

This is the main reason that Turing Machines are so important to the field of Computer Science; They define the ultimate criteria of whether or not a problem is solvable in the general sense.

Of course, the ability to solve a problem is this case assumes alot of things (an arbitrarily large amount of time, a nondeterministic automa, etc). Therefore, it is important to note that while a problem may be solvable in the Turing Machine sense, the reality may be that the problem will not be solvable in any useful manner.

"I should like to eliminate once and for all the questions regarding the foundations of mathematics, in the form in which they are now posed, by turning every mathematical proposition into a formula that can be concretely exhibited and strictly derived, thus recasting mathematical definitions and inferences in such a way that they are unshakable and yet provide an adequate picture of the whole science." - David Hilbert

Description

A Turing Machine (TM) is an imaginary machine, capable of performing computations.
It is generally used as a theoretical tool to prove points about computational problems.
More on that later. One can think of the Turing Machine as a computer.
It is neither a very fast nor a very efficient computer, but, because of its simplicity, it is a practical tool for the analysis of more complex, real life computers and computing problems.
Many real-life implementations of Turing Machines do exist an can be found on-line as Java applets.

A general description of the architecture of a TM is as follows: the TM is defined as a machine with an infinitely long tape and a read/write head.
The tape is divided into squares or regions of equal size, that can contain exactly one symbol (possibly blank).
The choice of symbols is up to whomever defines the TM.
For example, one might use the Latin alphabet or perhaps the numbers 0 and 1.
The read/write head is always on top of one of the squares.
It can move at most one square to the left or to the right.
Aside from these physical properties, the machine always has an internal 'state', which usually has a number (q0, q1, q2, etc..).
At any time, the TM is in one specific state.
The use of this architecture and the state concept will become apparent later on.
For now, it is important that there is one 'halt' state that will stop the machine.

The machine is only capable of performing one elementary operation on the tape.
This operation consists of four steps:

Read the symbol on the tape under the head.

Based on the symbol read and the current state, overwrite the square with a new symbol (possibly the same symbol).

Based on the symbol read and the current state, go to a new state (possibly the same state).

Based on the symbol read and the current state, go one square left or right.
The 'current' state here being the state at the start of the operation, not the 'new' state.

Later on, we will see how it is possible to compute things with the Turing Machine.
For now, it is of the utmost importance to understand that the TM is not complete without a set of transition rules.
If you have read the above description of the elementary operation carefully, you will have noticed that the new symbol, the new state and the new position are all dependent on the current state and the current symbol.
This means that the behaviour of the TM is governed by a set of rules.
These rules define a new state, direction and symbol for all possible current state/symbol combinations.
If we call the state-space Q, and the alphabet used Σ, these rules are in essence functions from (Q,Σ) to (Q,Σ,D), where D is a direction: <- or ->).
As a final note, input into a TM is performed by pre-writing the tape with some symbols, before starting the machine. The output is what is left on the tape when halt is reached.

Intuitively, the list of transition functions is the 'program' and the Turing Machine is the computer.
For historical reasons, which we will see later on, the rules are however considered part of the machine itself.
Not a separate program.
So, infinitesimally many Turing Machines exist and the tape and read/write head alone are not a complete Turing Machine.
There is another good reason for not calling the rule-set a program.
We will also see that later on.

Historical background

In the early twentieth century, mathematicians had an assumption about the power of mathematics that is still persistent amongst non-mathematicians today.
Mathematicians basically assumed that, given unlimited time and mathematical skills, it would be possible to solve any problem.
In other words, theorists assumed that if an assertion was true, then it would always be possible to prove that it was true.
To give an impression of how deeply engraved this belief was, the famous mathematician David Hilbert once said:

"If mathematical thinking is defective, where are we to find truth and certitude?"

This persistent idea eventually gave rise to two very important, yet simple questions:

Is there a sequence of very simple, formal steps (i.e. a 'method') to prove any true assertion in finite time?

Is every true assertion indeed provable?

These questions are a slight restatement of two questions formulated by Hilbert. They are known as the Hilbert Programme. Since it did not even occur to Hilbert at that time that either of these questions might be unanswerable, he thought of them as the last two formal bumps between mankind and a perfect mathematical system. Too bad..

The first of these questions, known as the Entscheidungsproblem (Decision Problem), is interesting, because it would allow us to build a machine that can systematically check all assertions and return a 'true' or a 'false'.
In essence, any mathematical theorem could be put into this machine and it could then be proven.
Fermat's Last Theorem for example.
It is obvious that such a method would have propelled us into an everlasting Golden Age of mathematics.
The second of these questions was considered by Hilbert and his peers to be an annoying little formality.
Everybody knew this had to be true, but a nice proof had just been evasive until then.

In 1931, Kurt Gödel struck a terrible blow to the mathematical community: he found a proof alright.
He proved that Hilbert's assumptions were wrong!
Gödel first proved that there are truths in mathematics (Theorems) that cannot be proved.
More horrifying was the fact that he proved that this problem could not be solved, because it was not a flaw of our mathematical system.
It was a fundamental flaw of any imaginable mathematical system.
Gödel showed that a mathematical system, any system, is either incorrect or incomplete.
This means that, if you create a mathematical system, there are either truths that cannot be proved within that system or there are things that are not true that can, incorrectly, be proved.
Gödel's Theorem deserves a node in itself.
As an overly bold simplification, it has to do with the fact that one can or cannot prove that "this statement is false".

Before Gödel delivered his striking blow, however, mathematicians looked at the first of these two questions.
This question can be rephrased as: "Is there a 'method' to prove things, that will always finish?"
For the word method, one can informally substitute 'recipe' or 'algorithm'.
In order to find such a method, it was necessary to define the concept of a method or an algorithm in a mathematical sense.
Mathematicians needed a mathematical model of the act of systematic problem solving.
A difficult philosophical question indeed: how do you make a model of a mathematician?

The model of the Mathematician

The young and talented British mathematician Alan Turing had a thorough understanding of both the abstract and the real world.
His solution to the question posed above was not the only one ever devised, but is is generally considered to be the most elegant one.
It still has many useful applications in theoretical computer science today.

Turing literally made a model out of a working mathematician, sitting behind a desk, contemplating, writing down things on a piece of paper, looking up facts in notes, correcting earlier notes, etc.
Turing stipulated that the first of Hilbert's questions actually was:
"Can a person with an infinite amount of paper, erasers and time, prove any theorem?"
He then systematically removed all irrelevant details from this picture.
First, he figured that the fact that a piece of paper may contain more than one symbol is irrelevant.
If necessary, one could use a great amount of separate sheets of paper for each symbol.

Then, Turing figured that a mathematician drew conclusions from the symbols he or she had jotted down, in combination with some idea in his or her head (the 'state') and, on the basis of his/her mental state and the symbols on the paper containing earlier results, the mathematician could write down new results or alter old ones.
The mathematician should be able to go backwards or forwards in to his/her notes if necessary, and consult or alter them, based on the mathematician's state of mind.

To form a more concrete picture of this, consider a mathematician, yourself for instance, adding up two large numbers on paper.
Usually, you would start on the right side of both the numbers.
You first read the symbol on the right of one number an memorise that (you enter a state describing that number) and move on to read the rightmost digit of the other number.
Then, based on the digit you are remembering and the digit you are currently looking at (current state and symbol), you draw a conclusion as to what digit should be written under the dotted line.
If the numbers add up to something higher than nine, you should 'remember' to add one when you add the next two digits.
This means that you should be in the 'remember one' state before you proceed to add the rest of the digits.
Your mode of operation for the remaining digits will have to be different when you are in this 'remember one' state.

At this point, it should be obvious that the tape machine described in the first section is exactly the same as the mathematician sitting behind a desk. The 'current' square is the desk, the tape is the infinite supply of paper, the read/write head is the mathematician and the 'state' is the mathematician's mental state.

The Turing Machine: a universal model of computation?

An obvious question to ask is wether this model of computation, the Turing Machine, is actually a universal model.
That is: can the Turing Machine be used to model all methods or, equivalently, is there a model that can solve more problems than the Turing Machine?
That question is very difficult to anwer.
But, as stated earlier, several other models were devised for the concept of a 'method' or algorithm.
These methods are, roughly, Lambda Calculus (by Alonzo Church), Partial Recursive Functions (Kurt Gödel and Stephen Kleene) and General Recursive Functions (Gödel, Kleene and Jacques Herbrand).
These four methods have all been proven to be equivalent.
Anything that can be proven with one of them can be proven with any of them.
This at least suggests strongly that they are indeed universal models of computation. Further, these methods are all self descriptive. One can make a Turing Machine on a Turing Machine. This property is called Turing Completeness. It is something any programming language should be tested for.

Since anyone with imagination can invent a model of computation and a concept of what computation is, this matter is by definition impossible to resolve.
One interesting note is the fact that, of these four methods, the Turing Machine is by far the most elegant model, because it is very comprehensible and little mathematical skill is required to understand it.
(Challenge to other noders: Try writing a node as comprehensible as this one about Lambda Calculus!)

Type classification

There are different types of Turing Machines and different grounds on which to classify them. Here is a summary:

Normal vs. Universal Turing Machines:
As stated earlier, the transition rules are considered part of the machine itself.
That means that one Turing Machine is only capable of solving one problem (for instance, finding the prime numbers between 1 and 100).
Such a Turing Machine is called a Normal Turing Machine.
Early on, Turing wondered if it would be possible to construct a set of rules (that is, one Turing Machine) that could perform a function that was specified on the tape itself, instead of in the predefined functions.
In essence, one would encode a program on the tape, next to the input data, and this special Turing Machine would read the program and perform it.
This is indeed possible.
This type of Turing Machine is called a Universal Turing Machine, since it can perform all sorts of functions.
It is more like a stored program computer than the normal Turing Machine.
The existence of this type of Turing Machine is a good reason not to call the state transition functions a 'program'.
Note that this idea was very revolutionary, since ordinary computers did not exist in the 1930's.
Modern computers are, roughly, just sophisticated Universal Turing Machines.
This property makes Turing Machines so valuable for the general analysis of computing problems.

Single vs. Multiband Turing Machines:
Another distiction to make is one between Turing Machines with one tape and one head and a Turing Machine with more tapes and heads.
The single most interesting question here is: is a multiband Turing Machine able to compute more things than a single band machine?
The answer is, of course, no.
Otherwise, we would have proven that the Turing Machine is not a universal model of computation.
In fact, it has been proven that any multiband Turing Machine can be simulated on a single band Turing Machine with at most a quadratic increase in time complexity.
This is a great exercise for the reader!

Deterministic vs. Non-deterministic Turing Machines
A TM with exactly one transition rule for every given symbol/state combination is said to be a Deterministic Turing Machine.
The one that we will look at in the example below is such a machine.
When we use the TM as a theorerical tool, it is often practical to define TM's with more than one rule per symbol/state pair when modelling computations.
Imagine, for instance, a Turing Machine that explores all possible cycles in an undirected graph.
Such a machine might start in one vertex and decide to go either left or right.
To find all cycles, the TM needs to do both.
You can think of this as having two duplicate machines running off in each direction to find a cycle.
It has been proven that every Nondeterministic Turing Machine can also be modelled using a Deterministic Turing Machine.
NDTM's are only useful as a philosophical tool.

Scientific applications

Since nobody actually uses Turing Machines to compute things, what are their practical purposes?
We already saw that Turing Machines are capable of performing the same functions as modern processors.
The advantage of the Turing Machine is that it is so simple to understand and that all steps take the same amount of time.
This property makes the Turing Machine an excellent tool for calculating the time complexity of algorithms.
(For a detailed description of time complexity, I gladly refer to my writeup there)

Aside from time complexity calculations, Turing Machines have one other important application.
Ever since Gödel shot down the idea that everything can be proved with a machine, people started to wonder what can and cannot be computed with a machine.
The Turing Machine is a theoretical aid in the determination of the type of problems that can be solved on computers, regardless of their architecture.
This is incredibly important to computer scientists, because it allows them to analyse beforehand if a given problem is at all solvable or not.
Needless to say, that is an interesting thing to check before one begins to write a program to solve such a problem.

Example of a Turing Machine

To demonstrate and illustrate the concept of the Turing Machine, we will look at an example.
Hopefully, you will see what a simple and elegant mechanism it is.
We have already seen that a TM is defined by a set of transition functions from a state/symbol pair to a new state/symbol pair and a direction in which to move.
Using only these types of functions, we will define a Turing Machine that can increment any binary number by one and then halt.
The number that is to be incremented will be on the tape prior to the start of the machine and the head will be on the leftmost digit.
At the time the system halts, the head will be on the leftmost bit of the result, and the rest of the tape will be empty.

Strategy outline:
We will first direct the head to the rightmost bit, by scanning until we find a space and then going one square to the left.
Then, we will add one to the rightmost bit. If we have a carry bit, we will remember that and move on to the next digit to the left.
To that, we will add one in case we have a carry bit or we will leave it if there is no carry bit.
If there was already a one on the new position and we do have a carry, we will set it to zero and remember a new carry when processing the next digit. The details will be specified below.

Notation:
We will use the following notation for transition rules: (qn,s)->(qm,t,d). Here, qn is the current state, s is the current symbol, qm is the new state, t is the new symbol and d is a direction.
The direction is either L, or R (Left or Right) or - (no move). The symbols will be 0, 1 or _ for space.

The Turing Machine:

(q0,0)->(q0,0,R) //q0 is the 'searching' state. If we read a 0 keep searching..

(q0,1)->(q0,1,R) //If we read a 1, keep searching as well..

(q0,_)->(q1,_,L) //If we find the first space, go left and enter the 'adding state' q1

(q1,0)->(q2,1,L) //If we find a zero while adding, add one and go to the 'done' state q2

(q1,1)->(q1,0,L) //If we find a 1, make it 0 and keep on adding to the left..

(q1,_)->(halt,1,-) //If we find a space while adding, make it 1 and halt.

(q2,0)->(q2,0,L) //If we are 'done' we keep looking for the leftmost bit.

(q2,1)->(q2,1,L) //Same here..

(q2,_)->(halt,_,R) //If we are 'done' and find a space on the left, we move right and halt.

..and that, as they say, is it!

Additional information and suggested reading

On Gödel's Incompleteness Theorem
There are books abound describing the proof of Gödel's incompleteness theorem. The proof in itself is not very difficult, but formal logic requires very strict notation. This makes most literature a difficult read. A very informal and famous treatment is given in Gödel, Escher, Bach: An Eternal Golden Braid, by Douglas R. Hofstadter. The author introduces the concepts of Godel's Theorem by comparing them to the 'impossible' graphics of M.C. Escher and J.S. Bach's mathematically structured compositions.

Turing Machine implementations
Turing Machines have been implemented in many remarkable ways (usually making finite assumptions for the infinite tape). Java implementations are abound on the web. A particularly interesting implementation is the one within Conway's Game of Life. More about this can be found here.

A note to the purists
Gödel did not actually prove that any mathematical system is either incorrect or incomplete. Obviously, this node is not the place for a complete coverage, but to be somewhat more exact he proved that any Theory at least as complex as the theory of Peano arithmetics cannot be fully described by a finite number of axioms. Peano arithmetics is more commonly known as the theory of natural numbers. As you can imagine, almost all branches of mathematics are at least as complex as number theory. Understand that the above exactly means: given any finite number of axioms, there are truths that cannot be derived from this set of axioms, meaning that these truths are themselves axioms. An example of a theory that is both correct and complete is that of Propositional Logic. Fitch's system of Natural Deduction and the Tree Method can be used to prove all theorems within this system. In turn, these methods can be proven to be both correct and complete. The Prolog programming language is in effect a proof checker for theorems in Predicate Logic; an even more complex Theory.