Monday, February 27, 2006

There has been quite a bit of reporting about Counterfactual Computation in the popular press. This work is essentially a realisation of a technique first published in this paper. I think it's worth looking at the process a little more clearly. Let's see if I can first cram a course in quantum mechanaics and quantum computing into a couple of paragraphs. (Physicists like to obfuscate these subjects but the principles are very simple. But I'm not sure if they're simple enough for two paragraphs however...)

Firstly, let's review classical computing. In a classical computer you have a machine in a state, call it s. You let a bit of time pass and it ends up in a new state, call it t. We can write this using the notation

|s> => |t>

That's basically all there is to know about computing, the rest is details. Let's look at an example. Suppose we have a computer with two one bit registers. The first is an on-off switch and the second will contain the result of the computation. For technical reasons we'll make it so that the answer doesn't just go into the second register but is exclusived-ored with it. We initialise the second register to zero so it's effectively the same thing as storing the result there. Write the state as |s,x> where s is the state of the switch and x is the second register. Suppose the result of the computation is r - this is the thing we are actually interested in. Then we have

|s,x> => |s,x^(s&r)>

In other words, this is saying that we only exclusive-or x with r if s is 1. (I used '^' to mean boolean 'exclusive-or' and '&' to mean 'and'.) For example

|0,0> => |0,0> and |1,0> => |1,r>.

Obviously s must be 1 in order to get a useful value of x.

Now let's move onto quantum computing. Now we no longer have a discrete set of states but instead a vector space. The time evolution of a quantum computer is given by a unitary linear operator. If the states evolves to state t we write this as

|s> => |t>

where we now interpret |s> and |t> as vectors in a normed vector space. The first important fact to know about quantum computers are that if |a> and |b> are possible states then so is any complex linear combination of these states. The second (fact(2)) is that if system A has states in a vector space V(A), andsystem B has states given by vector space V(B), then the combined system is given by V(A) (x) V(B), where (x) is the usual tensor product over C. The third important fact (fact (3)) is about the interpretation of the state |a>+|b>. This makes no sense classically, but in QM linear combinations (called superpositions) are allowed. The rule is that if you observe this state to see whether it's in state |a> or state |b> then it 'collapses' into state |a> or |b>. The probability of state |s> collapsing into state |a> is given by |<a|s>|^2/|<a|a>||<s|s>>| where <x|y> is fancy notation for the inner product of |x> and |y>. (There's another rule of QM, if |a> and |b> are distinct possible observable outcomes of an experiment, then <a|b>=0.)

We now move our classical computer into Quantumland by considering the states |s,x> to be the basis elements of a vector space. Using fact (2) above we can consider these to be |s> (*) |x>. The rule

|s,x> => |s,x^(s&r)> (rule A)

defines what happens to each basis element and hence defines a linear map. For basis elements it acts just like a quantum system. But with a quantum computer we have available all kinds of interesting linear operators that make no sense for a classical computer. In particular, given a register we can apply this operator to it:

|0> => a*(|0>+|1>) |1> => a*(|0>-|1>)

where a = 1/sqrt(2). This is linear and unitary. Notice also that if we do it twice we get back to where we started from. To make things easy we'll drop the constant a and work with linear operators that are unitary up to multiplication by a constant. The state |0>+|1> is a kind of weird mixture of on and off-ness.

So the trick to counterfactual computation is that instead of switching on the computer you instead 'half' switch it on using the linear operator above. Note that if you 'half' switch it on twice it just ends up being off again. But instead of 'half' switching it on twice in a row, we half switch it on, allow the quantum computer to evolve by rules (A) and then half switch it on again. Let's work through the details. First note that we are only 'half' switching on the first register. This means the linear operator applies only to the first factor in the tensor product |s>(x)|x>. So we get:

|0,0> => |0,0> + |1,0>

Now consider the result of this followed by allowing the computer to evolve by (A)

|0,0> => |0,0> + |1,r>.

And now follow this by another 'half' switch on:

|0,0> => |0,0> + |1,0> - |1,r> + |0,r>.

If r = 0 then:

|0,0> => |0,0> (modulo a constant multiple)

In other words, the state is unchanged.

If r = 1 then:

|0,0> => |0,0>+|1,0>+|0,1>-|1,1>

We no longer have the cancellation. Following fact (3), because of the |0,1> term above we have a non-zero probability of finding that the register x contains the answer r and yet the on-off switch is off.

And that's all there is to it. With a bit more cunning you can actually get the probability of finding r as high as you like. There are some great potential applications here. We have effectively left x untouched and this translates into practical 'interaction-free' experiments that allow us to make certain kinds of measurement leaving the measure system unchanged. (You may see why I used 'exclusive-or' above - just copying the result into the second register wouldn't have given me a unitary operator.)

BUT, and this is a big but, I see no valid way to interpret this as 'counterfactual computation'. In other words, I completely disagree with the interpretation of this result. In particular, suppose r equals zero. Then the reason we end up with the result |0,0> is that we have had a fortuitious cancellation. But we have actually passed the register |0>+|1> through the system. We only get |0,0> at the end because our second 'half' switching operation has made the effects of this 'destructively' interfere. In other words, it's just like we put on noise-cancelling headphones and then claim that there was no sound at all. That's silly, there was lots of sound, we just arranged for it to cancel at our ears. In our quantum computer both |0> and |1> waves passed through the system, but the second 'half' switch operation made them cancel. In the r=1 case you can thing of the shapes of these waves being messed up so they no longer cancel. (By the way, the talk of waves is quite fair here, you can think of the elements of these state vector spaces in fact being wavefunctions and the linear operators being time evolution of differential equations remarkably similar to the wave equation.)

In other words, I'm calling these physicists' bluff. There is nothing counterfactual about this computation at all. Though we might be able to get the result of our computation with the switch in the off state, that's only because it was partly on during the computation and we arranged for the state of that register to be destructively cancelled out at the end. I claim that the descriptions given of the experiment are completely bogus.

In fact, if you read Jozsa and Mitchison's paper you'll see that in his discussion his language is quite guarded. I think they realise that this talk of counterfactuality is a little bogus.

Update: Scott Aaronson rants about this too over here. Make sure you read Kwiat's response.

Sunday, February 26, 2006

In 1963 John McCarthy, the inventor of Lisp, published the paper A Basis for a Mathematical Theory of Computation in which he proposed the function (in the computer program sense of the word) amb(.,.). The idea is that amb(x,y) is first equal to x. But if later in the computation it is found that this leads to some sort of contradiction the value of x is retracted and replaced with y. This is a much more complex business than it may seem to be at first. Retracting a value essentially means winding back the entire state of the computation to where it was when amb returned the value x, and then slipping in the value of y. This means somehow freezing and copying the entire state when x was first returned. When a contradiction is found the entire state of the program is discarded and replaced with the frozen version which is reactivated. These frozen states are known as continuations. In many ways it's like a GOTO statement on acid. It can cause a jump to an arbitrary spot on your code. But continuations are nicer than GOTOs because they are more amenable to logical reasoning.

There are a number of languages that have support for continuations built in, including Scheme with its call-with-current-continuation function. But it takes a little care to understand exactly what is meant by "wind back" in this context. Although the state of the current computation is wound back, including things like local variables and the state of the functions currently being executed, global variables remain unchanged. So even though we wind back time and throw away much of the current state some information can still "leak through" from the path that we are now pretending never happened.

So, to get back to the devilish story of part 1: suppose we have a C keywords called TRY and FAIL with the bizarre property that TRY functions just like the return statement except that if a later FAIL statement is met your program winds back to the last TRY statement, undoing any effects caused by the previous return value, and continues execution after the TRY statement. The ambiguous operator would be implemented like this:

int amb(int a,int b){ TRY(a); return b;}

amb(a,b) would return a. But if a later FAIL is met it would unwind the computer back to its previous state and then continue execution with the return b line. Global variables would remain unundone. Given such keywords we can now define:

But C has no such keywords and there's no way to write such functions - unless you cheat a bit. The required code is on my web site here. The trick is to freeze the state of the program by literally reaching in and taking a copy of the C stack. Remarkably this code runs fine, without modification, when compiled with gcc, mipspro or MSVC on IA32, AMD64, PowerPC and mips under Windows, MacOS X, Irix and Linux for all legal combinations of those compilers, processors and operating systems.

But despite the implementation of TRY and FAIL being an ugly hack they really are nice operators to have. Papers like Wadler's lend a little legitimacy to continuation shenanigans by showing there is a nice theory underlying them, and that by allowing them you bridge the gap between intuitionistic and classical logic in the Curry-Howard isomorphism. They can also make for some surprisingly simple implementations of problems. For example, using TRY you can write an elegant non-brute force C sudoku solver in a couple of lines. In fact, you find yourself able to write code that is declarative rather than imperative, even though you are working in C. The backtracking is handled behind the scenes by TRY and FAIL which can be nested recursively (as suggested in McCarthy's original paper) without any difficulty. I originally wrote that code in order to implement a pattern matcher similar to Mathematica's over ten years ago. It took me several years to figure out how to rewrite it without using continuations - using continuation passing style instead. (I had no Computer Science training, I hadn't even heard of this stuff.) Nowadays I'd just use a backtracking monad in Haskell...

For an implementation in Ruby see randomhacks. For code similar to my C code see Aubrey Jaffer's implementation (in C) of Scheme. You can also do the same thing semilegally through the setcontext() and getcontext() functions if they are available for your operating system. You can probably do something similar using threads or fork().

Saturday, February 25, 2006

This was originally intended as my programming and mathematics blog but I never got around to any programming here, until today.

My recent interest in logic is partly motivated by a desire to understand the paper Call by Name is Dual to Call by Value. I suspect the paper is almost trivial, but that you have to be very familiar with the prerequisites before you can appreciate its triviality. (Ie. I'm using the mathematician's definition of trivial of course :-) )

Now I don't understand that paper yet, but on the seventh page there is a story about the Devil. Below I have a computational representation of the story written in C. The idea is, what must the functions devilChoice() and devilCheats() do in order to allow the devil to capture this soul? devilChoice() must return A or B.

printf("Devil says:\n"); printf("Here's my offer.\n"); printf("I will choose either A or B\n"); printf("If A I'll give you $1000000000\n"); printf("If B I'll give you any wish for $1000000000\n"); printf("Do you accept\n");

printf("Sucker is about to make wish...\n"); printf("'I wish to go to heaven'\n"); devilCheats(); printf("Sucker goes to heaven and devil loses a soul\n"); exit(1);break; }

if (suckerKarma<0) { printf("Sucker goes to hell with $%d\n",suckerMoney); }}

(Hmmm...syntax colouring didn't work out quite as well as I wanted.)

In fact, I'm going to leave this as an exercise and give a possible solution another day. But the title of this post gives a clue - and of course the paper proposes a solution too. The code doesn't need to be 100% portable.

PS You don't need to ask. I have a solution in not quite 100% portable C. But of course I cheat. But is there an interesting cheat?

Friday, February 24, 2006

Despite being moderately good at mathematics, even managing to scrape together a PhD, there are certain topics that are always brick walls to me so that I find it hard to get started even at the most elementary level. In algebraic topology I always had problems with spectral sequences but they're not so elementary and are notoriously tricky. But in logic I can barely get off the ground. Here's an example of a sentence from an introduction to linear logic that baffles the hell out of me "One of the most important properties of the proof-rules for Classical Logic is that the cut-rule is redundant". This is one of the most ridiculous things I have read in mathematics writing. If it's redundant then don't study it. Excise it from the list of derivation rules and don't bother with it every again.

I'm sure than when set theorists first tried to write down the axioms that became ZF they found lots of redundant axioms. Over the years they were whittled them down to the list we have today so that I bet you can't even name the axioms that were jettisoned for redundancy. Not so in "Gentzen style" logic. Every document ever written on the subject seems to introduce this rule and then with a flourish they show how it can be eliminated. They develop the whole subject and then proceed to demolish the earlier work by showing how they can rewrite everything they did earlier without using this rule. The only explanation I can come up with is that authors of books on logic are paid by the word and that this allows them a few extra chapters for very little work.

Of course the problem here must be me. I'm sure there's a perfectly good reason for harping on about the cut rule, I just don't see it. And I think this points to a difficulty with trying to read mathematics texts outside of academia. When you're a student you're sharing extratextual material all the time. People tell you that such and such a result is interesting because it has an application somewhere and then you go and read the formal details knowing what the motivation was. Or someone will give an informal seminar where they write equations on the board, but speak much more informally between the equations. But most mathematics texts hide this material from you and present the bare facts. This is fine when you're in an academic environment, but I have to confess to finding it difficult when working on my own.

One thing I'd love to see online is the equivalent of the reading seminars we did during my PhD work. Each week we'd read a chapter and then have a seminar to discuss what we'd just read. Does anyone do this? Blogs seem like a great way to do this but I've seen no evidence of groups working like this.

Friday, February 17, 2006

Mathematics is the lingua franca of science. Except that Bernard Chazelle now claims that there are two such languages - mathematics and computer science. It's a curious claim. He likens mathematics to "epithets" and computer science to "novels" and claims that a computer science background is imperative to studying biology. All he thinks we need are a great populariser of computer science (wasn't that what Wolfram tried to be?) and an Einstein.

I'm inclined to ask, with the article's author, "Isn't computer science really just a stepchild of mathematics?"

Thursday, February 16, 2006

Of course it's not clear that this is "counting" in any conventional sense. Additionally, some monkeys can count this well (and I think also many other mammals). So I'm not sure what conclusions can be drawn from this experiment.

Wednesday, February 08, 2006

Some would say the Banach-Tarski paradox but my attitude has always been that all bets are off when dealing with unmeasurable sets. I think the existence of a number satisfying the main property of Khinchin's constant is much less plasuble.

Addendum: According to this paper, even though the Khinchin property is true for almost all real numbers, nobody has explicitly found a number for which it is true! You can artificially construct a number for which it is true but that doesn't give you an explicit result, and anyway, that's cheating. Experimentally, π gives something near the Khinchin constant, but there's no proof of what happens in the limit.

Monday, February 06, 2006

Over the years many people have felt that mathematicians have made life too easy for themselves by using axioms that seem much like the proverbial sledgehammer cracking open a nut. Look at the way analysts and algebraists will wield the Axiom of Choice when they can prove what they need with much weaker axioms. Wouldn't it be more enlightening to carry out these proofs using the weakest axiom system possible so we can see exactly what is needed to make them provable?

It turns out that this is more or less what reverse mathematicians do. They take theorems such as the well known Bolzano-Weierstrass theorem and try to figure out precisely how much mathematical machinery is required to prove them. There is no need to use all of ZFC, or even ZF, to prove such a result.

So reverse mathematicians sometimes start with something simpler like second order arithmetic. In its universe are integers and sets of integers, and nothing else. It can still talk about real numbers by encoding them as Cauchy sequences in the usual way but there is no way to encode subsets of the reals in this way. Maybe surprisingly, it is possible to represent continuous functions because the rationals are dense in the reals so continuous functions are defined by the values they take at rational numbers. It turns out that almost all classical mathematics can be encoded in this system even though it's much weaker than ZF.

But usually reverse mathematicians like to go even weaker still and work with recursive comprehension which is essentially the Peano axioms combined with induction and a restricted comprehension axiom. Using these weak tools it's still possible to prove things like the Intermediate Value Theorem, the uncountability of the reals and even results about metric spaces such as the Baire category theorem (but only for separable spaces).

One thing that has always intrigued me about number theory is how many theorems are proved by making excursions into analysis. For example, think about results in analytic number theory that can easily be stated using the language of Peano's axioms, and yet whose proofs require techniques like contour integration that make reference to infinite sized objects and limits. In 1988 Stephen G Simpson proved that such theorems can in fact be reduced to "primitive recursive arithmetic" and hence turned into "finitistic" proofs. On the other hand, I'm not sure that these proofs are necessarily going to give any insight into why the theorems are true. They probably end up being as unwieldy as Borbaki's definition of one.

Sunday, February 05, 2006

Error correcting codes have inspired some beautiful mathematics. One of my favourite mathematical constructions, the binary Golay code, is an error correcting code.

The idea behind the theory is simple. Suppose there is a set of M messages, one of which you want to send over a noisy communication channel. If you simply send one of the messages it might be corrupted and interpreted as another. So instead you embed the set M inside a larger set C and send an element of C as a message. If the message is corrupted then they might receive a message in C but not in M. They'll know right away that the message has been corrupted. But even better, if the embedding of M in C is designed cunningly enough they'll be able to make a good guess as to what the original message was assuming some probability model for the corruption. You pay a price of course, the elements of C typically take longer to transmit than the elements of M, reducing the rate of data transmission.

There have been lots of proposed schemes over the years since Shannon started the field. They use a variety of methods from discrete mathematics and algebraic geometry. There are nice connections between the sporadic groups and coding theory. The Mathieu groups arise directly as symmetries of the Golay code and the Golay code can itself be derived from the Leech lattice whose symmetries are described by the Monster. A common feature of these codes is that they are linear. The set C is a vector space over F2, the field with two elements. M is a subspace of this space. M is the kernel of an F2-linear map represented by a matrix called the syndrome. If the syndrome, when applied to a message, gives zero, then it's likely that the message has arrived uncorrupted. Much of the work in coding theory has been about generating suitable syndrome matrices.

But in the sixties Robert Gallager looked at generating random sparse syndrome matrices and found that the resulting codes, called Low Density Parity Check (LDPC) codes, were good in the sense that they allowed messages to be transmitted at rates near the optimum rate found by Shannon - the so-called Shannon limit. Unfortunately the computers of the day weren't up to the task of finding the most likely element of M from a given element of C. But now they are. We now have near-optimal error correcting codes and the design of these codes is ridiculously simply. There was no need to use exotic mathematics, random matrices are as good as almost anything else. The past forty years of coding theory has been, more or less, a useless excursion. Any further research in the area can only yield tiny improvements.

A good place to learn about these codes is in the book by one of the 'rediscoverers' of Gallager's codes, David McKay. There's also the Wikipedia article. I originally heard of these codes from a friend of mine and was sceptical at first. Then I read this Science News article and I'm now a bit embarassed about my scepticism.