Saturday, November 30, 2002

Manuel Blum won his Turing Award in 1995 for his work in computational complexity and its applications to cryptography.
The theory of much of modern-day cryptography uses the assumption that certain problems are not easily computable to create unbreakable codes and protocols.

These days Blum is working on another project that also uses the assumption that some problems are hard computationally. The idea is to use
problems that humans can solve easier than computers to prevent automated registration, voting, etc. Check out the
CAPTCHA project web site.

Monday, November 25, 2002

Dan Boneh gave a talk at Princeton today about some recent developments in cryptography based on algebraic geometry. One of these tools is identity-based encryption which is public-key encryption where the public key is just a user's identity such as their email address.

Dan's group has an implementation of the system for Outlook, Yahoo Mail and some other systems. If you want to be the first on your block using the latest and greatest encryption or just want more information check out the IBE web site.

Personally, I send all my email in cleartext. If anyone goes through the hassle of capturing it they will only discover what a boring person I am.

The faculty member in computer science mentioned in the article is
complexity theorist Avi Wigderson. It was a coup for
complexity and all of computer science when he was appointed a faculty
member in 1999. With a large collection of postdocs, visitors
and students he has made the institute quite an exciting place for
theoretical computer science
and discrete math.

Saturday, November 23, 2002

The FOCS Conference, the major fall theory conference held last week in Vancouver,
sounded like a complete success. According to PC Chair Bernard Chazelle there were
320 registrants--quite a healthy number for this conference. Most encouraging was the larger number of students attending as well as a number of strong student papers indicating a solid future for theoretical computer science.

The 320 does not count another 50 "registrants" from Nigeria. They registered with fake credit card numbers in order to obtain letters from the conference organizers to help them obtain visas to go in this case to Canada. Whether they got the visas is unclear and they, of course, never showed up at the conference.

The temptation to help those from Africa is strong, especially since that continent is woefully underrepresented in computer science. However we must as a community be diligent against those who use our
conferences as a way to get around standard immigration laws. Whether or not we agree with those laws, if abuses of this nature continue it becomes harder to bring in legitimate scientists, a problem I discussed in my last post.

Friday, November 22, 2002

Andris Ambainis was supposed to spend the fall at MSRI in Berkeley but
instead is enjoying his native Latvia. When Manindra Agrawal came to
Boston last month to talk about his primality algorithm, he was
supposed to bring along his student co-authors. Instead he came
alone.

Worries about terrorism have caused the US government have made them
more cautious about issuing visas and this has slowed down the visa
process tremendously. Visa problems have always been a thorn for
academics but this fall seems particularly bad.

I understand the need to be careful but when science is hindered by
politics nobody is a winner.

Wednesday, November 20, 2002

In 1971, Steve Cook gave his conference presentation that showed that SAT was
NP-complete. There it did not immediately stir up much excitement but it is, in
retrospect, the single most important conference talk in this history
of theoretical computer science. So when and where was this talk?

Funda Ergun, a professor at Case Western Reserve, just purchased a
house in Shaker Heights and wondered where exactly the conference took
place. We got the answer from Bill Rounds, who
was one of the local organizers of that conference.

It was (at I think Stouffer's hotel) at the intersection of
Warrensville Center Road and Chagrin Boulevard, in the Van Aken center
district. The hotel is now gone.

Tuesday, November 19, 2002

In past lessons, we have studied the computable functions. Computable
functions can take an arbitrary amount of time: What good is a program
that will eventually give the correct answer but might not finish
before the universe collapses?

Somehow we want to limit the amount of time or memory that a computer
can use. Just giving a fixed bound does not work well. As technology
improves and computers get faster and better, we expect to solve
larger and larger problems in a reasonable amount of time. Hartmanis
and Stearns, in their seminal paper on computational complexity, turn
this around to come up with the right idea: Consider time and memory
as functions of the size of the input.

The time a machine M takes on input x is just the number of
computation steps that it takes before halting starting with input
x. I am being very informal about what a "step" is. In a later lesson
we will get into formal definitions of computers and steps but for now
just use the idea of implementing one instruction.

The memory or space as we theorists call it is just the number
of bits of storage used by M on a given input.

Edmonds gave an algorithm for the matching
problem that ran in polynomial time: The number of steps used by the
algorithm on a graph of size n is nk for some k. He
suggests that polynomial-time captures efficient computation.

We now define our first complexity class P as the set of all languages L
for which machine exist that determine whether x is in L and halts in
time polynomial in the length of x. The P has many nice properties: A
polynomial-time algorithm that uses a polynomial-time subroutine
remains in polynomial-time. Also P is robust, that is the class is the
same no matter how you formally define your machines.

In these lessons, we will treat P as the class consisting of
efficiently computable problems. More classes will come.

Thursday, November 14, 2002

In my daughter's second grade math homework there was an interesting
variation of Tic-Tac-Toe designed to teach addition and
subtraction. Take a 3 x 3 grid and randomly give each square a
different number between 2 and 18. We have two players X and O. Play
goes as follows:

Player X chooses a number from 1 to 9.

Player O chooses a
number from 1 to 9 that she had not picked before.

Player O adds
that number and the last number picked from X and if that square is on
the board and unmarked, that square is marked O.

Player X chooses a
number from 1 to 9 that he had not picked before.

Player X adds
that number and the last number picked from O and if that square is on
the board and unmarked, that square is marked X.

Go to step 2.

Play ends when either X or O has three in a row and is declared a
winner or when all the numbers run out and the game is declared a
draw.

Despite the simplicity this is quite a challenging game. For every
initial configuration, is there always a forced draw like in real
Tic-Tac-Toe or do some configurations have a forced win for X or O?
How complicated is it to compute an optimal strategy?

My daughter was frustrated at how hard it is to win this game but she
shouldn't be ashamed--I couldn't figure out the best strategy
either. Amazing what complicated things can come out of a second-grade
class.

Tuesday, November 12, 2002

Can't get enough Kolmogorov Complexity. Check out Marcus Hutter's site on Kolmogorov Complexity and Solomonoff Induction. The site is a bit dated but contains many useful links and information about the Kolmogorov mailing list which still seems quite active.

We often see the intersection of two classes as an interesting class
in and of itself. For example factoring is in NP∩co-NP. In some
cases you get interesting equalities, like that ZPP is equal to
RP∩co-RP. But we rarely see the union of two classes. Every wonder
why?

In fact, no complexity class can be the nontrivial union of two other
classes. To formalize and prove this statement we need some
definitions.

Let A and B be subsets of {0,1}*. We define the join,
A⊕B, as the union of {0x | x is in A} and {1y | y is in B}. Given
a set C we define the 0-projection of C as {x | 0x is in C} and the
1-projection of C as {y | 1y is in C}. Note that the 0-projection of
A⊕B is just A and the 1-projection is just B.

Essentially every complexity class is closed under joins and
projections. For example if A and B are in NP then A⊕B is also
in NP. The fact that no complexity class is the nontrivial union of
other classes follows from the following Lemma.

Lemma: Let E, F and G be classes of
languages that are closed under joins and projections and G =
E∪F. Then either G = E or G =
F.

Proof: Suppose the lemma is false. Let A be a set in
G-E and B be a set in
G-F. Let C = A⊕B. We have that C is in G
since G is closed under joins. Thus C is in either E or
F. Suppose C is in E. Since E is closed under
projections, we have A is in E a contradiction. If C is in
F then B is in F also a contradiction.

Monday, November 11, 2002

Here we are in Lesson 7 and have not yet talked about complexity per
se. I felt it important to give some background on computability
theory not only for the importance of the results but also to
introduce the basic concepts of Turing machines, diagonalization and
reducibility.
We will start complexity in the next lesson.

Let me end the discussion
of computability by one of my favorite theorems. Suppose you wanted to
create the ultimate computer virus that attacked any program and made
it change its behavior. The recursion theorem states that no
matter how powerful the virus, some program will remain unscathed. At
first this seems impossible just by considering the function that
simulates a program and then adds one to the answer. But this process
will not affect the machine that never halts.

Theorem: Let f be any computable function. There is some Turing
machine M such that

L(M) = L(f(<M>))

The recursion theorem, sometimes called the fixed-point theorem, has
one of the most unintuitive proofs where I cannot explain why it
works, only that it does.

Proof: Fix a computable function
f. For each machine N, construct a Turing machine <R> that on input x,
simulates N(<N>) to produce the description of a machine and
simulates that machine on x. Let g(<N>) be the function that
outputs <R>. Note that if N(<N>) halts then the programs
described by g(<N>) and N(<N>) accept the same language.

Note that g is computable even if N(<N>) does not halt. Let T(x)
be the machine that computes f(g(x)). We will let M be the machine
described by g(<T>). Then we have that M accepts input x
if and only if the machine described by g(<T>) accepts input
x if and only ifthe machine described by T(<T>) accepts
input x if and only if the machine described by f(g(<T>))
accepts input x (since T(x)=f(g(x))) if and only if the machine
described by f(<M>) accepts input x. QED

As an application, consider the function f(x) that outputs the
description of a machine that accepts {x}. By the recursion theorem must
be some M such that L(M) accepts exactly <M>. As an experiment, pick
your favorite programming language and find a program that outputs
its own code. By an argument based on the recursion theorem, such a task is
always possible but it is trickier than it seems.

This ends the section on computability theory which is an exciting
area of research in and of itself. For further reading the book of
Homer and Selman goes into these ideas with some more detail and
examples. For more advanced concepts I recommend the books of
Soare, Odifreddi
or Schoenfield.

Friday, November 08, 2002

The STACS Conference has just posted the list of accepted papers for their 20th conference. STACS alternates between France and Germany (and only some truth to the rumor that it alternates between great food and great organization). The upcoming 2003 conference will be held in Berlin, February 27 to March 1.

I have always considered STACS, the Symposium on Theoretical Aspects of Computer Science, the best venue for computational complexity in Europe. I have attended the conference many times and they consistently have several strong papers in the area as well a good attendance of complexity theorists from both Europe and America.
You can see the weight complexity gets on the web page where "Computational and structural complexity" gets the same weight as "Algorithms and data structures, including: parallel and distributed algorithms, computational geometry, cryptography, algorithmic learning theory".

The ICALP
conference has a longer history, a larger audience, more traditions and does a better job representing Europe as a whole. But the scope in ICALP is quite large and computational complexity
often gets lost in the shuffle.

Wednesday, November 06, 2002

Last
week we gave the history of the complexity class SPP and described
GapP functions. This week we will give a definition of SPP and many of
the class' amazing properties.

A language L is in SPP if there is a GapP function f such that

If x is in L then f(x)=1.

If x is not in L then f(x)=0.

That is if x is in L there is one more accepting than rejecting
path. If x is not in L there are the same number of each.

If we used #P functions instead of GapP functions we have the
definition of UP. SPP contains UP since every #P function is a GapP
function. In fact SPP contains FewP and even Few where we don't
believe such languages are in UP.

SPP is the smallest Gap-definable class, i.e., the smallest class that
can be defined by GapP functions as above. There are a number of
common Gap-definable classes, for example from the Zoo:
⊕P, AWPP, C=P, ModP, ModkP, MP, AmpMP, PP,
WPP and of course SPP. SPP is contained in all of these classes. AWPP
is the smallest classical class known to contain BQP, the class of
problems with efficient quantum algorithms, though it is not known if
BQP is itself Gap-definable.

SPP is exactly equal to the low sets for GapP, i.e., SPP is exactly
the set of oracles A such that for any NP machine M, the number of
accepting minus the number of rejecting paths of M^A(x) is still an
(unrelativized) GapP function. This means that SPP is low for all of
the Gap-definable classes, for example that ⊕PSPP =
⊕P. This also means that SPP is self-low: SPPSPP =
SPP which means SPP is closed under union, complement and in fact any
Turing-reduction.

Kobler, Schoning and Toran showed that graph automorphism is in SPP and
very recently Arvind and Kurur have show that graph isomorphism is in
SPP. This means that graph isomorphism sits in and is in fact low for
every Gap-definable class.

The decision tree version of SPP is interesting. A function f on n
bits is in this class if there is a polynomial g with polylog degree
such that f(x)=g(x) on all x in {0,1}*. All such functions
have low deterministic decision tree complexity--the first complexity
application of a combinatorial lemma of Nisan and
Szegedy. Applications of this result include relativized worlds where
SPP does not have complete sets or where P = SPP and the
polynomial-time hierarchy is infinite.

Monday, November 04, 2002

Last lesson we learned about using reductions to show problems are
hard. Now consider the most famous of undecidable problems, the
halting problem:

LH = {<M> | <M> eventually halts with blank tape as input}

We will now show that LH is not computable. We do this by
reducing the universal language LU to LH where
LU is the set of pairs (<M>,x) such that M(x) accepts.

Given <M> and x, consider the following program:
Replace input with x.
Simulate M on x.
If M(x) accepts then halt.
If M(x) does not accept then go into an infinite loop.

Let us call this program N. Note that M(x) accepts if and only if N
halts on blank tape.

Now here is the important point. Consider the function f that given <M>
and x, will produce the program N. Even though M(x) and N may not halt
the actual procedure that converts <M> and x to N is computable. This
is just converting one program to another.

So we have that (<M>,x) is in LU if and only if M(x)
accepts if and only if N=f(<M>,x) halts on blank tape if and only if N
is in LH. Thus f reduces LU to LH and
thus by the Lemma of Lesson 5, we have that LH is not
computable.

I consider the noncomputability of the halting problem to be the
single most important result in theoretical computer science.
There are some programs, of course, that are easy to determine whether or not they will halt. But in general,
no matter how smart you are or fast the computers, it is simply
impossible to analyze a piece of code and see if it will terminate.

Using similar techniques one can prove a general result known as
Rice's Theorem: Every nontrivial property of the computably enumerable
languages is undecidable. More formally
Rice's Theorem: Let P be any non-empty proper subset of the computably
enumerable languages. Then the language

Friday, November 01, 2002

November is a month for conference deadlines. The STOC conference
has a submission deadline of November 6. STOC and FOCS, which is being held
November 16-19, are the two major theoretical computer science
conferences.

In computer science in general and theoretical computer science in particular,
conferences are the primary outlet for announcement and publication of
results. Since computer science is a relatively young discipline, the
field changes dramatically year to year and the usual long process of
journal publications might often publish outdated work. More mature
fields like mathematics and physics use journals as the primary source
of publication.

The main disadvantage of the computer science system is that while
computer scientists are encouraged to submit their work to refereed
journals, many of the important papers in the area never make it that
far.

There have been at least two recent major exceptions to this
process. Alexander Razborov wrote a paper last
spring on lower bounds on quantum communication complexity that would
have been the best quantum paper in FOCS if not the best
paper. Instead he chose to submit it directly to a journal, Izvestiya
of the Russian Academy of Science: Mathematics. The
Agrawal-Kayal-Saxena Primality
Paper which would easily be the best paper at the upcoming STOC is
not being submitted to a conference either but directly to Annals of
Mathematics. "Why should I send it to a conference," Manindra Agrawal
asks, "when everyone already knows the result?"

Are these two papers a trend? Are conferences less important as papers
are easily available online? Or is computer science finally becoming a
mature field?