Thursday, August 31, 2006

The Szemeredi Regularity Lemma

This semester I will mostly be securing the cyberspace, but I also plan to learn more about additive combinatorics and its applications to computer science.

In the last few days I have been trying to understand the proof of the Szemeredi Regularity Lemma. It is actually not as hard as I thought.

The Szemeredi Regularity Lemma states that every graph can be approximately described by an object whose size is a constant that depends only on the accuracy of the approximation, and not on the number of vertices of the graph.

This remarkable ``approximate classification theorem for graphs'' has several applications in extremal combinatorics, additive combinatorics, and computer science.

(Indeed, besides such applications, one may imagine using the Regularity Lemma as follows: to prove an approximate quantitative statement about arbitrary graphs, we only need to prove a related statement for the finite set of objects that approximate arbitrary graphs up to a certain level of accuracy. The latter, completely finite, goal may be performed via a computer search. Unfortunately, the size of the objects arising in the Regularity Lemma grow so astronomically fast in the approximation that this approach is completely impractical.)

Roughly speaking, the Lemma says that for every graph G=(V,E) and every parameter eps, we can find a partition of V into equal-size subsets V1,...,Vk such that, for almost all pairs i,j, the edges between Vi and Vj form a dense bipartite "expander." Furthermore, k depends only on eps.

Let us define the notion of "expansion" that we will use. Let G=(U,V,E) be a bipartite graph, where U and V are sets of vertices and E is the set of edges. Let A be a subset of U and B be a subset of V. Define the density d(A.B) of the edges between A and B as e(A,B)/|A|*|B|, that is, the number of edges between A and B divided by the maximum number of possible edges. If G were a random graph, we would expect the density d(A,B) to be approximately the same for all large enough sets A and B, and to always be approximately |E|/|U|*|V|. We say that U and V are eps-regular if for every A of size at least eps*|U| and every B of size at least eps|V| the density d(A,B) is equal to |E|/|U|*|V| plus or minus eps.

Szemeredi Regularity Lemma For every eps there is a constant c(eps) such that for every t and every graph G=(V,E) we can find a partitio of V into at least t and at most t*c(eps) sets of equal size V1...Vk, such that for at least a 1-eps fraction of the pairs i,j, the sets Vi and Vj are eps-regular.

[A minor technical point: the number of vertices of the graph may not be divisible by the number of sets in the partition, so we allow one of the sets in the partition to be slightly smaller than the other k-1 sets.]

The idea of the proof is to start with an arbitrary partition of the sets of vertices into t sets. If the partition satisfies the lemma, we are done. Otherwise, for many pairs Vi,Vj we can find a subset Aij of Vi and a subset Aji of Vj such that both subsets are large but the density of the edges between them is not close to the density of the edges between Vi and Vj. We then refine our partition using these sets. This means that we take every set in the partition and we further subdivide it into subsets (in a way that we describe shortly). The refinement is chosen in such a way that the densities in the new partition are more "disordered" or have more "variance" than they had before. In particular, after the refinement, we show that a function that measures such "disorder" increases by at least poly(eps). Such function can never be more than 1, so we never need to refine the partition more than poly(1/eps) times. At the end, we are left with a partition that satisfies the Lemma and that contains a number of sets that depends only on eps.

It now remains to: (i) define the function that measures disorder; (ii) describe the refinement step; (iii) prove that the refinement increases the disorder function by at least poly(eps).

Let V1,....,Vk be a partition of V, and let pi=|Vi|/|V| be the fraction of elements in Vi. We define the variance of the partition as

f(V1,...,Vk):= sumij pi*pj*(d(Vi,Vj))2

We can interpret this quantity as follows. Consider the random variable X defined by the following experiment: we pick at random, independently, two vertices u,v, we let Vi be the set containing u and Vj be the set containing v, and we define X as d(V_i,V_j). Then f(V1,...,V_k) is the expectation of X2. Also, observe that the expectation of X is simply |E|/|V|2, independent of the partition, and so f(V1,...,Vk) indeed measures, up to additive factors, the variance of X.

It is now an exercise in the use of Cauchy-Schwartz to prove that f() cannot decrease if we refine the partition by splitting a set into two or more subsets.

Now, suppose that Vi and Vj are not an eps-regular pair, and let A and B be subsets witnessing this fact. Split Vi into A,Vi-A and Vj into B,Vj-B. This is now a more "disordered" partition, because the density d(A,B) is noticeably more (or noticeably less) than d(Vi,Vj), and so it also has to be noticeably different from at least one of the three other densities d(A,Vj-B), d(Vi-A,B), d(Vi-A,Vj-B). In fact, we can show that the contribution of the pair Vi,Vj to the variance grows from pipjd2(Vi,Vj) to at least pipjd2(Vi,Vj) + const*pipj*eps4.

If our partition is not eps-regular, then at least an eps fraction of all pairs is not eps-regular, and, by splitting each pair we can increase the variance by at least const*eps5.

There is, however, a difficulty. Let's say that the pair V1 and V2 is not eps-regular, (as witnessed by sets A,B) and neither is V1 and V3 (as witnessed by sets A' and C). What do we do if A and A' are different? Do we split V1 according to A or to A'.

The answer is both, that is, we split V1 into four sets so that A is the union of two of them, and A' is also the union of two of them. In general, if V1 participates into k-1 non-eps-regular pairs, then we split V1 into 2k-1 subsets. For each non-eps-regular pair Vi,Vj, where A,B are the sets witnessing the non-regularity, this refinement is finer than the refinement that simply breaks Vi into A,Vi-A and Vj into B,Vj-B, and so it increases the contribution of Vi,Vj to the variance by at least as much.

This refinement creates subsets of different sizes, while we want the sets to have the same size. We can adjust the size by further refining the large sets and possibly merging small sets, a process that can slightly reduce the variance but, if done properly, by less than half of the amount we gained previously.

Overall, the number of sets in the partition increases exponentially, and the variance increases by at least const*eps5. This means that we reach an eps-regular partition after no more than O(1/eps5) steps. The number of sets in the partition, unfortunately, can now be as large as a tower of exponentials of height O(1/eps5).

Monday, August 28, 2006

Did all those trees die in vain?

The first computer science section is Saturday afternoon, and it is all about Israeli theory: Omer Reingold and I are speaking, and Avi Wigderson chairs the section. I talk about the complexity-theoretic approach to derandomization and pseudorandomness, and about the coding theory / average-case complexity and the extractors / pseudorandom generators connections. Omer talks about expanders, pseudorandom walks, and L=SL. Here are the slides of my talk. (Based on this paper.)

The attendance is very good, both in quantity and quality, but nobody asks questions after the talk.

Invited speakers are presented with a gift from the organizers. It is an exquisite, two-volume, leather-bound edition of the work of Archimedes. One volume contains a reproduction of the original hand-written ancient Greek. In recognition of the fact that not many people can read ancient Greek, the second volume helpfully contains a Spanish translation.

When I made travel plans for the conference, I assumed that I would start teaching tomorrow. Instead, thanks to a timely approval of my green card, I will spend the semester securing the cyberspace at IPAM, in Los Angeles. So, even though I could have stayed longer, I left on Sunday morning and I am going to miss Ben Green's talk and the rest of the computer science program, not to mention Random-Approx in Barcelona. Between the two-volume Archimedes thing, the two-volume proceedings, and the volume of abstracts, it is a miracle that I am able to close my bag. At the airport, they find my bag to be overweight, and I have to pay 19 Euros.

Saturday, August 26, 2006

Derandomization in functional analysis

Thursday evening there is a party at the American Embassy. The invitation, mysteriously, requires "cocktail attire." The clothes that people wear in San Francisco while drinking cocktails in bars do not seem appropriate for an Embassy, so I have to search on google what this means. (Men have to wear jacket and tie.) Ronitt Rubinfeld isn't here yet, Avi Wigderson has a strict no-jacket policy, and so at the party I recognize less than a handful of people. After entering, there is a receiving line. A mathematician from UC San Diego (who, I believe, was one of the organizers of the party) meets the guests first; then he introduces each guest to a jovial fellow whom I assume to be the Ambassador (he is, instead, a government official, something like the science advisor to the Secretary of State). Finally, a woman says "welcome to the United States embassy." It's my turn. Hello, I say to the mathematician, I am Luca Trevisan from U.C. Berkeley. Oh, he says, you are a graduate student. Once we are inside, the three of them give one short speech each. The woman regrets to inform us that the Ambassador had to be somewhere else in Spain and so he could not be with us. He is hiding in the next room, is my comment. He is on vacation, someone else says.

On Friday, Jon Kleinberg gives his plenary talk. He frames his work by considering two ways of viewing computers (and hence, computer science, and hence theoretical computer science). One is that we have inputs, on which we have a computation that gives an output, and we are interested in the efficiency of computation. This would be the design of algorithms and many of the problems studied in computational complexity. In the other view, computers are "mediators" between people, and they are the means of interactions that happen in various forms (emails, electronic markets, blogs,...).

Studying the information exchanges of the second type one is led to study networks, and an important point is that such networks are not artifacts that have been designed; they are phenomena that happen. After giving some concrete examples, he focuses on his results on "small world" networks.

After the break, I go to see Emmanuel Candes's talk on "compressive sampling sensing." Here the idea, not unlike Goldreich-Levin (or Kushilevitz-Mansour) is that we want to reconstruct the Fourier coefficients of a function f:{1,...,N} -> R by looking at only at a small number of values of f, under the promise that only a bounded number of Fourier coefficients of f are non-zero. It turns out that, if k Fourier coefficients are non-zero, it is sufficient to read f at O(klog N) random points and then find the function that, subject to having those values at those points, has the smallest possible l1 spectral norm. Such minimization can be found in time poly(N) using linear programming. Similar ideas also lead to LP-based algorithms for error-correcting codes. Amusingly, compressive sampling sensing is commonly abbreviated CS, and one slide reads

CS: a new paradigm for analog to digital.

Then, Avi takes me to a lecture by Stanislaw Szarek on geometric functional analysis. The beginning of the lecture, devoted to putting the problems in context is too abstract for me to follow. But then come the concrete problems. Surprisingly (to me) the probabilistic method is used routinely in functional analysis, and, in finite dimensional cases, they are interested in explicit constructions. As partial results, they are interested in constructionts that use few random bits, and one results that is mentioned involves random walks on expander graphs. Some of the derandomization open questions that he discusses have applications in compressive sampling sensing, the area earlier described by Candes.

Wednesday, August 23, 2006

Pseudorandomness and more pseudorandomness

The first talk of the morning is by Terry Tao. Tim Gowers, in his essay The Two Cultures of Mathematics explains the difference between depth and generality in combinatorics versus other areas of pure math. In combinatorics, one is unlikely to find deep definitions (have you ever tried to understand the statements of the conjectures in the Langlands program?) and very general theorems. In combinatorics, it is more usual to find generality in the form of principles that the practictioners of the subject are aware of and that are exemplified in the proofs of certain theorems. (I hope I have not distorted and oversimplified Gowers's point too much.)

Tao's talk very explicitly points out the principle that emerges from his work on arithmetic progressions. The idea is that one can often define notions of pseudorandomness and structure such that one has a theorem of the following kind:

Dichotomy: if an object is not pseudorandom, then it has large stuctured component.

For example, in the setting of looking for arithmetic progressions of length 3, a subset of {1,...,N} is "structured" if it has a large Fourier coefficient and it is pseudorandom if it has approximately the same number of length-3 progressions of a random subset of {1,...,N} with the same density.

By iterative arguments, one then has theorems of the form

Decomposition: every object can be written as a "sum" of a pseudorandom part and a structured part

This is very important in the proof of the Green-Tao theorem on arithmetic progressions in the primes, but it can be seen, implicitly, in the Szemeredi regularity Lemma and in all proofs of Szemeredi's theorem.

The proof typically goes by: if our object is pseudorandom, we are done; otherwise we find a large structured "piece." We "subtract" it. If what remains is pseudorandom, we are fine, otherwise we subtract another structured part, and so on.

The advantage of this type of decomposition (whether it is done explicitely or it is implicit in the proof) is that one can use techniques from analysis and probability to analyse the pseudorandom part and (depending on the definition of "structure") techniques from analysis and geometry to analyse the structured part. This is how the proof of the Green-Tao theorem ends up using analysis, ergodic theory, combinatorics, and so on.

Often, one can use geometry and algebra only provided that the object is "exactly" structured, while results of the above type may only give "approximate" structure. To bridge this gap one employs results of the type:

Rigidity: an object that is "approximately structured" is "close" to an object that is structured

These are results in the spirit of property testing. (And he mentions property testing in his talk.)

This is all quite abstract, but reading papers in the area (or approaching new problems) with this point of view in mind is very helpful. He also proceeds to summarize the proof of the Green-Tao theorem in terms of the above perspective.

By the way, earlier in the talk, about the general importance of understanding pseudorandomness, Tao talks mentions expander graphs and the P versus BPP question. (As a context, not as specific problems that he is going to work on. For the time being, we are safe.)

Overall, the talk is a model of clarity and insight.

After the break, Avi Wigderson gives his talk on P, NP and Mathematics. Yesterday, Avi told me that, of all the material that he cover in his 50-page proceedings paper, he had decided to talk about pseudorandomness and derandomization, from the perspective that randomness seems to help in computation, but complexity theory suggests that it does not. I despaired, because this is exactly my talk, but luckily Avi's talk has a somewhate complementary emphasis on topics. (So I won't have to spend the next two days remaking my slides and I can go see the Prado tomorrow.)

Avi explains the "hardness versus randomness" principle in terms that could not be clearer, and one can see heads nodding as he goes along (nodding in agreement, not nodding off). He ends up with the following question about the complexity of the permanent: is there a linear transformation L of nXn matrices into kXk matrices such that, (i) for every matrix M, the determinant of L(M) equals the permanent of M and (ii) k is only polynomial in n? (The answer is supposed to be NO.) From known results one can derive that k need not be more than exp(O(n)) and a Omega(n2) lower bound is also known.

This is to say that one can work on complexity theory without having to learn about such things as algorithms, Turing machines and so on, and that purely algebraic questions are completely open and very related to complexity questions.

Tuesday, August 22, 2006

Laudationes

After a lunch-time "reception" (the less said, the better), the first afternoon session is dedicated to the laudationes. An expert in the area speaks on the work of each Field Medalist and of Jon Kleinberg.

First, a mathematician from ETH Zurich (who is, however, Italian) speaks on the work of Andrei Okounkov. He studies "partitions," which, in the simplest case, are ways of writing an integer as the sum of a non-decreasing sequence of integers. Partitions are related to the representation theory of the symmetric group, and apply in a variety of problems that were too technical for me to follow.

John Lott spoke on Perelman's work. Lott is part of one of the three teams that have been working out complete expositions of Perelman's work. His talk is extremely clear. The Poincare conjecture states that

every 3-dimensional manifold that is simply connected and compact is diffeomorphic to the boundary of a sphere ball in R4.

Where "simply connected" means that a simple loop can always be shrunk to a point via a continous transformation. "Diffeomorphic" means, probably, "equivalent under the appropriate type of nice continuous transformations." The lower dimensional analog is that any surface with "no holes" can be "stretched" into the surface of a sphere ball. Even though this is a purely topological problem (the assumption is a topological property and the conclusion is a topological property), the big idea was to use methods from analysis. Specifically, Hamilton suggested in the 80s to consider a physics-inspired continuous transformation (the "Ricci flow") on the manifold until it "rounds up" and becomes a sphere. (The continous transformation maps the starting manifold into a "diffeomorphic" one.) Unfortunately, the process produces singularities, and Hamilton had found ways to remove them in certain nicely-behaved special case. Perlman's work shows how to always remove such singularities. This was a major tour de force. Before Perelman's work, apparently, the experts in the area considered the known obstacles to eliminating singularities in all cases to be almost insurmountable, and did not consider Hamilton's program to be too promising. Hamilton said that he was "as surprised as anybody else" by the fact that Perelman made the Ricci flow approach work.

Next, Fefferman delivers the laudatio for Tao. It is a daunting task given the variety of areas that Tao has worked on. Fefferman choses to talk about the Kakeya problem, about non-linear Schrodinger equations and about arithmetic progressions in the primes. By the way, both his work on the Kakeya problem and the work about arithmetic progressions (though not the work on the primes in particular) have applications in theoretical computer science.

The laudatio for Werner is a complete disaster. The speaker reads from a prepared speech, and the big screen (that, for the previous speakers, had shown slides of the presentation) shows the text of the speech. This doesn't do justice to Werner's work on proving rigorous results in statistical physics which were previously derived via non-rigorous methods. His work includes problems of the type that computer scientists work on. In his interview, Werner says that he feels he shares the medal with Oded Schramm and Greg Lawler. Oded Schramm, in particular, was a leading figure in this line of work, but he was already over the age limit in 2002, and this work was done after 1998, so he could never be considered for the award.

John Hopcroft delivers the laudatio for Jon Kleinberg. He emphasizes five results: the hubs-authority idea for web search ranking, his work on small-world models and algorithms, the work on ``bursts'' in data, the work on nearest neighbor data structures and the work on collaborative filtering. Like Fefferman, Hopcroft runs out of time.

In the late afternoon (this is a LONG day), Hamilton gives a plenary talk on his work on the Poincare conjecture. Despit Lott's nice earlier introduction, I get rapidly lost in what seems a series of unrelated technical statements. Avi seems to follow. "No, no, they are not unrelated, he is building up an exposition of the proof by explaining all the problems and how they are overcome." He starts late because of projector problems, and he runs enormously out of time. At one point the session chair timidly stands up, not quite sure what to do, then walks towards the podium, then walks back, apparently with Hamilton not noticing. Where is Johan Hastad when you need him?

Congratulations, Jon!

There is a long line to enter the conference center this morning. Only at the end, I realize that invited speakers (but of course!) can skip the line and stride in. The line is because of security checks. Everybody passes through a metal detector, and bags are checked. So much security for a math conference? (Later we understand why.)

We start with a string trio (guitar, violin and cello) playing live. The sound of the violin is butchered by the sound system. Then the VIPs come on stage. Chairing the award cerimony is His Majesty Juan Carlos I, King of Spain. This is not going to be like a STOC business meeting. The King is accompanied by a person in a white military uniform with lots of stripes on his shoulders. The King sits in the middle of a table on stage, flanked by congress organizers and Spanish politicians. (The minister of Research and the mayor of Madrid.) The guy with the white uniform takes a seat behind the King. Then, one by one, everybody gets up and gives a speech.

John Ball starts his speech by explaining how mathematicians talk about their work freely, without fear that their work will be stolen, and how work is appreciated solely based on its merits, not on the way it is promoted. This is how the vast majority of mathematicians work, he continues, and exceptions are rare, and noted. It might be a reference to some of the recent events around the proof of the Poincare conjecture.

The mayor of Madrid starts what seems like a canned speech that he always gives at scientific/ technological events. Towards the end, however, he talks quite eloquently about the virtues of mathematics, its ability to make sense of a complex world, its commitment to truth, its trascendence of religion, race, and so on. Finally, he says, it is not just mathematicians that have to make an effort to come closer to the everyman and to explain the practical applications of mathematics. It is everybody's civic duty to learn more about math and science. (He said it better.)

Griffiths starts by saying "One of the main activities of the IMU [the International Mathematical Union] in the last few years has been the selection of a new logo." Hilarity ensues in the audience. (He seemingly did not mean it as a joke.) A documentary on the new logo is played.

More and more people talk, and at long last we come to the award of the Fields Medals. As expected, Terry Tao and Grisha Perelman receive the award, plus two other mathematicians that work in areas that I am not familiar with.

Perelman, is announced, has declined the Fields Medal. Someone from the back claps.

The Nevanlinna prize goes to Jon Kleinberg. This is definitely the computer science award with the most distinguished record, and I want to congratulate the committee for, once more, making an excellent choice. Congratulations to Jon too, of course.

Monday, August 21, 2006

Hey, you look like a mathematician!

ICM is held in a fairly remote conference center near the airport. The organizers had the good sense of not recommending the hotels next to the conference center but rather a number of hotels midway between there and the city center, so one can get to the conference with a short subway ride and also not be in the middle of nowhere. The recommended hotels, however, were still quite out of the way, and, by the information I found online, they sounded big and anonymous. So I got a reservation in a nice smaller hotel that is not far from Plaza Real and other touristic attractions, and it is also in what seems one of the cool neighborhoods, with a lot of bars and restaurants, including many gay bars. It takes, unfortunately, a good 45 minutes by subway to reach the conference center.

So this afternoon, as I was checking in at the hotel, being very proud of my choice, wearing an A&F shirt, faded jeans and Kenneth Cole shoes, and feeling as non-nerdy as I can get, the person behind me in line waves hello and asks: "Are you here for ICM too?"

Madrid is full of mathematicians

I have just arrived in Madrid for the International Congress of Mathematicians. Held every four years for more than a century, it features a number of plenary talks (Avi Wigderson will talk about P versus NP and computational complexity) and a number of talks in parallel sections devoted to specific topics (one of the parallel sections is on computer science).

Being used to the small scale, and the biannual frequency, of STOC and FOCS, it's hard to adjust to the size and the self-importance here. Tomorrow, for example, we start with an opening cerimony. Four years ago, in Beijing, the opening cerimony was held in the Great Hall of the People, and President Jiang Zemin attended. That's like having the conference in Washington D.C. and holding the opening cerimony in the Capitol, with George Bush coming to give a speech and congratulate the new Field Medalists. One can almost imagine the speech, ``I want to congratulate these good people, because God made them do good work, and we appreciate their good work. September 11 changed everything, so we must stay the course, just like they stayed their course when doing their good work.''

If I understand the program correctly, the Field Medals and the Nevanlinna Price will be announced during the opening cerimony. As part of taking itself very seriously, it is a tradition of the conference that the identities of the recipients are kept secret until then. When I registered, for example, I received a big bag with two huge and heavy books, which are the proceedings. (There are a lot of talks, and for each talk there is a paper of up to 25 pages.) It turns out that those two books, however (each heavier than a FOCS/STOC proceedings), are just Volume II and Volume III of the proceedings. Volume I is distributed later, because it contains articles about the prize winners (plus all the articles of the plenary speakers; Volume II and III contain the articles of the parallel sections).

Since the conference is held only once every four years, and Mathematics is a really vast subject, with thousands of researchers, there is a big fuss made over who gives the invited talks.

At the registration desk, invited speakers have a separate line. At the opening cerimony, the best seats are for press, IMU functionaries and ''authorities'' (that's what it says in the program), then there are very good seats for the speakers who give plenary talks, then seats in the back for the speakers who talk in the parallel sessions, and then, well behind, are all the others. And in uncharacteristic (for the US government) official show of support for the sciences, the American Embassy in Spain is holding a party for the American invited speakers.

Given what is at stake (I mean, who wouldn't want to stand in shorter lines, to see the opening cerimony up close, and to meet a US Ambassador?), the program committee that makes the invitations is kept secret, to protect the committee members from too much external pressure. I wonder how this ``double blind'' system would work at STOC and FOCS: the program committee invites speakers without knowing what paper they will present, and nobody knows who is on the program committee.

The big story of this conference will probably be the official announcement that Perelman's proof of the Poincare conjecture has been verified. The story has a fascinating human element, from the excentricity of Perelman to the rush of various teams to complete the verification, and it has already been the subject, last week, of a beautiful New York Times article by Dennis Overbye. In this week's New Yorker there is another article by Sylvia Nasar, of Beautiful Mind fame. More tomorrow on Perelman, on whether, as expected, he will be awarded the Fields Medal, and whether, as expected, he will fail to show up.

Flying on day 3

It's day three of the Great Liquid Scare, and I have a flight to catch. I arrive at the airport one hour and 40 minutes early, that is, one hour and ten minutes before boarding, fearing that I'll miss my flight because of the long lines.

Most people are checking their bags, slowing down the check-in line, which is, indeed, very, very long. After about five minutes, however, a person comes to direct me to an unused self-serve kiosk for people with no bags to check. This is a nice pay-off for my earlier decision to throw away my toothpaste, deodorant and sunscreen.

With an hour and five minutes to go until boarding, I reach the security check, hoping that I have enough time. (Waits of up to four hours were reported on Thursday.)

I find three people ahead of me, and by the time I put my wallet, cell phone and ipod in my bag, take out the laptop, and take off my shoes, I am holding up the line. There is a woman with two small children behind me, and she is too kind to complain. After we collect our bags, she and the small children are detained for further inspection, and I am free to go, with an hour to wait until boarding.

I browse the "bookstore." I notice two books, one that explains women to men, and one that explains men to women. The subtitles are different in a subtly sexist way. "All you need to know about the inner lives of men" versus "A straightforward guide to the inner lives of women." Women are needy, I understand, while men need directions. There is also a book on how to change your life in 24 hours. I have 55 minutes to kill and perhaps browsing through the book will change that. One chapter is devoted to changing your appearence. It relates the story of a person who, at an airport, walks up to a complete stranger and asks "what do you think is wrong with my appearence?" The stranger criticizes the other guy's eyewear and sideburns, and the fellow makes an appointment with a barber and an optician after returning home. The lesson, I suppose, is that a stranger can judge us with more impartiality than we can judge ourselves. Who buys these books? I fantasize about trying this approach, but the fear of being arrested for disorderly conduct (the loudspeakers invite to "report suspicious behavior") is too strong. Besides, I don't see anybody of trustworthy fashion sense. And I surely can judge them impartially.

Moving along, there is a book about the Rapture, and a book with Anderson Cooper's face on the cover. As I am warming up to the prospect of reading the in-flight magazine, I find Barbara Ehrenreich's latest book Bait and Switch. You may remember her from her book Nickel and Dimed, where she goes undercover to report on the lives of minimum-wage workers, or from the time she took over Maureen Dowd's column in The New York Times, and we had the rare experience of reading brilliant woman-authored op-eds in the Times. In Bait and Switch she goes undercover as an unemployed white-collar middle-aged woman looking for a corporate job.

The book makes for a fascinating reading on the downward mobility of the American middle class. Unfortunately (spoilers ahead!), after an almost year-long search, she is unable to land a job, and so the book is devoid of what might have been the most interesting part.

Thursday, August 10, 2006

Fighting the last battle

What security measures to take against the threat of terrorism? The problem is that there is an almost endless number of ways in which a terrorist act can be carried out, and it is very hard, and probably impossible, to find ways of preventing every possible such plot. Perhaps inevitably, security measures are always reactive. Terrorists hijack four planes, kill three thousand people, and make George Bush win an election, all just with box cutters. Hence all box cutters and pointy objects are banned from flights. A would-be suicide attacker tries to detonate explosive hidden in his shoe. Hence we must X-ray all passengers' shoes. Female terrorists hide explosive in their bras. Hence the stories of passengers being indicently patted during security checks. Today we learn of the explosives to be made out of liquid chemicals. Now no liquids can be taken aboard US flights.

It is only a matter of time until a terrorist tries to smuggle explosive on an airplane by hiding it up his ass. And that's when I am going to stop flying.

Sunday, August 06, 2006

Where do you come from?

The top searches were less predictable: after various combinations of "Luca", "Trevisan", "blog" and "In Theory," the top search phrases were "Welcome Yonatan," "Fourier analysis of boolean functions," "Aridatece la Gioconda" and "Poincare conjecture." I wish good luck to the person who searched "How to get faculty housing at Columbia," and the one who searched "how to be dreamy." I hope never to meet the person who searched "'try it yourself' zidane." But it's "pizza hut food stand theories" and "theory of club floors" that I really can't understand.

USA (more than half of the total), Canada, Italy, Israel, and Germany were the nations with the most readers.

Cheers to the Hong Kong readers, who have come here a total of 68 times in July putting Hong Kong ahead of Japan and Australia (but behind Taiwan!) in the statistics that really matters. See you soon: I'll visit the Chinese University of Hong Kong in January.

Thursday, August 03, 2006

Natural Proofs

If we pick at random a function f:{0,1}n->{0,1}, then, very likely, f has circuit complexity at least about 2n/n. Can we derandomize this application of the probabilistic method, and construct such a function in time polynomial in the length of its representation, 2n? This is, of course, the question of whether we can prove exponential circuit lower bounds in E (the complexity class of problems solvable in time 2O(n)).

Consider now the following problems. (1) Given a multivariate polynomial p of degree d (specified as an algebraic circuit) defined over a field F of size > 2d, and assuming that p is not always zero, find an input on which p is not zero. We know that p is non-zero on at least half of the inputs, but is there an efficient deterministic algorithm for this problem? (2) With high probability, a random graph on n vertices has no clique and no independent set of size more than 2log n. Can we deterministically construct such a graph? (3) With high probability, a random d-regular graph on n vertices has very good expansion. Can we deterministically construct such graphs? (4) If we pick a random n x (rho * n) matrix over a finite field, it defines with high probability a linear error-correcting code with rate rho and with good minimum distance. Can we deterministically construct such codes?

There is a superficial similarity to all these problems. We are interested in combinatorial objects with a certain property, we know that a random object has the property with high probability, and we want to deterministically construct an object with the property. (In time polynomial in its size.)

One important distinction is whether or not the property that we are interested (or some good "approximation") is efficiently computable. For example, in the case of the polynomial identity testing problem, we know when an input makes a polynomial non-zero, and, regarding expanders, computing the eigenvalue gap of a given regular graph gives an approximation of its expansion.

If very strong pseudorandom generators (or even "hitting set" generators) exist, then one can solve these explicit construction problems simply by looking at all the possible outputs of the generators. One of them must satisfy the property. This is the argument that proves that all probabilistic algorithms (including, for example, polynomial identity testing) can be derandomized if very strong generators exist.

It would require major algorithmic breakthroughs, however, to certify strong upper bounds to the independence number of a random graph (I think that currently the best upper bound is sqrt(n), via spectral methods), or to certify that a random matrix defines a good linear code, or to certify that a random graph has expansion beyond the eigenvalue bound. It is unclear how to use a generator to solve these explicit construction problems. (Under suitable assumptions, we would still be guaranteed that one possible output of the generator has the property, but we would not know which one.)

What about checking whether a given function f:{0,1}n->{0,1} (think of it as a binary string of length N=2n) has high circuit complexity?

Razborov and Rudich prove the remarkable result that if suitably strong one-way functions (and, hence, pseudorandom generators) exist, then there is a distribution of functions f:{0,1}n->{0,1} (again, think of them as binary strings of length N=2n) concentrated on functions of circuit complexity at most nlog n, that is indistinguishable from the uniform distribution over all boolean functions f:{0,1}n->{0,1} for algorithms of running time poly(N) = 2O(n). This means that (assuming the existence of the appropriate one-way functions), the problem "given the truth table of a boolean function, compute its circuit complexity" is intractable and, in fact, even hard to approximate. Even more remarkably, the problem is hard-on-average, even just to approximate. It is perhaps the only natural problem that is known to be hard-on-average under standard assumptions.

To come back to our discussion, if strong pseudorandom generators exists, we cannot "derandomize" the problem of constructing a function of high circuit complexity by simply using a strong pseudorandom generator and then picking oneof the outputs. There is, however, a way to show that the existence of strong pseudorandom generators does indeed imply the ability to construct functions of high circuit complexity. Here the circularity may make you feel a bit dizzy, so let's move on to the reason why Razborov and Rudich proved their remarkable result.

Their observation was that in all the circuit lower bound arguments known at the time (for example, in the proof that parity has no AC0 circuit) one can recognize a certain pattern: one defines a property of functions which implies the circuit lower bound (in the case of the AC0 lower bound, the property of remaining non-constant after a large number of variables have been randomly fixed) and then proves that a specific function satisfies the property. Razborov and Rudich call such an argument a "natural proof" if (i) the property is computable in time 2O(n) and (ii) the property is true with noticeable probability for a random function. Their result rules out super-polynomial circuit lower bounds proved via a natural proof (assuming strong enough one-way functions). This is because the property used in the natural proof would be an efficiently computable distinguisher between the Razborov-Rudich distribution (which is made of functions of low circuit complexity, and so satisfies the property with probability zero) and the uniform distribution over functions (which is assumed to satify the property with noticeable probability).

More than ten years later, the only candidate technique for non-natural lower bounds is a combination of diagonalization (which tends to lead to non-natural arguments) and of non-relativizing characterizations of complexity classes (because diagonalization alone cannot prove non-relativizing results, and EXP has polynomial size circuits relative to an oracle). In particular, these are the ingredients used by Buhrman et al. in proving that there are problems in MAEXP and PEXP of superpolynomial circuit complexity, and by Vinodchandran to prove that, for every fixed k, PP has problems of circuit complexity more than nk.

It should be noted that we do know explicit constructions of expanders, Ramsey graphs and error-correcting codes whose properties are stronger than the properties that we know how to certify for random graphs and random matrices. Even though the analogy may be faulty, I think we should remain optimistic that we will eventually find a way to explicitly construct functions of high circuit complexity in EXP.