Author Archive

The method I described last time started giving some very strange numbers for the hourly PSI yesterday June 21. Negative numbers! It turns out the 3-hour average PSI is not the average of the PSI for the last 3 hours. Huh?

Update: I see that someone else had the same idea here. Copying that plot, here is a plot of the 3 hour averages (in red) and 1 hour inferred averages (in blue) for June 20 from 6am to 5pm. Though the 3 hour average high of 371 at 1pm set a new record, the peak 1 hour average wasn’t that much worse than yesterday, about 467 by my calculation.

3 hour PSI averages and inferred one hour averages for June 20

——————

Singapore is obsessed with numbers these days. Well, one number in particular: the pollutant standards index or PSI. PSI measures the concentration of particles of diameter 10 micrometres or less in the air, that is about one-seventh the width of a human hair. A PSI value of 50 is fairly normal in a city these days. Over 100 starts to become noticeable and unhealthy, and above 200 feels like sitting next to a campfire with the wind blowing in your face.

The National Environment Agency hourly publishes PSI values for Singapore averaged over a 3 hour period. Because of fires in Sumatra the PSI levels have been elevated all this week, and last night (June 19) the PSI reading at 10pm hit a Singaporean record high of 321 (update: since broken with a 371 today). What I found even more remarkable, however, was that from 8pm to 9pm the 3-hour average jumped from 190 to 290. Since these readings share two data points, this means the jump from the 6pm reading to the 9pm reading must have been 300 points! The averages throughout the afternoon were mid-100′s, so it stands to reason that the 9pm reading was in the mid-400′s.

For some more fun, we can try to work out all the hourly averages from the 3 hour averages. Here is a table of the 3 hour averages from yesterday June 19 (currently found on the wikipedia page about PSI).

3 hour PSI averages in Singapore on June 19, 2013

We have more unknowns than equations, so we will have to make some assumptions to calculate the hourly values. Notice that in the morning the PSI readings are fairly flat: 77, 78, 80 at 6am, 7am, and 8am. Here the 6am value is the average PSI in the time period 3-6am. The two assumptions I will make are that the hourly values from 3-4am and 4-5am were 77. Of course, this means that the 5-6am value is also 77 since we know that is the 3 hour average. With this assumption we can set up a set of linear equations and work out the hourly averages for the rest of the day.

For example, to figure out the hourly values from 6am to 11am we have the following matrix equation.

Solving the analogous equation for the whole day gives the following values.

Inferred 1 hour PSI averages on June 19, 2013

The hourly PSI average from 8-9pm, where we saw the spike in the 3-hour average, was a whopping 452. This is in line with the prediction from our simple reasoning above. The results are quite robust to assuming different values in the early morning—if the 3 hour averages are correct, it looks like in any case we must have gotten over 400 in the 8-9pm period!

A favorite open problem recently returned to my head. For every boolean function \(f:\{0,1\}^n \rightarrow \{0,1\}\) is the deterministic query complexity at most the square of the bounded-error quantum query complexity, that is \(D(f) \le Q(f)^2\)? We know a function where this square gap is realized, the OR function which Grover’s algorithm can solve in \(O(\sqrt{n})\) quantum queries but for which deterministically in the worst case you have to query the entire input to tell if there is a 1. We also know that for partial functions (functions not defined over all of \(\{0,1\}^n\)) quantum and randomized query complexity can be exponentially far apart, for example for Simon’s problem.

Currently we know that \(D(f) \le Q(f)^6\) for total functions, a relation which has seen no improvement since the classic 1998 paper of Beals et al. that first showed a polynomial relationship. It is time for some improvement, no?

Every time this problem comes back to me, I end up spending a couple of days on it. The proof of this result, and similar statements, all use a well-defined proof paradigm we might call reduce and repeat. It always seems that if one can simply apply reduce and repeat with some other complexity measures, in a smarter way, improvement will follow.

You might use reduce and repeat when you lose your keys somewhere in your house: Pick a room you have not searched yet, and look there. If you find your keys, well, then you are done. If not, you have reduced the number of rooms remaining to search by one. So if searching any room takes you at most 10 minutes you will know that in time at most #rooms*10 minutes you will have found your keys or found that they are lost. Well, progress is progress.

The algorithm for deterministic query complexity is similar, except that instead of searching a room we make a few queries, and instead of looking for keys, we are looking for a certificate that the function value is 1 (defined formally below). If the round of queries reveals a one certificate, then we know the function value is 1. If they don’t, then the reduce step argues that the new function restricted to the query values we have seen is closer to the constant function.

As there can be exponential gaps for partial functions, we must somewhere use the fact that the function is total. Many potential proof ideas are spoiled by asking: “Where am I making use of the fact that the function is total?” Thus we will pay careful attention in these proofs to how totality is used.

“Is the task impossible or does it only seem that way? The beauty is: Nobody knows.”

Someone talking about solving the Traveling Salesman Problem in polynomial time? No, though I wouldn’t blame you for thinking that. This is from a new book of wire walker extraordinaire Philippe Petit, “Cheating the Impossible: Ideas and recipes from a rebellious high-wire artist“. It is a short TED book, available as an Amazon single. Here is the related TED talk, though I found the book better than the talk (and does not take much longer to read).

Note that the normal rank of \(M_n\) is \(3\) for any \(n\). It was conjectured (and even claimed to be proven) that the nonnegative rank of \(M_n\) is \(\Omega(n)\). Pavel Hrubešrecently (article behind paywall) gave a surprising answer.

Pavel’s proof is quite simple and elegant—simple enough to go through here. It is a proof by induction. In the base case, note that the rank of \(M_1\) is \(1\). For the induction step he shows that \(\mathrm{rk}_+(M_{2n}) \le \mathrm{rk}_+(M_n) + 2\). This will give the theorem.

Note that this matrix has a lot of structure. It is constant on diagonals, and symmetric. Moreover, the top left \(3\)-by-\(3\) block and bottom right \(3\)-by-\(3\) blocks are the same and are equal to \(M_3\).

Now we are going to rearrange the rows and columns. This will not change the nonnegative rank. Instead of labeling the rows and columns by \(1,2,3,4,5,6\) we will label them as \(1,2,3,6,5,4\). This gives the new matrix \(R_6\) (think of \(R\) as rearranged) where
\[
R_6 =
\begin{bmatrix}
0&1&4&25&16&9\\
1&0&1&16&9&4\\
4&1&0&9&4&1\\
25&16&9&0&1&4\\
16&9&4&1&0&1\\
9&4&1&4&1&0
\end{bmatrix}
\]

If we write this in terms of \(3\)-by-\(3\) blocks this is
\[
R_6 =
\begin{bmatrix}
M_3 & B_3 \\
B_3 & M_3
\end{bmatrix}
\]
Moreover, if you look at it, \(B_3\) is entrywise larger than \(M_3\). Thus we can express \(B_3 = M_3 + C_3\) where \(C_3\) is non-negative and hopefully has small non-negative rank! It turns out that in general \(C_n=B_n-M_n\) is rank one (and so is non-negative rank one as well). Let’s check that in our small example.

The general case proceeds in the same fashion. But now let me explain things in a slightly different, more geometrical way. We want to express all the entries of \(M_{2n}\) in terms of entries of \(M_n\), plus possibly some correction. To do this, let us define a mapping \(f: \{1, \ldots, 2n\} \rightarrow \{1, \ldots, n\}\). Then we want to write \((i-j)^2 = (f(i) – f(j))^2 + c_{i,j}\) where the corrections \(c_{i,j}\) hopefully form a matrix of low non-negative rank (actually rank one).

Imagine the numbers \(1, \ldots, n\) laid out on the \(x\)-axis. The mapping \(f\) will be conditional reflection in the line \(x=n+1/2\). That is, if \(1 \le i \le n\) then \(f(i)=i\) and if \(i > n+1/2\) then \(f(i)= i – 2(i-(n+1/2))=2n+1-i\).

In the example where \(n=6\) we conditionally reflect in the line \(x=3.5\). Note that this sends \(4 \rightarrow 3, 5 \rightarrow 2, 6 \rightarrow 1\). This is what the rearrangement of the rows and columns above did—we put column \(i > n+1/2\) into position \(n+f(i)\).

Clearly, if \(1 \le i,j \le n\) then \((i-j)^2 = (f(i)-f(j))^2\) and also as reflection preserves distance if \(n+1/2 < i,j \le 2n\) then \((i-j)^2 = (f(i)-f(j))^2\). This is why the two diagonal blocks of \(R_6\) are equal to \(M_3\).

The interesting case is when, say, \(i \le n\) and \(j > n+1/2\). In this case, you can check that \(|f(i)-f(j)| \le |i – j|\), so the points get closer together. This means that \((i-j)^2 = (f(i) – f(j))^2 + c_{i,j}\) where \(c_{i,j}\) is non-negative. A good start.

What is \(c_{i,j}\)? Let’s do the computation. We are going to use the fact that \((a-b)^2 – (a+b)^2 = -4 ab\).

Remember that \(i < n+1/2\) so now we get that \(c_{i,j}\) is the product of two positive numbers one which only depends on \(i\) and the other which only depends on \(j\) — thus the matrix \(C(i,j)=c_{i,j}\) is rank one!

That’s the end of the proof. I think there is more potential in this proof technique. Essentially the same proof idea is used by Fiorini, Rothvoß, and Tiwary to show that the non-negative rank of the slack matrix of a regular \(n\)-vertex polygon is \(O(log n)\). This has apparently been known for some time—the authors say it is implicit in the work of Ben-Tal and Nemirovski from 1999, though I cannot see this. The Fiorini et al. paper is highly readable and gives an explicit non-negative factorization of the slack matrix very similar in spirit to the above proof.

What is the slack matrix of an \(n\)-vertex polygon? It is an \(n\)-by-\(n\) matrix with rows labeled by vertices and columns labeled by faces. The \((i,j)\) entry is the distance of the \(i\)th vertex to the \(j\)th face. The same idea of conditional reflection can be used to map the vertices on one half of the polygon to those on the other half, and the same for faces. This sets up the same kind of inductive proof, where the correction term again turns out to be rank one.

Now the question of the non-negative rank of the “standard” linear Euclidean distance matrix has been resolved. Are there linear Euclidean distance matrices that have non-negative rank \(\Omega(n)\)? In general, a linear Euclidean distance matrix is of the form \(M(i,j) = (a_i-a_j)^2\) for some numbers \(a_1, \ldots, a_n\). Pavel’s proof shows that if \(a_1, \ldots, a_n \ge 0\) are contained in an arithmetic progression of length \(m\) then the non-negative rank of \(M\) is at most \(\log(m)\). So if a linear Euclidean distance matrix is going to have large non-negative rank it is going to have to contain large numbers. I spoke with Pavel about this and he suggested the matrix where formed by \(a_i=2^{2^i}\) as an interesting candidate to have large non-negative rank.

In the last post, I mentioned an open problem. Last week, I solved it.

I had the idea of something to try for a few days and finally got around to it on Monday. I didn’t think it would work, but thought it would be interesting to see why it failed. By late afternoon, nothing had failed and I became very excited.

For me, this is one of the best moments in research. It is a walk along a cliff edge—the exhilirating height of discovery and danger that at any moment a hole can appear that sends you and your proof tumbling down. Sometimes I just sit on the cliff and enjoy the view—purposefully delaying writing up the proof because I know it can all come crashing down, and I want to prolong those moments of joy a little while longer.

In this case, the initial verifications passed. I worked out a small example that also checked out. I was becoming fairly convinced.

I thought about canceling my plans for the evening to write up the proof straight away. But again I thought why not wait and enjoy these moments of new proof a little while longer.

I got back home around 11pm and started in on the proof, typing everything up. As darkness lifted in the early morning hours, the proof was done and there were no bugs. For this problem, I could write a program to check the algorithm, which also worked. At this point all doubts had evaporated.

It was late. I was tired. I was just putting the final touches on the writeup, adding some references so I could send it to some colleagues. I was googling to get the bibliographical data. Then a paper appeared in the list of search results that I had not seen before. I clicked on it. It solved the same problem! I went to bed depressed.

In a previous post, we solved a small instance of the traveling salesman problem (TSP) around Singapore’s hawker centers using linear programming. Today, we will talk about limitations of the linear programming approach to TSP, namely a recent result showing a super-polynomial lower bound on the number of linear constraints needed to characterize the set of valid tours. The paper is “Linear vs. Semidefinite Extended Formulations: Exponential Separation and Strong Lower Bounds” by Fiorini, Massar, Pokutta, Tiwary, and de Wolf (hereafter referred to as FMPTW) and shared the best paper award at the STOC 2012 conference.

A key tool in this lower bound is another concept we havediscussed on this blog, the non-negative rank. Believe it or not, those posts were written to prepare for this one! Even so, this post will be more technical than usual. Before we get started, let me thank Ronald de Wolf for his comments on this post, and for explaining this work to me when he visited CQT at the end of last year.

To put the FMPTW work in context, let me first recap the approach we took to finding the shortest tour through Singapore’s hawker centers. We considered variables \(x_{i,j}\) meant to represent the presence or absence of an edge between city \(i\) and \(j\) in a tour. We then brainstormed linear constraints on the \(x_{i,j}\) satisfied by true tours—starting out with basic constraints like all edge weights should be non-negative and the total weight of edges incident to any city should be two. Finding the shortest “tour” with respect to these linear constraints can be done efficiently, as it is a linear program. We saw, however, that the solution was not a true tour but had problems like non-integral weight edges and small cycles. To combat this, we did several iterations of adding more linear constraints in an attempt to rub out spurious solutions. A natural question is: when will this process stop? How many linear constraints are needed to describe the set of valid tours? The work of FMPTW gives the first super-polynomial lower bound on the number of linear constraints needed to characterize the set of valid tours: in fact, they show a bound of \(2^{n^{1/4}}\) for tours on \(n\) cities. (Ronald tells me they have now improved this to \(2^{\sqrt{n}}\).)

I want to clear up one thing that might be confusing with this introduction to their result. In the hawker center problem, we did not develop a set of linear constraints to characterize all valid tours on 18 cities—we just needed to eliminate false solutions near the optimal tour for our particular problem. To do this, we chose the constraints adaptively, meaning that when the linear program returned a spurious solution, we scratched our heads and identified a fault of this solution (like having a small cycle) and added a linear constraint to eliminate this fault. For a different configuration of hawker centers, our
final linear program might still return a spurious solution. The FMPTW result, on the other hand, speaks about the number of linear inequalities needed to characterize the set of valid tours. Such a characterization is not dependent on the objective function (the configuration of the cities), and so would universally allow you to solve a TSP problem by optimizing with respect to these inequalities.

The paper actually deals with more general linear programming formulations called extended formulations that allow the introduction of extra variables. In addition to the variable \(x \in R^{n \choose 2}\) whose \((i,j)\) entry represents the “strength” of the edge between cities \(i\) and \(j\) in a tour
(and ideally would be either 0 or 1), we are now also allowed some extra variables \(y \in R^k\) that don’t enter in the objective function. An extended formulation for TSP expresses the convex hull of the set of valid tours by the set \(\{x : \exists y \ge 0 : Ax +By = b\}\). The size of an extended formulation is the number of constraints, that is the number of rows in \(A\) and \(B\).

While it may be surprising that allowing extra variables can make a difference, it can. For example, in solving the hawker tour problem a key role was played by subtour inequalities to eliminate small cycles—inequalities of the form \(\sum_{i<j \in S} x_{i,j} \le |S|-1\) for a set \(S \subset [n]\). We did not add all the subtour inequalities, though, because there are exponentially many! The extended formulation size of the linear programming relaxation of TSP with all subtour inequalities, however, is still polynomial, in fact \(O(n^3)\).

The connection between extended formulation size and non-negative rank goes back to a beautiful result by Yannakakis over 20 years ago. We first need a couple of definitions. For our purposes a (convex) polytope \(P\) is a bounded set defined by linear inequalities \(P = \{x : Ax \le b\}\). A slack matrix \(S\) for \(P\) is a matrix with rows labeled by constraints (i.e. rows of \(A\), which we denote by \(A_i\) for the \(i\)th row) and columns labeled by points in \(P\) whose convex hull is all of \(P\). The \((i,j)\) entry corresponding to a row labeled by \(A_i\) and column by a point \(v_j\) is \(S(i,j)=b_i – A_i v_j\). That is, the \((i,j)\) entry of \(S\) indicates the slack of \(v_j\) with respect to the \(i^{th}\) constraint. As \(v_j \in P\) it must be the case that \(A_iv_j \le b_i\), so all the entries of the slack matrix are non-negative. An entry is \(0\) exactly when \(v_j\) satisfies the corresponding constraint with equality.

Note that there can be many different slack matrices for a polytope \(P\) as we have some freedom in choosing \(A,b\) and the points in \(P\). The extension complexity of \(P\) is the size of a smallest extended formulation defining \(P\). Yannakakis showed the following.

Theorem 1 (Yannakakis)Let \(P\) be a polytope. For any slack matrix \(S\) of \(P\), the non-negative rank of \(S\) is equal to the extension complexity of \(P\).

Thus in particular the non-negative rank of all slack matrices of \(P\) is the same.

Showing lower bounds on non-negative rank is very difficult, and we have few strong lower bounds for concrete examples. For example, even the following is open:

Question 2: Consider an \(n\)-by-\(n\) matrix \(M\) where \(M(i,j) = (i-j)^2\) for \(i,j \in [n]\). The rank of \(M\) is \(3\). What is the non-negative rank of \(M\)?

It is conjectured that the non-negative rank is \(\Omega(n)\), yet the best lower bound is \(\Omega(\log n)\).

One of the most interesting lower bounds on non-negative rank is for the unique disjointness matrix
from communication complexity. Consider a \(2^n\)-by-\(2^n\) matrix \(M\) indexed by \(n\)-bit strings. If \(x \cap y = \emptyset\) then \(M(x,y)=1\). If \(|x \cap y| =1\) then \(M(x,y)=0\). No matter how you fill out the rest of \(M\) with non-negative numbers, the non-negative rank will be \(2^{\Omega(n)}\). This lower bound follows from the key lemma in Razborov’s \(\Omega(n)\) lower bound on the randomized communication complexity of the disjointness function.

As unique disjointness provides a non-trivial family of matrices for which there is strong lower bound on the non-negative rank, it would be great to embed it in the slack matrix for an interesting combinatorial problem. This is exactly what FMPTW do.

Instead of directly working with TSP, they instead work with the correlation polytope, the convex hull of \(\{ bb^t : b \in \{0,1\}^n\}\). While not as famous as TSP, this is another important and well studied polytope. FMPTW provide a reduction to show that lower bounds on the extension complexity of the correlation polytope provide (slightly weaker) lower bounds on extension complexity of the TSP polytope. We will just focus on showing the lower bound on the extension complexity of the correlation polytope.

Now what matrices can appear as slack matrices of the correlation polytope? Here is a more general, and I think simpler, construction than that given by FMPTW to show a useful class of matrices that are slack matrices for the correlation polytope.

Lemma 3: Let \(p(z) = a + bz+cz^2\) be a single variate degree 2 polynomial that is non-negative on non-negative integers. Consider the matrix \(M(x,y) = p(| x \cap y)|)\) for \(x, y \in \{0,1\}^n\). Then \(M\) is a submatrix of a slack matrix for the correlation polytope.

The matrix used by FMPTW is \(F(x,y) = p(|x \cap y|)\) where \(p(z) = (z-1)^2\). By Lemma 3 this matrix is a slack matrix for the correlation polytope and moreover \(F\) is an instance of unique disjointness. Thus we get an exponential lower bound on the non-negative rank of \(F\), and hence the extension complexity of the correlation polytope.

The way that FMPTW came up with this matrix was through—of all things—quantum communication complexity! That story will have to wait for a future post.

My father is coming to visit Singapore next week. Of course one thing I have to do is take him to sample some of the great food at the hawker centres here. Say we are very ambitious and want to visit 18 of the best hawker centres on the island…in order to intelligently engage in the raging debate on which stall has the best char kway teow. As my Dad’s stay is limited, we want the shortest tour visiting all these hawker centres.

Here is a google map of the 18 chosen hawker centres. What do you think the shortest tour is?

I have to finish reviewing two papers today. Posting about the irrationality of reviewing seems like a great way to procrastinate.

Why do we review papers? As pointed out in the boycott against Elsevier, with the big commercial publishers, reviewers are doing free work for very profitable companies. Could it actually be that we are more likely to do reviews because they are not paid?

In Predictably Irrational, Dan Ariely distinguishes actions we undertake as part of social norms versus market norms. When we help a friend move a sofa, we would probably be offended if he offered to pay us something at the end—this action takes place in the social realm, not the market realm. Ariely did experiments (with college students of course) where he found subjects worked harder when asked to do a simple task as a favor to the experimenter than when they were paid some nominal amount. Similarly, he relates the story of a daycare center that found that the tardiness of parents picking up their kids increased when the center started imposing fines for lateness. Before the fines, parents avoided being late as a courtesy to the teachers who they were keeping from going home. With fines this moved to a market interaction, where you could pay to be late.

Reviews largely work in the realm of social norms. Many review requests I receive are from colleagues I know well, making them hard to turn down. It is much easier to say no to form letters from people I do not know, and it would be similarly easier to say no if it became a (low) paid job rather than a favor.

While on the topic of reviewing irrationality, I remember an old post of Lance Fortnow that when evaluating results in a paper, sometimes less is more. He related the story of a student who had a paper rejected from STOC, then removed one of the two main theorems and won best student paper at FOCS.

Being on a pop economics kick, I just finished reading Thinking, Fast and Slow by Daniel Kahneman, who discusses this very “less is more” phenomenon. This book was very enjoyable, full of interesting examples of how real human behavior differs from that of the Homo Economicus usually assumed by economists.

Kahneman describes an experiment where subjects are asked to put a price on a set of dinnerware. Set A has 24 plates and bowls. Set B is set A with the addition of 8 cups and 8 saucers, a few of which are broken.

In single evaluation, where subjects are presented only one of set A or set B, set A was valued at $33 while set B was valued at $23. In joint evaluation, where both sets are presented simultaneously, subjects acknowledge that set B can only be better than set A, and valued set B at $32 and set A at $30.

I think the implication for paper submission is clear: if in doubt, submit multiple versions and force your reviewers to behave rationally with joint evaluation.