As an example, let us consider the following linear program (taken from
Va\v sek Chvátal’s book):

subject to for each and:

To prove a lower bound on the value of this program, it suffices to
find a feasible solution: is feasible and has value 9, so.

Can we prove an upper bound on ? Yes! The second equation,
multiplied by two, reads . And its LHS is
at least the value of the objective (since the ‘s are nonnegative).
In other words, the second constraint, multiplied by two, has its
coefficients larger than in the objective function. So .

We may try some other combinations. (2)+(3) gives a proof that .

More generally, consider a nonnegative linear combination of the
constraints: for nonnegative‘s. The condition that all coefficients are larger than in the
objective function rewrites as:

As long as these constraints are satisfied, we have . So we want to minimize subject to these
constraints. This is a linear program in the ‘s, called thedual linear program. Of course, taking the dual of the dual gives
the initial linear program.

Theorem 1

If both values of the primal and dual LP are finite, then their
values are equal: .

If the solution to the primal is unbounded
(), then the dual is not feasible
().

Dually, if the primal is not feasible
(), then the dual is unbounded
().

No case distinction is needed under the convention that and .

Proof: is easy, we did this on an example. The other direction is
harder, and we will skip it.

2. Semi-definite programming duality

In general, in approximation algorithms, we follow the following steps:

Solve a linear programming relaxation (say a maximization problem);

round the solution to obtain an integer solution;

for the analysis, argue that the value of the output is at least
times the value of the linear program, for some value of~. This gives
a -approximation.

Duality gives a new possibility for approximation algorithms: If we can
find one feasible solution in the dual, such that the value of the
output is at least times the value of this feasible solution, then we
know that is at most times the output.

SDP also have a dual, so this proof technique can in principle be used.
I will illustrate this by an example on correlation clustering, but SDP
duality potentially goes far beyond.

Let us consider the following SDP:

subject to: for every ,

and the SDP constraint: is semi-definite positive.

(This presentation differs from the one used in previous lectures, but it
is more convenient to demonstrate duality.)

We rewrite the SDP constraint as: For every , . This
constraint is linear in the ‘s. In this sense the SDP program is
just a LP with an infinite number of constraints.

We can write the dual like for a linear program. Eventually, we find that, where is a vector, , and
depends on the parameters of the primal. We recognize that this is
equivalent to having positive semi-definite.

Interlude by Nati Linial: This works for convex programming also, via
polarity.

More precisely, let containing in its interior. Let be the polar
of~. Observe that if is the set of PSD matrices, then .

is PSD if and only if, for every PSD, we have . (This
is not entirely trivial but easy.)

For each , is equivalent to: For each , we
have . This is the same phenomenon as LP duality.

3. Correlation clustering

Today we consider a graph with similarity or dissimilarity constraints, and
we want to partition the vertex set by minimizing the number of
disagreements. (This is more difficult than the previous clustering
problem we studied: the exact problem is the same, but in terms of
approximation, this is different.)

More precisely, given the complete graph where all edges are labeled either (similar) or (dissimilar), find a partition of the vertex set that
minimizes the number of inconsistent edges: an edge is inconsistent if it
is labeled but its two vertices are in different pieces of the
partition, or it is labeled but its two vertices are in the same piece
of the partition.

The problem is of course NP-hard, so let us look for an approximation
algorithm. If , there exists a partition satisfying all the
constraints. In this case, the partition is easy to find using a simple
greedy algorithm: Iteratively take a vertex~ and put together with~
every such that is a . That’s one cluster. Set it aside and
repeat.

A simple randomized variant provides a good approximation for the problem
(even if ): We may take vertex uniformly at random. Let us
call this variant “Randomized greedy”.

Theorem 2
Randomized greedy is a 3-approximation for this problem.

Here is an example showing why randomization is necessary. Denote by the vertex set. Put a edge for edge iff or is equal to~. If is picked by the algorithm, then we are really
far from the optimum: we get a solution of cost . On the
other hand, if we use “Randomized greedy”, then is picked with
probability , and in the remaining cases the solution has cost. On average, we obtain a solution with cost.

Proof:
The analysis uses the notion of “bad triangle”. A triangle is
bad if exactly one of its edges is labeled . Certainly, is at
least the maximum number of edge-disjoint bad triangles.

More generally, to every bad triangle~, we associate a value. We consider positive combinations of bad triangles
satisfying the following condition: For each edge, the sum of the ,
for every bad triangle containing~, is at most one. The same
argument shows that .

Analysis: Let be the probability that, during a random execution of
“Randomized greedy”, one of ‘s vertices is chosen before
disappears. Then I claim that the form a feasible solution in
the dual.

Proof: Let be an edge, contained in bad triangles. Consider the
first vertex of the union of these triangles picked by the algorithm. If
it is an endpoint of~, then the algorithm destroys bad triangles.
If it is one of the other vertices, it destroys a single triangle.
Thus the sum of the , where ranges over all bad triangles
containing~, is at most~3.

The rest of the proof is left as exercise.

Theorem 3
Consider now the complete graph with weighted edges and
such that (instead of and , we have weights).
Then “Randomized greedy” (where you put the neighbor of the picked
vertex~ in the same cluster as~ iff is at least ) is
a 5-approximation.

Kleinberg proved that the problem does not admit a PTAS (unless P=NP).
Which variant of the problem admits a PTAS? Let us change the model and
move away from the worst case to the “planted model”.\footnote{The name
comes from the original study of such models in the max-clique problem;
cliques were “planted”, imposed on a random graph. See Uriel Feige’s
work.}

4. Other model

Assume that we are given a (possibly non-complete) graph with and
on the edges. Assume that there is a perfect clustering, but that the
input is generated by a random perturbation: Flip independently the label
of every edge with probability~.

Theorem 4
There is an algorithm that achieves an -approximation in expectation.

This slightly improves the state of the art, but with very different
(SDP) techniques.

\note{From here on, refer to Claire’s slides also.}

View the solution as a set of vectors , such that is iff
the th vertex is in cluster~. In this sense we want to minimize the
Hamming distance between the input and .

The algorithm is now very simple: Solve the dual SDP relaxation using the
ellipsoid method. This defines for every edge , which we
interpret as . This gives another clustering problem. We solve
it using the 5-approximation algorithm described before.

We assume here is a constant (say ).

Let be the “true” clustering before applying the noise, be the
input, and be the solution of the SDP.

Lemma 5.

Proof:
Assume not. Imagine that we add to the initial SDP, called SDP1, the
constraint that . Let SDP2 be the resulting
SDP. We get a new solution that must have a larger value.

\note{To be filled…}

We obtain that a matrix
must be SDP, where the noise comes from the perturbation of the input.
For this step, we find an oracle (a mathematician) and asks him about the
eigenvalues of random matrices. It turns out that the eigenvalues of the
noisy matrix are in expectation , so all eigenvalues are
nonnegative, so the matrix~ is SDP (with high probability). So with
high probability, the dual is feasible, so the two SDP are different.

In applications, and do not sum up to one. This extension
of the model has been studied also. The problem becomes more difficult to
approximate (best known is a -approximation or worse).

If , it is just as if the edge were not there.
And if the input is not really a complete graph, things become much more
complicated.

References:

3-approximation and 5-approximation: Ailon, Charikar, Newman, 2005.

Inapproximability: Jon Kleinberg, 1990’s.

SDP duality has been used in problems related to the Unique Games
Conjecture. Maybe this will prove useful for other problems as well.
There are also recent algorithms for solving SDP that take advantage of the
dual, by Arora and one of his PhD students.

Next course in March: detailed presentation of the subexponential algorithm
for UGC presented by Arora and Steurer.