We consider the following general correlation-clustering
problem [1]: given a graph with real nonnegative edge
weights and a 〈+〉/〈−〉 edge labeling,
partition the vertices
into clusters to minimize the total weight of cut 〈+〉 edges and
uncut 〈−〉 edges. Thus, 〈+〉 edges with large weights
(representing strong correlations between endpoints) encourage those
endpoints to belong to a common cluster while 〈−〉 edges with
large weights encourage the endpoints to belong to different
clusters. In contrast to most clustering problems, correlation
clustering specifies neither the desired number of clusters nor a
distance threshold for clustering; both of these parameters are
effectively chosen to be the best possible by the problem
definition.

Correlation clustering was introduced by Bansal, Blum, and
Chawla [1], motivated by both document clustering and
agnostic learning. They proved NP-hardness and gave constant-factor
approximation algorithms for the special case in which the graph is
complete (full information) and every edge has the same weight.
We give an O(log n)-approximation algorithm
for the general case
based on a linear-programming rounding and the “region-growing”
technique. We also prove that this linear program has a gap of
Ω(log n), and therefore our approximation is tight under
this approach. We also give an
O(r3)-approximation algorithm
for Kr, r-minor-free graphs.
On the other hand, we show that the problem is equivalent
to minimum multicut, and therefore APX-hard and difficult to
approximate better than Θ(log n).