I have two partitions of $[1 \ldots n]$ and am looking for the edit distance between them.

By this, I want to find the minimal number of single transitions of a node into a different group that are necessary to go from partition A to partition B.

For example the distance from {0 1} {2 3} {4} into {0} {1} {2 3 4} would be two

After searching I came across this paper, but a) I am not sure if they are taking into account the ordering of the groups (something I don't care about) in their distance b) I am not sure how it works and c) There are no references.

$\begingroup$What would you consider the distance to be between {0 1 2 3} and {0 1} {2 3} ? Would it be 2 ? Secondly, I don't see why "graphs" come into the picture at all. It sounds like you have two partitions of [n] and want to compute a distance between them.$\endgroup$
– Suresh VenkatMay 14 '11 at 5:18

$\begingroup$Yes, it would be two. Indeed these are set partitions on the nodes of a graph (i.e. a graph partition). This is likely not important to the solution, but this is the problem I am trying to solve, hence why I mentioned it.$\endgroup$
– zennaMay 14 '11 at 11:36

3

$\begingroup$If the graph is irrelevant, please remove all references to "graphs" and "nodes" from your question; it does not help, it distracts.$\endgroup$
– Jukka SuomelaMay 14 '11 at 20:57

$\begingroup$Can't the edit distance be defined in terms of the distance on the partition lattice?$\endgroup$
– Tegiri NenashiMay 17 '11 at 17:26

$\begingroup$@Tegiri - It is indeed the geodesic distance on the lattice of partititons. Unfortunately computing that lattice for any set of cardinality much greater than 10 is intractable.$\endgroup$
– zennaAug 31 '11 at 10:16

3 Answers
3

This problem can be transformed into the assignment problem, also known as maximum weighted bipartite matching problem.

Note first that the edit distance equals the number of elements which need to change from one set to another. This equals the total number of elements minus the number of elements which do not need to change. So finding the minimum number of elements which do not change is equivalent to finding the maximum number of vertices that do not change.

This is exactly the assignment problem where the vertices are $A_1$, ..., $A_k$, $B_1$, ..., $B_k$ and the edges are pairs $(A_i, B_j)$ with weight $|A_i \cap B_j|$. This can be solved in $O(|V|^2 \log |V| + |V||E|)$ time.

The definition of edit distance in there is exactly what you need I think. The 'reference' partition would be (an arbitrary) one of your two partitions, the other would simply be the other one. Also contains relevant citations.

$\begingroup$Thanks Rob. However, unless I am missing something, this is an edit distance defined in terms of split-merge moves. These are well studied and as the paper points out, the variation of information is a information theoretic measure of this. I am interested however, in single element move transitions.$\endgroup$
– zennaAug 31 '11 at 12:58

Wlog, let $P_1$ be the partition with more sets, $P_2$ the other. First, assign pairwise different names $n_1(S) \in \Sigma$ to your sets $P_1$. Then, find a best naming $n_2(S)$ for the sets $P_2$ by the following rules:

If now $n_2(S) = n_2(S')$ for some $S \neq S'$, assign the one that shares less elements with $S'', n_1(S'') = n_2(S)$, the name of the set in $P_1$ it shares the second most elements with, i.e. have it compete for that set's name.

If the former rule can not be applied, check for both sets wether they can compete for the name of other sets they share less elements with (they might still have more elements from some $S'' \in P_1$ than the sets that got assigned its name!). If so, assign that name to the one of $S, S'$ that shares more elements with the respective set whose name they can compete for; the other keeps the formerly conflicting name.

Iterate this procedure until all conflicts are resolved. Since $P_1$ does not have less sets than $P_2$, there are enough names.