The Greedy Algorithm for the Minimum Common String Partition Problem

Abstract

In the Minimum Common String Partition problem (MCSP) we are given two strings on input, and we wish to partition them into the same collection of substrings, minimimizing the number of the substrings in the partition. Even a special case, denoted 2-MCSP, where each letter occurs at most twice in each input string, is NP-hard. We study a greedy algorithm for MCSP that at each step extracts a longest common substring from the given strings. We show that the approximation ratio of this algorithm is between Ω(n0.43) and O(n0.69). In case of 2-MCSP, we show that the approximation ratio is equal to 3. For 4-MCSP, we give a lower bound of Ω(log n).