Minimum Common String Partition Problem: Hardness and Approximations

Abstract

String comparison is a fundamental problem in computer science, with applications in areas such as computational biology, text processing or compression. In this paper we address the minimum common string partition problem, a string comparison problem with tight connection to the problem of sorting by reversals with duplicates, a key problem in genome rearrangement.

A partition of a string A is a sequence \({\mathcal P}=(P_{1},P_{2},...P_{m})\) of strings, called the blocks, whose concatenation is equal to A. Given a partition \({\mathcal P}\) of a string A and a partition \({\mathcal Q}\) of a string B, we say that the pair \(\langle\mathcal{P,Q}\rangle\) is a common partition of A and B if \({\mathcal Q}\) is a permutation of \({\mathcal P}\). The minimum common string partition problem (MCSP) is to find a common partition of two strings A and B with the minimum number of blocks. The restricted version of MCSP where each letter occurs at most k times in each input string, is denoted by k-MCSP.

In this paper, we show that 2-MCSP (and therefore MCSP) is NP-hard and, moreover, even APX-hard. We describe a 1.1037-approximation for 2-MCSP and a linear time 4-approximation algorithm for 3-MCSP. We are not aware of any better approximations.