In this paper, the authors solve the open problem of extracting the maximal number of iterations from a loop that can be executed in parallel on Chip Multi-Processors (CMPs). Their algorithm solves it optimally by migrating the weights of parallelism-inhibiting dependences on dependence cycles in two phases. They model dependence migration with retiming and formulate this classic loop parallelization into a graph optimization problem, i.e., one of finding retiming values for its nodes so that the minimum non-zero edge weight in the graph is maximized.