Transcription

1 A New Allgoriitthm for Miiniimum Costt Liinkiing M. Sreenivas Alluri Institute of Management Sciences Hanamkonda , AP, INDIA Dr. T. Srinivas Department of Mathematics Kakatiya University Warangal , AP, INDIA Abstract This paper considers a special case of the link distance where both point sets lie on the real line and the cost of matching two points is the distance between them in the L 1 metric. An O(n 2 ) algorithm for this problem is presented, improving the previous best known complexity of O(n 3 ). 1 Introduction Given two finite point sets S and T, let n = S + T. A linking between S and T is a matching, L, between the sets where all elements of S and T are matched to at least one element of the other set. The link distance between the two sets is defined as the minimumcost linking according to some distance function. This distance measure was originally proposed in 1997 by Eiter and Mannila [5] in the context of measuring the relationship between theories expressed in a logical language. Eiter and Mannila show that this problem can be solved in O(n 3 ) time via a reduction to the computation of a minimum-weight perfect matching in a suitable bipartite graph. The link distance can also be expressed as the minimum-weight bipartite edge cover problem. For a weighted bipartite graph G = (SUT, w, E), the edge cover problem asks a subset, E 1, of edges such that each vertex is the endpoint of at least one edge. The minimumweight edge cover problem finds the set E 1 of minimum weight. This problem may be solved in O(n 3 ) time using the Hungarian method [8]. Let D = (V,E) be a directed graph, and let V be partitioned into two disjoint sets, the source vertices S and the target vertices T. A bibranching in D with respect to S is a set of edges B in E such that: For each v in S, B contains a directed path from v to a vertex in T, and For each v in T, B contains a directed path from a vertex in S to v. For the special case when D is a bipartite graph with color classes S and T, and all the edges in D are directed from S to T, the bibranching is a bipartite edge cover. 1

2 In this note we address the special case where both point sets lie on the real line and the distance is measured with the L 1 metric. This version of the problem has applications to sequence comparison in bioinformatics as well as measuring music similarity in music information retrieval. Lemma 1.1. Let S and T be two sets of points, and L a minimum-cost linking between them. For s s 1 if(s, t) L, and (s 1, t) L then the distance from s to t is not more than the distance from s to any other element of T. Proof. Assume that for some minimum-cost linking L the above property does not hold. This implies that there is some s where (s, t) L such that (s, t) > (s, t 1 ) for some t 1 T, and there is s 1 s where (s 1, t) L. Consider L* L only in that s is linked to t 0 instead of t. Then the cost of L* is less than the cost of L, which contradicts the assumption that L is a minimum-cost linking. Lemma 1.2. Let S and T be two sets of points and L be a minimum-cost linking between them. Then, for any relation (s; t) L, either s or t has degree 1. Proof. Assume that there exists a minimum-cost linking L for which the Lemma does not hold. This implies that for some s; t where (s; t) L there exist s 1 ; t 1 where the relations (s; t 1 ) and (s 1 ; t) are also in L. But then we can construct a new linking L* L only in the exclusion of (s; t). L 1 is also a linking, since all elements are linked to at least one element of the other set; yet the cost of L* is less than cost of L, a contradiction. Eiter and Mannila proceed by constructing a bipartite graph in which the matching is performed. The key concept is the creation of dummy nodes to handle the case when an element, x, of either set is linked to an element of the other set which has degree more than one, and hence, by Lemma 1.1 is x s nearest neighbor in that set. From Lemma 1.2 we know that any such x must have degree one. Therefore we need only create one copy for each element of SUT with the weight of an edge between an element and its dummy node equal to the distance between that element and its nearest neighbor. The desired bipartite graph consists of two complete sub graphs; one representing the elements of S and T with weights equal to the distances between the elements; the other a complete zero-weight dummy subgraph. An element in the dummy subgraph is connected to its corresponding element in the first subgraph with a weight equal to the distance to that element s nearest neighbor. For sets S and T create a graph G = (AUB, E, w) in the following manner. For each s S create an a A, and for each t T create a b B. For each edge, e = (ai, bj), w(e) = (si, tj). Next, create a zero weight copy of the graph G 1 = (A 1 UB 1,E 1, w 1 ). 2

3 From these two graphs we construct a graph G 11 = G U G 1 and contains additional edges from ai to ai 1 and bj to bj 1 with weight equal to the minimum distance from si to an element of T and from t j to an element of S, respectively. Illustration 1: A graph corresponding to S = {2,6,7,8}, T = {1,3,7}. The minimum-weight perfect matching is given in bold. Figure 1: Lemma 1.3. w(m) = c(l) Proof. From a minimum-cost linking L, for every pairing (si, tj) L we perform one of three actions. If both si and tj have not yet been matched, then we add the edges (ai,bj) and (ai 1, bj1) to M. Otherwise, if si is already matched, we add (bj,bj 1 ) to the matching. Finally, if tj is already matched, we add (ai;ai 1 ) to the matching. Note that Lemma 2.2 cannot already have matched both s i and tj. Since each element si or tj is linked in L the corresponding nodes in G 11 must be matched. Therefore M is a matching. Furthermore, the weight of matching M* is equal to the cost of L because for the cost of each individual linking (si; tj) there is a corresponding matching in M* of equal weight. This follows from the definition of the weight function and Lemma 1.1. Thus w(m*) = c(l) establishing that w(m) c(l). From a minimum-weight matching M we construct a linking L*. For every matching m M if m = (ai,bj) we add (si, tj) to L. Otherwise, if m = (ai;ai 1 ) then we add a link between si and t T where t minimizes (si; t). Likewise for m = (bj,bj 1 ) we add a link between bj and an s S that minimizes (s,bj). This must be a linking since each element in S and T is represented by some ai or bj, which must occur in some matching. The weight of M is less than the cost of L because the cost of each link created is equal to the corresponding match by virtue of the manner in which the weight function is defined. We conclude that c(l) w(m). Thus, w(m) = c(l) 3

5 Corollary 2.2. Let S and T be sets of points on the real line. Then there exists a minimumcost linking L*: S T such that for all si < sj, L*(si) L*(sj). This observation implies that in a minimum-cost linking, if we know that (si, t j) L, then one of (si, tj+1), (si+1, t j), or (si+1, t j+1) must also be in L. Using this information it is not hard to reduce the problem to finding the shortest path through a weighted directed acyclic graph. The construction of this graph differs from the previous directed acyclic graph in that it is expanded to allow multiple linkings not only from elements of S to elements of T, but also from elements of T to elements of S. Let S and T be sets of integers on the interval (0;X). Let si denote the i th element of S. We construct a directed acyclic graph G in the following way. For all pairs of elements si, t j where si S and t j T construct a vertex vi, j. From each vertex vi, j add an edge to vi, j+1, vi+1, j, vi+1, j+1 with weights ti - sj+1, ti+1 - sj, ti+1 -sj+1, respectively, provided that each vertex exists. Finally create a vertex labeled start and insert an edge to v 1,1 with weight s 1-t 1. Illustration 2: A directed acyclic graph corresponding to the sets S = {2, 6, 7, 8}, T = {1, 3, 7}. The bold edges indicate the minimum-weight path through the graph. Figure 3: Lemma 2.3. w(p) = c(l) Proof. Consider a minimum-cost linking L on S and T. Then we can create a path P 1 from start to v T, S in G in the following way. First, the edge from start to v 1,1 is inserted. Next, for each (ti, sj) in L we know that one and only one of (ti, sj+1), (ti+1, sj), (ti+1, sj+1) is also in L. Therefore we add an edge from v i, j to whichever of the three that is in P 1. Since such an edge must exist in the graph, the path is connected. 5

6 Note that there is a one-to-one correspondence between edges and links, and by construction, each edge has the same weight, as it s corresponding link. Therefore we conclude that w(p 1 ) = c(l) and therefore, w(p) c(l). From a minimum-weight path P from start to v S, T in G, we create a linking, L 1, on the corresponding strings S and T. For any vertex v i, j through which P passes, add (si, tj) to L. This is a valid linking because there exist no paths in the graph that do not touch all rows and columns, and thus each node in SUT is included in some linking in L 1. Since the weight of each edge used is equal to the cost of the corresponding linking we conclude that w(p) = c(l 1 ), which yields w(p) c(l). Thus the desired result is w(p) = c(l). As for the complexity of this method, let S = n and T = m. In the construction V = O(n *m), and since there are at most three edges per node, E = O( V ). Thus the construction takes O( V ) = O(n*m) time. The algorithm for finding the shortest path between two elements in a directed acyclic graph takes time O( V + E ) which in this case is O( V ) = O(n *m) time [3]. Thus the total time complexity of the method is O(n *m) = O(n 2 ). References [1] Alok Aggarwal, Amotz Bar-Noy, Samir Khuller, Dina Kravets, and Baruch Schieber. Efficient minimum cost matching and transportation using the quadrangle inequality. Journal of Algorithms, 19(1): , [2] J. Colannino and G. Toussaint. An algorithm for computing the restriction scaffold assignment problem in computational biology. Information Processing Letters, 95(Issue 4): , [3] Thomas C. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. MIT Press (second edition), Cambridge Mass., [4] J. Edmonds and R. M. Karp. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the Association for Computing Machinery, 19: , [5] Thomas Eiter and Heikki Mannila. Distance measures for point sets and their computation. Acta Informatica, 34(2): , [6] M. L. Fredman and R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the Association for Computing Machinery, 34: ,

2.3 Scheduling jobs on identical parallel machines There are jobs to be processed, and there are identical machines (running in parallel) to which each job may be assigned Each job = 1,,, must be processed

Class One: Degree Sequences For our purposes a graph is a just a bunch of points, called vertices, together with lines or curves, called edges, joining certain pairs of vertices. Three small examples of

arxiv:1205.5492v1 [math.co] 24 May 2012 Partitioning edge-coloured complete graphs into monochromatic cycles and paths Alexey Pokrovskiy Departement of Mathematics, London School of Economics and Political

136 TER 4. INDUCTION, GRHS ND TREES 4.3 Graphs In this chapter we introduce a fundamental structural idea of discrete mathematics, that of a graph. Many situations in the applications of discrete mathematics

COMBINATORIAL PROPERTIES OF THE HIGMAN-SIMS GRAPH ZACHARY ABEL 1. Introduction In this survey we discuss properties of the Higman-Sims graph, which has 100 vertices, 1100 edges, and is 22 regular. In fact

Zachary Monaco Georgia College Olympic Coloring: Go For The Gold Coloring the vertices or edges of a graph leads to a variety of interesting applications in graph theory These applications include various

Cycles and clique-minors in expanders Benny Sudakov UCLA and Princeton University Expanders Definition: The vertex boundary of a subset X of a graph G: X = { all vertices in G\X with at least one neighbor

Stationary random graphs on Z with prescribed iid degrees and finite mean connections Maria Deijfen Johan Jonasson February 2006 Abstract Let F be a probability distribution with support on the non-negative

Offline 1-Minesweeper is NP-complete James D. Fix Brandon McPhail May 24 Abstract We use Minesweeper to illustrate NP-completeness proofs, arguments that establish the hardness of solving certain problems.

On the k-path cover problem for cacti Zemin Jin and Xueliang Li Center for Combinatorics and LPMC Nankai University Tianjin 300071, P.R. China zeminjin@eyou.com, x.li@eyou.com Abstract In this paper we

DETERMINANTS IN THE KRONECKER PRODUCT OF MATRICES: THE INCIDENCE MATRIX OF A COMPLETE GRAPH CHRISTOPHER RH HANUSA AND THOMAS ZASLAVSKY Abstract We investigate the least common multiple of all subdeterminants,

Week 5 Integral Polyhedra We have seen some examples 1 of linear programming formulation that are integral, meaning that every basic feasible solution is an integral vector. This week we develop a theory

Dynamic Programming Applies when the following Principle of Optimality holds: In an optimal sequence of decisions or choices, each subsequence must be optimal. Translation: There s a recursive solution.

5.1 Midsegment Theorem and Coordinate Proof Obj.: Use properties of midsegments and write coordinate proofs. Key Vocabulary Midsegment of a triangle - A midsegment of a triangle is a segment that connects

Max Flow, Min Cut, and Matchings (Solution) 1. The figure below shows a flow network on which an s-t flow is shown. The capacity of each edge appears as a label next to the edge, and the numbers in boxes

A -factor in which each cycle has long length in claw-free graphs Roman Čada Shuya Chiba Kiyoshi Yoshimoto 3 Department of Mathematics University of West Bohemia and Institute of Theoretical Computer Science

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

Practice with Proofs October 6, 2014 Recall the following Definition 0.1. A function f is increasing if for every x, y in the domain of f, x < y = f(x) < f(y) 1. Prove that h(x) = x 3 is increasing, using

PROBLEM SET 7: PIGEON HOLE PRINCIPLE The pigeonhole principle is the following observation: Theorem. Suppose that > kn marbles are distributed over n jars, then one jar will contain at least k + marbles.

Séminaire Lotharingien de Combinatoire 53 (2006), Article B53g ON DEGREES IN THE HASSE DIAGRAM OF THE STRONG BRUHAT ORDER RON M. ADIN AND YUVAL ROICHMAN Abstract. For a permutation π in the symmetric group

INCIDENCE-BETWEENNESS GEOMETRY MATH 410, CSUSM. SPRING 2008. PROFESSOR AITKEN This document covers the geometry that can be developed with just the axioms related to incidence and betweenness. The full

Collinear Points in Permutations Joshua N. Cooper Courant Institute of Mathematics New York University, New York, NY József Solymosi Department of Mathematics University of British Columbia, Vancouver,

Large induced subgraphs with all degrees odd A.D. Scott Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, England Abstract: We prove that every connected graph of order

An inequality for the group chromatic number of a graph Hong-Jian Lai 1, Xiangwen Li 2 and Gexin Yu 3 1 Department of Mathematics, West Virginia University Morgantown, WV 26505 USA 2 Department of Mathematics

1. Prove that the empty set is a subset of every set. Basic Topology Written by Men-Gen Tsai email: b89902089@ntu.edu.tw Proof: For any element x of the empty set, x is also an element of every set since

1 Introduction Just the Factors, Ma am The purpose of this note is to find and study a method for determining and counting all the positive integer divisors of a positive integer Let N be a given positive

8. Matchings and Factors Consider the formation of an executive council by the parliament committee. Each committee needs to designate one of its members as an official representative to sit on the council,

A simpler and better derandomization of an approximation algorithm for Single Source Rent-or-Buy David P. Williamson Anke van Zuylen School of Operations Research and Industrial Engineering, Cornell University,

Even Faster Algorithm for Set Splitting! Daniel Lokshtanov Saket Saurabh Abstract In the p-set Splitting problem we are given a universe U, a family F of subsets of U and a positive integer k and the objective

Odd induced subgraphs in graphs of maximum degree three David M. Berman, Hong Wang, and Larry Wargo Department of Mathematics University of New Orleans New Orleans, Louisiana, USA 70148 Abstract A long-standing

On the independence number of graphs with maximum degree 3 Iyad A. Kanj Fenghui Zhang Abstract Let G be an undirected graph with maximum degree at most 3 such that G does not contain any of the three graphs

Angle Bisectors in a Triangle ID: 8892 Time required 40 minutes Topic: Triangles and Their Centers Use inductive reasoning to postulate a relationship between an angle bisector and the arms of the angle.