Category Archives: Maximum Diversity

Martí, Gallego and Duarte (2010)Optsicom project, University of Valencia (Spain)

Problem Description

The maximum diversity problem (MDP) consists of selecting a subset of m elements from a set of nelements in such a way that the sum of the distances between the chosen elements is maximized. The definition of distance between elements is customized to specific applications. In most applications, it is assumed that each element can be represented by a set of attributes. Let sik be the state or value of the kth attribute of element i, where k = 1, …, K. Then the distance between elements i and j may be defined as:

In this case, dij is simply the Euclidean distance between i and j. The distance values are then used to formulate the MDP as a quadratic binary problem, where variable xi takes the value 1 if element i is selected and 0 otherwise, i = 1, …, n:

Kuo, Glover and Dhir (1993) use this formulation to show that the clique problem (which is known to be NP-complete) is reducible to the MDP.

State of the Art Methods

The most relevant approximated methods developed to solve this problem, categorized by methodology, are:

GRASP based methdos

GRASP_D2+I_LS: Constructive method coupled with a first-improvement local search. Duarte and Martí (2007).

MA: Memetic Algorithm that combines a population based approach with a local search approach. Katayama and Narihisa (2005).

Instances (MDPLIB)

We have compiled a comprehensive set of benchmark instances representative of the collections previously used for computational experiments in the MDP. We call this benchmark MDPLIB. Furthermore we have included new hard instances. A brief description of the origin and the characteristics of the set of instances follows:

SOM: This data set consists of 70 matrices with random numbers between 0 and 9 generated from an integer uniform distribution.

SOM-a: These 50 instances were generated by Martí et al. (2010) with a generator developed by Silva et al. (2004). The instance sizes are such that for n = 25, m = 2 and 7; for n = 50, m = 5 and 15; for n = 100, m = 10 and 30; for n = 125, m = 12 and 37; and for n = 150, m = 15 and 45.

GKD: This data set consists of 145 matrices for which the values were calculated as the Euclidean distances from randomly generated points with coordinates in the 0 to 10 range.

GKD-a: Glover et al. (1998) introduced these 75 instances in which the number of coordinates for each point is generated randomly in the 2 to 21 range. The instance sizes are such that for n = 10, m = 2, 3, 4, 6 and 8; for n = 15, m = 3, 4, 6, 9 and 12; and for n = 30, m = 6, 9, 12, 18 and 24.

GKD-b: Martí et al. (2010) generated these 50 matrices for which the number of coordinates for each point is generated randomly in the 2 to 21 range and the instance sizes are such that for n = 25, m = 2 and 7; for n = 50, m = 5 and 15; for n = 100, m = 10 and 30; for n = 125, m = 12 and 37; and for n = 150, m = 15 and 45.

GKD-c: Duarte and Martí (2007) generated these 20 matrices with 10 coordinates for each point and n = 500 and m = 50.

MDG: This data set consists of 100 matrices with real numbers randomly selected between 0 and 10 from a uniform distribution.

MDG-a: Duarte and Martí (2007) generated these 40 matrices, 20 of them with n = 500 and m = 50 and the other 20 with n = 2000 and m = 200. These instances were used in Palubeckis (2007).

MDG-b: This data set consists of 40 matrices generated by Duarte and Martí (2007). 20 of them have n = 500 and m = 50, and the other 20 have n = 2000 and m = 200. These instances were used in Gallego et al. (2009) and Palubeckis (2007).

MDG-c: This is a new data set with 20 matrices with n = 3000 and m ∈ {300, 400, 500, 600}. These are the largest instances reported of the MDPLIB. They are similar to those used in Palubeckis (2007).

The MDPLIB contains 315 instances and the best known values for the largest ones (SOM-b, GKD-c, MDG-a, MDG-b and MDG-c). They have been obtained by running G_SS and ITS for 2 hours of CPU time (in each instance) in a workstation Intel Core 2 Quad CPU Q 8300 with 6 GiB of RAM and Ubuntu 9.04 64 bits OS.

Computational Experience

We have performed a computational comparison of the state of the art methods on several of these the instances. We have considered a time limit in our comparative of 10 minutes per instance. In this experiment we compute for each instance and each method the relative deviation Dev (in percent) between the best solution value, Value, obtained with the method and the best known value of this instance, BestValue. For each method, we also report the number of instances #Best for which the method obtains the best value.

The experiment was performed in a Intel Core 2 Quad CPU Q 8300 with 6 GiB of RAM and Ubuntu 9.04 64 bits OS.