This problem probably already here but I could not find the right words to find it.

I have a list with 1700 points (geographic coordinates) and a need to separate into 17 groups with 100 nearest. I mapped this as a graph where each node is the point and each edge weight is the real distance of points connected by this. Then the problem is how to partition this graph into 17 groups of 100 elements each that minimizes the sum of the inter-partition edges weight.

Someone has a better idea to map this problem?

I tried to use METIS but the best result was using the edge-cut that minimize the sum of the weights of the edges removed (for this case I use the inverse of the distance as weight). I also tried using k-means but it does not guarantee the number of elements in each group.

I suspect that a small variation on $k$-means clustering would work. You might want to try asking on stats.stackexchange.com where there may be more people with experience with this type of problem.
–
Douglas ZareAug 21 '12 at 18:40

1 Answer
1

I would suggest using a local search algorithm that maintains the size of the sets at 100. You seek a clustering that consists of a partition of the vertices int 17 clusters, each of size 100. The cost of each cluster is the sum over all pairs of vertices in the cluster of the distance between the vertices. The cost of a clustering is the sum of the 17 cluster costs.

Your move in the local search space should be exchanging a vertex in one cluster with a vertex in another cluster. In this way you can compute the change in cost that results from exchanging two vertices.

Depending on how you select a move, this may be very time consuming. If you try this kind of approach, you might want to do it on 10% of the data set just to make sure it's not going to take an absurd amount of time to find a decent solution.