Friday, May 20, 2016

geometric median - linyx - 博客园

geometric median - linyx - 博客园
The geometric median of a discrete set of sample points in a Euclidean space is the point minimizing the sum of distances to the sample points. This generalizes the median, which has the property of minimizing the sum of distances for one-dimensional data, and provides a central tendency in higher dimensions.
也就是说，中位数就是一个数组里到所有其他数据点的距离之和达到最小值的点。n维的也一样。
一维的中位数满足这个性质，证明的话可以用反证法。可以证明的到的是，中位数往左一点或者往右一点都会造成距离之和增加，所以中位数是到其他点的距离之和最小。GeometricMedian=argminy∈ℝn∑mi=1∥xi−y∥2GeometricMedian=argminy∈Rn∑i=1m∥xi−y∥2
然后，问题来了。。。
Q:Given set of points in 2d grid space. Find a grid point such that sum of distance from all the points to this common point is minimum.
eg: p1: [0, 0] p2: [3, 0] p3: [0, 3]
ans: r: [0,0]
sum: 0 + 3 + 3 = 6
这题naive 方法就是O(n2)O(n2)，求出所有点到其他点的距离之和，再取最小。
这里指的是曼哈顿距离。manhattan distance. 欧式距离不好求，网上人家直接用kmeans。。
参考：

Note that in 1-D the point that minimizes the sum of distances to all the points is the median.

In 2-D the problem can be solved in O(n log n) as follows:

Create a sorted array of x-coordinates and for each element in the array compute the "horizontal" cost of choosing that coordinate. The horizontal cost of an element is the sum of distances to all the points projected onto the X-axis. This can be computed in linear time by scanning the array twice (once from left to right and once in the reverse direction). Similarly create a sorted array of y-coordinates and for each element in the array compute the "vertical" cost of choosing that coordinate.

Now for each point in the original array, we can compute the total cost to all other points in O(1)time by adding the horizontal and vertical costs. So we can compute the optimal point in O(n). Thus the total running time is O(n log n).