The brute force approach to the closest pair problem (i.e. checking every
possible pair of points) takes quadratic time. We would now like to
introduce a faster divide-and-conquer algorithm for solving the closest
pair problem.
Given a set of points in the plane S, our approach will be to split
the set into two roughly equal halves (S1 and
S2) for which we already have the solutions, and then to
merge the halves in linear time to yield an O(nlogn)
algorithm.
However, the actual solution is far from obvious. It is possible that the
the desired pair might have one point in S1 and one in
S2, does this not force us once again to check all
possible pairs of points? The divide-and-conquer approach presented here
generalizes directly from the one dimensional algorithm we presented in
the previous section.

Alright, we'll generalize our 1-D algorithm as directly as possible (see
figure 3.2). Given
a set of points S in the plane, we partition it into two subsets
S1 and S2 by a vertical line l
such that the points in S1 are to the left of l
and those in S2 are to the right of l.
We now recursively solve the problem on these two sets obtaining minimum
distances of d1 (for S1), and
d2 (for S2). We let d be the
minimum of these.

Now, identical to the 1-D case, if the closes pair of the whole set
consists of one point from each subset, then these two points must be
within d of l. This area is represented as the two strips
P1 and P2 on either side of l
(as shown in the figure).

Figure 3.2: Divide-and-conquer in
two dimensions.

Up to now, we are completely in step with the 1-D case. At this point,
however, the extra dimension causes some problems. We wish to determine
if some point in say P1 is less than d away from
another point in P2. However, in the plane, we don't
have the luxury that we had on the line when we observed that only one
point in each set can be within d of the median. In fact, in two
dimensions, all of the points could be in the strip! This is
disastrous, because we would have to compare n2 pairs of
points to merge the set, and hence our divide-and-conquer algoritm wouldn't
save us anything in terms of efficiency.

Thankfully, we can make another life saving observation at this point. For
any particualr point p in one strip, only points that meet the
following constraints in the other strip need to be checked:

those points within d of p in the direction of the other
strip

those within d of p in the positive and negative
y directions

Simply because points outside of this bounding box cannot be less than
d units from p (see
figure 3.3). It just so
happens that because every point in this box is at least d apart,
there can be at most six points within it (I won't let myself get
away with that scot-free, click here to see the proof).
Well this is simply fantastic news, because now we don't need to check all
n2 points. All we have to do is sort the points in the
strip by their y-coordinates and scan the points in order, checking
each point against a maximum of 6 of its neighbors. This means at most
6*n comparisons are required to check all candidate pairs.
However, since we sorted the points in the strip by their
y-coordinates the process of merging our two subsets is not
linear, but in fact takes O(nlogn) time.

Hence our full algorithm is not yet O(nlogn), but it is still
an improvement on the quadratic performance of the brute force approach (as
we shall see in the next section). In section 3.4,
we will demonstrate how to make this algorithm even more efficient by
strengthening our recursive sub-solution.

We present here a step by step summary of the algorithm presented in the
previous section, followed by a performance analysis. The algorithm is
simply written in list form because I find pseudo-code to be burdensome
and unnecessary when trying to understand an algorithm. Note that we
pre-sort the points according to their x coordinates which in
itself takes O(nlogn) time.

ClosestPair of a set of points:

Divide the set into two equal sized parts by the line l, and
recursively compute the minimal distance in each part.

Let d be the minimal of the two minimal distances.

Eliminate points that lie farther than d apart from l

Sort the remaining points according to their y-coordinates

Scan the remaining points in the y order and compute the
distances of each point to its five neighbors.

If any of these distances is less than d then update d.

Steps 2-6 define the merging process which must be repeated logn
times because this is a divide and conquer algortithm:

Step 2 takes O(1) time

Step 3 takes O(n) time

Step 4 is a sort that takes O(nlogn) time

Step 5 takes O(n) time (as we saw in the previous section)

Step 6 takes O(1) time

Hence the merging of the sub-solutions is dominated by the sorting at step
4, and hence takes O(nlogn) time.
This must be repeated once for each level of recursion in the
divide-and-conquer algorithm,

We can improve on this algorithm slightly by reducing the time it takes
to achieve the y-coordinate sorting in Step 4. This is done by
asking that the recursive solution computed in Step 1 returns the points
in sorted order by their y coordinates. This will yield two sorted
lists of points which need only be merged (a linear time operation)
in Step 4 in order to yield a complete sorted list. Hence the revised
algorithm involves making the following changes:

Step 1: Divide the set into..., and recursively compute the distance
in each part, returning the points in each set in sorted order by
y-coordinate.

Step 4: Merge the two sorted lists into one sorted list in O(n)
time.

Hence the merging process is now dominated by the linear time steps thereby
yielding an O(nlogn) algorithm for finding the closest pair
of a set of points in the plane.