I have been following Coursera's course on Algorithms and came up with a thought about the divide/conquer algorithm for the closest pair problem, that I want clarified.

As per Prof Roughgarden's algorithm (which you can see here if you're interested):
For a given set of points P, of which we have two copies - sorted in X and Y direction - Px and Py, the algorithm can be given as

closestPair(Px,Py):

Divide points into left half - Q, and right half - R, and form sorted copies of both halves along x and y directions - Qx,Qy,Rx,Ry

Let closestPair(Qx,Qy) be points p1 and q1

Let closestPair(Rx,Ry) be p2,q2

Let delta be minimum of dist(p1,q1) and dist(p2,q2)

This is the unfortunate case, let p3,q3 be the closestSplitPair(Px,Py,delta)

Return the best result

Now, the clarification that I want is related to step 5.
I should say this beforehand, that what I'm suggesting, is barely any improvement at all, but if you're still interested, read ahead.

Prof R says that since the points are already sorted in X and Y directions, to find the best pair in step 5, we need to iterate over points in the strip of width 2*delta, starting from bottom to up, and in the inner loop we need only 7 comparisions. Can this be bettered to just one?

How I think is possible seemed a little difficult to explain in plain text, so I drew a diagram and wrote it on paper and uploaded it here:

Since no one else came up with is, I'm pretty sure there's some error in my line of thought.
But I have literally been thinking about this for HOURS now, and I just HAD to post this. It's all that is in my head.

Welcome to Programmers Stack exchange Programming Noob, would you consider transcribing your notes into your question. Also, check the Theoretical Computer Science FAQ to if your question is suitable for cstheory as you might get more traction there, plus that site supports MathJax.
–
Mark BoothJul 4 '12 at 14:12

2 Answers
2

Your simplification that lets you only do one comparison implicitly involves calculating the distance between points. I haven't watched the lecture you're referring to, but this is a well known algorithm; the assumptions and reasoning behind the step you're talking about go something like this:

For each point, a, within δ of the boundary, you can draw a box sized δ (in the x dimension) and 2 δ in the y dimension. By the assumption that no points are within d of each other in each region, that box contains at most 6 points, so you only need to compare a with at most 6 other points. The critical part that you’re missing is that this works because it’s a box. You already have the x and y coordinates of each point, so it’s easy to tell if it’s within some arbitrary rectangle. By turning the box into a circle, you actually create more work. How can you tell if a point is inside the circle? You have to calculate the distance to the center of the circle. You can limit the work by using the trick above, but then you’re back where you started.

If you don’t mind an unsolicited suggestion, I’d recommend coding up the algorithms you’re thinking of, until you can get the point where implementing algorithms is a trivial exercise. If you tried to code up your algorithm you’d quickly realize that there’s no way to find which points are within the circle you’re describing without doing more comparisons. After a while, you’ll get to the point where you aren’t tripped up by assumptions like that anymore.

There is a paper called A Note Concerning The Closest Pair Problem by Martin Richards (IPL 82 (2002) p193-195) which gets the number of comparisons down to 2 on the other side.

Its fairly easy to arrange 2 points from the other side and the current point so they're all d units apart and in the square half box. You just make them an equilateral triangle. Now by sliding either of these microscopically up towards the current point, you can make either one the closest point. So they're both candidates.

One of the interesting points in the paper was that empirical testing on the normal algorithm showed that there was almost always one or zero points in the candidate box. So you could get the standard algorithm to perform as well yours by pruning out points which got pushed outside the box rather than just blindly comparing against the last 7 points within the strip.