Closest pair algorithm from (n log^2 n) to (n log n) time

Question

I am trying to understand how to go from n log^2 n time to n log n time for the closet pair algorithm. I get the below part (from http://www.cs.mcgill.ca/~cs251/ClosestPair/ClosestPairDQ.html)

1. Divide the set into two equal sized parts by the line l, and recursively compute the minimal distance in each part.
2. Let d be the minimal of the two minimal distances.
3. Eliminate points that lie farther than d apart from l
4. Sort the remaining points according to their y-coordinates
5. Scan the remaining points in the y order and compute the distances of each point to its five neighbors.
6. If any of these distances is less than d then update d.

Step 4 is a sort that takes O(n log n) time, which dominates all other steps and this is the one that needs to be reduced to O(n) in order for the overall algorithm to achieve O(n log n) time. And this is the part I am having a hard time understanding. The author proposes

Step 1: Divide the set into..., and recursively compute the distance in each part, returning the points in each set in sorted order by y-coordinate. Step 4: Merge the two sorted lists into one sorted list in O(n) time.

You still have to sort the points by the y-coordinate in the recursive step, which takes O(n log n) time. How can we avoid this? The merging is O(n), but we still have to sort somewhere.

Show source

Answers to Closest pair algorithm from (n log^2 n) to (n log n) time ( 3 )

1. Their proposal is that you have two (already sorted lists) A and B. Combining those into one sorted list can be done using merge sort (just thing of step (4) as the merge step in merge sort).

The result of merge sort is one sorted list with all members of both A and B. Once merged, there is no need to sort anything again.

2. The reason that O(n log n) is a problem is that we do it over and over and over again: if we partition the set into two subsets, and partition each of those into two subsets, then that involves seven sorts (one over the whole set, two over halves of it, four over quarters of it).

So the proposal fixes this by reusing the results of previous sorts: so we do full merge-sorts of the smallest partitions (each of which is O(1), totaling O(n)), but for larger partitions we just need to do a single O(n) merge pass to combine the results of those. So we only pay the O(n log n) price a total of once, which is fine.

3. I provide an alternate solution which is maybe easier to understand. At first sort all points w.r.t their y coordinate. This is one time using O (n log n) . There are n points, and in the sorted array each of them has some index which is at most n. Save index of each point (add integer for index to point data structure). Then run the original algorithm. This time, at the moment which we want to sort points, do not sort them with normal comparision sort. But sort them according to their index. We can sort them according to their index with radix sort in O (n). So the total process is O (n log n), as we used the comparision sort only once and the rest is T (n)=2T (n/2)+O (n). But the constant is not as good as modification suggested in the question.

The modified procedure suggested in the question works like the merge in merge sort: when we have two sorted lists, we do not need to sort them again by using normal sort, we can sort them by merging them in O (n).