shear sort

Assuming that the number of elements to be sorted N is a perfect square, we can place an element on every processor on an N-processor 2D mesh (with sqrt(n) rows and sqrt(n) columns).

The algorithm is thus: On odd time steps, we sort along the row, on even time steps, we sort along the column. When sorting along the row, we also consider the row number. If the row is odd-numbered, we sort left to right. If the row is even, we sort right to left.

Sorting along the row/column is equivalent to sorting on a single dimensional array, and thus will take O(sqrt(n)) time (see Odd-Even Transpose). It turns out that we only need to take O(log(n)) iterations for the whole list to be sorted. (the proof is left to the reader)