Where the head of a list is its first element and its tail the rest of the list, and where Insert(element, List) is a function that inserts the element in the correct position into the List and returns the new sorted list.

This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.

But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:

The insertion cannot be implemented in O(log n). How do you represent the list? Sorting algorithms are usually designed to work on arrays. If you are doing that you can use binary search to find the correct position of the element but in order to insert it you have to shift O(n) additional elements in the worst case. If you chose to represent the list as a linked list you can insert an element in the middle of the list in O(1) but you cannot use binary search to find the correct position.

Ah OK, so the explanation is that even if you only have O(N Log N) comparisons, other operations on common data structures have a faster scaling and will become dominant if comparisons aren't the bottleneck. That makes sense. I guess it also explains why insertion sort for non-computing applications like sorting playing cards works so well since shifting costs nothing.

Not exactly, you're saying other operations are causing the whole algorithm to not have O(n * log n). You can achieve such O using the right data structures, such as self organizing binary trees (e.g. red-black trees). This sorting is however inferior to other approaches because of the overhead of such structures. This kind of sorting is sometimes also called binary insertion sort. Normal insertion sort is traditionally implemented over array. That is why the O(N^2) is used while discussing it.

Boson Collider wrote:This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.

But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:

Alternative answer: Yes you can. That's called HeapSort. A "Heap" is a special sorted list that has O(LogN) insertions and O(LogN) removals.

Spoiler:

Sorta. An optimized heapsort will use "heapify" to create the heap in O(n) time. Academically however, creating the heap with n * (log(n)) steps makes "Heapsort" easier to explain.

If you have an "insertion-sort" that uses a Heap, its called HeapSort.

korona wrote:If you chose to represent the list as a linked list you can insert an element in the middle of the list in O(1) but you cannot use binary search to find the correct position.

Suppose you partially index the list? For example, you store (in an array) the address and value of every (say) tenth element. You can use a binary search on this array to jump into the linked list, so you don't have to traverse the whole thing, just the last mile. Yes, there's some overhead in maintaining the array, but the array needn't be perfect in order to speed the thing up. Would this speedup not be enough to reduce the O factor? What if it were done recursively (that is, index the index as n gets bigger)?

Jose

Order of the Sillies, Honoris Causam - bestowed by charlie_grumbles on NP 859 * OTTscar winner: Wordsmith - bestowed by yappobiscuts and the OTT on NP 1832 * Ecclesiastical Calendar of the Order of the Holy Contradiction * Please help addams if you can. She needs all of us.

I believe there are four characteristics of insertion sort:a) you're taking the input elements one by one.b) you're inserting the elements directly into the output data structure.c) after each insert, the output data structure is sorted.d) the memory overhead is O(1).

If you push the elements into a heap, then empty the heap into the output structure, it's not insertion sort because of b). As soon as you add an index on top of the output data structure, it's not insertion sort because of d).

And I think that's an important point to make: adding an index is not improving insertion sort. It's making a different tradeoff, memory vs. speed.

About your actual suggestion, the index you're thinking of is called a skip list. If done correctly, it has O(log(n)) insertion costs on average, but the worst case insertion costs are O(n). So - unless you can prove otherwise via amortization - the worst case complexity for the whole sort must still be assumed O(n^2).It's possible to reduce this to O(n*log(n)) with a different index structure (a B+ tree comes to mind, it keeps the leaf nodes in a linked list anyway), but again, this requires additional memory, so it's not insertion sort.

Also, these things only work if your output structure is a list. On an array, I'm not aware of any way to get insertion sort below O(n^2) while keeping characteristics a, b and c.

Boson Collider wrote:This leads to N tail recursive calls. As I understand it, the argument that the algorithm has O(N^2) complexity comes from the assumption that the Insert function runs in O(N) worst case.

But shouldn't you be able to write an insert function that runs in O(Log N) time, since Insert is only called with a sorted list? For example:

Alternative answer: Yes you can. That's called HeapSort. A "Heap" is a special sorted list that has O(LogN) insertions and O(LogN) removals.

Spoiler:

Sorta. An optimized heapsort will use "heapify" to create the heap in O(n) time. Academically however, creating the heap with n * (log(n)) steps makes "Heapsort" easier to explain.

If you have an "insertion-sort" that uses a Heap, its called HeapSort.

Alternatively, you can insert the elements into a sorted binary tree. This is called Tree Sort .