The binary heap data structure allows the heapsort algorithm to take advantage of the heap's heap properties and the heapsort algorithm makes use of the efficient running time for inserting to and deleting from the heap.

Contents

Heap data structure

The binary heap data structure is heap implementation. These are often shown as an array object that can be viewed as nearly complete binary tree built out of a given set of data. The heap data structure is also used in the construction of a priority queue. The complete binary tree maps the binary tree structure into array indices as shown in the figure below. Each array index represents a node.

There are two kinds of binary heaps: max-heap and min-heaps. Both types of heaps satisfy a certain heap property.

Max-heap property

If \(A\) is an array representation of a heap, then in Max-heap:

\[A[\text{parent}(i)]\geq A[i],\]

which means that a node can't have a greater value than its parent. In a max-heap, the largest element is stored at the root, and the minimum elements are in the leaves.

Min-heap property

Similarly, if \(A\) is an array representation of a heap then, in Min-heap:

\[A[\text{parent}(i)]\leq A[i],\]

which means that a parent node can't have a greater value than its children. Thus, the minimum element is located at the root, and the maximum elements are located in the leaves.

Both min-heap and max-heap can be used to implement a heapsort but this wiki will discuss heapsort in terms of max-heaps. To convert to min-heap, just change the problem around to use min-heaps, and to ensure that the min-heap property holds.

Maintaining a max-heap

In order to maintain the max-heap property, heapsort uses a procedure called max_heapify(A,i). It takes an array \(A\) and an index in the array \(i\) as input. Maintaining the max-heap property is a vital part of the heapsort algorithm.

Essentially, if an element \(A[i]\) violates the max-heap property, max_heapify will correct it by trickling the element down the tree, until the subtree rooted at index \(i\) is a max heap (and therefore the violation is corrected).

Which parent node violates the max-heap property?

16
8
1
4
9

Heapsort algorithm

The heapsort algorithm has two main parts (that will be broken down further below): building a max heap and then sorting it. The max heap is built as described in the above section. Then, heapsort produces a sorted array by repeatedly removing the largest element from the heap (which is the root of the heap), and then inserting it into the array. The heap is updated after each removal. Once all elements have been removed from the heap, the result is a sorted array.

The heapsort algorithm uses the max_heapify function, and all put together, the heapsort algorithm sorts a heap array \(A\) like this:

Builds a max heap from an unordered array.

Find the maximum element, which is located at \(A[0]\) because the heap is a max heap.

Swap elements \(A[n]\) and \(A[0]\) so that the maximum element is at the end of the array where it belongs.

Decrement the heap size by one (this discards the node we just moved to the bottom of the heap, which was the largest element). In a manner of speaking, the sorted part of the list has grown and the heap (which holds the unsorted elements) has shrunk.

Now run max_heapify on the heap in case the new root causes a violation of the max-heap property. (Its children will still be max heaps.)

Implementation of Heapsort

defmax_heapify(A,heap_size,i):left=2*i+1right=2*i+2largest=iifleft<heap_sizeandA[left]>A[largest]:largest=leftelse:largest=iifright<heap_sizeandA[right]>A[largest]:largest=rightiflargest!=i:A[i],A[largest]=A[largest],A[i]max_heapify(A,heap_size,largest)defbuild_heap(A):heap_size=len(A)foriinrange((heap_size/2),-1,-1):max_heapify(A,heap_size,i)defheapsort(A):heap_size=len(A)build_heap(A)#print A #uncomment this print to see the heap it buildsforiinrange(heap_size-1,0,-1):A[0],A[i]=A[i],A[0]heap_size-=1max_heapify(A,heap_size,0)#A = [2,8,1,4,14,7,16,10,9,3]#heapsort(A)#print A

Complexity of Heapsort

Heapsort has a running time of \(O(n\log n)\).

Building the max heap from the unsorted list requires \(O(n)\) calls to the max_heapify function, each of which takes \(O( \log n)\) time. Thus, the running time of build_heap is \(O(n \log n)\).

Note: while it is true that build_heap has a running time of \(O(n \log n)\), a tighter bound of \(O(n)\) can be proved by analyzing the height of the tree where max_heapify is called. However, this does not change the overall running time of heapsort, and since the explanation of this is quite involved, it has been omitted​.

Heapsort has a running time of \(O(n\log n)\) since the call to build_heap takes \(O(n \log n)\) time, and each of the \(O(n)\) calls to max_heapify takes \(O(\log n)\) time.

Heapsort has a worst and average-case running time of \(O(n \log n)\) like mergesort, but heapsort uses \(O(1)\) auxiliary space (since it is an in-place sort) while mergesort takes up \(O(n)\) auxiliary space, so if memory concerns are an issue, heapsort might be a good, fast choice for a sorting algorithm. Quicksort has an average-case running time of \(O(n \log n)\) but has notoriously better constant factors, making quicksort faster than other \(O(n \log n)\) time sorting algorithms. However, quicksort has a worst-case running time of \(O(n^2)\) and a worst-case space complexity of \(O(\log n\)), so if it is very important to have a fast worst-case running time and efficient space usage, heapsort is the best option. Note, though, that heapsort is slower than quicksort on average in most cases.