recursive(codes){}

recursive(codes){}

recursive(codes){}
The Personal Blog of Todd Sharp

A Closer Look At Sorting Algorithms

Posted By: Todd Sharp on 3/27/2017 12:05 UTC

As I mentioned in a previous post, sorting algorithms typically play a large role in programming interviews. Those who follow the traditional path into the programming world and obtain a CIS degree are typically exposed to algorithms. Those among us who follow a less traditional path into this world are less familiar with them.

I decided to take a deeper dive into sorting algorithms and implement some of them to see:

How difficult they are to implement

How performant they are against varying sizes of datasets

How they compare to the native sorting in Java

As usual, I've implemented my code in Groovy. Let's take a look at some algorithms, shall we?

Bubble Sort

The first sort I decided to look at is a basic bubble sort. Wikipedia defines it as:

Bubble sort, sometimes referred to as sinking sort, is a simple sorting algorithm that repeatedly steps through the list to be sorted, compares each pair of adjacent items and swaps them if they are in the wrong order. The pass through the list is repeated until no swaps are needed, which indicates that the list is sorted. The algorithm, which is a comparison sort, is named for the way smaller or larger elements "bubble" to the top of the list.

I've implemented all of the examples in this post as methods on the Java List class itself using metaprogramming. Here's the bubble sort:

Pretty simple. Loop over each element, then compare each element to every other element and swap them if necessary. For small datasets the performance is fine, but as we get into larger sets the expense really starts to add up. How much does it add up? Well, lets take a look at how it performs against random lists ranging from 1000 elements up to 10,000:

Selection Sort

The algorithm divides the input list into two parts: the sublist of items already sorted, which is built up from left to right at the front (left) of the list, and the sublist of items remaining to be sorted that occupy the rest of the list. Initially, the sorted sublist is empty and the unsorted sublist is the entire input list. The algorithm proceeds by finding the smallest (or largest, depending on sorting order) element in the unsorted sublist, exchanging (swapping) it with the leftmost unsorted element (putting it in sorted order), and moving the sublist boundaries one element to the right.

This sort should be more efficient than a bubble sort, as the animation and definition above should illustrate. Instead of a full nested loop, selection sort only loops over a sublist of non sorted items remaining in the list. Here's the implementation:

And the results, which illustrate it as a more efficient sort than the bubble sort:

Insertion Sort

Insertion sort iterates, consuming one input element each repetition, and growing a sorted output list. Each iteration, insertion sort removes one element from the input data, finds the location it belongs within the sorted list, and inserts it there. It repeats until no input elements remain.

The implementation looks like this:

Based on the full nested loop I'd anticipate similar performance to the bubble sort, and if we look at the results we can see that the performance is quite comparable to the bubble sort. Side note, at this point you should be realizing that all of the sorts we've looked at so are about as useful as a fork in a sugar bowl.

Heap Sort

Heapsort can be thought of as an improved selection sort: like that algorithm, it divides its input into a sorted and an unsorted region, and it iteratively shrinks the unsorted region by extracting the largest element and moving that to the sorted region. The improvement consists of the use of a heap data structure rather than a linear-time search to find the maximum.

Like merge sort, the implementation is more complex, but the performance gains over the selection sort on which it improves are impressive.

Quick Sort

Quicksort is a comparison sort, meaning that it can sort items of any type for which a "less-than" relation (formally, a total order) is defined. In efficient implementations it is not a stable sort, meaning that the relative order of equal sort items is not preserved. Quicksort can operate in-place on an array, requiring small additional amounts of memory to perform the sorting.

Quick sort is a fast sort, but it is not "stable". In fact, it's the sort that Java uses when you call Array.sort. The implementation that I used looked like so:

And the performance was the most impressive of all the implementations that I have looked at.

Better than all of the implementations that we've looked at. Of course, I would never assume that I could implement a more efficient sort than the one that the JDK provides. And that's quite the point - the standard library has grown and evolved over the years. The people who've contributed to the Java language have been down this road before. They've given us the best solution out of the box, and unless we're looking at serious edge cases, the default sort() method is going to be the best one to use.

The moral of the story is, I trust my standard library until an edge case proves that I shouldn't. And no, I still can't memorize these and whiteboard them.