João Vieira

Longest Increasing Subsequence

November 30, 2017
·
João VieiraThe goal is to build the longest subsequence of sorted numbers from a given
sequence that can be in any order. For each number in the original sequence you can either
choose to have it in the subsequence or not. Once you choose to add a number to
the subsequence it will be appended and from that moment on you can only
append numbers of greater value.

The greedy approach, of picking every possible number
that is bigger than the one chosen before it, will
produce an increasing subsequence but likely not the longest. As an example consider the sequence
(10, 9, 2, 5, 3, 7, 101, 18). If we start by picking 10, we won’t be able to
pick 9, 2, 5, 3 or 7, otherwise our subsequence would no longer be increasing.
We can, however, pick one last number, either 101 or 18, ending with the
subsequence (10, 101) which has length 2.

In this example, however, a valid solution to the problem will have 4 elements
in increasing order. Valid solutions would be (2, 3, 7, 18) or
(2, 5, 7, 101), among others with length 4.

As we now know, to arrive at a valid solution, we have to consider more than
one greedily built subsequence.

However, considering all possible subsequences, would require traversing a
tree with n branches at first level (one for each number in the sequence),
n-1 at second (all numbers in the sequence except for the one already picked), n-2 at third level and so on, which would be O(n!).

A few key observations will allow us to drastically reduce the amount of
subsequences to consider:

We can stop considering any subsequence
as soon as appending an element to it would result in decreasing, e.g.,
say we have (2, 5), appending 3 would cause the subsequence to no
longer be increasing, so we can stop right there and ignore that branch.

For each number we get by iterating over the original sequence, we search our
pool of subsequences for the longest to which we can append the number we got, i.e.,
the longest having a last element smaller than the number we have, and append the current
number to it.

At most one subsequence of length L is needed among our
pool of candidates. Say we have (2,) and (2, 5) in our pool and we
find a 3. We cannot append 3 to (2, 5) as that would cause the subsequence to
decrease, but we can append to (2,) resulting in the following pool of
candidates (2,), (2, 3), (2, 5). However, any number that can be appended to
(2, 5) can also be appended to (2, 3), as 3 < 5. As such, there will never
be a subsequence starting with (2, 5) that has greater length than the one
starting with (2, 3) so (2, 5) can simply be replaced with (2, 3), i.e., our
pool would be (2,), (2, 3).

Let’s look at how our pool of candidates evolves over time while we iterate over
the sequence (10, 9, 2, 5, 3, 7, 101, 18).

We start with 10. The next number is 9, which is less than 10 and as such cannot be appended to (10,). As the subsequences (9,) and (10,) have the same length (1),
we can only keep the one which has the smallest last value, i.e., 9. Then we get 2, which cannot be appended to (9,). As both (2,) and (9,) have the same length we discard (9,) and keep only (2,).

We then find a 5 which can be appended to (2,),
producing a new subsequence (2, 5). Our pool has
now two subsequences. We cannot get rid of (2,), although it is a smaller subsequence it can still produce other promising candidates if we find any numbers between 2 and 5, which we do because the next number in the sequence is 3. It cannot be appended to (2, 5), but it can be appended to (2,), producing a new subsequence (2, 3). Both (2, 3) and (2, 5) have length 2 so we keep the one with the smallest last value, i.e., (2, 3). Any number that can be appended to (2, 5) can also be appended to (2, 3) so it will never produce
a shorter subsequence.

When we get 7, our pool has (2,), (2, 3) so the longest subsequence to
which we can append is (2, 3) producing a new subsequence (2, 3, 7) which is added to the pool.

Other iterations follow just the same algorithm, which can be written in Python 3:

This algorithm has O(n²) time complexity as we iterate the original sequence once and for each element iterate the list of candidates once.

It can be made a little better if we keep the list of candidates sorted
by its last element. Notice that we only have one
candidate for each given length and at the end of
each iteration the candidate with the biggest last
element will also be the candidate of greatest length. We can then use binary search to find the right candidate to append an element in O(log n) thus reducing
overall complexity to O(n log n).