By the way, pretty much every algorithm described here turns into O(n^2) or O(n log n) when k==n. That is, I don't think a single one of them is O(n) for all values of k. I got modded down for pointing this out but thought you should know anyway.
–
Kirk StrauserNov 4 '08 at 22:09

14

Selection algorithms can be O(n) for any fixed value of k. That is, you can have a selection algorithm for k=25 that is O(n) for any value of n, and you can do this for any particular value of k that is unrelated to n. The case in which the algorithm is no longer O(n) is when the value of k has some dependency on the value of n, such as k=n or k=n/2. This doesn't, however, mean that if you happen to run the k=25 algorithm on a list of 25 items that it is suddenly no longer O(n) because the O-notation describes a property of the algorithm, not a particular run of it.
–
Tyler McHenryJul 31 '09 at 16:58

1

I was asked this question in an amazon interview as a general case of finding the second greatest element. By the way the interviewer lead the interview I didn't ask if I could destroy the original array (i.e. sorting it), so I came up with a complicated solution.
–
SambatyonMay 9 '11 at 17:43

@Sambatyon, could you please share your complicated solution? It seems pretty easy to me, which needs to take 2 variables, which will hold max and second max, and traverse the array once, and get both in O(n). Am I missing something?
–
HengamehJul 24 at 10:34

24 Answers
24

This is called finding the k-th order statistic. There's a very simple randomized algorithm (called quickselect) taking O(n) average time, and a pretty complicated non-randomized algorithm taking O(n) worst case time. There's some info on Wikipedia, but it's not very good.

Everything you need is in these powerpoint slides. Just to extract the basic algorithm of the O(n) worst-case algorithm:

If you want a true O(n) algorithm, as opposed to O(kn) or something like that, then you should use quickselect (it's basically quicksort where you throw out the partition that you're not interested in). My prof has a great writeup, with the runtime analysis: (reference)

The QuickSelect algorithm quickly finds the k-th smallest element of an unsorted array of n elements. It is a RandomizedAlgorithm, so we compute the worst-case expected running time.

Here is the algorithm.

QuickSelect(A, k)
let r be chosen uniformly at random in the range 1 to length(A)
let pivot = A[r]
let A1, A2 be new arrays
# split into a pile A1 of small elements and A2 of big elements
for i = 1 to n
if A[i] < pivot then
append A[i] to A1
else if A[i] > pivot then
append A[i] to A2
else
# do nothing
end for
if k <= length(A1):
# it's in the pile of small elements
return QuickSelect(A1, k)
else if k > length(A) - length(A2)
# it's in the pile of big elements
return QuickSelect(A2, k - (length(A) - length(A2))
else
# it's equal to the pivot
return pivot

What is the running time of this algorithm? If the adversary flips coins for us, we may find that the pivot is always the largest element and k is always 1, giving a running time of

T(n) = Theta(n) + T(n-1) = Theta(n2)

But if the choices are indeed random, the expected running time is given by

T(n) i=1 to nT(max(i, n-i-1))

where we are making the not entirely reasonable assumption that the recursion always lands in the larger of A1 or A2.

and now somehow we have to get the horrendous sum on the right of the plus sign to absorb the cn on the left. If we just bound it as 2(1/n) ∑i=n/2 to n an, we get roughly 2(1/n)(n/2)an = an. But this is too big - there's no room to squeeze in an extra cn. So let's expand the sum using the arithmetic series formula:

Quickselect is only O(n) in the average case. The median-of-medians algorithm can be used to solve the problem in O(n) time in the worst case.
–
John KurlakJan 19 '13 at 5:01

1

@MrROY Given that we splitted A into A1 and A2 around the pivot, we know that length(A) == length(A1)+length(A2)+1. So, k > length(A)-length(A2) is equivalent to k > length(A1)+1, which is true when k is somewhere in A2.
–
Filipe GonçalvesAug 17 '14 at 19:28

You do like quicksort. Pick an element at random and shove everything either higher or lower. At this point you'll know which element you actually picked, and if it is the kth element you're done, otherwise you repeat with the bin (higher or lower), that the kth element would fall in. Statistically speaking, the time it takes to find the kth element grows with n, O(n).

Not really true in all cases. I have implemented median-of-medians and compared it to built-in Sort method in .NET and custom solution really ran faster by order of magnitude. Now the real question is: does that matter to you in given circumstances. Writing and debugging 100 lines of code compared to one liner pays off only if that code is going to be executed so many times that user starts noticing the difference in running time and feel discomfort waiting for the operation to complete.
–
Zoran HorvatJul 19 '13 at 9:14

No, it has an expected average O(n) runtime. For example, quicksort is O(nlogn) on average with a worst case of O(n^2). Wow, something straight up factually wrong!
–
Kirk StrauserOct 30 '08 at 23:30

5

No, there's nothing factually wrong with this answer. It works and the C++ standard requires an expected linear run time.
–
David NehmeOct 31 '08 at 0:21

I was asked in interview to assume space availability of O(k) and 'n' is very huge. I couldn't tell him O(n) solution as I thought nth_element would need space o(n). Am i wrong ?isn't underlying algorithm is quicksort based for nth_element ?
–
ManuSep 13 '11 at 17:27

I implemented finding kth minimimum in n unsorted elements using dynamic programming, specifically tournament method. The execution time is O(n + klog(n)). The mechanism used is listed as one of methods on Wikipedia page about Selection Algorithm (as indicated in one of the posting above). You can read about the algorithm and also find code (java) on my blog page Finding Kth Minimum. In addition the logic can do partial ordering of the list - return first K min (or max) in O(klog(n)) time.

Though the code provided result kth minimum, similar logic can be employed to find kth maximum in O(klog(n)), ignoring the pre-work done to create tournament tree.

Read Chapter 9, Medians and Other statistics from Cormen's "Introduction to Algorithms", 2nd Ed. It has an expected linear time algorithm for selection. It's not something that people would randomly come up with in a few minutes..
A heap sort, btw, won't work in O(n), it's O(nlgn).

Find the median of the array in linear time, then use partition procedure exactly as in quicksort to divide the array in two parts, values to the left of the median lesser( < ) than than median and to the right greater than ( > ) median, that too can be done in lineat time, now, go to that part of the array where kth element lies,
Now recurrence becomes:
T(n) = T(n/2) + cn
which gives me O (n) overal.

iterate through the list. if the current value is larger than the stored largest value, store it as the largest value and bump the 1-4 down and 5 drops off the list. If not,compare it to number 2 and do the same thing. Repeat, checking it against all 5 stored values. this should do it in O(n)

That "bump" is O(n) if you're using an array, or down to O(log n) (I think) if you use a better structure.
–
Kirk StrauserOct 30 '08 at 21:11

It needn't be O(log k) - if the list is a linked list then adding the new element to the top and dropping the last element is more like O(2)
–
AlnitakOct 30 '08 at 21:14

The bump would be O(k) for an array-backed list, O(1) for an appropriately-linked list. Either way, this sort of question generally assumes it to be of minimal impact compared to n and it introduces no more factors of n.
–
bobinceOct 30 '08 at 21:16

it would also be O(1) if the bump uses a ring-buffer
–
AlnitakOct 30 '08 at 21:18

1

Anyhow, the comment's algorithm is incomplete, it fails to consider an element of n coming in which is the new (eg) second-largest. Worst case behaviour, where each element in n must be compared with each in the highscore table, is O(kn) - but that still probably means O(n) in terms of the question.
–
bobinceOct 30 '08 at 21:21

if we take the first k elements and sort them into a linked list of k values

now for every other value even for the worst case if we do insertion sort for rest n-k values even in the worst case number of comparisons will be k*(n-k) and for prev k values to be sorted let it be k*(k-1) so it comes out to be (nk-k) which is o(n)

Here is a C++ implementation of Randomized QuickSelect. The idea is to randomly pick a pivot element. To implement randomized partition, we use a random function, rand() to generate index between l and r, swap the element at randomly generated index with the last element, and finally call the standard partition process which uses last element as pivot.

The worst case time complexity of the above solution is still O(n2).In worst case, the randomized function may always pick a corner element. The expected time complexity of above randomized QuickSelect is Θ(n)

Nice solution, except that this returns the kth smallest element in an unsorted list. Reversing the comparison operators in the list comprehensions, a1 = [i for i in arr if i > arr[r]] and a2 = [i for i in arr if i < arr[r]], will return the kth largest element.
–
gumptionApr 22 at 22:03

Below is the link to full implementation with quite an extensive explanation how the algorithm for finding Kth element in an unsorted algorithm works. Basic idea is to partition the array like in QuickSort. But in order to avoid extreme cases (e.g. when smallest element is chosen as pivot in every step, so that algorithm degenerates into O(n^2) running time), special pivot selection is applied, called median-of-medians algorithm. The whole solution runs in O(n) time in worst and in average case.

Here is link to the full article (it is about finding Kth smallest element, but the principle is the same for finding Kth largest):

initialize empty doubly linked list l
for each element e in array
if e larger than head(l)
make e the new head of l
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element

You can simply store pointers to the first and last element in the linked list. They only change when updates to the list are made.

Update:

initialize empty sorted tree l
for each element e in array
if e between head(l) and tail(l)
insert e into l // O(log k)
if size(l) > k
remove last element from l
the last element of l should now be the kth largest element

What if e is smaller than head(l)? It could still be larger than the kth largest element, but would never get added to that list. You will need to sort the list of items in order for this to work, in ascending order.
–
ElieOct 30 '08 at 21:22

You are right, guess I'll need to think this through some more. :-)
–
Jasper BekkersOct 30 '08 at 21:27

The solution would be to check if e is between head(l) and tail(l) and insert it at the correct position if it is. Making this O(kn). You could make it O(n log k) when using a binary tree that keeps track of the min and max elements.
–
Jasper BekkersOct 30 '08 at 21:30

There is also Wirth's selection algorithm, which has a simpler implementation than QuickSelect. Wirth's selection algorithm is slower than QuickSelect, but with some improvements it becomes faster.

In more detail. Using Vladimir Zabrodsky's MODIFIND optimization and the median-of-3 pivot selection and paying some attention to the final steps of the partitioning part of the algorithm, i've came up with the following algorithm (imaginably named "LefSelect"):