$\begingroup$Yes, you are missing a more elementary approach. I think it is best if you try to find it by yourselves. Hint: $O(k\log k)$ means you can clearly sort all the unique integers if they're in a single list. How can sorting this list help you in the full list?$\endgroup$
– Discrete lizard♦Apr 24 at 7:01

$\begingroup$Distinct values. Ok, so I'm thinking of just sorting the distinct list, then inserting the duplicates where needed. There are however multiple ways of doing this, and to the best of my knowledge list iteration (with insertions) won't make the cut for linear time. Storing the duplicates in a hash table would work, but this would essentially wrap up in a counting sort if i'm not mistaken, leading to $n\log{n}$ time?$\endgroup$
– AndreasApr 24 at 11:47

2 Answers
2

Create a hash map (dictionary in python) with keys being the elements of the vector and the corresponding values being the number of times the the element occurs in the vector, i.e. its frequency. Time complexity for this would be $O(n)$ (How? Try to find out and let me know in the comments).

Sort all the keys of the map. Time complexity $= O(k\log (k))$ as there are $k$ distinct elements.

In a new vector, push each sorted values as many times as its frequency. Look up on a map takes constant time. So this operation would take linear time.

$\begingroup$Thanks for the explanation, this is the method I've implemented. The time complexity in step 1 originates from looping though the vector since insertion in hash tables takes (amortised) constant time. The result looks quite neat compared to what I expected!$\endgroup$
– AndreasApr 24 at 14:09

$\begingroup$Excellent. Hashing comes handy most of the time when dealing with time optimizations.$\endgroup$
– SiluPandaApr 24 at 14:35

The short answer is no, in the worst-case comparison based algorithms, for reasons stated here.

Using a counting technique will at least take $O(n \log n)$ worst case and $O(n \log k)$ if you use a BST. Here I'll give a variant of quick-sort which also achieves $O(n \log k)$ With a slight modification and a more careful analysis of quick-sort. We will change the "pivot" of quick-sort to never put duplicate elements in separate sub-problem. The partition procedure is as follows:

partition(list A of size n):
x = SELECT(A, n/2)
left = list of all elements less than x
right = list of all elements greater than x
center = list of all elements equal to x
return (left, center, right)

Then the overall quick-sort looks like this:

quick-sort(list A of size n):
if all elements in A are the same:
return A
left, center, right = partition(A)
quick-sort(left)
quick-sort(right)
return concat(left, center, right)

The important thing here is the base case, when we only have one unique element in the list, just return the list. This will ensure that we have exactly $k$ leaf nodes in our recursion tree. The recurrence at face-value is:

$$T(n) \leq \max_{i,j} \{T(n - i) + T(n - i - j)\} + O(n)$$

Obviously with some constraints on $i$ and $j$, but that's the idea. Another thing to note is that via SELECT we ensure that center contains all the median elements, this would imply that we only recurse on two sub problems of size less than $n/2$, leading us to:

Where $l$ would be the overlap of the median on the left side and $r$ would be overlap of median on the right side. Note that $r + l$ is the number of elements equal to the median. With this we see a clear upper bound of $O(n \log n)$. However, when we include the base cases this becomes better. If we assume that each unique element is repeated $c = n/k = n^*/k$ times (distributed evenly) then we can get the recurrence:

If we do recursion tree analysis on this we can see it is $O(n \log k)$.

We could also take advantage of the fact that we do not do $n$ work at each level of recurrence, but rather at level 0 we do $O(n)$, at level 1 we do $O(n - n/k)$, at level 2 we do $O(n - 3n/k)$, at level 3 we do $O(n - 7n/k)$ and so on. Without out our assumption the analysis becomes more tricky, and we would need to do average case analysis. However, I would claim that this assumption is worst case. Intuitively if this assumption did not hold, then we have fewer repetitions for some element $x_i$, but more repetitions for some element $x_j$. Thus, making $x_j$ more likely to be in one of the center's without going too deep in the recursion tree. This would reduce the amount of work on the remaining subproblem(s)
more by removing more than $n/k$ elements from the list.

$\begingroup$Nice quicksort-based algorithm! Strictly speaking, the "no" only applies to comparison-based algorithms. Algorithms that use other methods could potentially be faster (though I agree that I can't see how to do it, and it might not be possible).$\endgroup$
– D.W.♦Apr 24 at 19:38

$\begingroup$Very Interesting approach. This can also be done via data structure that can perform search in logarithmic time (such as a BST); for each element in the array, add it to said structure if it is new, otherwise increase its count by 1.$\endgroup$
– loxApr 25 at 11:13

$\begingroup$@lox that's a good point too, I added a blurb about that.$\endgroup$
– ryanApr 25 at 16:56

1

$\begingroup$Exercise: Show that this algorithm takes $O(nH)$ time where $H$ is the entropy of the key space.$\endgroup$
– PseudonymApr 26 at 2:01