I've written an algorithm to find duplicate integers in a sequential list of integers, but I'm running into issues when trying to calculate it's worst case complexity.

Given a sequential list of integers the algorithm is as follows:

1. get the length of the list of integers, assign to list_length - O(1)
2. while list_length is greater than 0: O(n)
1. remove the item at index 0 from the list and assign to checking - O(1)
2. set list_length to the current length of the list
3. perform a binary search of the remaining list for the value assigned to checking - O(log n) where n is the current length of the list
4. if the item is in the list return True
3. if none of the binary searches found a match return False

My understanding is that the complexity of a binary search is $O(\log n)$ and if I were not to remove an item from the list on each iteration, the complexity of my algorithm would be $O(n \log n)$ (as I am running an $O(\log n)$ algorithm $n$ times).

What I am not sure of is how the shortening of $n$ will affect the binary search, as on each iteration the complexity of the binary search will change: $O(\log (n - 1))$ on the first iteration, then $O(\log (n - 2))$ on the second.

$\begingroup$Note that asking for "the big-O notation of an algorithm" is like asking for the decimal notation of a person. Big-O is notation for the asymptotic behaviour of a (mathematical) function; that function could be used to measure anything. Just like decimal is notation for numbers, which could be used to represent anything.$\endgroup$
– David RicherbyDec 2 '18 at 21:55

$\begingroup$I wasn't asking for the actual notation, just wanted to understand the impact of removing items from the list versus not removing them. Also if there was any benefit to the initial segment of the binary search being smaller on each iteration. I overthought the algorithm, and thanks to @sergGr I realise that now.$\endgroup$
– onmylemonDec 2 '18 at 22:00

1

$\begingroup$Yes, and in asking for that, you used the phrase "the big-O notation of [an algorithm]". I'm pointing out that this statement doesn't type-check, just like "the decimal notation of onmylemon" doesn't type-check.$\endgroup$
– David RicherbyDec 2 '18 at 22:03

1

$\begingroup$So, just out of interest, how would you have asked the same question? As I lack the vocabulary to explain what I meant concisely.$\endgroup$
– onmylemonDec 3 '18 at 9:15

1

$\begingroup$You're asking for the running time of the algorithm. The running time is almost always expressed using big-O (it's usually not worth calculating the exact number of steps), so that part is implicit.$\endgroup$
– David RicherbyDec 3 '18 at 10:35

2 Answers
2

As for complexity of your algorithm I believe this still to be O(N*log N). Consider the first half of the checks. For each check you do a binary search over the list of the size at least N/2 so it is O(log (N/2)) but given that log (N/2) = log(N) + constant it is just O(log N). And you do at least N/2 such checks. So it is O(N*log N)

If your list is not sorted, just sorting it to use binary search will take at least O(N*log N) so this is the lower bound anyway. And if it is sorted, your algorithm is very inefficient. You can do it in just O(N) by comparing each 2 sequential elements in the list (duplicates obviously have to be next to each other in a sorted list).

This sum contains $n/2$ terms that are at least $\log(n/2)$, so we have
$$T(n) > \frac{n}2\log\frac{n}2 = \frac{n}2\log n - \frac{n}2\log 2 = \Omega(n\log n)\,,$$
and we can conclude that $T(n) = \Theta(n\log n)$.