-1: Sigh, another abuse of BigOh. BigOh is just an asymptotic upper bound and could be used for anything and is not just CS related. Talking about BigOh as if there is one unique is meaningless (A linear time algorithm is also O(n^2), O(n^3) etc). Saying it helps us measure efficiency is misleading too. Also, what is with the link to the complexity classes? If all you are interested in, is techniques to compute the running times of algorithms, how is that relevant?
–
AryabhattaMay 23 '11 at 17:35

15

Big-O does not measure efficiency; it measures how well an algorithm scales with size (it could apply to other things than size too but that's what we likely are interested here) - and that only asymptotically, so if you are out of luck an algorithm with a "smaller" big-O may be slower (if the Big-O applies to cycles) than a different one until you reach extremely large numbers.
–
ILoveFortranSep 5 '11 at 17:05

22 Answers
22

I'm a professor assistant at my local university on the Data Structures and Algorithms course. I'll do my best to explain it here on simple terms, but be warned that this topic takes my students a couple of months to finally grasp. You can find more information on the Chapter 2 of the Data Structures and Algorithms in Java book.

As a "cookbook", to obtain the BigOh from a piece of code you first need to realize that you are creating a math formula to count how many steps of computations get executed given an input of some size.

The purpose is simple: to compare algorithms from a theoretical point of view, without the need to execute the code. The lesser the number of steps, the faster the algorithm.

This function returns the sum of all the elements of the array, and we want to create a formula to count the computational complexity of that function:

Number_Of_Steps = f(N)

So we have f(N), a function to count the number of computational steps. The input of the function is the size of the structure to process. It means that this function is called suchs as:

Number_Of_Steps = f(data.length)

The parameter N takes the data.length value. Now we need the actual definition of the function f(). This is done from the source code, in which each interesting line is numbered from 1 to 4.

There are many ways to calculate the BigOh. From this point forward we are going to assume that every sentence that doesn't depend on the size of the input data takes a constant C number computational steps.

We are going to add the individual number of steps of the function, and neither the local variable declaration nor the return statement depends on the size of the data array.

That means that lines 1 and 4 takes C amount of steps each, and the function is somewhat like this:

f(N) = C + ??? + C

The next part is to define the value of the for statement. Remember that we are counting the number of computational steps, meaning that the body of the for statement gets executed N times. That's the same as adding C, N times:

f(N) = C + (C + C + ... + C) + C = C + N * C + C

There is no mechanical rule to count how many times the body of the for gets executed, you need to count it by looking at what does the code do. To simplify the calculations, we are ignoring the variable initialization, condition and increment parts of the for statement.

To get the actual BigOh we need the Asymptotic analysis of the function. This is roughly done like this:

The first thing you needed to be asked is the order of execution of foo(). While the usual is to be O(1), you need to ask your professors about it. O(1) means (almost, mostly) constant C, independent of the size N.

The for statement on the sentence number one is tricky. While the index ends at 2 * N, the increment is done by two. That means that the first for gets executed only N steps, and we need to divide the count by two.

The sentence number two is even trickier since it depends on the value of i. Take a look: the index i takes the values: 0, 2, 4, 6, 8, ..., 2 * N, and the second for get executed: N times the first one, N - 2 the second, N - 4 the third... up to the N / 2 stage, on which the second for never gets executed.

On formula, that means:

f(N) = Summation(i from 1 to N)( Summation(j = ???)( ) )

Again, we are counting the number of steps. And by definition, every summation should always start at one, and end at a number bigger-or-equal than one.

We have a problem here: when i takes the value N / 2 + 1 upwards, the inner Summation ends at a negative number! That's impossible and wrong. We need to split the summation in two, being the pivotal point the moment i takes N / 2 + 1.

I would give a +2 if I could. This is one of the most-complete answers I've ever seen on SO
–
Martin ThurauJan 31 '11 at 16:00

26

Extreme helpfulness of the answer aside, I laughed aloud at the second paragraph: "There is no mechanical procedure that can be used to get the BigOh."
–
ProblematicJul 18 '11 at 23:40

but how would you deal with multidimension array ? isn't more input than just one? what would you do for "int[x][y] array" in printing all the values of array[]? what is the O() of this kind of algorithms?
–
arthurAug 25 '12 at 10:27

2

@arthur That would be O(N^2) because you would require one loop to read through all the columns and one to read all rows of a particular column.
–
Abhishek Dey DasNov 14 '14 at 20:34

@arthur: It depends. It's O(n) where n is the number of elements, or O(x*y) where x and y are the dimensions of the array. Big-oh is "relative to the input", so it depends on what is your input.
–
Mooing DuckApr 6 at 22:55

Big O gives the upper bound for time complexity of an algorithm. It is usually used in conjunction with processing data sets (lists) but can be used elsewhere.

A few examples of how it's used in C code.

Say we have an array of n elements

int array[n];

If we wanted to access the first element of the array this would be O(1) since it doesn't matter how big the array is, it always takes the same constant time to get the first item.

x = array[0];

If we wanted to find a number in the list:

for(int i = 0; i < n; i++){ if(array[i] == numToFind){ return i; }}

This would be O(n) since at most we would have to look through the entire list to find our number. The Big-O is still O(n) even though we might find our number the first try and run through the loop once because Big-O describes the upper bound for an algorithm (omega is for lower bound and theta is for tight bound).

Small reminder: the big O notation is used to denote asymptotic complexity (that is, when the size of the problem grows to infinity), and it hides a constant.

This means that between an algorithm in O(n) and one in O(n2), the fastest is not always the first one (though there always exists a value of n such that for problems of size >n, the first algorithm is the fastest).

Note that the hidden constant very much depends on the implementation!

Also, in some cases, the runtime is not a deterministic function of the size n of the input. Take sorting using quick sort for example: the time needed to sort an array of n elements is not a constant but depends on the starting configuration of the array.

There are different time complexities:

Worst case (usually the simplest to figure out, though not always very meaningful)

Average case (usually much harder to figure out...)

...

A good introduction is An Introduction to the Analysis of Algorithms by R. Sedgewick and P. Flajolet.

As you say, premature optimisation is the root of all evil, and (if possible) profiling really should always be used when optimising code. It can even help you determine the complexity of your algorithms.

In mathematics, O(.) means an upper bound, and theta(.) means you have a bound above and below. Is the definition actually different in CS, or is it just a common abuse of notation? By the mathematical definition, sqrt(n) is both O(n) and O(n^2) so it is not always the case that there is some n after which an O(n) function is smaller.
–
Douglas ZareFeb 26 at 13:59

Seeing the answers here I think we can conclude that most of us do indeed approximate the order of the algorithm by looking at it and use common sense instead of calculating it with, for example, the master method as we were thought at university.
With that said I must add that even the professor encouraged us (later on) to actually think about it instead of just calculating it.

The first step is to try and determine the performance characteristic for the body of the function only in this case, nothing special is done in the body, just a multiplication (or the return of the value 1).

So the performance for the body is: O(1) (constant).

Next try and determine this for the number of recursive calls. In this case we have n-1 recursive calls.

So the performance for the recursive calls is: O(n-1) (order is n, as we throw away the insignificant parts).

Then put those two together and you then have the performance for the whole recursive function:

1 * (n-1) = O(n)

Peter, to answer your raised issues; the method I describe here actually handles this quite well. But keep in mind that this is still an approximation and not a full mathematically correct answer. The method described here is also one of the methods we were taught at university, and if I remember correctly was used for far more advanced algorithms than the factorial I used in this example.
Of course it all depends on how well you can estimate the running time of the body of the function and the number of recursive calls, but that is just as true for the other methods.

Sven, I'm not sure that your way of judging the complexity of a recursive function is going to work for more complex ones, such as doing a top to bottom search/summation/something in a binary tree. Sure, you could reason about a simple example and come up with the answer. But i figure you'd have to actually do some math for recursive ones?
–
PeteterAug 15 '08 at 5:46

2

+1 for the recursion... Also this one is beautiful: "...even the professor encouraged us to think..." :)
–
TT_Jan 19 '14 at 3:09

Yes this is so good. I tend to think it like this , higher the term inside O(..) , more the work you're / machine is doing. Thinking it while relating to something might be an approximation , but so are these bounds. They just tell you how does the work to be done increases when number of inputs are increased.
–
Abhinav GauniyalApr 22 at 16:54

I think about it in terms of information. Any problem consists of learning a certain number of bits.

Your basic tool is the concept of decision points and their entropy. The entropy of a decision point is the average information it will give you. For example, if a program contains a decision point with two branches, it's entropy is the sum of the probability of each branch times the log2 of the inverse probability of that branch. That's how much you learn by executing that decision.

For example, an if statement having two branches, both equally likely, has an entropy of 1/2 * log(2/1) + 1/2 * log(2/1) = 1/2 * 1 + 1/2 * 1 = 1. So its entropy is 1 bit.

Suppose you are searching a table of N items, like N=1024. That is a 10-bit problem because log(1024) = 10 bits. So if you can search it with IF statements that have equally likely outcomes, it should take 10 decisions.

That's what you get with binary search.

Suppose you are doing linear search. You look at the first element and ask if it's the one you want. The probabilities are 1/1024 that it is, and 1023/1024 that it isn't. The entropy of that decision is 1/1024*log(1024/1) + 1023/1024 * log(1024/1023) = 1/1024 * 10 + 1023/1024 * about 0 = about .01 bit. You've learned very little! The second decision isn't much better. That is why linear search is so slow. In fact it's exponential in the number of bits you need to learn.

Suppose you are doing indexing. Suppose the table is pre-sorted into a lot of bins, and you use some of all of the bits in the key to index directly to the table entry. If there are 1024 bins, the entropy is 1/1024 * log(1024) + 1/1024 * log(1024) + ... for all 1024 possible outcomes. This is 1/1024 * 10 times 1024 outcomes, or 10 bits of entropy for that one indexing operation. That is why indexing search is fast.

Now think about sorting. You have N items, and you have a list. For each item, you have to search for where the item goes in the list, and then add it to the list. So sorting takes roughly N times the number of steps of the underlying search.

So sorts based on binary decisions having roughly equally likely outcomes all take about O(N log N) steps. An O(N) sort algorithm is possible if it is based on indexing search.

I've found that nearly all algorithmic performance issues can be looked at in this way.

Wow. Do you have any helpful references on this? I feel this stuff is helpful for me to design/refactor/debug programs.
–
aitchnyuSep 11 '11 at 8:43

2

@aitchnyu: For what it's worth, I wrote a book covering that and other topics. It's long since out of print, but copies are going for a reasonable price. I've tried to get GoogleBooks to grab it, but at the moment it's a little hard to figure out who's got the copyright.
–
Mike DunlaveySep 11 '11 at 13:47

First of all, accept the principle that certain simple operations on data can be done in O(1) time, that is, in time that is independent of the size of the input. These primitive operations in C consist of

The justification for this principle requires a detailed study of the machine instructions (primitive steps) of a typical computer. Each of the described operations can be done with some small number of machine instructions; often only one or two instructions are needed.
As a consequence, several kinds of statements in C can be executed in O(1) time, that is, in some constant amount of time independent of input. These simple include

Assignment statements that do not involve function calls in their expressions.

Read statements.

Write statements that do not require function calls to evaluate arguments.

The jump statements break, continue, goto, and return expression, where
expression does not contain a function call.

In C, many for-loops are formed by initializing an index variable to some value and
incrementing that variable by 1 each time around the loop. The for-loop ends when
the index reaches some limit. For instance, the for-loop

uses index variable i. It increments i by 1 each time around the loop, and the iterations
stop when i reaches n − 1.

However, for the moment, focus on the simple form of for-loop, where the difference between the final and initial values, divided by the amount by which the index variable is incremented tells us how many times we go around the loop. That count is exact, unless there are ways to exit the loop via a jump statement; it is an upper bound on the number of iterations in any case.

For instance, the for-loop iterates ((n − 1) − 0)/1 = n − 1 times,
since 0 is the initial value of i, n − 1 is the highest value reached by i (i.e., when i
reaches n−1, the loop stops and no iteration occurs with i = n−1), and 1 is added
to i at each iteration of the loop.

In the simplest case, where the time spent in the loop body is the same for each
iteration, we can multiply the big-oh upper bound for the body by the number of
times around the loop. Strictly speaking, we must then add O(1) time to initialize
the loop index and O(1) time for the first comparison of the loop index with the
limit, because we test one more time than we go around the loop. However, unless
it is possible to execute the loop zero times, the time to initialize the loop and test
the limit once is a low-order term that can be dropped by the summation rule.

Now consider this example:

(1) for (j = 0; j < n; j++)
(2) A[i][j] = 0;

We know that line (1) takes O(1) time. Clearly, we go around the loop n times, as
we can determine by subtracting the lower limit from the upper limit found on line
(1) and then adding 1. Since the body, line (2), takes O(1) time, we can neglect the
time to increment j and the time to compare j with n, both of which are also O(1).
Thus, the running time of lines (1) and (2) is the product of n and O(1), which is O(n).

Similarly, we can bound the running time of the outer loop consisting of lines
(2) through (4), which is

We have already established that the loop of lines (3) and (4) takes O(n) time.
Thus, we can neglect the O(1) time to increment i and to test whether i < n in
each iteration, concluding that each iteration of the outer loop takes O(n) time.

The initialization i = 0 of the outer loop and the (n + 1)st test of the condition
i < n likewise take O(1) time and can be neglected. Finally, we observe that we go
around the outer loop n times, taking O(n) time for each iteration, giving a total
O(n^2) running time.

If you want to estimate the order of your code empirically rather than by analyzing the code, you could stick in a series of increasing values of n and time your code. Plot your timings on a log scale. If the code is O(x^n), the values should fall on a line of slope n.

This has several advantages over just studying the code. For one thing, you can see whether you're in the range where the run time approaches its asymptotic order. Also, you may find that some code that you thought was order O(x) is really order O(x^2), for example, because of time spent in library calls.

Basically the thing that crops up 90% of the time is just analyzing loops. Do you have single, double, triple nested loops? The you have O(n), O(n^2), O(n^3) running time.

Very rarely (unless you are writing a platform with an extensive base library (like for instance, the .NET BCL, or C++'s STL) you will encounter anything that is more difficult than just looking at your loops (for statements, while, goto, etc...)

Less useful generally, I think, but for the sake of completeness there is also a Big Omega Ω, which defines a lower-bound on an algorithm's complexity, and a Big Theta Θ, which defines both an upper and lower bound.

Big O notation is useful because it's easy to work with and hides unnecessary complications and details (for some definition of unnecessary). One nice way of working out the complexity of divide and conquer algorithms is the tree method. Let's say you have a version of quicksort with the median procedure, so you split the array into perfectly balanced subarrays every time.

Now build a tree corresponding to all the arrays you work with. At the root you have the original array, the root has two children which are the subarrays. Repeat this until you have single element arrays at the bottom.

Since we can find the median in O(n) time and split the array in two parts in O(n) time, the work done at each node is O(k) where k is the size of the array. Each level of the tree contains (at most) the entire array so the work per level is O(n) (the sizes of the subarrays add up to n, and since we have O(k) per level we can add this up). There are only log(n) levels in the tree since each time we halve the input.

Therefore we can upper bound the amount of work by O(n*log(n)).

However, Big O hides some details which we sometimes can't ignore. Consider computing the Fibonacci sequence with

a=0;b=1;for (i = 0; i <n; i++) { tmp = b; b = a + b; a = tmp;}

and lets just assume the a and b are BigIntegers in Java or something that can handle arbitrarily large numbers. Most people would say this is an O(n) algorithm without flinching. The reasoning is that you have n iterations in the for loop and O(1) work in side the loop.

But Fibonacci numbers are large, the n-th Fibonacci number is exponential in n so just storing it will take on the order of n bytes. Performing addition with big integers will take O(n) amount of work. So the total amount of work done in this procedure is

As to "how do you calculate" Big O, this is part of Computational complexity theory. For some (many) special cases you may be able to come with some simple heuristics (like multiplying loop counts for nested loops), esp. when all you want is any upper bound estimation, and you do not mind if it is too pessimistic - which I guess is probably what your question is about.

If you really want to answer your question for any algorithm the best you can do is to apply the theory. Besides of simplistic "worst case" analysis I have found Amortized analysis very useful in practice.

It does. Think of them as two separate statements, the latter explaining the former. Reworded: because premature optimization is the root of all evil, we should forget about small efficiencies about 97% of the time.
–
Robert GrantJan 6 at 13:23

For the 1st case, the inner loop is executed n-i times, so the total number of executions is the sum for i going from 0 to n-1 (because lower than, not lower than or equal) of the n-i. You get finally n*(n + 1) / 2, so O(n²/2) = O(n²).

For the 2nd loop, i is between 0 and n included for the outer loop; then the inner loop is executed when j is strictly greater than n, which is then impossible.

Familiarity with the algorithms/data structures I use and/or quick glance analysis of iteration nesting. The difficulty is when you call a library function, possibly multiple times - you can often be unsure of whether you are calling the function unnecessarily at times or what implementation they are using. Maybe library functions should have a complexity/efficiency measure, whether that be Big O or some other metric, that is available in documentation or even IntelliSense.

In addition to using the master method (or one of its specializations), I test my algorithms experimentally. This can't prove that any particular complexity class is achieved, but it can provide reassurance that the mathematical analysis is appropriate. To help with this reassurance, I use code coverage tools in conjunction with my experiments, to ensure that I'm exercising all the cases.

As a very simple example say you wanted to do a sanity check on the speed of the .NET framework's list sort. You could write something like the following, then analyze the results in Excel to make sure they did not exceed an n*log(n) curve.

In this example I measure the number of comparisons, but it's also prudent to examine the actual time required for each sample size. However then you must be even more careful that you are just measuring the algorithm and not including artifacts from your test infrastructure.

Don't forget to also allow for space complexities that can also be a cause for concern if one has limited memory resources. So for example you may hear someone wanting a constant space algorithm which is basically a way of saying that the amount of space taken by the algorithm doesn't depend on any factors inside the code.

Sometimes the complexity can come from how many times is something called, how often is a loop executed, how often is memory allocated, and so on is another part to answer this question.

Lastly, big O can be used for worst case, best case, and amortization cases where generally it is the worst case that is used for describing how bad an algorithm may be.

What often gets overlooked is the expected behavior of your algorithms. It doesn't change the Big-O of your algorithm, but it does relate to the statement "premature optimization. . .."

Expected behavior of your algorithm is -- very dumbed down -- how fast you can expect your algorithm to work on data you're most likely to see.

For instance, if you're searching for a value in a list, it's O(n), but if you know that most lists you see have your value up front, typical behavior of your algorithm is faster.

To really nail it down, you need to be able to describe the probability distribution of your "input space" (if you need to sort a list, how often is that list already going to be sorted? how often is it totally reversed? how often is it mostly sorted?) It's not always feasible that you know that, but sometimes you do.

For code A, the outer loop will execute for n+1 times, the '1' time means the process which checks the whether i still meets the requirement. And inner loop runs n times, n-2 times.... Thus, 0+2+..+(n-2)+n= (0+n)(n+1)/2= O(n²).

For code B, though inner loop wouldn't step in and execute the foo(), the inner loop will be executed for n times depend on outer loop execution time, which is O(n)

I don't know how to programmatically solve this, but the first thing people do is that we sample the algorithm for certain patterns in the number of operations done, say 4n^2 + 2n + 1 we have 2 rules:

If we have a sum of terms, the term with the largest growth rate is kept, with other terms omitted.

If we have a product of several factors constant factors are omitted.

If we simplify f(x), where f(x) is the formula for number of operations done, (4n^2 + 2n + 1 explained above), we obtain the big-O value [O(n^2) in this case]. But this would have to account for Lagrange interpolation in the program, which may be hard to implement. And what if the real big-O value was O(2^n), and we might have something like O(x^n), so this algorithm probably wouldn't be programmable. But if someone proves me wrong, give me the code . . . .