3
Algorithm analysis Analysis is the process of estimating the number of computational steps performed by a program (usually as a function of the input size). Useful to compare different approaches. Can be done before coding. Does not require a computer (paper and pencil). Order of magnitude (rough) estimate is often enough. We will introduce a notation to convey the estimates. (O notation).

4
Analysis of selection sorting Consider the program to find the min number in an array: min = 0; for (j = 1; j < n; ++j) if (A[j] > min) min = j; The number of comparisons performed is n – 1. loop starts with j = 1 and ends with j = n so the number of iterations = n – 1. In each iteration, one comparison is performed.

5
Selection sorting – analysis The inner loop: n – 1 comparisons during the first iteration of the inner loop n – 2 comparisons during the 2nd iteration of the inner loop.... 1 comparison during the last iteration of the inner loop Total number of comparisons = 1 + 2 + … + (n – 1) = n(n – 1)/ 2 (best as well as the worst-case)

6
O (order) notation Definition: Let f(n) and g(n) be two functions defined on the set of integers. If there is a c > 0 such that f(n) <= c g(n) for all large enough n. Then, we say f(n) = O(g(n)). Example: n 2 + 2n – 15 is O(n 2 ) Rule: When the expression involves a sum, keep only the term with the highest power, drop the rest. You can also drop constant multiplying terms. (3n 2 + 2 n + 1) (4 n – 5) is O(n 3 )

7
How to Measure Algorithm Performance What metric should be used to judge algorithms? –Length of the program (lines of code) since the personnel cost is related to this. –Ease of programming (bugs, maintenance) –Memory required  Running time (most important criterion) Running time is the dominant standard –Quantifiable and easy to compare –Often the critical bottleneck –Particularly important when real-time response is expected

8
Average, Best, and Worst-Case On which input instances should the algorithm’s performance be judged? Average case: –Real world distributions difficult to predict Best case: –Unrealistic –Rarely occurs in practice Worst case: (most commonly used) –Gives an absolute guarantee –Easier to analyze

9
Simplifying the Bound T(n) = c k n k + c k-1 n k-1 + c k-2 n k-2 + … + c 1 n + c o –too complicated –too many terms –Difficult to compare two expressions, each with 10 or 20 terms Do we really need all the terms? For approximation, we can drop all but the biggest term. When n is large, the first term (the one with the highest power) is dominant.

10
Simplifications Keep just one term! – the fastest growing term (dominates the runtime) No constant coefficients are kept –Constant coefficients affected by machines, languages, etc. Order of magnitude (as n gets large) is captured well by the leading term. –Example. T(n) = 10 n 3 + n 2 + 40n + 800 If n = 1,000, then T(n) = 10,001,040,800 error is 0.01% if we drop all but the n 3 term

11
O (order) notation - formally Definition: Let f(n) and g(n) be two functions defined on the set of integers. If there is a c > 0 such that f(n) <= c g(n) for all large enough n. Then, we say f(n) = O(g(n)). Example: n 2 + 2n – 15 is O(n 2 ) Rule: When the expression involves a sum, keep only the term with the highest power, drop the rest. You can also drop constant coefficients from this term. (3n 2 + 2 n + 1) (4 n – 5) is O(n 3 )

12
Problem size vs. time taken Assume the computer does 1 billion ops per sec.

14
Basic rules and examples about magnitude and growth of functions Constant = O(1) refers to functions f such that there is a constant c: f(n) < c for all n. Ex: accessing an array element A[j] given j. log n grows much slower than n. log n < 30 when n is a 1 trillion. Ex: binary search on array of size n takes O(log n) time. Usually, query systems are expected to perform in O(log n) time to answer about or update database of size n.

15
Magnitudes and growth functions O(n) may be acceptable for off-line processing but not for on-line (real-time) processing. When the data is unstructured (un- preprocessed), it usually takes O(n) time to give any non-trivial answer. Ex: maximum in a given collection of keys, a key in the top 10% etc. Algorithms whose time complexity is O(2 n ) is totally impractical.