Algorithms/Mathematical Background

Before we begin learning algorithmic techniques, we take a detour to give ourselves some necessary mathematical tools. First, we cover mathematical definitions of terms that are used later on in the book. By expanding your mathematical vocabulary you can be more precise and you can state or formulate problems more simply. Following that, we cover techniques for analysing the running time of an algorithm. After each major algorithm covered in this book we give an analysis of its running time as well as a proof of its correctness

In addition to correctness another important characteristic of a useful algorithm is its time and memory consumption. Time and memory are both valuable resources and there are important differences (even when both are abundant) in how we can use them.

How can you measure resource consumption? One way is to create a function that describes the usage in terms of some characteristic of the input. One commonly used characteristic of an input dataset is its size. For example, suppose an algorithm takes an input as an array of n{\displaystyle n} integers. We can describe the time this algorithm takes as a function f{\displaystyle f} written in terms of n{\displaystyle n}. For example, we might write:

f(n)=n2+3n+14{\displaystyle f(n)=n^{2}+3n+14}

where the value of f(n){\displaystyle f(n)} is some unit of time (in this discussion the main focus will be on time, but we could do the same for memory consumption). Rarely are the units of time actually in seconds, because that would depend on the machine itself, the system it's running, and its load. Instead, the units of time typically used are in terms of the number of some fundamental operation performed. For example, some fundamental operations we might care about are: the number of additions or multiplications needed; the number of element comparisons; the number of memory-location swaps performed; or the raw number of machine instructions executed. In general we might just refer to these fundamental operations performed as steps taken.

Is this a good approach to determine an algorithm's resource consumption? Yes and no. When two different algorithms are similar in time consumption a precise function might help to determine which algorithm is faster under given conditions. But in many cases it is either difficult or impossible to calculate an analytical description of the exact number of operations needed, especially when the algorithm performs operations conditionally on the values of its input. Instead, what really is important is not the precise time required to complete the function, but rather the degree that resource consumption changes depending on its inputs. Concretely, consider these two functions, representing the computation time required for each size of input dataset:

f(n)=n3−12n2+20n+110{\displaystyle f(n)=n^{3}-12n^{2}+20n+110}

g(n)=n3+n2+5n+5{\displaystyle g(n)=n^{3}+n^{2}+5n+5}

They look quite different, but how do they behave? Let's look at a few plots of the function (f(n){\displaystyle f(n)} is in red, g(n){\displaystyle g(n)} in blue):

Plot of f and g, in range 0 to 5

Plot of f and g, in range 0 to 15

Plot of f and g, in range 0 to 100

Plot of f and g, in range 0 to 1000

In the first, very-limited plot the curves appear somewhat different. In the second plot they start going in sort of the same way, in the third there is only a very small difference, and at last they are virtually identical. In fact, they approach n3{\displaystyle n^{3}}, the dominant term. As n gets larger, the other terms become much less significant in comparison to n3.

As you can see, modifying a polynomial-time algorithm's low-order coefficients doesn't help much. What really matters is the highest-order coefficient. This is why we've adopted a notation for this kind of analysis. We say that:

This gives us a way to more easily compare algorithms with each other. Running an insertion sort on n{\displaystyle n} elements takes steps on the order of O(n2){\displaystyle O(n^{2})}. Merge sort sorts in O(nlog⁡n){\displaystyle O(n\log {n})} steps. Therefore, once the input dataset is large enough, merge sort is faster than insertion sort.

That is, f(n)=O(g(n)){\displaystyle f(n)=O(g(n))} holds if and only if there exists some constants c{\displaystyle c} and n0{\displaystyle n_{0}} such that for all n>n0{\displaystyle n>n_{0}}f(n){\displaystyle f(n)} is positive and less than or equal to cg(n){\displaystyle cg(n)}.

Note that the equal sign used in this notation describes a relationship between f(n){\displaystyle f(n)} and g(n){\displaystyle g(n)} instead of reflecting a true equality. In light of this, some define Big-O in terms of a set, stating that:

Note that a function f is in o(g(n)) when for any coefficient of g, g eventually gets larger than f, while for O(g(n)), there only has to exist a single coefficient for which g eventually gets at least as big as f.

[TODO: define what T(n,m) = O(f(n,m)) means. That is, when the running time of an algorithm has two dependent variables. Ex, a graph with n nodes and m edges. It's important to get the quantifiers correct!]

Merge sort of n elements: T(n)=2∗T(n/2)+c(n){\displaystyle T(n)=2*T(n/2)+c(n)} This describes one iteration of the merge sort: the problem space n{\displaystyle n} is reduced to two halves (2∗T(n/2){\displaystyle 2*T(n/2)}), and then merged back together at the end of all the recursive calls (c(n){\displaystyle c(n)}). This notation system is the bread and butter of algorithm analysis, so get used to it.

There are some theorems you can use to estimate the big Oh time for a function if its recurrence equation fits a certain pattern.

for a ≥ 1, b > 1 and k ≥ 0. Here, a is the number of recursive calls made per call to the function, n is the input size, b is how much smaller the input gets, and k is the polynomial order of an operation that occurs each time the function is called (except for the base cases). For example, in the merge sort algorithm covered later, we have

because two subproblems are called for each non-base case iteration, and the size of the array is divided in half each time. The O(n){\displaystyle O(n)} at the end is the "conquer" part of this divide and conquer algorithm: it takes linear time to merge the results from the two recursive calls into the final result.

Thinking of the recursive calls of T as forming a tree, there are three possible cases to determine where most of the algorithm is spending its time ("most" in this sense is concerned with its asymptotic behaviour):

the tree can be top heavy, and most time is spent during the initial calls near the root;

the tree can have a steady state, where time is spread evenly; or

the tree can be bottom heavy, and most time is spent in the calls near the leaves

Depending upon which of these three states the tree is in T will have different complexities:

[Start with an adjacency list representation of a graph and show two nested for loops: one for each node n, and nested inside that one loop for each edge e. If there are n nodes and m edges, this could lead you to say the loop takes O(nm) time. However, only once could the innerloop take that long, and a tighter bound is O(n+m).]