In an amortized analysis, the time required to perform a sequence of
data structure operations is averaged over all the operations
performed.

Amortized analysis can be used to show that the average cost of an
operation is small, if one averages over a sequence of operations,
even though a single operation within the sequence might be expensive.

Amortized analysis differs from average case analysis in that
probability is not involved; an amortized analysis guarantees the
average performance of each operation in the worst case.

I thought amortized analysis is for an algorithm. But the above
definition says it is for a sequence of (data structure) operations.
So I wonder if the sequence of operations as a whole is an
algorithm, or each operation is an algorithm?

Can you formulate the definition of amortized analysis in terms of
mathematics in order to

distinguish it from average case analysis,

emphasize that there is no probability of the input involved, and

show that it is average performance of each operation in the worse case? (where does it say about worse case in amortized
analysis?)

2 Answers
2

I'm going to give an example, because I think that will clear up your questions better than yet more verbiage. Suppose we have a language that has linked lists. A linked list $L$ supports three operations, each of which takes a constant unit of time:

$empty(L)$, which yields true or false depending on whether $L$ is empty.

$L' = add(h, L)$ which creates a new list, $L'$; the first element of $L'$ is $h$ and the rest of the elements are $L$.

$(h, L') = remove(L)$, which removes the first element from a nonempty list $L$, yielding the first element, $h$, and the rest of $L$, which is $L'$.

In particular, $remove(add(x, L))$ yields $(x, L)$ for any $x$ and $L$.

It is easy to implement a stack data structure using these three operations: you use $add$ for pushing an element onto the top of the stack, and $remove$ for popping it off again. Whenever you pop the stack, you get the element that was most recently pushed that has not yet been popped.

But what if you want to implement a queue? A queue has a push and pop operation, but when you pop the queue you should get the element that was least recently pushed and not yet popped. Or put another way, it has a front and a back, and the push operation adds an element to the back of the queue while the pop operation removes the element at the front.

One implementation is like this: You represent a queue $q$ as a pair of lists, called $q.f$ and $q.b$, and there is a $Queue(f, b)$ function which takes lists $f$ and $b$ and makes a new queue with that $f$ and $b$. The $q.f$ part contains the items at the front of the queue, and the $q.b$ part contains the items at the back, in reverse order. For example, if $q.f = [1,2,3,4]$ and $q.b = [8,7,6,5]$ then the elements of the queue $q$ are $[1,2,3,4,5,6,7,8]$. An empty queue has both $f$ and $b$ empty, so:

qempty(q) = empty(q.f) AND empty(q.b)

This takes at most two units of time, so it runs in constant time.

It's easy to implement the $qpush$ operation: it just inserts a new item into $q.b$. Since $q.b$ is backwards, the first element of $q.b$ is the end of the queue, and the queue push operation can just put the new element onto the beginning of list $q.b$ with the list $add$ operation:

qpush(h, q) = Queue(q.f, add(h, q.b))

This also takes a constant amount of time: one unit for the $add$, and maybe a unit to construct the new queue object.

Popping the queue is easy too: since $f$ is the front of the queue, the first element of $f$ is the one we want to remove, and we can remove it with $remove$:

qpop(q) =
let (h, f') = remove(q.f)
return (h, Queue(f', q.b))

Wait, no, that's wrong! Before we can call $remove(q.f)$, we have to make sure that $q.f$ is nonempty! Otherwise the $remove(q.f)$ call is erroneous.

What if $q.f$ is empty, though? Then we have a queue that looks like $([], [8,7,6,5])$, where the $[8,7,6,5]$ part is backwards—the next element to pop off is the 5. So what we need to do, if we're asked to pop a queue with an empty $f$ part, is to reverse the back part and put it on the front. In this case we would turn $([], [8,7,6,5])$ into $([5,6,7,8], [])$ and then pop the 5 off, leaving $([6,7,8], [])$. So the correct code for $qpop$ looks like this:

How long does this take to run? The $remove$ call runs in constant time, and so does the $return$. The $empty$ test is constant time, as are the assignments. But what about $reverse$? Unfortunately, $reverse(q.b)$ takes time proportional to $q.b$; there is no way to reverse a linked list in constant time. The code looks like this:

and the loop removes one element from $L$ on each iteration, and so must iterate $n$ times to reverse an $n$-element list.

So sadly, $qpop$ does not run in $O(1)$ (constant) time. Most of the time it does, but sometimes you hit that if-block and it has to reverse $b$ and that could take a long time, depending on how long $b$ is.

But you can make a claim that's almost as good: Each element that you push on the queue is moved from $b$ to $f$ at most once. So if you push $n$ items into a queue and then pop them off, in any order, the total time spent reversing $b$ cannot be more than a constant amount of time per item.

That is what amortized time is about. Although any particular call to $qpop$ might take a long time, there is a limit to how long any sequence of calls to $qpop$ can take: If you pop $n$ items from a queue, n of those pops will be very fast indeed, and the others will be few enough or fast enough that the total time for the $n$ pops will be something like $Cn$ for some constant $C$, just as if each one took $C$ time. And this is true for any sequence of queue operations involving pops: $n$ pops cannot take more than $Cn$ time.

This is different than saying that the average running time of $qpop$ is $O(1)$, which is not actually true; the average running time of $qpop$ depends on the size of $q.b$, and is $O(size(q.b))$. But the cost of $qpop$ amortized over the lifetime of a queue, and over any sequence of queue operations, even the worst possible sequence, is $O(1)$.

That came out a lot longer than I expected, but I think the example is a fundamental one, and I hope it makes the point clearly.

+1! Thanks! My question is similar to one for the Jeremy. Usually average case and worst case analysis are defined for an algorithm and its varying input. Here they are defined for a data structure, its operations and its states. I would like to find the link between these two formulations.
–
TimNov 13 '12 at 13:37

@Tim Algorithms and data structures are two sides of the same coin. When we want an algorithm to solve some problem, it is always an algorithm to represent the problem as a certain data structure and then to manipulate the data structure. And an operation to manipulate a data structure is itself an algorithm. By choosing a good data structure for a problem, we enable the use of an efficient algorithm for solving the problem; choose a the data structure badly, and the algorithms will be slow.
–
MJDNov 13 '12 at 14:03

This is a fairly terse mathematical definition of worst-case, average-case, and amortized analysis. Please see @MJD's answer for a prose example and explanation.

Let $S$ be the state of a data structure (such as a list or tree), and let $\Omega$ be a set of operations that can be applied to $S$. We assume that the data structure starts in an initial state $S_0$ (such as the empty list).

Let $T(\omega(S))$ be the time required to complete an operation $\omega\in\Omega$, given the current state $S$.

Worst-case is taking the maximum over all possible operations and all possible states.

Average-case (Version 1) is taking the expectation over all operations, given the worst-case state. Average-case (Version 2) is taking the expected average of a random sequence of operations, starting from the initial state.

Amortized takes the maximum average over all sequences of operations, starting from the initial state $S_0$.

+1! Thanks! (1) in all the definitions, how can they be interpreted in terms of "algorithm" and its input with varying size? In other words, is an algorithm and its input related to an operation and the state of the data structure somehow? (2) How are the probability measures defined: the probability measure on the operations in average case (version 1), and the joint probability measure on the set of sequences of operations in average case (version 2) and amortized case? (3) I am very happy to see the mathematical formulations. I wonder if there are references providing similar treatment?
–
TimNov 13 '12 at 13:21

add to (1), usually average case and worst case analysis are defined for an algorithm and its varying input. Here they are defined for a data structure, its operations and its states. I would like to find the link between these two formulations.
–
TimNov 13 '12 at 13:28

add to (2) are the probability measures assumed to be uniform distributions if without being mentioned explicitly?
–
TimNov 13 '12 at 13:48

add to (1) again. Does the set of operations $\Omega$ consist of the operations that perform the same task but with different implementations (hence different running time), not the operations with different tasks (such as adding and deleting an item are different tasks)?
–
TimNov 13 '12 at 14:25

add to (1) another time. Does "a sequence of operations" in avergae case (version 2) and amortized case need to satisfy that any operation provides its output as the input to the next operation? Or are the operations in the same sequence unrelated?
–
TimNov 13 '12 at 14:36