Preface and Notation

This document contains the text of a first course in algorithms and data structures.

As html is not suited for mathematical formulas, some additional notation is used (as used in the typographical package Latex). a_i denotes a with subscript i. a^i denotes a to the power i. <= and >= are used as in most computer languages. Curly brackets, "{" and "}", are use to group things. sum_{i = 0}^j denotes the sum for i running from 0 to j. The same may also be written sum_{0 <= i <= j}. ~= stands for "approximately equal" and ~ for "proportional to". Greek letters are written out. For logical expressions we either use the notation from C or the operators are written out in text. So, "a && !b" is the same as "a and not b". sqrt stands for the square root function and log without specifying the base number for the logarithm to basis 2. There are a few more notations, but they should be understood easily. New notions are printed bold there where they are defined. Inside the chapters particularly important ideas and short key notes are highlighted without introductory text. There are many pictures. These are intended to be self-explanatory. Typically they are placed just after the text fragment to which they belong, generally there is no direct reference to them in the text.

There are very few references to the literature. Clearly most of the presented material is not new. A large part is even common knowledge. Next to several other lecture notes, the following book has been used as a source:

Several students have contributed by pointing out errors and spots which required better explanation.

Notice: the following text may be overcomplete. At the examination it is expected that the students only know all that has been presented during the lectures.

Table of Contents





Introduction

Covered Material

In this lecture elementary algorithms and data structures are presented and analyzed. An algorithm is a systematic, step-by-step description of how to solve a problem. Of course we will consider algorithms to solve problems from the domain of computer science, but the word algorithm might in a wider sense also refer to a step-by-step description of how to solve problems from other domains (cooking, repairing bicycles, ...). A data structure is a specific way to organize data so that certain operations can be executed efficiently.

In the course of the lecture we will encounter the following data structures:

We will also study alternative techniques for searching and maintaining subsets: From the algorithms domain we will consider algorithms for solving the following problems: But on the way we will also solve many smaller problems in an algorithmic way.

Algorithms can be formulated in many ways. Consider the problem of finding the maximum from a set of n numbers which is stored in an array a[]. The algorithm can be formulated as follows:

Start by initializing a variable, which we will call M hereafter, with the value of a[0]. Then check the numbers a[i] starting from a[1] and going all the way to a[n - 1] in this order. Each of these numbers is compared with the current value of M, and if a[i] happens to be larger than M, then a[i] is assigned to M.
This is definitely an algorithm. All the important details are mentioned. This is about how a (good) cooking recipe might be given or how one may be instructed to repair a punctured tire.

However, in computer science texts, it is common to use a slightly more formalized way of expression, using pseudo-code. Pseudo-code intermixes common program-language-like fragments with fragments in natural language. This is convenient, because this allows to express loops and tests more clearly. On the other hand, there is no need to declare variables. So, we might also write:

  function maximum(array a[] of length n)
    begin 
      if (n equal to 0) then 
        return some special value or generate an exception
      else
        M = a[0];
        for (all i from 1 to n - 1) do
          if (a[i] > M) then
            M = a[i];
        return M;
      fi;
    end.
In such a more formal notation one might also more easily detect that we should also handle the case of an empty set correctly. In this simple case, we could just as well have written C or Java code, but in general this may distract from the essentials of the algorithm.

Asymptotical Notation

For estimating and comparing the time consumption of programs, we need some notions to estimate the time consumption of algorithms in a computer independent way. We want to make statements like "algorithm A is better than algorithm B". By this we mean: "for sufficiently large inputs a program that implements algorithm A will run faster than a program that implements algorithm B". The following notation is standard and widely used among all computer scientists:

Actually all these notions should be written with the "element-of symbol", so one should say "T(n) = 6 * n^2 + 2 * n^3 + 345 is an element of O(n^3)", but it is very common to use the equality symbol. Nevertheless one should not be fooled by this: T(n) = O(f(n)) does not imply that O(f(n)) = T(n) (this is not even defined) or that f(n) = O(T(n)) (which might be true, but which does not need to be true).

By far the most common of these symbols is O(). This symbols gives us an upper-bound on the rate of growth of a function, allowing us to make an overestimate. In the chapter on union-find we will give an analysis of the time consumption and we will conclude that certain operations can be performed very efficiently, stating T(n) = O(f(n)) for some function f() which is almost constant. However, the actual result is even sharper, even in an asymptotic sense. Most other algorithms and data structures we study in this lecture are so simple that the time consumption of operations can be specified exactly. In that case we might even write T(n) = Theta(f(n)), but because it is typically the upper bound which interests us most, we will even in these cases mostly use O().

It is easy to show, just arguing formally, that if T_1(n) = O(f_1(n)) and T_2(n) = O(f_2(n)), that then T_1(n) + T_2(n) = O(max{f_1(n), f_2(n)}). As an example we will prove this. Other relations can be proven similarly. We know T_1(n) < c_1 * f_1(n) for all n > n_1, and T_2(n) < c_2 * f_2(n) for all n > n_2. This implies T_1(n) + T_2(n) < (c_1 + c_2) * max{f_1(n), f_2(n)}, for all n > max{n_1, n_2}.

Common terminology:

It happens, even in the literature, that people mix-up Omega, Theta and O. You will see that O is used where clearly one of the other two is understood. The most common usage of the above notation is that one tries to express the time complexity of an algorithm consisely as O(f(n)) for some suitable and simple function f(). For an expression f(n) + g(n), f(n) is said to be the leading term and g(n) a lower-order term when g(n) = o(f(n)). Typically the time complexity involves a number of contributions. In that case only the leading term should be retained, while the lower-order terms are scratched away. The leading term should be given without constants. For example, 23 * n * log + 2 * n^2 + 100 * n, is normally written as O(n^2). In some contexts it may make sense to specify the leading constant, the constant with which the leading term is multiplied. Then this same result can be given as 2 * n^2 + o(n^2), as (2 + o(1)) * n^2, or as 2 * n^2 + O(n * log n).

In order to determine the leading term, it is handy to know a few rules which all can be proven by using the definition above:

  1. c * f(n) = Theta(f(n)), for any constant c.
  2. n^a = o(n^b), for any a < b.
  3. a^n = o(b^n), for any a < b.
  4. log^c(n) = o(n), for any constant c.
  5. n^c = o(a^n), for any constant c and any a > 1.
The first is trivial (use alpha = c and alpha = 1 / c, respectively). The second follows because n^a / n^b = 1 / n^(b - a). If b > a, b - a > 0. In that case lim_{n -> infinity} 1 / n^{b - a} = 0, because lim_{n -> infinity} 1 / n^c = 0, for any constant c > 0. The third can be proven analogously and is left as an exercise.

The last two relations can be proven conveniently using l'Hopital's rule. This rule states that for differentiable functions f() and g(), with lim_{n -> infinity} f(n) = infinity and lim_{n -> infinity} g(n) = infinity, lim_{n -> infinity} f(n) / g(n) = lim_{n -> infinity} f'(n) / g'(n). We now prove the fourth relation, leaving the last as an exercise. lim_{n -> infinity} log^c(n) / n = 0 <=> (lim_{n -> infinity} log^c(n) / n)^{1/c} = 0 <=> lim_{n -> infinity} log(n) / n^{1/c} = 0. The last transition is not trivial but may be found in textbooks on analysis. To prove this last we use l'Hopital's rule. First we notice that both f(n) = log(n) and g(n) = n^{1/c} are differentiable for n > 0 and go to infinity with n. f'(n) = 1 / n and g'(n) = 1/c * n^{1/c} / n. So, f'(n) / g'(n) = c / n^{1/c}. For any constant c, lim_{n -> infinity} f'(n) / g'(n) = 0, and therefore we conclude lim_{n -> infinity} f(n) / g(n) = 0, as was to be shown.

Computer and Cost Model

It is common, but other assumptions are made as well, to work under the so-called RAM cost model, also called "von Neumann" model. This means that we assume that all basic instructions take equally long, and that we take the total number of basic instructions as our cost measure. So, time, measured in terms of basic instructions, and expressed in O() notation, is our main concern.

One should be aware that this model is only a coarse approximation of the reality that holds for modern computer systems. Of course, at least in theory, this is dealt with by the O() assumption: all operations are constant time, but the constants may be quite considerable. The most important aspect is the non-uniform cost of memory access. The following picture gives a very high-level view of a modern computer system.

Sketch of a memory hierarchy

There are a few registers, there is 16-64 KB of first-level cache, there is 256-1024 KB of second-level cache, there is 64-16384 MB of main memory and a hard disk with storage for 20-200 GB. Each higher level of memory has higher access costs. The registers and the first-level cache may be assumed to be accessible in 1 clock cycle. The second-level cache costs several clock cycles to access. Upon a cache miss (that is, when the required data are not available in the cache) the data are fetched from the main memory, which currently costs on the order of 200 clock cycles. This cost is partially amortized by look-ahead and loading a cache line consisting of 64 bytes, but will nevertheless slow down the computation noticeably. Much worse is a page-fault (that is, when the required data are not available in the main memory). In that case, the data are fetched from the hard disk. This is a terribly expensive operation, costing about 10 ms. Again it is attempted to amortize these costs by delivering a large page of 10 to 100 KB, but this only helps if one is indeed using the data on the page. Random access to the secondary memory (a more general name for memory like hard disks) is devastatingly slow.

Other factors may, however, be equally important. The main other factor is space. There are examples of problems that can be solved faster if we are willing to use more space. So, in that case we find a space-time trade-off. A trivial (but extreme) example is to create a dictionary for all words of at most five letters with 27^5 storage. After initialization, we can perform insertions, deletions and look-ups in constant time. Actually, applying the idea of virtual initialization (treated in one of the later chapters), the prohibitively expensive initialization does not need to be performed.

A further point that should be stressed here, is that when evaluating the performance of an algorithm, we consider the time the algorithm might take for the worst of all inputs. That is, we perform a worst-case analysis. Thus, if we state "this sorting algorithm runs in O(n * log n)", we mean that the running time is bounded by O(n * log n) for all inputs of size n. If on the other hand we say "this sorting algorithm has quadratic running time", we do not mean that it takes c * n^2 time for all inputs of size n, but that there are inputs (maybe not even for all values of n) for which the algorithm takes at least c * n^2 time for some constant c. This type of analysis is by far the most common in the study of algorithms, but occasionally one gets the feeling that this does not reflect the "real" behavior in practice. Maybe there are only some rare very artificial instances for which the algorithm is slow. In that case one may also perform some kind of average case analysis. The problem is to define what the average case is. For example, when sorting n numbers, can one assume that all n! arrangements are equally frequent? In the real practice there may be a tendency to have some kind of clustering. The strong point of a worst-case analysis is that it leaves no room for such discussions. Furthermore, one can easily find many contexts, where one wants guaranteed performance.

Mathematical Background

Studying algorithms and data structures also means analyzing algorithms and data structures mathematically. You may already have noticed it, otherwise you will notice it here: there is more mathematics in computer science than you may have imagined or maybe even than you like. Mathematics from several branches is needed. The following topics are of central importance in algorithmics:

The mathematics in this lecture will mostly be of the elementary type, but we will also encounter a little bit of probability theory and we will deal with graphs, but not in a very theoretical way.

Useful Expressions

For reference purposes we list some of the most useful relations, inequalities and limits.

log_a n = log_b n / log_b a, for all positive a, b and n.
This relation is mostly used with b = 2, showing that for any constant a > 0, log_a n = O(n).

(1 + x)^a ~= 1 + a * x,
1 / (1 + x) ~= 1 - x,
log_e (1 + x) ~= x,
e^x ~= 1 + x,
sqrt(1 + x) ~= 1 + x / 2.
These are so-called first-order Taylor approximations, which are accurate only for x close to zero (positive or negative). All of the above approximate relations can be made more accurate by using the second-order Taylor approximations:
(1 + x)^a ~= 1 + a * x + a * (a - 1) * x^2 / 2,
1 / (1 + x) ~= 1 - x + x^2,
log_e (1 + x) ~= x - x^2 / 2,
e^x ~= 1 + x + x^2 / 2,
sqrt(1 + x) ~= 1 + x / 2 - x^2 / 8.
The size of the second-order term gives insight on how accurate the first-order estimate is. These first- and second-order approximations are particularly useful for obtaining inequalities. For example, 1 + x / 2 - x^2 / 8 <= sqrt(1 + x) <= 1 + x / 2, for all x >= 0.

(1 - 1 / n)^n < 1 / e < (1 - 1 / (n + 1))^n, for all n >= 1.
In particular this shows that (1 - 1 / n)^n converges to 1 / e for large n. In most applications the expression to estimate is slightly less beautiful. It is common to have a relation like (1 - a / n)^{b * n}. Substituting m = n / a, gives (1 - a / n)^{b * n} = (1 - 1 / m)^{a * b * m} = ((1 - 1 / m)^m)^{a * b} < 1 / e^{a * b}. The notation suggests that a and b are constants, but this is not essential.

n! ~= sqrt(2 * pi * n) * (n / e)^n.
The above formula, which is known as Stirling's Formula, is quite exact. As a consequence, n! = Theta(sqrt(n) * (n / e)^n). Essentially n! grows as (n / e)^n. Another consequence is that log_2(n!) ~= log_2((n / e)^n) = n * (log_2 n - log_2 e) ~= n * log_2 n - 1.44 * n.

(n / k)^k <= (n over k) <= (n * e / k)^k.
This estimate is strongly related to Stirling's formula: (n over k) = n! / ((n - k)! * k!) ~= sqrt(2 * pi * n) * (n / e)^n / (sqrt(2 * pi * (n - k)) * ((n - k) / e)^(n - k) * sqrt(2 * pi * k) * (k / e)^k) = sqrt(n / (2 * pi * (n - k) * k)) * 1 / (1 - k / n)^{n - k} * (n / k)^k. For small k, this essentially grows as (n / k)^k, for large k, we must also take into account the factor 1 / (1 - k / n)^{n - k}. Using m = n / k, we get 1 / (1 - k / n)^{n - k} = 1 / (1 - 1 / m)^{k * m - k)} = 1 / (1 - 1 / m)^{m * k * (1 - 1 / m)} ~= e^{k * (1 - 1 / m)}.

sum_{k = 0}^n (n over k) = 2^n, for all n >= 0.
This tells that the sum of the values in row n of Pascal's triangle equals 2^n. Realizing that (n over k) gives the number of ways to select k numbers out off n numbers, this expression is equivalent to the statement that in total there are 2^n possible ways to flip n coins. From this it immediately follows that (n over k) <= 2^k, for all n >= 0 and 0 <= k <= n. It follows also that there are values of k for which (n over k) >= 2^n / n.

sum_{k >= 0} 1 / k! = e.
This goes back on the fact that the derivative of e^x is e^x itself again. A consequence of the above is the slightly surprising fact that sum_{k >= 0} k / k! = sum_{k >= 1} k / k! = sum_{k >= 1} 1 / (k - 1)! = sum_{k >= 0} 1 / k! = e. So, the value is the same (analogously, it follows that sum_{k >= 0} k * (k - 1) * ... * (k - l) / k! = e for any constant l). This can be used in the following computation sum_{k >= 1} (k - 1) / k! = sum_{k >= 1} k / k! - sum_{k >= 1} 1 / k! = e - (e - 1) = 1. Here we used that under conditions which are mostly satisfied for the functions we are considering, the limit of a sum equals the sum of the limits.

Sums of Geometric Progressions and Proof by Induction

Not so widely used outside computer science are the sums of the elements of a geometric progression. That is, that sum_{i = 0}^infty a^i = 1 / (1 - a), for all a < 1. This can be proven by mathematical induction, which is also called complete induction or just induction. We will show that sum_{i = 0}^n a^i = (1 - a^{n + 1}) / (1 - a), for all i >= 0. This is good enough, because clearly lim_{i to infty} (1 - a^{n + 1}) / (1 - a) = 1 / (1 - a).

What does proving something by induction mean?

You must be especially careful not to forget proving a base case. Even scientists sometimes overlook this and "prove" things that are plainly false. This procedure of proving may be applied to any countable domain. The integral positive numbers are the major example. It is an axiom of mathematics that this approach gives a proof.

We now apply this proof method to the formula above. First we notice that sum_{i = 0}^0 a^i = 1 = (1 - a^{0 + 1}) / (1 - a). This gives our base case. Now assume that we have proven the equality for some arbitrary n >= 0. Then we can write

  sum_{i = 0}^{n + 1} a^i               =def of sum=
  sum_{i = 0}^n a^i + a^{n + 1}         =induction hypothesis=
  (1 - a^{n + 1}) / (1 - a) + a^{n + 1} =computation=
  (1 - a^{n + 2}) / (1 - a).
So, using the claimed property for n, we have proven it for n + 1. Together with the basis, the axiom of proving by induction now allows to conclude that the property holds for all n >= 0.

In a similar way many more of these relations can be proven. Often one needs a small twist that is not entirely obvious. For example,

The second formula gives answer to questions like: how much is 1/2 + 1/2 + 3/8 + 4/16 + 5/32 + ... . The answer, we have proven now, is 2, which, by coincidence, is the same as 1 + 1/2 + 1/4 + 1/8 + ... .

Sums of Arithmetic Progressions and Guessing Expressions

Consider the sum of the arithmetic progression: f(n) = sum_{i = 0}^n i. A geometric argument argument, drawing rectangles of width 1 and height i next to each other for i = 0 to n, immediately shows that this sum approximately equals 1/2 * n^2. This same approximate value can also be found by replacing the sum by an integral: realizing that an integral is nothing else but an infinitely refined sum, it is no surprise that for well-behaving functions the values of sums and their corresponding integrals often do not differ too much. Because the linear function is smooth, we may assume that sum_{i = 0}^n i ~= integral_0^n x dx = 1/2 * x^2 |_0^n = 1/2 * n^2 - 0 = 1/2 * n^2.

The above arguments suggest that f(n) ~= 1/2 * n^2. What is the exact value? A reasonable guess is that f(n) == g(n) for some polynomial of degree two g(n) = 1/2 * n^2 + b * n + c. We know that f(0) = 0 and that f(1) = 1. So, if f(n) == g(n), then we must have g(n) = 1/2 * n^2 + 1/2 * n + 0 = 1/2 * n * (n + 1). Notice that we do not yet claim that this is true: we have only explained how one can guess an expression and assuming its correctness determine the occurring parameters. Proving that indeed f(n) == g(n) for all n can be done with induction. The base case, n == 0, is ok because f(0) == sum_0^0 0 == 0 == 1/2 * 0 * (0 + 1) == g(0). So, assume f(n) == g(n) for some n >= 0 and all smaller values, then the following completes the proof:

   f(n + 1)                    =def of f=
   sum_{i = 0}^{n + 1} i       =def of sum=
   sum_{i = 0}^n i + (n + 1)   =induction hypothesis=
   1/2 * n * (n + 1) + (n + 1) =computation=
   1/2 * (n + 1) * (n + 2)     =def of g=
   g(n + 1).

More generally, you should know that similar formulas exist for f_k(n) = sum_{i = 0}^n i^k, for all finite k. Again approximating the sum by an integral, we find f_k(n) ~= n^{k + 1} / (k + 1). Guessing that this constitutes the leading term in a polynomial of degree k + 1, the expression can be derived: in general, a polynomial of degree d is uniquely determined when we know its value in d + 1 points: using these values, the coefficients can be determined by solving a system with d + 1 unknowns and d + 1 equations. For f_k we can take the values f_k(0), ..., f_k(k).

Proof by Contradiction

Another very common proof technique is by proving that the opposite assumption leads to a contradiction. For example, how could one ever prove that there are an infinite number of primes? This is hard, until one reverses the argument: Let us assume there is a finite number. Say, p_1, ..., p_k. So, we assume that the set P = {p_1, ..., p_k} of finite cardinality k contains all prime numbers. What do we now about the number n = prod_i p_i + 1? Clearly, none of the numbers p_i divides n, because n mod p_i == 1. So, n is not divided by any of the primes, and therefore n has no divisors at all. The conclusion is that n is a prime number itself. Because n is larger than any of the p_i, n is no element of P. This contradicts the assumption that P contains all prime numbers. The conclusion is that the assumption that there is a finite number of primes must be wrong, and that hence the number of primes must be infinite.

This proof method can also be used to prove that any algorithm for determining the maximum element in an array a[] with n elements must inspect all n positions of the array. Assume the opposite, so assume there is a correct algorithm which does not inspect all positions for a certain input array a[]. Let a[i] be a position which is not inspected. Let m be the value which is output by the algorithm. If this is not the maximum the algorithm is not correct for the given array. Otherwise, we change a[] by setting a[i] = m + 1. Running the algorithm on the modified array will output the same value m, because position i is not inspected and for the algorithm there is thus no way to notice the difference. But m is not the maximum of the modified array, showing that the algorithm is not correct for all inputs. We conclude that there is no correct) maximum algorithm not inspecting all positions and that thus any algorithm must inspect all positions of the array at least once. This implies that any maximum algorithm has running time Omega(n). At the same time, the simple algorithm presented at the beginning of this chapter has running time O(n) because it consists of a loop which is executed n times while each pass of the loop takes O(1) time. So, the running time is n * O(1) = O(n). By the above argument we now know that there is very little space for improvements, up to constant factors this algorithm is optimal. The proven lower bound was not for the worst-case running time of a particular algorithm, but for all possible algorithms. That is, we have shown that the time complexity of the problem of computing the maximum of an array with n elements is Omega(n), and because there is an algorithm with matching time complexity, we can say that the complexity of maximum computation is Theta(n). There are not many problems for which matching upper and lower bounds are known.

Recursion

A function is said to be recursive, if it is defined in terms of itself. The following gives a recursive definition of the factorial function:
0! = 1,
n! = n * (n - 1)!, for all n > 0.

The general idea of a recursive definition of a function f is that

Clearly there is a problem here: one should assure that for all values n, the recursion terminates. That is, one should make sure that expanding the recursion, going from f(n) to f(n') to f(n''), ..., one ultimately reaches one of the base cases.

Not only functions can be recursive, but also programs and algorithms (in the light of the existence functional programming this is not surprising). In a recursive algorithm, it is fixed how some special cases can be treated and for all other cases it is told how they can be solved, using the solution of some other cases. As for recursive functions, the problem is to assure that for any possible input ultimately a base case is reached. That is, one should make sure that for any finite input the computation reaches one of the base cases in a finite number of steps and thus terminates.

A problem which can very well be solved in a recursive way is sorting. The task is to design an algorithm that sorts sets of numbers. Let n denote the cardinality (the number of elements) of the set to sort. As base case we take n == 1. In that case the algorithm simply outputs the single element, constituting a sorted set of size 1. For any n > 1, it performs the following steps:

  1. Select an arbitrary element x from the set S (at this point there is no need to specify how this selection is performed).
  2. Split the set S in three mutually disjoint subsets S_0, S_1 and S_2, containing the elements smaller than x, equal to x and larger than x, respectively.
  3. Sort S_0 and S_2 recursively.
  4. Output the elements of S_0 in sorted order, followed by the elements of S_1, followed by the elements of S_2 in sorted order.
In this case termination is guaranteed, because recursion is performed only for S_0 and S_2, which are strictly smaller than S itself (because x not in S_0 or S_2). Thus, the size of the sorted sets is decreasing and will eventually reach size 1, a base case.

The correctness of recursive algorithms is proven by induction. Mostly this is an induction over the size of the input. This works in the case of our sorting algorithm: for sets of size 1 the correctness is obvious. This is the basis of our inductive proof. So, let us assume the algorithm is correct for all sets of size n and smaller, for some n >= 1. Consider a set of size n + 1. The algorithm splits this set in S_0, S_1 and S_2. S_0 and S_2 have size at most n, and thus we may assume by the induction hypothesis, that they are correctly sorted. But then the whole algorithm is correct, because all values in S_0 are strictly smaller than all those in S_1, which in turn are strictly smaller than all those in S_2.

Recurrence Relations

The time consumption of recursive algorithms is typically described by recursive formulas, also called recurrency relations.

As an example we consider the time consumption of the recursive sorting algorithm presented above. We can estimate

T(n) == T(n_0) + T(n_2) + c * n,
for some constant c, where n_0 = |S_0| and n_2 = |S_2|. The reason behind this estimate, is that splitting a set as specified can somehow be performed by considering all elements once and then putting them in the right bag. In any reasonable implementation, this takes linear time, covered by the term c * n. Sorting S_0 takes T(n_0) and sorting S_2 takes T(n_2). As long as we do not spend too long on selecting x, all other operations take constant or at most linear time.

In this case the problem is that we do not know the values of n_0 and n_2, because these depend on the selected item x. The only thing we know is that n_0 + n_2 <= n - 1. In a later chapter this algorithm is considered again. Here we will only analyze its worst-case. The worst that can happen, is that we split the set very unevenly, for example by selecting the largest element. In that case n_0 == n - 1 and n_2 == 0. If this unlucky situation happens again and again, then the time consumption is given by the following recurrence relation:

T(1) = c,
T(n) = T(n - 1) + c * n, for all n > 1.
Here c is chosen so large that the expressions on the right actually give an upper bound on the time consumptions on the left. Expanding the recurrence relation several steps, one soon discovers that T(n) == f(n) for f(n) == sum_{i = 1}^n (c * i).

The formal proof that T(n) == f(n) goes by induction. The basis of the induction is given by T(1) = c = f(1). Assuming that T(n) == f(n) for some value n and all smaller values, it follows that

  T(n + 1)           =def of T()= 
  T(n) + c * (n + 1) =induction assumption=
  f(n) + c * (n + 1) =def of f(n)=
  f(n + 1).
But now we can use our knowledge on the sum of a arithmetic progression to conclude that for this function T(), T(n) = c/2 * n * (n + 1) = Theta(n^2).

As soon as one believes to know the solution of a recurrence relation, it is normally not hard to verify the correctness of this believe. However, in general, it may be very hard to find such solutions. There are mathematical methods which allow to solve several classes of common recurrence relations. Here we will not discuss these methods (which are similar to solving differential equations). Rather we list some of the most common types for reference purposes.

T(n) <= T(alpha * n) + c T(n) <= c * log_{1/alpha} n, for alpha < 1
T(n) <= T(alpha * n) + c * n T(n) <= c * n / (1 - alpha), for alpha < 1
T(n) <= T(n') + T(n - n') + c T(n) <= c * (2 * n - 1), for all 0 < n' < n
T(n) <= T(alpha * n) + T((1 - alpha) * n) + c * n T(n) <= c * n * log_{1 / alpha} n, for 1/2 <= alpha < 1
T(n) <= T(n - a) + c T(n) <= c * n / a
T(n) <= T(n - a) + c * n T(n) <= c * n^2 / a
T(n) <= a * T(n / b) T(n) <= n^{log_b a}
T(n) <= a * T(n - 1) + c T(n) <= c * (a^{n - 1} - 1) / (a - 1)
Here a, b and c are arbitrary positive constants, mostly integers. For all recurrences we assumed that T(1) = c. The last recurrence is the worst: if we are trying to solve a problem of size n by reducing it to two subproblems of size n - 1, then the get four subproblems of size n - 2, ..., and finally 2^n subproblems of size 1. Fortunately, none of the algorithms presented in this course are working so inefficiently, but there are many problems, for which such a behavior cannot be excluded.

Basic Algorithms

Binary Search

Suppose we have an array int[n] a. How can we test whether there is an i with a[i] = x, for a given number x? The easiest way is to write
  boolean linear_search(int* a, int n, int x) {
    /* Test whether x occurs in the array a[] of length n. */
    int i;
    for (i = 0; i < n && a[i] != x; i++);
    return i < n; }
This algorithm is clearly correct and takes O(n) time: each pass through the loop takes constant time, and the loop is traversed at most n times. For the general case, this is optimal, because if one is looking for a number x, one cannot conclude that x does not occur before having inspected all n numbers. Each inspection takes at least a constant amount of time, so Omega(n) is a lower bound on the set-membership problem for sets without special structure. Thus, in that case the complexity of this problem is Theta(n), because we have matching upper and a lower bounds.

On the other hand, if the elements stand in sorted order, that is a[i] <= a[i + 1], for all 0 <= i <= n - 2, this lower-bound argument does not apply: as soon as one finds an i so that a[i] < x and a[i + 1] > x, then there is no need to inspect any further number, so, we cannot argue that all numbers must be inspected in case x does not occur. And indeed, for sorted arrays, there is a much faster method for testing membership:

  boolean binary_search_1(int* a, int n, int x) {
    /* Test whether x occurs in the array a[] of length n. */
    int i;
    while (n > 0) {
      i = n / 2;
      if (a[i] == x)
        return true;
      if (a[i] < x) { /* Continue in right half */
        a = &a[i + 1];
        n = n - i - 1; }
      else /* Continue in left half */
        n = i; }
    return false; }

The time consumption of this algorithm follows from the following claim: after k passes of the loop, i_hgh - i_low <= n / 2^k, for all k >= 0. To prove this claim is left as an exercise. This claim implies that after log n passes, the difference becomes smaller than 1. Because at the same time we know that these numbers are integral, they must be equal. So, after log n steps or less we will either have found x, or we can conclude that x does not occur. Because each pass of the loop takes constant time, the time consumption of the complete algorithm is bounded by O(log n). Under the assumption that the algorithm works by performing comparisons and that in constant time at most constantly many numbers can be compared, O(log n) is optimal. We do not prove this here, the topic of lower bounds is addressed again in the chapter on sorting.

The above algorithm also has a very simple recursive formulation:

  boolean binary_search_2(int* a, int n, int x) {
    /* Test whether x occurs in the array a[] of length n. */
    int i;
    if (n > 0) {
      i = n / 2;
      if (a[i] == x)
        return true;
      if (a[i] < x)
        return binary_search_2(&a[i + 1], n - i - 1, x);
      return binary_search_2(a, i, x); }
    return false; }

The three subroutines for searching an element in an array integrated in a running C program can be downloaded here. Here the array a[] is initialized with increasing random numbers. On average every second number occurs. Testing for increasing values of n clearly shows that the two versions with O(log n) running time are always ready immediately, while for large n linear search starts to take noticeable time.

Exponentiation

Suppose we want to compute x^n. How do we do this? Clearly the following works:
  // In the following we have as invariant that at the
  // beginning of each pass through the loop c == x^i,
  // so in the end c = x^n.

  for (c = 1, i = 0; i < n; i++)
    c *= x;

Assuming that all the multiplications can be performed in unit time, this algorithm has complexity O(n). However, we can do this much faster! Supposing, for the time being, that n = 2^k, the following is also correct:

  // In the following we have as invariant that at the
  // beginning of each pass through the loop c == x^i,
  // so in the end c = x^n.

  for (c = x, i = 1; i < n; i *= 2)
    c *= c;
Here the number of passes through the loop is equal to the number of times we must double i to reach n. That is exactly k = log_2 n times. This algorithm is of the same type as binary search: there is some notion of repeated halving/doubling, which leads to logarithmic time, whereas doing the operation in a linear way gives linear time.

Now, we consider the general case. Assume that n has binary expansion (b_k, b_{k - 1}, ..., b_1, b_0). Then we can write n = sum_{i = 0 | b_i == 1}^k 2^i. So, x^n = x^{sum_{i = 0 | b_i == 1}^k 2^i} = prod_{i = 0 | b_i == 1}^k x^{2^i}. If we now first perform the above computation and store all intermediate c values in an array of length k, then x^n can be computed from them with at most log n additional multiplications and a similar number of additions. That is, the whole algorithm has running time O(log n). Actually it is not necessary to store the c-values: the final value can also be computed by taking the interesting factors when they are generated. The complete routine may look as follows:

  int exponent_1(int x, int n) 
  {
    int c, z;
    for (c = x, z = 1; n != 0; n = n >> 1) 
    {
      if (n & 1) /* n is odd */
        z *= c;
      c *= c; 
    }
    return z; 
  }
It is a good idea to try how the values of z, c and i develop for x = 2 and n = 11.

A slightly different idea works as well. The idea is to start from the top-side: x^99 = x * x^98, x^98 = x^49 * x^49, x^49 = x * x^48, x^48 = x^24 * x^24, x^24 = x^12 * x^12, x^12 = x^6 * x^6, x^6 = x^3 * x^3, x^3 = x * x^2, x^2 = x * x. This idea can be turned into code most easily using recursion:

  int exponent_2(int x, int n) 
  {
    if (n == 0) /* terminal case */
      return 1;
    if (n & 1) /* n is odd */
      return x * exponent_2(x, n - 1);
    return exponent_2(x, n >> 1) * exponent_2(x, n >> 1); 
  }

As usual, the recursive algorithm is easy to understand and its correctness is obvious, while the iterative algorithm was rather obscure. How about the time consumption? Check what happens for n = 32. Formally the time consumption can be analyzed by writing down a recurrence relation. For numbers n = 2^k for some positive k, the time consumption T(n) is given by

T(1) = c_2,
T(n) = 2 * T(n / 2) + c_1, for all n > 1.
The solution of this is given by T(n) = (c_1 + c_2) * n - c_1. Once this relation has been found somehow, for example by intelligent guessing after trying small values of n, it can be verified using induction. So, define the function f() by f(n) = (c_1 + c_2) * n - c_1. Then T(1) = c_2 = (c_1 + c_2) * 1 - c_1 = f(1). This gives a base case. Now assume the relation holds for some n. Then we get
  T(2 * n)                          =def T()= 
  2 * T(n) + c_1                    =induction assumption=
  2 * f(n) + c_1                    =def f()=
  2 * ((c_1 + c_2) * n - c_1) + c_1 =computation=
  (c_1 + c_2) * (2 * n) - c_1       =def f()=
  f(2 * n).
Thus, assuming the equality for n = 2^k, we can prove it for 2 * n = 2^{k + 1}. Because it also holds for n = 1 = 2^0, it holds for all n which are a powers of two.

So, the running time of exponent_2 is at least linear! What went wrong? The problem is that we are recursively splitting one problem of size n in two subproblems of size n / 2. At the bottom of the recursion this inevitably leads to a linear number of subproblems. For other problems this may be inevitable, but here there is an easy solution:

  int exponent_3(int x, int n) 
  {
    int y;
    if (n == 0) /* terminal case */
      return 1;
    if (n & 1) /* n is odd */
      return x * exponent_3(x, n - 1);
    y = exponent_3(x, n >> 1);
    return y * y; 
  }

Algorithm exponent_3 performs the same number of multiplications as exponent_1 (the exact analysis is left as an exercise). Nevertheless, even though the difference will not be large, it will be somewhat slower because every recursive step means that the whole state vector must be pushed on the stack.

Let us now assume that the time for the multiplications increases with the size of the number. This is reasonable, because unless the numbers are small, c^n will soon become a very large number. In order to get an easy comparison, we assume that multiplying an n_1-digit number and an n_2-digit number costs O(n_1 * n_2). Under this assumption, the conventional algorithm takes

  sum_{i = 0}^{n - 1} log c * log c * i = O(log^2 c * n*2).
The cost of the improved exponentiation can be estimated as
  sum_{i = 0}{log n - 1} (log c * 2^i)^2 = O((log c * n)^2).
So, even though we have reduced the number of products to compute from linear to logarithmic, we have not gained much when we look at the time for the whole computation because it are the last few products that dominate the cost.

If we are computing expo_mod, that is (c^n) mod m, then the situation is much better: generally we have (a * b) mod m = ((a mod m) * (b mod m)) mod m, and thus we can compute modulo after each multiplication: the numbers do not grow beyond the size of m, and therefore, all products cost the same (possibly with the exception of the first few). So, for expo_mod, the reduction of the number of the products gives the performance one would hope to achieve.

Multiplication

Assume we are writing a library for handling arbitrary large numbers. Because the arithmetic operations (addition, subtraction, multiplication, division, comparison, etc.) are defined only for numbers with 32 or 64 bits, these must be programmed by the user. Addition and subtraction are rather straightforward: applying the elementary methods taught at primary school leads to algorithms running in O(n) time for operations on two n-digit numbers.

School Algorithm

Multiplication is much more interesting. The school method is correct. Let us consider it with an example (assuming that our computer can only handle one digit at a time). Then
  83415 * 61298 =
    6 * 83415 shifted left 4 positions +
    1 * 83415 shifted left 3 positions +
    2 * 83415 shifted left 2 positions +
    9 * 83415 shifted left 1 positions +
    8 * 83415 shifted left 0 positions

How long does this take? When multiplying two n-digit numbers, there are n multiplications of a 1-digit number with an n-digit number, n shifts and n additions. Each such operation takes O(n) time, thus the total time consumption can be bounded by 3 * n * O(n) = O(n^2). Clever tricks may reduce the time in practice quite a bit, but this algorithm appears to really need Omega(n^2). This quadratic complexity is precisely the reason that it is so tedious to multiply two 4-digit numbers. There is an alternative method. It is a pearl of computer science, surprisingly simple and, for sufficiently long numbers, considerably faster, even in practice.

Recursive Algorithm

Assume we are multiplying two n-digit numbers, for some even n (one can always add a leading dummy digit with value 0 to achieve this). Let m = n / 2. The following description is for decimal numbers, but can easily be generalized to any radix (in practice it is efficient to work with a radix of 2^16 or 2^32). Let the numbers be x = x_1 * 10^m + x_0 and y = y_1 * 10^m + y_0. That is, x_1 and y_1 are the numbers composed of the leading m digits, while x_0 and y_0 are the numbers composed of the trailing m digits. So far this is just an alternative writing, nothing deep. A correct way to write the product is now
  x * y = (x_1 * 10^m + x_0) * (y_1 * 10^m + y_1) 
= x_1 * y_1 * 10^{2 * m} + x_1 * y_0 * 10^m + x_0 * y_1 * 10^m + x_0 * y_0.

This formula suggests the following recursive algorithm:

  superlong prod(superlong x, superlong y, int n) {
    /* add(x, y) adds x to y,
       shift(x, n) shifts x leftwards n positions */

    if (n == 1)
      return x * y /* Product of ints */

    if (n is odd)
      add a leading 0 to x and y and increase n by 1;

    compute x_1, x_0, y_1, y_0 from x and y;

    xy_11 = prod(x_1, y_1, n / 2);
    xy_10 = prod(x_1, y_0, n / 2);
    xy_01 = prod(x_0, y_1, n / 2);
    xy_00 = prod(x_0, y_0, n / 2);

    xy = xy_00;
    xy = add(xy, shift(xy_01, n / 2));
    xy = add(xy, shift(xy_10, n / 2));
    xy = add(xy, shift(xy_11, n)); 

    return xy; }

How long does this take? Is it faster than before? Let us look what happens. Instead of one multiplication of two numbers of length n, we now have 4 multiplications of numbers of length m = n / 2 plus 3 shifts plus 3 additions. The additions and the shifts take time linear in n. So, all together, the second part takes linear time. That is, there is a d, so that the running time for this part is bounded by d * n. The first part is formulated recursively. So, it makes sense to formulate the time consumption as a recurrence relation:

  T_prod(n) = 4 * T_prod(n / 2) + d * n
  T_prod(1) = 1

To solve recurrence relations it often helps to try a few values in order to get an idea:

  T(1)  = 1
  T(2)  = 4 *    1 + 7 *  2 =    18
  T(4)  = 4 *   18 + 7 *  4 =   100
  T(8)  = 4 *  100 + 7 *  8 =   456
  T(16) = 4 *  456 + 7 * 16 =  1936
  T(32) = 4 * 1936 + 7 * 32 =  7968
  T(64) = 4 * 7968 + 7 * 64 = 32320

Here we assumed d = 7 (estimating 1 * n for each linear time operation, counting the construction of the numbers x_0, x_1, y_0 and y_1 as one operation). Quite soon one starts to notice that actually this additional term c * n does not matter a lot. The main development is determined by the factor 4. Which function returns a four times larger value when taking twice as large an argument? How about n^2? Indeed, this algorithm has running time O(n^2).

Let us try to prove this, taking T(n) = a * n^2, and see whether it works, and for which value of d. Our estimate should be exact or an overestimate, so substitution should give:

  a * n^2 <= a * 4 * (n / 2)^2 + 7 * n = a * n^2 + 7 * n.
This does not work! There is no choice for the parameter a for which this relation is true. Bad luck, we apparently estimated T(n) wrong. Still we feel that the quadratic development is essentially correct.

The second thing one might try, motivated by the nature of the recurrence relation, is a polynomial of degree 2. That is, an expression of the form f(n) = a * n^2 + b * n + c, for some constants a, b and c. Any polynomial of degree d is entirely determined by its values in d + 1 points. So, assuming that T(n) = f(n), the parameters of f() can be determined by using T(1) = 1, T(2) = 18 and T(4) = 100. This gives three equations with three unknowns:

1 * a + 1 * b + 1 * c = 1,
4 * a + 2 * b + 1 * c = 18,
16 * a + 4 * b + 1 * c = 100.
The solution is a = 8, b = -7, c = 0, so f(n) = 8 * n^2 - 7 * n. This does not yet prove that T(n) = f(n), for all n, but if T(n) is given by a polynomial of degree two at all, then it must be given by this function f(n). Checking that T(n) = f(n) for all n goes by induction. The basis is satisfied, because (due to the choice of the parameters) T(n) = 1 = f(n). So, assume T(n) = f(n) for some n, then T(2 * n) = 4 * T(n) + 7 * (2 * n) = 4 * f(n) + 7 * (2 * n) = 8 * (2 * n)^2 - 7 * (2 * n) = f(2 * n). This completes the proof.

Karatsuba's Algorithm

Now we have found a good estimate for the time consumption, but what is this all about? Where is the gain? Performing the multiplication this way, there is no gain. However, we can also do the following:
  superlong prod(superlong x, superlong y, int n) {
    /* add(x, y) adds x to y,
       shift(x, n) shifts x leftwards n positions */

    if (n == 1)
      return x * y /* Product of ints */

    if (n is odd)
      add a leading 0 to x and y and increase n by 1;

    compute x_1, x_0, y_1, y_0 from x and y;

    xy_11 = prod(x_1, y_1, n / 2);
    x_sum = add(x_1, x_0);
    y_sum = add(y_1, y_0);
    xy_ss = prod(x_sum, y_sum, n / 2);
    xy_00 = prod(x_0, y_0, n / 2);

    xy = xy_00;
    xy = add(xy, shift(xy_ss, n / 2));
    xy = subtract(xy, shift(xy_00, n / 2));
    xy = subtract(xy, shift(xy_11, n / 2));
    xy = add(xy, shift(xy_11, n)); 

    return xy; }

So, we compute x * y as x_0 * y_0 + (x_1 + x_0) * (y_1 + y_0) * 10^m - x_0 * y_0 * 10^m - x_1 * y_1 * 10^m + x_1 * y_1 * 10^n, which is just right. Clever or not? Let us write the time expression again.

  T_prod(n) = 3 * T_prod(n / 2) + c * n
  T_prod(1) = 1
Here c is somewhat larger than before. Estimating the cost of all linear-time operations as before gives c = 11. What matters much more is that now there are only three calls of the form prod(..., n / 2), giving 3 * T_prod(n / 2) instead of 4 * T_prod(n / 2).

Let us look at a few numbers again:

  T(1)  = 1
  T(2)  = 3 *    1 + 11 *  2 =    25
  T(4)  = 3 *   25 + 11 *  4 =   119
  T(8)  = 3 *  119 + 11 *  8 =   445
  T(16) = 3 *  445 + 11 * 16 =  1511
  T(32) = 3 * 1511 + 11 * 32 =  4885
  T(64) = 3 * 4885 + 11 * 64 = 15359

Again the development is dominated by the multiplication, so essentially, when doubling n, the time is multiplied by three. That is just what happens for the function n^{log_2 3}. Guessing, inspired by the solution of the above recurrence relation, that the solution has the form f(n) = a * n^{log_2 3} + b * n, and b are determined by the T(1) = 1 and T(2) = 25. This gives
1 * a + 1 * b = 1,
3 * a + 2 * b = 25.
The solution is a = 23 and b = -22, that is, f(n) = 23 * n^{log_2 3} - 22 * n. Proving that T(n) = f(n) for all n goes by induction again. The assumption is ok for n = 1, because T(n) = 1 = f(n). Assuming T(n) = f(n), we get T(2 * n) = 3 * T(n) + 11 * (2 * n) = 3 * f(n) + 11 * (2 * n) = 23 * (2 * n)^{log_2 3} - 11 * (2 * n) = f(2 * n).

Using the improved recursive algorithm, known as Karatsuba's algorithm, two n-digit numbers can be multiplied in O(n^{log_2 3}) = O(n^{1.58...}) time.

Comparison

Even though the leading constant (based upon the coarse estimates made above) hidden in the estimate O(^{log_2 3}) is almost three times as large as the leading constant in the O(n^2) complexity of the basic recursive algorithm, n^1.58 is so much smaller than n^2 for large n, that Karatsuba's multiplication algorithm may be expected to be faster than conventional multiplication already for moderate values of n.

An experiment is the only way to verify such an expectation. We have implemented several variants:

The program is written in C. All algorithms are somewhat optimized without going to the bottom. Divisions and modulo computations are replaced by biwtwise operations which can be performed in one clock cycle. Particularly for Karatsuba's algorithm the time might be further reduced by reducing the amount of copying. Tests have been performed on a computer with a 2.66 GHz Pentium IV processor running Linux.

  log_2 n  school  opt. school  opt. recrs  Karatsuba
     10     16E-3      17E-3       13E-3      11E-3
     11     48E-3      41E-3       44E-3      16E-3
     12    192E-3     150E-3      189E-3      44E-3
     13    781E-3     608E-3      809E-3     191E-3
     14   4644E-3    5220E-3     4625E-3     605E-3
     15  24894E-3   19073E-3    23936E-3    2255E-3
     16    176E+0     109E+0      118E+0    7634E-3
     17    660E+0     454E+0      486E+0   25619E-3
     18   4527E+0    2652E+0     1965E+0   72144E-3
     19  17394E+0   12740E+0     7818E+0     218E+0
     20  70936E+0   53073E+0    31616E+0     787E+0
The recursive algorithm without optimization is several times slower than any of the other algorithms. This is mainly due to the additional overhead at the lowest levels of the recursion. Karatsuba's algorithm is the best of all. For small n the difference is not that big, but for n = 2^20 it is almost 100 times faster than the standard implementation of the school algorithm. For n which are so large that n integers do not fit in the first- or second-level cache, the performance of all algorithms deteriorates. This effect is strongest for the school algorithm because it does not have a local character. For large n the improved method is 30% faster. This difference is mainly due to the reduced number of cache faults. The optimized recursive algorithm and the blocked algorithm are about equally good, for n >= 2^18 they are more than twice as fast as the standard algorithm.

As usual, the times increase slightly faster than predicted theoretically. This cannot be explained by considering the lower-order terms, but is rather caused by slower memory access when working on larger data sets. For the school method the development of the running time can be estimated as t(n) ~ n^{2.18}. For Karatsuba we get t(n) ~ n^{1.66}.

Integer Product

Factorials

A well-known elegant expression for n! is recursive:
0! = 1
n! = n * (n - 1), for all n > 0.

This formulation can immediately be turned into a recursive algorithm:

  int fac_1(int n) 
  {
    if (n == 0)
      return 0;
    return n * fac(n - 1);
  }

This is elegant and correct, but not terribly efficient because of the overhead due to making procedure calls. The time consumption T_fac(n) is given by the recurrence relation

T_fac(0) = b,
T_fac(n) = T_fac(n - 1) + a, for all n > 0.
It is easy to guess and prove that the solution of this is given by T(n) = a * n + b. Thus, T(n) = O(n).

Linear time consumption is also achieved by the following simple iterative implementation:

  int fac_2(int n) 
  {
    int f = 1;
    for (int i = 2; i <= n; i++)
      f *= i;
    return f; 
  }

Fibonacci Numbers

Another well-known example of a function that is mostly defined recursively is the row of Fibonacci numbers (named after a 13th century Italian mathematician):
fib(0) = 0,
fib(1) = 1,
fib(n) = fib(n - 1) + fib(n - 2), for all n > 1.

Turning this formulation into a recursive algorithm gives

  int fib_1(int n) 
  {
    if (n == 0)
      return 0;
    if (n == 1)
      return 1;
    return fib(n - 1) + fib(n - 2); 
  }

This program is clearly correctly computing the function. The time consumption T_fib(n) is given by the recurrence relation

T_fib(0) = b,
T_fib(1) = b,
T_fib(n) = T_fib(n - 1) + T_fib(n - 2) + a, for all n > 1.
Because the constants a and b are positive, it can be easily shown by induction that T_fib(n) >= fib(n), which is not good, because fib(n) is exponential. More precisely, fib(n) ~= alpha^n, for alpha = (1 + sqrt(5)) / 2 ~= 1.61. So, for n = 100, there are already 2.5 * 10^20 calls, which take years to perform.

The following simple non-recursive algorithm has linear running time, computing fib(100) goes so fast that you cannot even measure it (but it does not fit in a normal 32-bit integer):

  int fib_2(int n)
  {
    int x, y, z, i;
    if (n == 0)
      return 0;
    if (n == 1)
      return 1;
    for (x = 0, y = 1, i = 1; i < n; i++)
    // Invariant: y = fib(i); x = fib(i - 1)
    {
      z = x + y;
      x = y;
      y = z;
    }
    return y;
  }

Even a recursive algorithm can be made to run in linear time, but the formulation lacks the typical elegance of recursive algorithms:

  int rec_fib(int* f0, int n) 
  {
    int f1, f2;
    if (n == 1)
    {
      *f0 = 0;
      return 1;
    }
    f1 = rec_fib(f0, n - 1);
    f2 = *f0 + f1;
    *f0 = f1;
    return f2; 
  }

  int fib_3(int n) 
  {
    int f;
    if (n == 0)
      return 0;
    return rec_fib(&f, n);
  }

The three subroutines for computing Fibonacci numbers integrated in a running C program can be downloaded here. Testing for increasing values of n clearly shows that the two versions with O(n) running time are ready immediately, while for n > 30, the simple recursive version starts to take noticeable time.

The examples in this section show that often recursion is elegant, and sometimes iterative alternatives are much harder to write and understand. However, using recursion always implies a certain extra overhead and a careless use of recursion may even be fatal for the performance.

Exercises

  1. Prove that sum_{k = 1}^n (k - 1) / k! = 1 - 1 / n!, for all n >= 1. Notice that this provides an independent proof that sum_{k >= 1} (k - 1) / k! = 1.

  2. Let the operators "<<<" and "===" between functions be defined as follows: f <<< g if and only if f = o(g) and f === g if and only if f = Theta(g). Use this notation to order the following functions by growth rate: e^n, n, sqrt(n), n^3, 800 * n^2, n * sqrt(n), log^2 n, loglog n, n * log n, 8^{sqrt(n)}, 2^n, 160, 2^n - n^3.

  3. Suppose the functions T_1 and T_2 are bounded by the same function f, that is, T_1(n) = O(f(n)) and T_2(n) = O(f(n)). Which of the following relations are true?
    1. T_1(n) - T_2(n) = o(f(n))
    2. T_1(n) + T_2(n) = O(f(n))
    3. T_1(n) / T_2(n) = O(1)
    4. T_1(n) * T_2(n) = O(f^2(n))
    5. T_1(n) = O(T_2(n))
    For each relation either give a proof or counter example. Hint: first realize well what exactly O() means.

  4. Which of the following two function grows faster: 10 * n * log n, or n^{1 + eps / sqrt(log n)}, for any constant eps > 0?

  5. Find two functions f(n) and g(n) so that neither f(n) = O(g(n)), nor g(n) = O(f(n)). The functions must be such that for all n > 0, f(n), g(n) >= 1.

  6. Prove the following relations
    1. n^c = o(a^n), for any constant c and any a > 1.
    2. a^n = o(b^n), for any a < b.

  7. Give an exact expression for f_a(m, n) = sum_{i = m}^n a^i which is valid for all n >= m and prove it directly by using induction. Now compute lim_{n -> infty} f_a(m, n). Perform the same tasks for g_a(n) = sum_{i = 0}^n i * a^i. Here m and n are positive integers and 0 < a < 1.
  8. Give an exact expression for f(n) = sum_{i = 0}^n i^2 and prove the correctness of this expression using induction.

  9. Consider the non-recursive version of the binary-search algorithm. Prove that after k passes of the loop i_hgh - i_low <= n / 2^k, for all k >= 0.

  10. Rewrite the non-recursive version of the binary-search algorithm without the additional variables i_low and i_hgh. Instead n should indicate the number of elements on which the algorithm is still working and a[], interpreted as a pointer, is adjusted, just as in the recursive algorithm.

    Create two programs: one based on the non-recursive and one based on the recursive variant of binary search. In each program, generate arrays of length 2^k, for k = 12, 16 and 20, containing all numbers starting from 0. Test for all numbers in the range whether they occur, measuring how long this takes in total. Compute the time per operation and compare the times for the recursive and non-recursive variants.

  11. Consider the recursive algorithm exponent_3 for computing exponents. Let f(n) denote the number of multiplications for computing x^n. Let z(n) and o(n) denote the number of zeroes and ones in the binary expansion of n, respectively. Prove, using induction, an exact expression of f(n) in terms of z(n) and o(n).

  12. We consider integer division. For example we want to compute 1,876,223,323 / 566,264 without using the build-in division routine. On the other hand, we may use comparison, addition, subtraction and multiplication with one-digit numbers, which are substantially easier.

    1. Write down a detailed algorithm for the school method for division (long division) in terms of the above primitives. For computing x / y, one repeatedly takes a new digit from the most significant side of x until the number z we have is larger than y. Then we try how many times y fits in z. This number, u, is written away and z is reduced by u * y. We continue until there are no more digits in x to process.
    2. Let n and m be the number of digits of x and y, respectively. What is the complexity of your algorithm? In your analysis, you may assume that multiplying a one-digit number with a number of length l takes O(l) time and that also adding, subtracting or comparing two numbers with at most l digits takes O(l) time. The expression certainly involves n but may also involve m. So, you are asked to give an expression of the form O(f(n)) or of the form O(f(n, m)), for some suitable function f() or f(,).
    3. In the above algorithm you are probably computing the product of y with the numbers 1 to 9. For numbers with radix 10 this is not serious but if we were working with radix b, for some large number b (on a 32-bit computer b = 2^16 is a good choice), it would matter a lot. Modify the cost expression so that it takes a possible dependence of b into account.
    4. In the following we consider how to reduce the dependence of the time consumption on b. A first idea is to compute the numbers i * y, 0 <= i < b, once at the beginning of the algorithm and to store these in a table. Upon need these can be accessed in constant time. This important idea is called table look-up, which is a general technique to prevent frequent recomputation.
    5. Now the numbers i * y can be obtained efficiently, but they still have to be compared with z. How about comparing two large numbers? Describe an algorithm for determining whether z < v or not. Only in exceptional cases the algorithm should use time linear in the length of these. Here we assume that we have access to the individual digits of the numbers in constant time and that we know how many non-zero digits each of them has. So, we assume that for a number w, w.length denotes its length and that w_i denotes the value of digit i, digit 0 being the least significant one. Notice that in the case of our application, comparing the numbers i * y with z, there is at most one value of i for which the comparison is expensive, because i * y and (i + 1) * y are rather different.
    6. Add the modifications to the above algorithm and give the new expression for its complexity in terms of n, m and b.
    7. Suggest a further improvement of the procedure for determining how many times y fits in z, leading to a final reduction of the dependence on b.

  13. We have seen that two n-digit numbers can be multiplied in O(n^{1.58}) time. The above division algorithm requires O(n^2). Is division really so much harder than multiplication? The answer is: no. One clever algorithm for division is quite different and exploits our multiplication skill. The most important reduction is to rewrite x / y as x * (1 / y). Here 1 / y is computed as a floating point number in sufficient precision to assure that multiplying it by x gives a deviation of less than 1. This idea reduces the problem of general division to computing reciprocals.

    The problem of computing reciprocals can be solved by a method called Newton iteration. It starts by making a coarse estimate e_0 of the value 1 / y to compute. Then it computes e_1, e_2, ..., giving better and better estimates of 1 / y. These e_i are computed according to the following rule (which is presented here in an ad hoc way, but which is actually a special case of a more general method for computing functional inverses):

    e_{i + 1} = 2 * e_i - e_i * e_i * y.
    It can be shown that the number of correct positions doubles in every iteration (but starting with a very bad initial value it may take a while before we have one correct digit or it may not converge at all).

    1. Describe how to obtain a value e_0 satisfying e_0 <= 1 / y < 10 * e_0. Your procedure should run in O(n) time, where n gives the number of decimal positions in y.
    2. Give the values of e_0, ..., e_7 and 1 / y for y = 362.
    3. How many iterations must be performed to approximate 1 / y sufficiently accurate? Your answer should have the form O(f(n, m)), for some suitable function f(,).
    4. Give the complete algorithm for computing x / y in C-like pseudocode. You may use all the needed subroutines (addition, multiplication, comparison, ...) without specifying them.
    5. Let T_prod(l) denote the time for multiplying two numbers of maximum length l. Give an expression of the time consumption of your algorithm in terms of T_prod.
    6. How much harder is division than multiplication?

  14. Let T(n) be defined by T(0) = T(1) = b and T(n) = T(n - 1) + T(n - 2) + a, for all n > 1, where a, b > 0. Prove that T(n) >= fib(n) for all n >= 0.





Structured Programming: C

In this chapter we consider some of the most important aspects of the programming language C. Some of these are very typical for C, most of them appear explicitly or implicitly in all other modern programming languages. Both C++ and Java can be viewed as extensions of C, though in these languages there is a different conception of good style. We do not strive for completeness in any way. We mention only the most important data types and commands. The purpose of this chapter is to provide a basic subset of C which allows to write simple programs. Complete programs are provided as examples and these can be used to formulate your own programs by modification.

Any program is created inside an editor, it does not matter which one is used. It is common, maybe even required, to give programs names which end in ".c", so for example "my_program.c". Then within a unix-like environment, the program is compiled by calling one of the available compilers followed by the name of the program and compiler options. A compiler is a program which tests the source code and, when it does not find obvious errors, translates it into machine code.

Available compilers are at least gcc and cc. gcc is a real C compiler, cc is a C++ compiler. These compilers do not generate the same code and there is no guarantee that a program which runs when compiled with gcc also runs when compiled with cc or vice versa. There are two main reasons for this:

Compiler options can tell the compiler all kind of useful things: the degree of optimization, the file the code should be written to, the amount of warnings that should be printed, how tolerant the compilation should be performed, which libraries should be loaded, etc. A possible compile command looks like gcc my_program_c -O3 -Wall -ansi -o my_executable Here "-O3" means that we want optimized code, "-Wall" means that we want to hear all warnings, "-ansi" means that we want strict enforcement of the ansi rules, "-o my_executable" indicates that the compiled code should be written to the file my_executable.

Of course, most programs, especially those of beginning programmers, contain syntactical errors when they are compiled for the first time. Syntactical errors are deviations from the syntax. The compiler is checking whether all syntactical rules have been applied and only when there are no violations an executable is generated. If there are no syntactical errors, there may nevertheless be all kind of other errors. Possibly the programming is crashing at runtime, because one is performing a division by zero or running out off an array. In this case we say that the program has runtime errors. But even if the program has neither syntactical nor runtime errors, it does not need to be correct. Turning on the warnings with -Wall will find some of the non-syntactical errors, for example it detects that one is using an uninitialized variable, but no software whatever clever, can detect that the program is not fulfilling the specification (unless the program is fed with the specification).

With help of a compiler the source code of a program is translated to executable machine code.

Primitive Data Types

The most important primitive data types are
char:
one byte, can also be used in a numerical way. The constants of this type, that is the symbols, are denoted between a pair of closing-quote symbols, "'": 'f', 'g', '8', ... .
int:
integral numbers in twos complement, mostly of size four bytes. The constants of this type are written as usual numbers: 567227, -12, 0, ... . With special symbols it can be specified that numbers are given in a different number system than decimal.
long:
integral numbers in twos complement, of size four or eight bytes The constants are written in the same way as ints.
float:
floating point numbers, typically of size four bytes. The constants of this type can be written in many different formats. For example as 1.2345, 1., 786.34e-25, ... . If one wants that the number is interpreted as a float (in a division for example) it should not look like a correct int, so it must contain a "." or an "e".
double:
floating point numbers, typically of size eight bytes. The constants are written in the same way as floats.
string:
Text sequences. This is not really a type. There are constants of this type. These look like "hello what is your name?", so a sequence of chars enclosed by a pair of double-quote symbols, """, but one cannot define a variable of string. Instead one should use an array of char.
There is no special type "boolean". Any of the numerical types can be used for this, char, interpreted as a number, is most suitable. F corresponds to 0, T to all other values. The types char, int and long also have unsigned versions. So, one can write "unsigned char c = 178".
C provides several primitive numerical types and characters, but no primitive types for strings or booleans.

Trivial Program

The following program reads two values from input, computes their product and prints the result.
  #include "stdio.h"
  int main() {
    int a, b, c;
    printf("Give the value of a   >>>   ");
    scanf("%d", &a);
    printf("Give the value of b   >>>   ");
    scanf("%d", &b);
    c = a * b;
    printf("The product of a and b equals %1d\n", c);
    return 1; }
Except for the (conditional) jump statements and calls to subroutines which are discussed later, a program is executed in linear order, starting with the first line of the section called "main".

The first line tells that before the actual compilation is started, routines from the library stdio.h must be loaded. This library contains IO routines. Other libraries which one may need are math.h (mathematical routines such as exp, cos and sqrt), sys/time.h (routines and types for determining setting clocks), string.h (routines for manipulating strings). Using mathematical routines does not only require the inclusions of math.h, but also the compiler option -lm. So, then the compilation command might look "gcc my_prog.c -lm".

Then the program starts in the section main. The "{" indicates the beginning of the text. The end is indicated by "}". In between we find lines of code, giving several instructions. Each instruction is ended with a ";". The piece of code "int main()" is called the program header. It tells us that the return value of the program is an int, and, in this case, that the program does not have external parameters.

Following the header, it is told which variables we are going to use: "int a, b, c;". Such a statement is called a declaration. This means that we declare three integer variables. Later we can use these names to store and retrieve integer values. Variables of other types are declared in a similar way. It is quite common to write the declarations at the beginning, but in C one can also write the declarations where they are needed. In any case: a variable must be declared before its first usage; otherwise the compiler would not know how much memory to allocate for the variable.

After the variable declaration, we find a statement producing a line of output. The format of IO statements varies strongly from language to language. C is quite convenient. In a print statement, the text to print is enclosed between a pair of """ symbols. Possibly we want to print the value of a variable. In that case it is indicated what kind of variable it is and how much space one can use for it by a combination like "%1d". Here the "%" indicates "take care, here a variable value must be inserted". The "1" indicates that the default number of used positions is 1, always using at least the number of positions the number actually requires. The "d" indicates that here we are going to print an integer. For a float one uses "f", for a char one uses "c", for a string one uses "s".

Reading a value is analogous. Only now we must precede the name of the variable by the symbol "&". In the section on procedures we will see that this symbol means that we are passing the address of the variable, which allows to return a value into it. Several values can be read in a single statement: "scanf("%d%f", &a, &x);" can be used to read an integer a and a float x.

The statement "c = a * b" is an example of a simple computation followed by an assignment. In an assignment the value of an expression on the right is assigned to a variable on the left. All of the usual operators are available in C: +, -, *, /, % for manipulating numbers; |, & and ^ for bitwise operations; ||, && and ! for boolean operations; <, <=, ==, !=, >= and > for comparison. The first four numerical operators are defined for all numerical types, but the division has different meaning: on integers "/" is division while throwing away the non-integral part of the result. So, 11 / 3 == 3. "%" is only defined for integers. It returns the remainder of the division, 11 % 3 == 2. Said otherwise, "%" corresponds to a modulo computation. Finally there are the shift operators >> and <<. They work on integers. a >> b, returns the value of a when shifting its bit pattern rightwards over b positions, throwing away the least significant b bits. Thus, 55 >> 3 == 6, while 55 == 110111 and 110 == 6. a >> b is equivalent to a / 2^b, but it can be performed faster because it is one of the primitive operations of most processors. In the same way a << b returns the value of a shifted leftwards over b positions. It is equivalent to a * 2^b. 6 << 3 == 48, because 6 == 110 and 110000 == 48. Notice that (a >> b) << b in general does not give a again. On the other hand, (a << b) >> b == a, provided that no overflow occurs.

There are even some special operators which are merely shorthands for combinations of the above operators:

  i++;      <->  i = i  + 1;
  i--;      <->  i = i  - 1;
  i  += j;  <->  i = i  + j;
  i  -= j;  <->  i = i  - j;
  i  *= j;  <->  i = i  * j;
  i  /= j;  <->  i = i  / j;
  i  %= j;  <->  i = i  % j;
  i  &= j;  <->  i = i  & j;
  i  ^= j;  <->  i = i  ^ j;
  i  |= j;  <->  i = i  | j;
  i <<= j;  <->  i = i << j;
  i >>= j;  <->  i = i >> j;
One may have different opinions on whether it is good style or not to use these. However, one should never believe that writing the source code more compactly will lead to shorter and faster compiled code: for the compiler there is really no difference between "i += 4" and "i = i + 4", and therefore this will result in exactly the same code.

All operators belong to 1 of 15 priority levels. Any book on C provides a complete table. Here we give a shortened version:

15 bracketlike
 ()   []  . 
normal
14 unary operators
 ++   --  !  +  -  *  & 
reversed
13 multiplicationlike
 *  /  % 
normal
12 additionlike
 +  - 
normal
11 shifts
 <<  >> 
normal
10 comparisons
 <  <=  >  >= 
normal
9 equalitylike
 ==  != 
normal
8 ... 4 logical
 &  ^  |  &&  || 
normal
2 assignment
 =  +=  -=  *=  /=  %=  &=  ^=  |=  <<=  >>= 
reversed
1 comma
 , 
normal
Here in every row we first indicate the priority (higher priority operators are executed first), then the operators and finally the execution order for operators in the same priority class. "Normal" indicates execution from left to right, "reversed" indicates execution from right to left. One must not know all this. In case of doubt one should rather use brackets than look it up: the compiled code does not become longer because of this, but it becomes much easier to understand the program!

The last line of the program is "return 1;". Not all compilers require this, but some of them insist that "main" is returning an integer value. That is why we have written "int main()". Any value will do. The value can be used to flag where the program is left.

The execution of any C program starts at "main". The usage of any variable must be preceded by a declaration. In expressions it is essential to assure the correct execution order.

Structured Data Types

The most important structured data types are given by arrays. An array a of int of length n is declared by writing "int a[n];" Upon declaration the value of n must be known. n can also be replaced by a constant. So, the array construction is indicated by the bracket pair "[", "]". Examples of the usage of arrays will be given further down. If we have two arrays declared by "int a[100], b[100]", then it is not wrong to write "a = b" (actually the compiler forbids to do so, though he should know better), but the result is not that the values of b[] are copied into a[] as one might think. The reason is that the variables a and b are actually pointers to arrays rather than arrays. This issue is discussed in more detail further down. All manipulations on an array are done by manipulating the individual positions. So, if we want to copy the values of b[] into a[], we must write
  a[0] = b[0];
  a[1] = b[1];
    ...
  a[99] = b[99];
In the following section we will see how to write this more concisely.

Compound types can also be declared. In general this is a very important construction, but in our applications of C we will rarely encounter it: we use C for simple programs which should be ready as soon as possible and/or which should run very fast. For more elaborate programming an object-oriented language, such as C++ and Java, is much more suitable. Therefore, we only indicate a few aspects of this constructions, further details can be found in any book on C.

For example for playing cards we can write

  struct {
    char color;
    int  value; } my_card;
This declares a variable my_card with two fields: a character and a value. A correct assignment is "my_card.value = 11", setting the value field of my_card to 11. Accessing the fields of a compound variable is done in most languages with help of the dot operator ".".

This idea becomes much more useful, when we define a new compound type, so that we can reuse it for several declarations:

  struct playing_card {
    char color;
    int  value; };
Hereafter, we can write:
  struct playing_card my_first_card;
  struct playing_card my_second_card;

We can go one step further by using the keyword "typedef" for defining special types. This can be integrated into the struct definition, but there is no need to do so. If we write

  typedef struct playing_card playing_card_type;
then we can write later in the program
  playing_card_type my_card;
  playing_card_type deck_of_cards[10];
The latter declares an array of playing_card. After this declaration we can write "deck_of_cards[8].value = my_card.value;".

Unlike with arrays, and this is not very intuitive, writing "deck_of_cards[8] = my_card" copies the values of the fields of my_card into the values of the fields of deck_of_cards[8]. Consider the following program:

  int main() {
    struct playing_card {
      char color;
      int  value; };
    typedef struct playing_card playing_card_type;
    playing_card_type my_card;
    playing_card_type deck_of_cards[10];
    my_card.color = 's';
    my_card.value = 11;
    deck_of_cards[8] = my_card;
    my_card.color = 'h';
    my_card.value = 6;
    printf("deck0 = %10X, deck8 = %10X, my_card = %10X\n", 
      deck_of_cards[0], deck_of_cards[8], my_card);
    printf("deck_of_cards[8]: (%1c, %1d)\n", 
      deck_of_cards[8].color, deck_of_cards[8].value);
    return 1; }
The output reads
  deck0 =   FFBEECD0, deck8 =   FFBEECC8, my_card =   FFBEECC
  deck_of_cards[8]: (s, 11)

Nevertheless, one should be very careful with such assignments. In Java this would not have worked! There a struct variable is actually a pointer and the assignment would result in deck_of_cards[8] pointing to the same object as my_card. The memory space which deck_of_cards[8] was originally pointing to would have become unreachable by this.

Explicitly defining types with help of "typedef" makes it much easier to work with more complex types. Consider for example the declaration of the array playing_card. It is now entirely analogous to declaring an array of int by writing "int a[10]".

Arrays and structs are the most important derived data types. One should be very careful with assignments of derived types. Explicitly defining new types is an important structuring step.

Basic Instructions

A program consists of specifying the data and data structures that are going to be used and operations performed on them. Assignment is the most important category of instructions. The second most important category are conditional statements: testing whether the value of a boolean clause is true or false, and proceeding in different ways in reaction to the outcome. The third one definitely needs is some kind of jump statement: an instruction to proceed at another place in the program. These three kinds of statements, "=", "if-then-else" and "goto", are all one needs to write programs.

Working with gotos however tends to lead to programs which are very hard to follow (so-called "spaghetti code"). Therefore, this instruction has been banned (though it is still available even in most modern languages). Instead of the unconditional goto jumps, there are conditional jumps provided by the if-then-else mechanism and by the repetitive statements: statements telling that a block of code must be executed one or more times repeatedly until a boolean condition at the beginning or end becomes false. In the following we describe these instructions for C with tiny variations they can be found in other computer languages.

If Statement

The if statement has two forms. The first only contains a then-part:
  if (a == 1000)
    a = 0;
The keyword "then" is not written (C likes to save writing, even when this goes at the expense of clearness). The boolean condition, is written between round brackets.

The operational semantic of the if statement is that if evaluating the boolean condition results in true, in the example this happens when the variable a equals 1000, that then the following statement is executed and otherwise this statement is skipped.

The if statement only applies to the statement immediately following it. If one wants to apply it to several statements, these should be compounded by writing them between curly brackets:

  if (a == 1000) {
    a = 0;
    b++; }
The whole section of the program within the curly brackets is also referred to as a compound statement.

The spacing and the placement of the brackets is not prescribed (though clearly it has a strong impact on the readability). The above code can also be written as

  if (a == 1000) 
  {
    a = 0;
    b++; 
  }
Generally one should choose one fixed way of writing and stick to it. But, their are some conventions which are widely accepted. The most important is indentation, which means that for every level the program opens (like going into an if statement), the code is shifted right a fixed number of positions. 2 positions is a convenient choice. It is also a good idea to surround all binary operators, including "=", by a blank and to add a blank between if and other operators and the following conditional. It is a very good idea to put matching curly brackets above each other as is done in the last example: this highlights the structure of the program and you will never have to search for the corresponding bracket. Some people even write curly brackets around a single statement. Even this is a good idea, because if later you (or someone else editing your program!) are adding a statement you cannot forget to add the brackets. Consider the following:
  if (a == 1000) 
    a = 0;
    b++; 
Even this is a correct fragment of C code. The compiler is not so clever to notice the indentation and to guess that you actually forgot to write the brackets. However, the meaning of this is different from the fragment above: here the statement "b++;" is always executed.

The second form of the if statement also has an else-part:

  if (a == 1000)
    a = 0;
  else {
    a += 4;
    s += a[i]; }
The if statement is the main conditional statement.

While Statement

An if statement is executed and then the program goes on with the statements following the end of the else-part (or following the end of the then-part if there is no else-part). The (compound) statement following a while statement is executed as long as evaluating a boolean condition returns true. An example is the following piece of code which computes the largest non-zero bit of a number:
  i = 0;
  while (n > 0) {
    i++;
    n >>= 1; }

Notice that the (compound) statement following the while ( ... ) may be executed zero times. If we start with n <= 0, the compound statement is not executed at all. If there is something important happening there, like a declaration or an initialization, we may run into trouble further down in the program.

The while statement has a second form with the conditional at the end of the iterated statement:

  int main() {
    int a, b;
    do {
      printf("\nGive the next a and b value to multiply   >>>   ");
      scanf("%d%d", &a, &b);
      printf("a * b = %8d\n", a * b);
      printf("Continue? (type 0 to stop, 1 to continue) >>>   ");
      scanf("%d", &a); }
  while (a == 1); 
  printf("\n"); 
  return 1; }
A possible session looks like

  Give the next a and b value to multiply   >>>   24 5
  a * b =      120
  Continue? (type 0 to stop, 1 to continue) >>>   1    
  
  Give the next a and b value to multiply   >>>   7 8
  a * b =       56
  Continue? (type 0 to stop, 1 to continue) >>>   1
  
  Give the next a and b value to multiply   >>>   4 5
  a * b =       20
  Continue? (type 0 to stop, 1 to continue) >>>   0

Here we really want that the loop is executed at least once, and therefore the usage of the do-while construction is sensible.
The while statement offers an elegant way to realize a conditional repetition.

For Statement

Any iterative statement is informally called a loop. They are distinguished by prefixes. So we may for example speak of a while loop or a for loop.

The for statement is a variant of the while statement most appropriately used in cases where a variable is repeatedly increased by a fixed amount, especially in combination with operations on arrays. The following program initializes all fields of an array and then computes the sum of all values.

  #define SIZE 100
  int main() {
    int i, a[SIZE], sum;
    for (i = 0; i < SIZE; i++)
      a[i] = i;
    for (i = 0, sum = 0; i < SIZE; i++)
      sum += a[i];
    printf("The average value is %5.2f\n", (float) sum / SIZE);
    return 1; }
The output is
  The average value is 49.50

The above program contains several new concepts. First we use "#define" to set the value of a constant. The preprocessor simply replaces all occurrencies of SIZE by the value 100. So, at runtime this is just as fast as when writing 100 everywhere, but it nevertheless gives us the possibility to later change the array size in a simple way. We also see how to print a floating point number. Here we specify that it should take at least 5 positions in total, of which exactly 2 are used for the decimal positions. Then in "(float) sum / SIZE" we are dividing two integers. The default way of doing this is to use the integer division, which means that the result is rounded down. Here we want the exact answer (within the limits of the precision of floats), so we want that the floating-point division is applied. Therefore we write "(float)". This is called a cast, which means a forced type conversion: the variable sum is for the time of this statement converted to a float, and the rule is that any operation is performed at the most precise level of any of its arguments: int plus int gives int; int + float gives float; int + double gives double; float + double gives double; ... .

Let us now consider the actual for loops. The first is the simplest. Between the round brackets we find three parts separated by semicolons. The first part is executed once before anything else. The second part is a boolean condition which is evaluated before each execution of the following (compound) statement. The third part is executed after each execution of the compound statement. Thus

    for (i = 0; i < SIZE; i++)
      a[i] = i;
is equivalent to
    i = 0;
    while (i < SIZE) {
      a[i] = i;
      i++; }
The first and the third part may also be empty, thus we might also write
    i = 0;
    for ( ; i < SIZE; ) {
      a[i] = i;
      i++; }
though this cannot be called an improvement. Sometimes it is mainly a matter of taste whether one wants to use a for loop or a while loop. In the second for loop we see that the first part may actually consist of several instructions. These instructions must be separated by the comma operator. The same applies to the third part. Thus, we might also have written
    for (i = 0, sum = 0; i < SIZE; sum += a[i], i++);
This is nice and short, but does not necessarily express so clearly what is happening. Here we see that the following statement in particular may also be empty: No-statement is also a statement! We also see the extreme importance of being precise: it is the semicolon which marks the end of the for loop, without it, the following statement, if any, would have been considered to constitute the body of the loop, giving a quite different computation.
The for statement should be used instead of a while statement when the repetition has a counting nature.

Comments

The last important type of instructions are comments. They are not important for the functioning of the program (they are simply removed at some stage of the compilation, and therefore they have no influence on the length of the compiled code or the speed of it), but they are essential for understanding the program. This is particularly important for programs which you or someone else is going to use again after some time: even the best code is hard to understand without hints on the purpose and functioning. In C comments are enclose between matching pairs of "/*" and "*/" symbols. Inside you can write anything except "*/". Thus, we might have
  /* Written by Jop Sibeyn, 25.10.2002 
     This program computes the average value of the elements 
     of an array */

  #define SIZE 100 /* The length of the used array */

  int main() {
    int i, a[SIZE], sum; 

    /* Initialization */
    for (i = 0; i < SIZE; i++)
      a[i] = i;

    /* The main part of the program */
    for (i = 0, sum = 0; i < SIZE; i++)
      sum += a[i];

    /* Printing the results */
    printf("The average value is %5.2f\n", (float) sum / SIZE);

    /* Finishing */
    return 1; }

You should not write books, but concise explaining phrases are essential. Any complete program of more than 50 lines should be commented.

Comment is essential for understanding a program.

Procedures

A procedure is a subsection of a program performing a specific subtask. Especially if the same subtask is performed at several places in the program, using a procedure helps to save code. Saving code helps to save typing and even the compiled code becomes shorter. This implies that a larger part of the compiled code can be maintained in the cache and that thus the program will run faster. Using a procedure at several places also assures that at each of these places the used code is exactly the same and will remain the same even when performing updates. Even more important is that using procedures allows to structure a program. Like dividing a book in chapters and the chapters in sections. Procedures may be called with parameters and they may terminate by handing back a return value. Procedures returning a value are sometimes called functions. All modern imperative programming languages provide some kind of procedure mechanism. These are very similar.

In C, procedures returning a value have the following form:

  return_type procedure_name(parameter_type parameter_name, ... )
  {
    instruction;
    ...
    instruction;
    return variable_of_return_type;
  }
Here the first line is called the header. Procedures not returning a value look slightly differently:
  void procedure_name(parameter_type parameter_name, ... )
  {
    instruction;
    ...
    instruction;
  }
The special word void stands for no type and the return statement is omited. Inside the procedure new variables may be declared and used in the same way as the parameters.

A procedure is called by writing its name and specifying all parameters:

  my_procedure(parameter_name, ... );
For procedures returning a value it is most common to use this value in some way, for example in an assignment:
  variable_of_return_type = my_procedure(parameter_name, ... );
Procedures offer the possibility to reuse the same piece of code at several places in the program and are the main way of structuring a program in C.

The most important non-trivial aspect of handling procedures, is the status of the parameters. What happens with them inside the procedure? In any case their value outside is available within the procedure, so they are more than just variables: they have a value from the start. Can we assign values to the parameters and what happens then? In most languages one can assign to all parameters. But it depends on the kind of variables whether this effect becomes visible outside the procedure or not. At this point there are differences between computer languages. In many languages, C is among these, the user can specify this, in some languages, Java is one of them, the status of the parameters only depends on their type.

Consider the following two C procedures:

  void local_swap(int lx, int ly) {
    int z;
     z = lx;
    lx = ly;
    ly =  z;
    printf("Values in local_swap:\n");
    printf("  x = %10d,   y = %10d\n",  lx,  ly);
    printf(" ax = %10X,  ay = %10X\n", &lx, &ly); }

  void global_swap(int* rx, int* ry) {
    int z;
      z = *rx;
    *rx = *ry;
    *ry =   z;
    printf("Values in global_swap:\n");
    printf("  x = %10d,   y = %10d\n", *rx, *ry);
    printf(" ax = %10X,  ay = %10X\n",  rx,  ry); }

  int main() {
    int x, y;
    x = 17;
    y = 23;
    printf("Values in main at beginning:\n");
    printf("  x = %10d,   y = %10d\n",  x,  y);
    printf(" ax = %10X,  ay = %10X\n", &x, &y);
    local_swap(x, y);
    printf("Values in main after local_swap:\n");
    printf("  x = %10d,   y = %10d\n",  x,  y);
    printf(" ax = %10X,  ay = %10X\n", &x, &y);
    global_swap(&x, &y);
    printf("Values in main after global_swap:\n");
    printf("  x = %10d,   y = %10d\n",  x,  y);
    printf(" ax = %10X,  ay = %10X\n", &x, &y);
    return 1; }
It is not serious if not all details are clear. Actually we are not doing more than calling two simple procedures and generating some formatted output. Running the program gives the following output:
   x =         17,  y =         23
  ax =   FFBEED2C, ay =   FFBEED28
  Values in local_swap:
   x =         23,  y =         17
  ax =   FFBEED0C, ay =   FFBEED10
  Values in main after local_swap:
   x =         17,  y =         23
  ax =   FFBEED2C, ay =   FFBEED28
  Values in global_swap:
   x =         23,  y =         17
  ax =   FFBEED2C, ay =   FFBEED28
  Values in main after global_swap:
   x =         23,  y =         17
  ax =   FFBEED2C, ay =   FFBEED28
Integer values are given in decimal notation, addresses are printed hexadecimally.

Values of the variables x, xl and xr

Apparently, the exchange of x and y in local_swap does not become visible outside the procedure itself. The reason is that in this case the procedure is called by value: the parameters are nothing else than local variables which are initialized with the values passed when calling the procedure.

On the other hand, the exchange of x and y in global_swap does become visible outside the procedure. The reason is that in this case the procedure is called by reference: when calling the procedure we do not pass the value of the parameters, but their addresses. This is indicated by the address operator "&" (the same symbol is also used as the binary logic operator bitwise and). In the header of the procedure this is reflected in the usage of the symbol "*": the parameter x is not an integer, but a pointer to an integer, also called a reference of an integer. Accessing the value of a pointer type variable can be done by preceding it by the dereferencing operator also called indirection operator "*". In the above example the printed values of the addresses depend on the computer the program is run on and the used compiler.

There is an essential difference in call-by-value and call-by-reference.

Arrays == Pointers ?

In C there is no very strict type management. This is certainly one of the reasons why C leads to somewhat faster programs than most other languages, and it also creates opportunities. At the same time it leads to errors which are hard to detect. A clever programmer therefore handles the offered possibilities very carefully.

Address Arithmetic

An array in C is closely related to a pointer. There are some differences though, which we will discuss here. The typical way of defining an array is as follows:
  int a[100];
This instruction creates 400 bytes of space for storing 100 integers. This space can be accessed with the help of the name a. Actually there is an internal mechanism, which is called address arithmetic, which is used to access the stored values: if we write "b = a[23]", then the value of a is fetched, this is a memory address, namely the address of a[0], then 23 * 4 is added to this value, and the integer at this address (the value represented by the 4 bytes starting at the specified address) is returned. Therefore, the above assignment is equivalent to writing "b = *(a + 23)". The most remarkable feature is that we only need to add 23 and not 4 * 23, because internally it is known that a is an array of integers.

The header of a procedure with an integer argument may be written in two ways. At a first glance the most correct way of doing is to write

  int sum(int a[], int n) {
    int i, s;
    for (i = s = 0; i < n; i++)
      s += a[i]; 
    return s; }
In this way it is explicitly told that the argument must be an integer array. Alternatively, one may write
  int sum(int* a, int n) {
    int i, s;
    for (i = s = 0; i < n; i++)
      s += a[i]; 
    return s; }
Both procedures are correct C.

Exchanging the Values of Arrays

Suppose, now that we want to exchange two arrays a[] and b[]. We do not care where the data are stored, we only want that afterwards a[i] has the original value of b[i] and vice versa, for all 0 <= i < n.

A possible way of doing this is by simply exchanging all values:

  void initialize(int a[], int b[], int n) {
    int i;
    for (i = 0; i < n; i++) {
      a[i] =  i; 
      b[i] = -i; } }

  void simple_exchange(int a[], int b[], int n) {
    int i, c;
    for (i = 0; i < n; i++) {
      c    = a[i];
      a[i] = b[i];
      b[i] = c; } }

  void print_arrays(int a[], int b[], int n) {
    int i;
    printf("\n");
    for (i = 0; i < n; i++)
      printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }

  int main() {
    int n = 10, a[n], b[n];
    initialize(a, b, n);
    print_arrays(a, b, n);
    simple_exchange(a, b, n);
    print_arrays(a, b, n); }
This procedure takes time proportional to n. One might fear that it has the same problem as local_swap above, but that is not the case. The reason is that the array variables a and b are actually pointers and that these pointers are handed over as parameters. The output of the program is:
  a[ 0] =    0, b[ 0] =    0
  a[ 1] =    1, b[ 1] =   -1
  a[ 2] =    2, b[ 2] =   -2
  a[ 3] =    3, b[ 3] =   -3
  a[ 4] =    4, b[ 4] =   -4
  a[ 5] =    5, b[ 5] =   -5
  a[ 6] =    6, b[ 6] =   -6
  a[ 7] =    7, b[ 7] =   -7
  a[ 8] =    8, b[ 8] =   -8
  a[ 9] =    9, b[ 9] =   -9
  
  a[ 0] =    0, b[ 0] =    0
  a[ 1] =   -1, b[ 1] =    1
  a[ 2] =   -2, b[ 2] =    2
  a[ 3] =   -3, b[ 3] =    3
  a[ 4] =   -4, b[ 4] =    4
  a[ 5] =   -5, b[ 5] =    5
  a[ 6] =   -6, b[ 6] =    6
  a[ 7] =   -7, b[ 7] =    7
  a[ 8] =   -8, b[ 8] =    8
  a[ 9] =   -9, b[ 9] =    9

This is nice, but not very efficient. This operation can also be performed in constant time: we do not have to exchange all elements, it is sufficient to exchange the values of a and b. So, afterwards a will point to the first position of b and vice versa. That is, we want to perform a procedure like global_exchange with parameters of type "array of integer". For more complex operations like this, it becomes much more convenient not to mix the array notation with the pointer notation:

  void initialize(int a[], int b[], int n) {
    int i;
    for (i = 0; i < n; i++) {
      a[i] =  i; 
      b[i] = -i; } }

  void fast_exchange(int** aa, int** ab) {
    int* c ;
      c = *aa;
    *aa = *ab;
    *ab =   c; }

  void print_arrays(int a[], int b[], int n) {
    int i;
    printf("\n");
    for (i = 0; i < n; i++)
      printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }

  int main() {
    int n = 10; 
    int* a = (int*) malloc(n * sizeof(int));
    int* b = (int*) malloc(n * sizeof(int));
    initialize(a, b, n);
    print_arrays(a, b, n);
    fast_exchange(&a, &b);
    print_arrays(a, b, n);
    free(b);
    free(a); }
This produces the same output as the program above.

Values and addresses of arrays

Understanding and Working With Pointers

It is important to understand the above program in detail. This time a and b are declared of type int*. Exchanging two variables of type int* is performed entirely analogously to exchanging two variables of type int in global_swap. The only difference is that here all variables have one extra *.

Different from an array declaration, declaring a variable of type int* does not immediately allocate a whole lot of memory. A variable of int* has size four (possibly eight) bytes. The standard procedure malloc is used to allocate memory. The number of bytes is passed as an argument. We might write 4 * n, but then we would explicitly use that integers are four bytes long. Doing this, the program would not work on a more modern system were integers consist of eight bytes. The procedure malloc returns a typeless pointer, void*, which cannot be assigned in a correct way to an int* without forcing the system to do so. Therefore we precede the procedure by "(int*)", enforcing a type conversion of the result. Said otherwise, the result type is cast to int*.

At the end of the program we find the calls to the standard procedure free. This procedure deallocates the memory a pointer is pointing to. Of course, at the end of a program all memory is deallocated, so in this case these statements are superfluous. However, in general it is important to carefully manage the memory making sure that the program does not create garbage: allocated memory which cannot be reached anymore by following any of the pointers. Garbage would be created if at some stage in the program we would write "a = b;" or if we would have a second malloc statement for a. Forgetting to free memory is an important source of problems. Suppose that instead of simple_exchange we were doing the following:

  void stupid_exchange(int a[], int b[], int n) {
    int i;
    int* c = (int*) malloc(n * sizeof(int));
    for (i = 0; i < n; i++);
      c[i] = a[i];
    for (i = 0; i < n; i++);
      a[i] = b[i];
    for (i = 0; i < n; i++);
      b[i] = c[i]; }
Constructions of this kind, reducing the number of different variables in each loop, might have advantages when the cache associativity is low (stupid_exchange is guaranteed to work fine already for a two-way associative cache). However, this procedure leaves behind n * sizeof(int) bytes of garbage. If we are calling this procedure many times, we will run out of memory, even when the program actually needs only a small fraction of it. Every good programmer is so disciplined to match each malloc (or similar operation) with a corresponding free, in the same way any "{" is matched by a corresponding "}".

Now one might become afraid that we have the same problem for the following procedure:

  void not_so_stupid_exchange(int a[], int b[], int n) {
    int i;
    int c[n];
    for (i = 0; i < n; i++);
      c[i] = a[i];
    for (i = 0; i < n; i++);
      a[i] = b[i];
    for (i = 0; i < n; i++);
      b[i] = c[i]; }
But here we touch on the counterpart of the automatic memory allocation of an array: just as the memory is allocated implicitly, it is also automatically deallocated at the end of the procedure.

In particular this means that one should not assign a local array to a pointer variable which is going to be used outside the procedure. Consider the following program:

  void incorrect_initialize(int a[], int n) {
    int i;
    int b[n];
    for (i = 0; i < n; i++);
      b[i] = i;
    a = b; }

  void print_array(int a[], int n) {
    int i;
    printf("\n");
    for (i = 0; i < n; i++)
      printf("a[%2d] = %4d\n", i, a[i]); }


  int main() {
    int n = 10;
    int* a;
    initialize(a, n);
    print_array(a, n); }
This program is syntactically correct. However, when running it, it will probably crash with "segmentation fault", one of the most common errors. It means that the program is trying to access memory which it does not own. Another common error is "bus error". In most cases this also means segmentation fault. So, not withstanding the syntactical correctness, this program has runtime errors.

Working With Very Large Arrays

Above we considered the program for computing the average value of the elements in an array. The array length was specified using the command "#define SIZE 100". Alternatively, one may want to read the size of the array at runtime: otherwise the program must be recompiled for any size we want to test the program for. One possible solution is to define the array only after its size has been read:
int main() {
    int i, size, sum;
    printf("Give size   >>>   ");
    scanf("%d", &size);
    int a[size];
    for (i = 0; i < size; i++)
      a[i] = i;
    for (i = 0, sum = 0; i < size; i++)
      sum += a[i];
    printf("The average value is %5.2f\n", (float) sum / size);
    return 1; }

Nothing speaks against this except the practice: for some reason (which goes back on the place where the memory is allocated), one cannot allocate very large arrays at runtime. On my computer the maximum size is only about 8 MB. This problem can be remedied in two possible ways. The easiest is to declare the array before the start of the program and to use only the fraction which is needed:

  #include "stdlib.h"
  #define MAXSIZE 10000000
  int a[MAXSIZE];
  int main() {
    int i, size, sum;
    printf("Give size   >>>   ");
    scanf("%d", &size);
    if (size > MAXSIZE) {
      printf("Size is too large, exiting the program!\n");
      exit(1); }
    for (i = 0; i < size; i++)
      a[i] = i;
    for (i = 0, sum = 0; i < size; i++)
      sum += a[i];
    printf("The average value is %5.2f\n", (float) sum / size);
    return 1; }

Here we touch on one more important aspect of variables: their visibility. A variable is visible, that is can be used only within a certain scope. A variable which is declared within a procedure can be used anywhere within this procedure, but not in another procedure: it is a local variable. This also means that it is no problem to use the same variable names in several procedures. This is essential! Otherwise, it would be very hard to add a new procedure at a later time to an existing program: one should have a complete overview of all names used anywhere in the program. Variables can be even more local then the procedure: new variables may be defined inside an if, for, while or compound statement. These are invisible outside it. Even these variables may have the same names as other variables within the same procedure. In that case we say that these other variables are shielded. In general shielding a variable is not a good idea, because it is confusing. The other extreme is variables which are not local to any procedure: global variables. In C these are declared at the beginning of the program like the array a[] above. These are visible everywhere. Not only there visibility is different: because the compiler already knows from the start that these variables are going to be there, the memory for them is allocated in a different way than for local variables. This makes that in the above program it is no problem to declare a global array with 10.000.000 ints, whereas this is not possible inside main.

So, the above program works fine, but it is kind of stupid to declare an array of maximum size for just in case. Using malloc gives us the best of both: we can declare very large arrays, without wasting memory when size is chosen small.

  int main() {
    int i, size, sum;
    printf("Give size   >>>   ");
    scanf("%d", &size);
    int* a = (int*) malloc(size * sizeof(int));
    for (i = 0; i < size; i++)
      a[i] = i;
    for (i = 0, sum = 0; i < size; i++)
      sum += a[i];
    printf("The average value is %5.2f\n", (float) sum / size);
    free(a);
    return 1; }

The above program works fine except for possible number overflow. The reason is that the final value of sum = size * (size - 1) / 2, which is too large for size > 100.000. A possibility is to use a long for size. However, on most current computers a long is not longer than an int. An ugly, but practical, solution is to use a double. A double is guaranteed to have 52 bits for its mantissa.

There is a close but imperfect relation between arrays and pointers. Arrays are convenient, but often it is better to use pointers and allocate the memory with malloc and deallocate it with free.

Booleans == Bytes ?

As pointed out above, in C there is no special data type for booleans. Of course by defining a type boolean, this can be repaired, if in addition constants true and false are defined, we can work with logic variables in a convenient way. Adding the following lines to the program is sufficient:
  #define false 0
  #define true  1
  typedef char  boolean;
Here a boolean is defined to be a char. In this way every boolean is stored in one byte (8 bits). Because booleans are bi-valued, in principle a single bit would be sufficient. For single boolean variables this does not matter. But, arrays of booleans offer a natural way of realizing a set data structure: if set[i] is true, then the element with index i is an element of the set, otherwise it is not. If the set is large, then it is undesirable to use a byte for each possible element. The underlying problem is that the byte is the smallest addressable memory unit: the memory is organized as a set of bytes.

The solution is to view a byte not as a number from 0 to 255, but as 8 bits packed together. So, for storing an array of n booleans, we use an array of n / 8 (rounded up) bytes, and store 8 booleans in each of them. The values of the individual bits can be set and read in a constant number (one or two) of clock cycles using the bitwise operations: in most programming languages there are not only instructions to perform operations on booleans, characters and numbers, but one can also perform 8-, 32- or even 64-bit operations in one stroke. Because there is a one-to-one correspondence between bit operations and boolean operations this feature allows to perform 32 or even 64 boolean operations in one clock cycle, provided that all these operations are of the same kind. The language C provides such bitwise operations: bitwise-and, bitwise-or and bitwise-exor. Bitwise-not can be obtained by computing bitwise-exor with FFFFFFFF.

Memory-efficient computation requires that arrays of booleans are packed with 8 booleans per byte. This feature is normally not provided defaultly.

Packing 8 booleans in one byte

In C all this is easy to realize:

  #define ALL_ONE 0xFFFFFFFFu
  typedef unsigned int boolean;

  void set_value(boolean* a, int i, boolean x) {
    /* Assign value x to boolean i of the array. */
    int j = i >> 5; /* This is equivalent to j = i / 32. */
    int k = i & 31; /* This is equivalent to k = i % 32. */
    int l = 1 << k; /* This is equivalent to l = 2^k. */
    if (x)
    {
      a[j] = a[j] | l;
    }
    else
    {
      l = l ^ ALL_ONE; /* This inverts all bits in l. */
      a[j] = a[j] & l;
    } }
    
  boolean get_value(boolean* a, int i) {
    /* Return the value of boolean i of the array. */
    int j = i >> 5; /* This is equivalent to j = i / 32. */
    int k = i & 31; /* This is equivalent to k = i % 32. */
    int l = 1 << k; /* This is equivalent to l = 2^k. */
    return a[j] & l;
For a boolean variable x it is ugly to write something like "if (x == true)", x has a truth-value itself, so one should rather write "if (x)". In the above we heavily use the bitwise operations. If applicable, these are much faster than the equivalent formulations with divisions: all bitwise operations may be assumed to take a single clock cycle, while division is not an elementary operation. The subroutines can be written more compactly as follows:
  #define ALL_ONE 0xFFFFFFFFu
  typedef unsigned int boolean;

  void set_value(boolean* a, int i, boolean x) {
    /* Assign value x to boolean i of the array. */
    if (x)
      a[i >> 5] |= 1 << (i & 31);
    else
      a[i >> 5] &= (1 << (i & 31)) ^ ALL_ONE; }
    
  boolean get_value(boolean* a, int i) {
    /* Return the value of boolean i of the array. */
    return a[i >> 5] & (1 << (i & 31)); }
This shows clearly that the operations are very simple and can be expected to take very little time. Because the value of x is mostly known by the caller, it is more efficient to replace set_value() by set_true() and set_false(). If efficiency is really a concern, then the procedure call should be avoided, inlining the above instructions.

Because of cache effects, saving memory in many cases implies that all operations go faster. Therefore, the four or five extra elementary operations that must be performed to access the fields of the array are typically outweighed by faster memory access, and therefore packing booleans will mostly even lead to faster execution: there is no memory-time trade-off. These subroutines integrated in a running C program can be downloaded here.

In the above implementation, instead of chars we have chosen to use unsigned ints for the array of booleans. An unsigned does not have a reserved sign bit, and is therefore ideally suited for our purposes. Using ints may imply the waste of at most three bytes at the end of the array, but this has the advantage that many operations which involve all booleans of the array can be performed with 32-fold parallelism. This is particularly important when using such an array of booleans to implement a set: the union and intersection of two sets with n elements each can be computed with round_up(n / 32) operations. Exploiting the w-fold parallelism provided by the bitwise operations of a computer with w-bit word length is called bit parallelism. For example, allocating an array of booleans and initializing all of them at false can be performed as follows:

  boolean* make_array(int n) {
    int i;
    n = (n + 31) / 32;
    boolean* a = (boolean*) malloc(n * sizeof(boolean));
    for (i = 0; i < n; i++)
      a[i] = 0; 
    return a; }

Exercises

  1. Consider the problem of inverting an integer array a[] of length n. For example, an array with values [12, 4, 6, 33, 5] should be turned into an array with values [5, 33, 6, 4, 12]. The task is to write a void procedure invert which takes as arguments an array and its length. The procedure should be correct for all n >= 0. Write two variants: A procedure like the second, using hardly any additional memory is said to work in-situ. Specify the number of assignments for each of the two variants as a function of n, paying attention to the leading constant but ignoring the constants in lower-order terms. Argue which of the two variants will be faster.

  2. Consider the problem of finding the most frequent value in an array a[] of length n. Assume 0 <= a[i] < m for all i, 0 <= i < n. Write a correct C procedure solving this task. The procedure should return an int, giving the most frequent value. Hint: use an additional array which must be declared in a correct way. Express the complexity of your algorithm in terms of n and m.

  3. Consider the problem of testing whether a specified value x occurs in an n x n integer matrix a[][].
    1. Write a correct C procedure returning 1 if there are i and j so that a[i][j] = x and 0 otherwise. What is the time complexity of your algorithm?
    2. Now assume it is given that the matrix is weakly sorted: in every row the values are increasing and in every column as well. That is, a[i][j] <= a[i + 1][j] and a[i][j] <= a[i][j + 1], for all applicable values of i and j. Suggest an alternative algorithm running in O(n) time. Prove that this is optimal, that is, show that any algorithm for this problem has Omega(n) running time.
    3. Work your algorithm out to a correct C procedure.
    4. Surprisingly, for non-square matrices the problem of testing whether a specified value x occurs or not can be solved faster. Present an algorithm which performs better for n x (c * n) matrices, for c >= 1. Express the complexity of your algorithm in terms of n and c. How much faster is this algorithm than the above presented one?
    5. Work your algorithm out to a correct C procedure.

  4. Consider an array a[] of type unsigned int. The length of the array is given by a parameter n, which is read at the beginning of the program. Initialize the array according to the following rule: a[0] == 0; a[i] = (a[i - 1] + x) % n, for all i > 0. Algebra tells us that whenever n and x are relatively prime, that is gcd(x, n) == 1, the whole pattern of a[] values a permutation of the numbers 0, ..., n. That is, all numbers occur exactly once. Taking x a prime number, this is guaranteed to hold for all n which are not a multiple of x.

    The task is to verify this property for various n and x and to measure the time consumption. This is done by using a second unsigned integer array b[], which is used for counting the frequencies of the numbers in a[]. It is initialized at zero and in a final pass the maximum of all values in b[] is determined and printed.

    Times can be measured with the following procedure:

          long dclock() {
            /* Returns the time in milliseconds */
            struct timeval  tp;
            struct timezone tzp;
            gettimeofday(&tp, &tzp);
            return 1000 * (tp.tv_sec % 1000000) + tp.tv_usec / 1000; } 
        
    This is not the most scientific way of measuring times, but it is simple and works quite well. In order to be able to use this routine it is necessary to include the system library "sys/time.h", which is done in the same way as the inclusion of "stdio.h".

    For n you must test n = 2^k, for all k >= 12 as far as the computer allows you to solve the problem in less than 1 minute. For x you must test x = 1, 2, 4, 11, 19, 1007, 99991. The time measurement should only reflect the time for counting the frequencies of the numbers in a[], not the initialization or finding the maximum. To get stable measurements, the experiments should be repeated until the sum of the measured times exceeds 1000 ms (and then of course you must divide by the number of experiments to get the average time per experiment). Plot the resulting average time consumptions as a function of n using a doubly logarithmic scale (that is, both along the x-axis and along the y-axis the scaling is so that each factor two is one unit distance) connecting the points belonging to the same x value. Consider the developments and the differences and explain them.

  5. Write a program for converting numbers from one number system to another. The program repeatedly asks for the number, the initial radix and the final radix. Both radices can be any number up to 10. Negative numbers and zero should also be treated correctly. The resulting converted number is printed.

  6. In Chapter 1 it was specified how a 32-bit floating-point number is composed: 1 sign bit, 8 exponent bits in excess-127 representation, 23 mantissa bits giving an unsigned int. Write a program which asks for a floating-point number, for example -456.123E16 and composes all bits into the 32 bits of a single unsigned int x. In the program you also declare a float* variable y, which is set to point to the same address as x (setting y = (float*) &x;). Then you print the value *y to check the correctness of your conversion.

  7. Write a program for efficiently performing set operations using a boolean for every element of the set, packing 32 booleans (which indicate whether an element is present in the set or not) in an unsigned int. The elements in the sets have indices from 0 to n - 1, for some value n which is read at the beginning of the program. The supported operations should be:

    Generate three random sets of size 100.000.000 each: S1 are the lotto prices for the first draw, the probability that a number gives a price in the is 0.05. S_2 are the lotto prices for the second draw, again a fraction 0.05 of them is 1. S_3 gives the lotto bets, the probability that a number is selected is 0.2. Now compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably there is something wrong with your program).

    Random numbers can be generated with help of the function random. See the online manual for the details (type "man random" inside a Unix or Linux environment).

  8. In the above exercise one of the tasks is to compute the number of elements in a set stored in an array of unsigned ints. Of course the size of the set can be maintained with a counter, but this has a negative impact on the efficiency of the operations insert and delete because then we must first determine the value of the bit before the operation, followed by a conditional update of the counter. If the size of the set is needed only occasionally, it is better to compute it by adding up the number of ones in each of the unsigned ints. The efficiency of this operation depends on the time for determining the number of ones in an n bit number.
    1. Write a subroutine number_of_ones() performing this task in O(n) time.
    2. If an operation has to be performed frequently, performance gains may be achieved by precomputing the resulting value for all possible inputs, storing these in an appropriate way. Apply this idea to the number-of-ones problem: write a procedure precompute() and give a modification of number_of_ones().
    3. How much time does the precomputation take? How much storage is needed? How fast can number_of_ones() now be executed?
    4. Probably your solution uses too much storage for n = 32. Describe how the storage, and the time for the precomputation, can be strongly reduced, while keeping the same good asymptotical running time of number_of_ones().

  9. Write a program for determining all primes up to a certain maximum value n which is read at the beginning of the program. Use the Eratosthenes sieve method. The idea of this method is the following:

    This algorithm can be made more efficient by explicitly dealing with some special cases. For example, it is not necessary to ever test even numbers: these can also be thrown out by a modified initialization. Larger improvements can be achieved by not testing multiples of 2, 3, 5, 7, ... , either. And, when one is not testing them, why should one have storage for these numbers? None of these improvements you must implement, it is just pointed out here that this fast algorithm for finding prime numbers can be improved further.

    Program two variants with the following features:

    For each of the two versions determine the largest power of 2 for which the program runs in less than 1 minute. Which variant is best?

    The program should also produce some output. After performing the sieving, you should determine for each number 2 <= k == 2^i <= n / 2 the number of primes between k and 2 * k and the resulting average distance between two primes in these intervals. For each of these intervals the program should also print the maximum distance between any two consecutive primes.

    We want to know how efficient the Eratosthenes algorithm for computing primes is. Not in a concrete sense by measuring seconds, but in an abstract sense by counting some specific operations which give a good measure for the amount of work performed. In our case, such a measure is given by the number of visited multiples of the prime numbers. This does not account for the initialization and the testing, but this amount of work is easy to estimate: it is proportional to n.

    Determine this number for several values of n and speculate how it develops as a function of n. You can choose from simple functions of the following types: c * n^2, c * n^{3/2}, c * n * log n, c * n * loglog n, and variants. Of course you do not need to speculate: using your measurement of the development of the average distance between primes, it is not hard to derive this development.

  10. Write a program for converting a text to caps_format, all letters must be replaced by capitals, while the other characters and the layout remain unchanged. The original text is found on the file input, the converted text is written to the file output. Both files stand in the same directory as the program.

  11. Write a program for reformatting a text so that all lines are left- and right-aligned. The program first asks for the line width w. Then it determines which words fit on a line loading the characters into a buffer. Then between all words that fit on the line a certain number of blanks is added (so that the spacing becomes as evenly as possible). In this context, a "word" means any sequence of characters not containing blanks, so non-letters standing connected to a word (commas, dots, etc.) are treated as being part of the word and do not get separated from it. Additional blanks and empty lines in the original text are ignored, so the output is a single block of text of width w. Only the last line should be treated in a special way: here no additional spaces are added. The original text is found on the file input, the converted text is written to the file output. Both files stand in the same directory as the program.

  12. Write a program for computing matrix products in three different ways. For computing the product C of n x n matrices A and B, the following methods should be tried:
    1. The trivial method computing C_{ij} = sum_{k = 0}^{n - 1} A_{ik} * B_{kj}.
    2. Transposing A to A' and then computing C_{ij} = sum_{k = 0}^{n - 1} A'_{ki} * B_{kj}. Choose the order in which the C_{ij} are computed so that the time is minimized.
    3. Transposing B to B' and then computing C_{ij} = sum_{k = 0}^{n - 1} A_{ik} * B_{jk}. Choose the order in which the C_{ij} are computed so that the time is minimized.
    Here the transpose of a matrix A is the matrix A' with A'_{ij} = A_{ji}.

    The matrices should be initialized as follows:

    A_{ij} = 1, for all i, j with i + j even
    A_{ij} = -1, for all i, j with i + j odd
    B_{ij} = i, for all i, j
    For C = A * B this gives a simple regular pattern, which can be used to check that the three procedures all compute the same product.

    Measure the time for each of these methods for n = 2^k, for k = 4, 5, ..., 10 or 11. The time for possibly transposing the matrix must also be taken into account, but not the time for allocating and initializing the matrices. The first time you are using C after allocating it, all its fields must be accessed once to make sure that C is actually loaded in to the cache/memory. For the small matrices the experiments must be repeated many times to get stable time measurements.

    Plot the results in a suitable way: along the x-axis you should give the k values, along the y-axis you should give log_2 T(2^k), where T(2^k) gives the time for an experiment with n = 2^k. The graphs should be about lines. Explain the irregularities in the development and the differences between the methods. Which method is best?





Object Oriented Programming: Java

In this chapter a high-level view of Java is presented. It is not intended to provide a complete description of the language. Particularly, it is assumed that the reader already knows how to program in C or a similar language. Here we point out the main features of object-oriented programming and illustrate the introduced concepts with examples taken from Java. There are many good textbooks and reference books. For specific information on classes an overview of all standard classes is provided online.

Introduction

Like in C, the execution of a Java program starts in the procedure called "main". In Java it is common to say method for procedure. This procedure must be found inside some class. In Java classes are what before we called types. In object-oriented languages, the classes are put in the foreground, and therefore, any method, including "main", must belong to some class. The name of the text file in which a program is stored is determined by the name of the class in which main is given: it must be "MainClassName.java".

Java was originally designed as an interpreted language. However, not the source code is interpreted, but something called byte-code. This byte-code is generated by a program (one could say a compiler) called "java.c". So, once the program is written, one types "java_c MainClassName.java". If there are no syntactical errors, then a new file with name "MainClassName.class" is generated. The program can now be executed by typing "java MainClassName".

The following gives a very simple program computing the average value of the fields of an array:

  class ArrayAverage
  {
    static final int SIZE = 100;

    public static void main(String ps[])
    {
      int i, sum;
      int[] a = new int[SIZE];
      
      for (i = 0; i < SIZE; i++)
        a[i] = i;
      for (i = 0, sum = 0; i < SIZE; i++)
        sum += a[i];
      System.out.println("\nThe average value is " + 
        (float) sum / SIZE + "\n");
    }
  }

Comparing with the C program doing the same, we see that details have changed, but that the program as a whole is more or less the same. The differences are

For trivial programs as in the example, Java is a burden: C is simpler.

In Java there are quite generally applied name-giving conventions:

At the most basic level, the difference between Java and C is small.

Classes, Objects and Methods

Definitions

The central notion in object-oriented languages is the class (though in other languages other names may be used). A class is an extension of a data type. Namely, it is a data type together with the operations that can be performed on it. In Java, these operations, which in non-object-oriented languages would be called procedures, are called methods. A class variable is called an object.
Classes, structured data types defined along with the operations that can be performed on them, their functionality, are the central notion of object-oriented programming languages.

Class Examples

We give some simple examples of classes. The first class may be useful when writing a program for the administration of a company: the class "Employee". The second class can be useful in a linear algebra program: the class "IntegerMatrix". The third class is useful in creating data structures: the class "Chain". The classes Employee and Chain are, slightly modified, appearing again in complete programs in the section "Program Examples" hereafter. Extending class IntegerMatrix to a program is one of the exercises.

Class Employee

  class Employee
  {
    String name;
    int number;
    double salary;

    public Employee(String theName, int theNumber, int theSalary) 
    {
      name   = theName;
      number = theNumber;
      salary = theSalary;
    }

    public void increaseSalary(double salaryIncrease)
    {
      salary += salaryIncrease;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary + ")";
    }

    public void setName(String newName)
    {
      name = newName;
    }
  }

Here we see many important aspects of classes. First the header. A class header always consists of the word "class" followed by the name of the class, in our case "Employee". In the basic case we are considering here, we then get a "{", which is matched by a "}" at the end of the definition of a class. It should be noticed that here we only describe a class, we do not create an object of this class.

Then we see a list of variables: a String, an int and a double. These variables will be called instance variables. Other names are in use as well. Any Employee object (= instance) has the instance variables name, number and salary. So far a class is just like a struct in C.

The difference with a struct is that the definition of a class contains also the definition of the methods that are working on objects (= variables) of this class. Further down we will see how this goes, but here we can notice already that there are four such methods: Employee, increaseSalary, toString and setName.

The latter three resemble procedures in C: they have a name, parameters and a return type. In addition we find the word "public", which is an example of an access modifier. "public" means that these methods are accessible from outside. If we would have written "private" instead, these methods could only have been called from inside the class itself. In total there are four of these access modifiers, they will be discussed further down.

The method Employee is more special. It is a so-called constructor. When calling this method in combination with the keyword new, then memory is allocated and the instructions in the constructor are executed. Typically a constructor contains instructions to initialize the instance variables, but it may also do more or less. In any case: each class must have at least one constructor, otherwise no objects can be generated. The name of a constructor is always the same as the name of the class and therefore one does not indicate the return type: by default it returns an object of the class.

Inside a class a list of instance variables together with their type is followed by all methods. Each class definition must contain at least one constructor, which is called when creating new objects.

Class IntegerMatrix

  class IntegerMatrix
  {
    int n;
    int[][] a;

    public IntegerMatrix(int size)
    // Initializes all positions with 0
    {
      n = size;
      a = new int[n][n];
      for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
          a[i][j] = 0;
    }

    public IntegerMatrix(IntegerMatrix matrix)
    // Creates a copy of matrix
    {
      n = matrix.n;
      a = new int[n][n];
      for (int i = 0; i < n; i++)
        for (int j = 0; j < n; j++)
          a[i][j] = matrix.a[i][j];
    }

    public int trace()
    // Computes the trace (= sum of diagonal elements) of a matrix
    {
      int s = 0;
      for (int i = 0; i < n; i++)
        s += a[i][i];
      return s;
    }

    public boolean findValue(int x)
    // Checks whether the value x occurs in the matrix
    {
      int i;
      for (int j = i = 0; i < n && a[i][j] != x; i++)
        for (j = 0; j < n && a[i][j] != x; j++);
      return i < n;
    }
  }

Here we see all the things we saw above plus some new features. Most noticeable is that there are two constructors. They have the same name, inside the same scope (both names are visible inside and outside the class). Nevertheless this is correct: the names are the same, but their signature is not. The signature of a method is the whole set of name, parameter list and return type. For two parameter lists to be the same, parameters of the same types must appear in the same order. Here the first constructor has an int parameter, the second has an IntegerMatrix parameter. When calling these methods from outside, the compiler/interpreter has no problem in figuring out which of the two is meant: it just has to check the type of the parameters and to match it with one of the specified methods. This is a first example of polymorphism about which we will hear more further down.

Now it is also time to notice that inside the class the methods can work with the instance variables. In general, in a method there are three kinds of variables:

In findValue we see examples of each category: a[][] and n are instance variables, x is a parameter and i and j are local variables.

Local variables must not be declared at the beginning of a method. On the contrary: in Java it is considered to be good style to declare a variable locally. Also it is considered good to initialize a variable upon declaration. One must be careful with the scope of a local variable. By scope we mean the "visibility range" of a variable. The scope of a variable stretches from its declaration to the end of the level at which it was declared. So, a variable declared at the beginning of a method is visible anywhere in the method, but not outside the method. That is why we call it a local variable. The variable i in findValue is of this type. A variable which is declared in the header of a for loop is visible inside this loop, but not outside of it. The variable j in findValue is of this type. The reason that i was not declared in the header of the first loop is that we wanted to use it in the final comparison. A variable declared inside a compound statement is visible only within this compound statement.

It is correct and perfectly accepted to use the same variable name in many different methods. This possibility assures that program fragments can be combined without extensive effort to trace all common variable names. If from a method in which a variable x is used another method is called in which also a variable x is used, then of course this latter x is the valid one, because the scope of the x in the calling method is limited to its own method.

Slightly less clear that the following is also correct:

  int i = 10;
  for (int i = 0; i < 1000; i++)
    a[i] = 2 * i;
  System.out.println("i = " + i);
What is going to be printed? 10 of course! Outside the for loop the locally defined variable i is not existing, the scope of this variable is limited to the loop. On the other hand, inside the loop the original variable i is not visible: it is shielded by the more local variable. The reason why this works, not only in Java, is that the compiler creates its own list of variables and has no problem to keep the variables apart. Even though this works, there is rarely a good reason to program this way, and therefore this confusing style of programming should be avoided.

In the second constructor, we see that there may be a parameter of the same type as the class. One might fear that this leads to confusion. However, the instance variables of such a parameter are accessed like the instance variables of any other object with help of the dot-operator, ".", just like in C. So, n is the instance variable of the instance under consideration, while mat.n denotes the corresponding instance variable of the parameter mat.

In both constructors we see how an array is allocated. In C there is no distinction between declaring an array variable and the allocation of its memory. In Java, writing int a[][] creates an array variable without allocating memory (which would be hard, because the compiler still does not know how big it should be). An array variable is actually a reserved memory location in which pointers to arrays of the appropriate type can be stored. The call "new int[n][n]" allocates space for n * n integers and returns a pointer to this space. This pointer is assigned to the array variable a. All this is very clean. A similar construction we have in C when we use int** a and malloc to allocate memory, but this is quite ugly.

Now that we speak about memory allocation: in Java one does not have to bother about cleaning up (though it is possible to do so): the system runs a garbage collector in the background. A garbage collector is a program which checks for objects to which no pointers are pointing anymore and then deallocates their memory.

A variable of a class type is actually only a pointer to an object of this type. This object can be generated calling a constructor and then it can be assigned to the variable.
Not all data types are classes: the primitive data types, several numerical types, characters and booleans, are not classes. All derived types are classes. Only the variables of class types are called objects. This distinction is important: when calling a method, objects are passed "by reference", while non-objects (= normal variables) are passed "by value". Actually this is not an entirely correct view: a variable of a class type is actually a pointer. So, if we pass a class variable as parameter in a method call, then the value of this pointer variable, an address, is copied into the corresponding parameter. This is the same as in C, the only difference is that for variables which are not objects, variables of the primitive types, there is no way to specify that we want to pass their address.

At this point Java is rigid, and sometimes this makes it hard to do easy things. In C it is trivial to write a procedure "swap" for exchanging the value of two variables which are passed as parameters: one passes their address instead of their value, which is done with help of the address operator "&". In the procedure one can access the values of these parameters with help of the value-of operator "*". In Java this simple and common task can only be realized in a quite elaborate way, using a so-called wrapper class: a class with a single instance variable of a primitive type, which thus obtains object status. The following example, which can be downloaded here, gives a possible work-out of this idea. Java provides also predefined wrapper classes: Integer, Float, Boolean, ... .

  class Int // A self-defined wrapper class
  {
    public int v; // The wrapped value
  
    public Int(int x)
    {
      v = x;
    }
  
    static public void swap(Int a, Int b)
    {
      int c;
      c   = a.v;
      a.v = b.v;
      b.v = c;
    }
  }
  
  public class Swap
  {
    static public void swap(int a, int b)
    {
      int c;
      c = a;
      a = b;
      b = c;
    }
  
    public static void main(String[] args)
    {
      int a = 4;
      int b = 7;
      System.out.print("a = " + a + ", b = " + b + "\n");
  
      swap(a, b); // Swapping without effect
      System.out.print("a = " + a + ", b = " + b + "\n");
  
      Int aWrap = new Int(a); Int bWrap = new Int(b); // Wrapping
      Int.swap(aWrap, bWrap);                         // Swapping
      a = aWrap.v; b = bWrap.v;                       // Unwrapping
      System.out.print("a = " + a + ", b = " + b + "\n");
    }
  }

When calling methods class variables, objects, are passed by reference, while variables of primitive types are passed by value. Wrapper classes grant object status to primitive types, allowing to pass variables of primitive types by reference.

Class Chain

In this section we consider classes which can be used to construct a linked list of nodes. The class Node, corresponding to the nodes of the list, contains no methods except for constructors. The class Chain, has a single Node as instance variable. This is the access point to the chain. The methods provide the required functionality, allowing for searching a specified key, printing, insertions and deletions.
  class Node
  {
    int  key;
    Node next;

    Node(int key, Node next)
    {
      this.key  = key;
      this.next = next;
    }

    Node(int key)
    {
      this(key, null);
    }
  }

  class Chain
  {
    Node first;

    public Chain()
    {
      first = null;
    }

    private Node getLast()
    // Return the last node of a chain
    {
      if (first == null)
        return null;
      Node node = first;
      while (node.next != null)
        node = node.next;
      return node;
    }

    public void addFirst(int key)
    // Add a new node at the beginning of the chain
    {
      first = new Node(key, first);
    }

    public void addLast(int key)
    // Add a new node at the end of the chain
    {
      if (first == null)
        first = new Node(key);
      else
        getLast().next = new Node(key);
    }

    public void addChain(Chain chain)
    // Attach the Chain chain at the end of the considered chain
    {
      if (first == null)
        first = chain.first;
      else
        getLast().next = chain.first;
    }

    public boolean findValue(int x)
    // Test whether there is a node with key value x
    {
      Node node = first;
      while (node != null && node.key != x)
        node = node.next;
      return node != null;
    }

    public void print()
    // Print all the keys together with their position in the list
    {
      int counter = 0;
      Node node = first;
      while (node != null)
      {
        System.out.println("Node " + counter + " has key " + node.key);
        counter++;
        node = node.next;
      }
    }
  }

In Node there are two instance variables: key and node. key is a simple integer instance variable as we have seen before. The exiting thing is that node is of type Node. Is this possible? What does it mean? Here it is crucial that an object, and any variable of type Node is an object because Node is a class, is only a pointer and not the thing itself (otherwise we would get an explosion). So, upon calling one of the constructors with "new Node( ... )", space is allocated for one integer and for one pointer to a Node object (each takes either 4 or 8 bytes) and a pointer to this space is returned.
Linked structures can be defined by defining a class with an instance variable with the data type of the class itself. Because memory for an object is only allocated when explicitly calling a constructor, this does not lead to a recursive explosion.

The constructors of Node contain the special word this. this has several related meanings. It means either: "this class", or "the current object". In our example we find examples of both applications:

When calling in a class a method of the same class, then by default it is assumed that this call is to be performed with the current object. Therefore, even though this is not wrong, it is superfluous to write, for example, "this.getLast()" in the method addLast() of Chain.

The first constructor of Node is of a conventional type: two parameters are passed and assigned to the instance variables. Slightly problematic might be the assignment "this.next = next". Here a Node object is assigned to another Node object. What does it mean? If one realizes that an object is a pointer, the answer is clear: afterwards next points to the same object as next. This is general: an assignment "x = y" can always be performed when x and y are variables (y may also be a constant) of the same type (or more generally when the type of y may be converted to the type of x). In case x and y are of a primitive type, then afterwards x has the same value as y. In case x and y are objects, then afterwards x points to the same object as y (even in this case one can say that x has the same value as y, namely the same address).

The class Chain has only one instance variable: the Node first. The single constructor is trivial: no parameters, first is set to null. null is a constant value which can be assigned to any pointer variable (that is, object). It means something like "to_nothing". The important thing is that it can be used in tests. If first (or any other object) has value null, that it would be fatal (that is, leading to a runtime error) to use first.key: first == null means that the pointer of first has no specific value, in particular it is not pointing to a storage space of a Node. Thus, first.key, which means so much as the value of the int stored in the storage space first is pointing to, is not defined. Errors of this kind are very common. In the above example we were careful not to run into it.

The other methods of Chain are for adding nodes either at the beginning or at the end, for checking whether an element exists or for printing all keys in the order they appear in the list. Further methods can be added to make it more useful, here we only give an example. The method getLast is declared private. This is because we decided that it should be only for internal usage. The reason for this is that we maybe do not want to guarantee that it is always there or not in exactly this form. This prevents users from using features which they are not supposed to use. This is a first example of encapsulation about which we will hear more further down.

Now that we are presenting the Chain, we should also try to understand how exactly it works under addition of nodes. Initially we have an empty structure: first == null. Then, the first addition (it does not matter which of the additions is used) creates a new initialized Node by calling "new Node(key)" and assigns the returned value, a pointer to a Node to first. The later additions are of two kinds.

addFirst performs

  first = new Node(key, first);
Here many things are happening! First the value of first, a pointer to a Node or null, is looked up and together with the new key it is passed to the Node constructor. This creates a new Node object with the same next value as first had so far. Then the resulting pointer is assigned to first.

addLast performs

  getLast().next = new Node(key);
Here a new Node object with the new key value is created. Its next value is set to null. Then the resulting pointer is assigned to the next field of the Node which is found by calling getLast. Here getLast walks along the chain until coming to the last node and returns this object (of course it would be handy to have a second instance variable "last" in order to access this position faster, but this would be less instructive).

Operations on a Chain

It was pointed out that one should be very careful not to access the instance variables of a null-object. Is this not exactly what we are doing in the following loop in findValue?

  while (node != null && node.key != x)
    node = node.next;
No! The reason is that in an expression involving && the left-hand side is evaluated first. If node == null, it is certain that the whole evaluation will result with false and therefore it is interrupted. On the other hand, it would have been fatal to write
  while (node.key != x && node != null)
    node = node.next;
Even though this works, depending on the programming language there may be no guarantee that it does. Therefore this is an example of a possibly risky programming style which might better be avoided. In this case this goes at little extra cost by rewriting findValue() as follows:
  public boolean findValue(int x)
  {
    if (first == null)
      return false;
    Node node = first;
    while (node.next != null && node.key != x)
      node = node.next;
    return node.key == x;
  }
Using linked structures in an object-oriented way requires that one or more objects of some node-type occur as instance variables in the definition of another class. These instance variables give the access points to the structure.

Program Examples

It is now considered how the above classes can be integrated in a program which can be tested and adapted. In Java the various classes may stand in several files. For simple small programs, there is no need to do so, but for larger programs this is actually recommended. Each class may be located in its own file with appropriate name: NameOfClass.java. If, as in the example of Chain, a class uses objects from another class, then javac is so clever to first trace all needed classes and to translate even them. So, storing Node and Chain in files Node.java and Chain.java, it is sufficient to write "javac Chain.java": this generates both Node.class and Chain.class, just as when they would have been stored in the same file.

Program Employee

The class Employee is now extended to a complete program based on it. We introduce one extra class Company and a trivial class containing main.
  class Employee
  {
    protected String name;
    protected int number;
    protected double salary;

    public Employee(String theName, int theNumber, double theSalary) 
    {
      name   = theName;
      number = theNumber;
      salary = theSalary;
    }

    public double getSalary()
    {
      return salary;
    }

    public double getNumber()
    {
      return number;
    }

    public void increaseSalary(double salaryIncrease)
    {
      salary += salaryIncrease;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary + ")";
    }

    public void setName(String newName)
    {
      name = newName;
    }
  }
  class Company
  {
    protected int size;
    protected int maxSize;
    protected Employee staff[];

    public Company(int theMaxSize)
    {
      size    = 0;
      maxSize = theMaxSize;
      staff   = new Employee[maxSize];
    }

    public int getSize()
    {
      return size;
    }

    public int getMaxSize()
    {
      return maxSize;
    }

    public void setName(int number, String name)
    {
      int i = 0;
      while (i < size && staff[i].getNumber() != number)
        i++;
      if (i == size)
        System.out.print("Number not found, ignoring instruction!\n");
      else
        staff[i].setName(name);
    }

    public void addEmployee(String name, int number, double salary)
    {
      if (size == maxSize)
        System.out.print("No space left, ignoring instruction!\n");
      else
      {
        staff[size] = new Employee(name, number, salary);
        size++;
      }
    }

    public void increaseSalary(double factor, double leastIncrease)
    {
      for (int i = 0; i < size; i++)
      {
        double increase = factor * staff[i].getSalary();
        if (increase < leastIncrease)
          staff[i].increaseSalary(leastIncrease);
        else
          staff[i].increaseSalary(increase);
      }
    }

    public void print()
    {
      System.out.print("\nOverview of employees:\n");
      for (int i = 0; i < size; i++)
        System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
    }
  }
  class CompanyTest
  {
    public static void main(String ps[])
    {
      Company myCompany = new Company(100);

      myCompany.addEmployee("Becker, Boris",    235521, 4500.00);
      myCompany.addEmployee("Hecht, Edgar",     878722, 6500.00);
      myCompany.addEmployee("Albers, Marianne", 456212, 1554.00);
      myCompany.addEmployee("Krauser, Angela",  426578, 1954.00);
      myCompany.addEmployee("Noack, Christina", 663738, 5646.00);
      myCompany.print();

      myCompany.increaseSalary(0.04, 50.0);
      myCompany.addEmployee("Brauer, Harald", 568900, 2200.00);
      myCompany.setName(456212, "Becker Marianne");
      myCompany.print();

      System.out.print("\n");
    }
  }
Here we have slightly changed even Employee. We have made the instance variables "protected" in order to restrict the access from outside the class. Instead special access methods are supplied. These methods are typically given names like "getNumber" and "setName". Of course this means that many extra calls to methods are made, but errors in future extensions is worse! Never forget that in Java the prime consideration is correctness, not speed. If speed is really critical (as it is in programs solving very large problems and in games), then in some small well-documented sections in which most of the computation is performed you may do ugly things. However, if speed really matters, then one can better write a hack in C.

One should also notice that once we have defined Employee how amazingly simple it is to build Company on top of it: we just declare an array of Employee and add a few methods for performing operations on the Company as a whole. Then the main program is more or less an empty shell. The good thing is that even without knowing about the underlying organization, any reader who understands the format immediately grasps what is going on. This is partially because of the names that were chosen, but even more because of the usage of powerful subroutines and the object-oriented programming style.

Here we touch on the most important new point. What does writing "myCompany.addEmployee( ... )" or "staff[i].increaseSalary( ... )" mean? Here we see the second usage of the dot-operator. Before we have seen that it can be used for accessing the instance variables of an object. Here we use it to connect an object with a method from its class. The semantic of this is, that the system first determines the class of the object, then searches for a method with matching signature in the class and then executes the method working on the instance variables of the object. In object-oriented languages, this is the major way of calling methods.

An exception are the static methods. A static method is any method which in its definition is preceded by the keyword "static". Static methods can be called without passing an object of the class on which it works. This implies that inside a static method there are no instance variables to use. A static method corresponds to a procedure in C and other non-object-oriented languages. The non-static methods are something new, the static ones we already know! Even in Java we already know one important example: main. Of course main should be callable without object, because by the time it is called there is not yet any object!

Static methods are encapsulated inside their classes. That is, if they are called from outside the class, it is not obvious where to find such a method (there might be static methods with the same name in several classes). Therefore, when calling a static method it is necessary to indicate where they can be found. This is done by prefixing the name of a static method with the name of the class connected by ".": another usage of the dot operator.

In principle it is possible to program in Java as in C: make one big class without instance variables and declare all methods to be static. This is against the whole concept of object-oriented programming, and therefore considered to be extremely bad style. Sometimes it is very handy though to have static methods, sometimes it more clearly expresses what is going on (a call with an object puts one an object in the foreground, but maybe the operation uses several objects as arguments in a symmetric way), and sometimes there is no alternative: as we mentioned before variables of the primitive data types are no objects. So, how should one compute e^x for a double x? The exponent function, and many other mathematical functions alike, are therefore static. This allows to call them the conventional way, without first converting a double into a Double (Double is the class with a double as an instance variable). Therefore, inside the class Math the method exp is defined as

  public static double exp(double a)
It can be called by writing Math.exp(x).

A somewhat strange case are the constructors. These are called by only giving the name of the method, but because this name is identical with the name of the class, it is clear where to find them. No object is passed, in this sense it resembles a static method, but the constructor allocates the object, and therefore the instance variables are available like in non-static methods.

In object-oriented programming, the default way of calling methods is by connecting an object of the appropriate class to the method with help of the dot operator. The method is working on this object. Static methods are called by specifying the class without passing an object.

Program FibonacciTest

Above it was considered how to write a Java method that swaps the values of two integers. The problematic that primitive data types cannot be passed by reference also arises when trying to efficiently compute Fibonacci numbers in a recursive way. Here we consider how to handle this problem in an elegant and object-oriented way.

Computing fib(n), the n-th Fibonacci number using directly fib(n) = fib(n - 1) + fib(n - 2) gives an algorithm whose time consumption increases exponentially with n. Of course Fibonacci numbers can easily be computed in an itterative way, but that is not the point here: this problem stands for a whole class of problems. An efficient recursive algorithm can be obtained by not only computing fib(n), but also fib(n - 1). From these two values fib(n + 1) and fib(n) can be computed in constant time and thus the time for computing fib(n) increases linearly with n, as it should do.

In C the two computed values may be handed over using variables of type int*. In Java each variable can be individually wrapped, but doing that means ignoring the structure of the problem: if the method should return a pair of values, then we should use objects of some class which can hold a pair of integers. This can now easily be turned into a correct and efficient program, but this leads to a functional rather than to an object-oriented approach. In an object-oriented context, it is cleaner to let the method work on objects of some class, than to let a static method return objects of this class. Taking all these considerations into account, we get the following program which can be downloaded here:

import java.io.*;

class IO
{

  public static int readInt() 
  // Reads an int from standard input.
  {
    String input = "";
    try 
    {
      BufferedReader bufRead = new BufferedReader
        (new InputStreamReader (System.in));
      input = bufRead.readLine();
    } 
    catch (java.io.IOException e) 
    {
      System.out.print("Error while reading input line!\n");
    }
    return Integer.valueOf(input).intValue();
  }
} 

class Fibonacci
{
  int x, y;

  Fibonacci()
  {
    x = y = 0;
  }

  void recFib(int n)
  {
    if (n == 1)
    {
      x = 0;
      y = 1;
    }
    else
    {
      recFib(n - 1);
      y += x;
      x =  y - x;
    }
  }

  static int fib(int n)
  {
    if (n == 0)
      return 0;
    Fibonacci p = new Fibonacci();
    p.recFib(n);
    return p.y;
  }
}

class FibonacciTest
{
  public static void main(String[] args)
  {
    System.out.print("\nGive n       >>>   ");
    int n = IO.readInt();
    System.out.println("Computed value = " + Fibonacci.fib(n) + "\n");
  }
}

Here we will not try to understand the method readInt(). In Java even IO is handled in a clean object-oriented way, but one would prefer C's basic but convenient routines. Unformatted writing is easy, but reading and formatted writing require quite elaborate methods. More interesting is the class Fibonacci. It contains a static method fib(), which is called from main(). We see how when calling readInt() and fib(), the name of the class is indicated by prefixing it with the respective class names.

In fib() an object p of the class Fibonacci is created. recFib() is called with this object. In recFib() recursive calls are made. Remind that the statement "recFib(n - 1)" is equivalent to "this.recFib(n - 1)", and in this way p, or more correctly a pointer to p, is handed all the way down until reaching the bottom of the recursion. There the values x and y are given values. Remind that whenever working inside a class with the instance variables, these are the instance variables of the current object. In our case this is the object p. Then the recursion returns step-by-step, eventually computing fib(n - 1) and fib(n). The second of these values is returned by fib().

Program Chain

In this section Node and Chain, with minimal modifications, are combined into the following program which can be downloaded here.
  class Node
  {
    static int totalSize = 0;

    int  key;
    Node next;

    Node(int key, Node next)
    {
      this.key  = key;
      this.next = next;
      totalSize++;
    }

    Node(int key)
    {
      this(key, null);
    }

    protected void finalize()
    {
      totalSize--;
    }
  }

  class Chain
  {
    Node first;

    public Chain()
    {
      first = null;
    }

    private Node getLast()
    // Return the last node of a chain
    {
      if (first == null)
        return null;
      Node node = first;
      while (node.next != null)
        node = node.next;
      return node;
    }

    public void addFirst(int key)
    // Add a new node at the beginning of the chain
    {
      first = new Node(key, first);
    }

    public void addLast(int key)
    // Add a new node at the end of the chain
    {
      if (first == null)
        first = new Node(key);
      else
        getLast().next = new Node(key);
    }

    public void addChain(Chain chain)
    // Attach the Chain chain at the end of the considered chain
    {
      if (first == null)
        first = chain.first;
      else
        getLast().next = chain.first;
      chain.first = null;
    }

    public boolean findValue(int x)
    // Test whether there is a node with key value x
    {
      Node node = first;
      while (node != null && node.key != x)
        node = node.next;
      return node != null;
    }

    public void print()
    // Print all the keys together with their position in the list
    {
      int counter = 0;
      Node node = first;
      while (node != null)
      {
        System.out.println("Node " + counter + " has key " + node.key);
        counter++;
        node = node.next;
      }
    }
  }

  class ChainTest
  {
    public static void main(String ps[])
    {
      Chain c1 = new Chain();
      Chain c2 = new Chain();

      System.out.println("\nCreating chain 1\n");
      c1.addFirst(12);
      c1.addFirst(22);
      c1.addFirst(16);
      c1.addFirst(14);
      c1.addFirst(20);
      c1.addFirst(18);
      for (int i = 0; i < 100; i++)
        if (c1.findValue(i))
          System.out.print(i + " is among the stored values\n");
      c1.print();
      System.out.println("Total number of nodes = " + Node.totalSize);

      System.out.println("\nCreating chain 2\n");
      c2.addLast(11);
      c2.addLast(23);
      c2.addLast(19);
      c2.addLast(37);
      c2.addLast(21);
      for (int i = 0; i < 100; i++)
        if (c2.findValue(i))
          System.out.print(i + " is among the stored values\n");
      c2.print();
      System.out.println("Total number of nodes = " + Node.totalSize);

      System.out.println("\nConcatenating chains\n");
      c1.addChain(c2);
      for (int i = 0; i < 100; i++)
        if (c1.findValue(i))
          System.out.print(i + " is among the stored values\n");
      System.out.println("\nChain 1:\n");
      c1.print();
      System.out.println("\nChain 2:\n");
      c2.print();
      System.out.println("Total number of nodes = " + Node.totalSize);

      System.out.println("\nRemoving chain 1\n");
      c1 = null;
      System.gc();
      System.out.println("Total number of nodes = " + Node.totalSize);
    }
  }

Class Chain is the same as before. Node is augmented by a static variable. A static variable is the fourth kind of variables next to instance variables, parameters and local variables. These might best be called class variables, so belonging to the class and not to the instance: for all the objects of a class there is only one copy of a static variable. This is the ideal way to maintain information pertaining to the class as a whole. The prime example of this is a counter which keeps track of the number of objects extent. For example, it may be counted how many external ports are in use, and once a new port is requested when the maximum number is already used, some special action must be taken. Inside the class these variables can be accessed just like the instance variables. Outside the class they are accessed analogously to the way a static method is accessed: the name of the static variable is prefixed with the name of the class connected by ".". An example is found in the instruction

  System.out.println("Total number of nodes = " + Node.totalSize);
Of course this access is possible only if the variable is not private.

In Node we now also find a new method called finalize. The method finalize is called automatically by the system when an object is removed by the garbage collector, once for each removed object. It is by default part of any class definition (in an unvisible way) doing nothing, but one can choose to give it a certain functionality. Especially when one uses static variables to count occupied resources, it is important to also decrease there value when these resources are freed again.

Now one might think that in our example the value printed in the last line is 0: because there are no pointers anymore to the chain, all nodes in it have become garbage, unaccessible allocated parts of the memory. So, they could be removed. However, the garbage collection is done in a lazy way: typically it is only performed when need arises or when the processor is waiting anyway. Therefore, the printed value will most likely be 11. If one wants to force the garbage collector to run, then one should add a call to the static method gc from System:

  System.gc();
Notice that in addChain the final instruction is deleting the link of the attached chain. Without this instruction, the second half of the chain would still have been reachable, and the garbage collector would only throw away 6 of the nodes. It is strongly suggested that the readers actually try these variants of the program and understand what is happening.
Static variables are class variable: one copy exists for all objects of a given class. This is particularly useful for counters. In order to keep the counting up-to-date in the context of automatic garbage collection one should overwrite the method finalize().

Inheritance, Polymorphism and Encapsulation

General Idea of Inheritance

In software development it is a very common situation that an existing software package is extended. For example, we have one of the above classes and decide that we actually need an extra instance variable or an extra method. Of course one could edit the old package and add the new instance variable or method. This requires that one finds its way through the declarations which might have been made long ago or by some else.

Harder is it if we do not want to add an instance variable or a method but to change it. For example we may want to change the type of the node first in Chain from Node to BetterNode or we may want to replace a method which is good in a general case by a method which is better in a special case. Of course we can give it a new name and add it nevertheless. This is however quite ugly and confusing. It would at least require a very good documentation to make sure that later updates indeed choose the right methods. In any case it increases the number of variables and methods unnecessarily.

Now assume that we want to maintain objects with slightly different features in a common structure, for example an array. One can think of a shop having all kind of things to sell. For food articles there is an ultimate selling day, for non-food articles there may be seasons to respect. But all of them have a price. So, it makes sense to maintain all objects in an array and to call a method price increase. In C this is really hard to realize.

All mentioned aspects are dealt with in a trivial way by the idea of inheritance. Inheritance means that one defines a new class as an extension of an existing class. Such a new class is called a derived class, the class which it extends will be called mother class or base class.

A derived class inherits all the instance variables and methods of its mother class. In addition new instance variables and methods may be added. Instance variables from the mother class may even be defined again, shielding the variable from the mother class. Methods can be overwritten. Frequently a method in a derived class is merely a small modification of a method in the mother class. In that case it is natural and possible to reuse the code from the mother class by a special calling mechanism.

Inheritance is the key concept of object-oriented programming. It allows to add, extend and adapt the functionality of methods and to add instance variables to a class in a hierarchical way.

Inheritance, Polymorphism and Encapsulation

Class BetterCompany

As an example we consider again the earlier defined classes Employee and Company. In this company there are only one kind of employees, which all have the same stored features and which are all treated in the same way by the methods. However, in most companies there are many kinds of employees: they can be divided both according their domain of activity and according to their hierarchical level. We will consider the last, and distinguish director, staff, worker. Different rules apply for them according to salary increases, vacation days, absence due to illness, etc. They might also have different relevant features to store: for the director it might not be counted how much vacation he/she takes, but for all others this variable must be there; only the director has a budget to take care of.

All what has been mentioned so far, holds true for any object-oriented language, possibly with some differences in terminology. The concrete example brings us back to Java. The class definitions of Employee and Company are not repeated, these classes are considered as being fixed. All of the following classes are all build on top of these two. It turns out that while designing these original classes, we might have been slightly more extension oriented: one method is formulated in an unsuitable way, another is not defined at all, even though it will arise in all derived classes. Therefore the following construction is slightly more complex than necessary. This might be considered as a realistic example therefore. Click here if you want to download the complete program.

Overview of Classes in Company Example

  class FixedEmployee extends Employee
  {
    public FixedEmployee(String name, int number, double salary)
    {
      super(name, number, salary);
    }

    public void endOfYear()
    {
    }
  }

  class Director extends FixedEmployee
  {
    private double yearlyBudget;
    private double budget;
   
    public Director(String name, int number, double salary,
             double theBudget)
    {
      super(name, number, salary);
      yearlyBudget = budget = theBudget;
    }

    public void endOfYear()
    {
      budget = budget / 2 + yearlyBudget;
    }

    public void expense(double amount)
    {
      budget -= amount;
    }

    public void increaseSalary(double salaryIncrease)
    {
      if (budget >= 0)
        super.increaseSalary(2.0 * salaryIncrease);
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary +
             ", director, " + yearlyBudget + ", " + budget + ")";
    }
  }
FidexEmployee is only used to add the method endOfYear, which is defined in all derived classes.

Director is defined as an extension of FixedEmployee. Director has two additional instance variables: "yearlyBudget" and "budget". The new constructor has one more parameter. It performs first a call super( ... ). In this case this means a call to the constructor of the mother class. However, the usage of super is not limited to this case: it generally denotes methods or instance variables in the mother class. The opposite is this, which we encountered already in class Node. It generally denotes the current object or a method, particularly a constructor, from the current class.

"endOfYear" and "expense" are new methods. More interesting are the methods which existed already before: "increaseSalary" and "toString". These are overwriting the methods with the same name in the mother class. increaseSalary calls the method in the mother class by specifying this with super.

  class LowerEmployee extends FixedEmployee
  {
    protected int vacationDays;
    protected int yearlyVacationDays;

    public LowerEmployee(String name, int number, double salary, 
      int theYearlyVacationDays)
    {
      super(name, number, salary);
      vacationDays = 0;
      yearlyVacationDays = theYearlyVacationDays;
    }

    public int applyVacation(int numberOfDays)
    {
      if (numberOfDays > vacationDays)
        numberOfDays = vacationDays;
      vacationDays -= numberOfDays;
      return numberOfDays;
    }

    public void endOfYear()
    {
      vacationDays = vacationDays / 2 
                   + yearlyVacationDays;
    }
  }
The class LowerEmployee has the same features as Director: a few new instance variables and methods. In the constructor the constructor of the mother class is again called. It is a requirement that this call is the first statement of any constructor in a derived class.

  class Staff extends LowerEmployee
  {
    private int overTime;

    public Staff(String name, int number, double salary,
      int yearlyVacationDays)
    {
      super(name, number, salary, yearlyVacationDays);
      overTime = 0;
    }

    public void addOvertime(int hours)
    {
      overTime += hours;
    }

    public void endOfYear()
    {
      super.endOfYear();
      vacationDays += overTime / 10;
      overTime = 0;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary +
             ", staff, " + yearlyVacationDays + ", " + vacationDays + ")";
    }
  }
  class Worker extends LowerEmployee
  {
    private static int shiftVacationDays  = 5;
    private boolean shiftDuty;

    public Worker(String name, int number, double salary,
      int yearlyVacationDays, boolean theShiftDuty)
    {
      super(name, number, salary, yearlyVacationDays);
      shiftDuty = theShiftDuty;
    }

    public void increaseSalary(double salaryIncrease)
    {
      if (shiftDuty)
        super.increaseSalary(1.1 * salaryIncrease);
    }

    public void endOfYear()
    {
      super.endOfYear();
      if (shiftDuty)
        vacationDays += shiftVacationDays;
    }

    public String toString()
    {
      return "(" + name + ", " + number + ", " + salary +
             ", worker, " + yearlyVacationDays + ", " + vacationDays +
             ", " + shiftDuty + ")";
    }
  }
The variable shiftVacationDays is static. This means that this is not an individual quantity, but common to all members of the class.

  class BetterCompany extends Company
  {
    public BetterCompany(int maxSize)
    {
      super(maxSize);
    }

    public void addEmployee(FixedEmployee newEmployee)
    {
      if (size == maxSize)
        System.out.print("No space left, ignoring instruction!\n");
      else
      {
        staff[size] = newEmployee;
        size++;
      }
    }

    public void endOfYear()
    {
      for (int i = 0; i < size; i++)
        if (staff[i] instanceof FixedEmployee)
          ((FixedEmployee) staff[i]).endOfYear();
    }

    public void expense(int number, double amount)
    {
      int i = 0;
      while (i < size && staff[i].getNumber() != number)
        i++;
      if (i == size)
        System.out.print("Number not found, ignoring instruction!\n");
      else
        if (staff[i] instanceof Director)
          ((Director) staff[i]).expense(amount);
        else
          System.out.print("Employee with number " + number + 
            " is not a director, ignoring instruction!\n");
    }
  }
The class BetterCompany corrects an omission in Company: the method addEmployee with an Employee parameter. Notice that in this case we do not say that addEmployee is overwriting the method with the same name in the mother class: the signature of these methods is not the same. Here we rather encounter polymorphic variants.

The new method endOfYear makes it possible to perform endOfYear in the same way as increaseSalary in the original version. The new method expense makes it possible to call the method expense in Director in the same way as before we could call changeName.

In endOfYear we see the operator "instanceof". The reason for this is that staff[] is an array of Employee objects. Even though we might believe that these are actually of type fixedWorker, for which the method endOfYear is defined, there might also be a derived class TemporaryAid for which endOfYear is not defined. At this point it is important to introduce the difference between the declared type and the actual type of a variable. The declared type of staff[i] is Employee, the actual type may be any of the derived classes. instanceof determines at runtime the actual type of a variable and returns true if this matches the specified type.

Even though we now are sure that the application of endOfYear is correct, it still does not work to simply write

          staff[i].endOfYear();
The problem is that endOfYear is not mentioned in class Employee. Thus, at compile time, this looks wrong. Therefore it is required to add a so-called cast. A cast is a forced type conversion. So, we convert staff[i] in a FixedEmployee, or our own responsibility. Not withstanding the cast, at runtime the actual type determines which method to select.

Now we have obtained all we need to get a main program with considerably larger functionality. The changes to make are small. If Company would have been designed better, with a method addEmployee with Employee parameter, the changes would have been even less.

  class CompanyTest
  {
    public static void main(String ps[])
    {
      BetterCompany myCompany = new BetterCompany(100);
      myCompany.addEmployee(
        new Staff("Becker, Boris",    235521, 4500.00, 28));
      myCompany.addEmployee(
        new Director("Hecht, Edgar",     878722, 6500.00, 10000000));
      myCompany.addEmployee(
        new Worker("Albers, Marianne", 456212, 1554.00, 23, false));
      myCompany.print();
      myCompany.endOfYear();
      myCompany.print();
      myCompany.addEmployee(
        new Worker("Krauser, Angela",  426578, 1954.00, 25, true));
      myCompany.addEmployee(
        new Staff("Noack, Christina", 663738, 5646.00, 32));
      myCompany.print();
      myCompany.increaseSalary(0.04, 50.0);
      myCompany.addEmployee(
        new Worker("Brauer, Harald", 568900, 2200.00, 25, true));
      myCompany.setName(456212, "Becker, Marianne");
      myCompany.expense(878722, 73000);
      myCompany.print();
      System.out.print("\n");
    }
  }
Running the program gives the following output, clearly showing the result of the more individual treatment.
Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 0)
Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.0E7)
Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 0, false)

Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28)
Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7)
Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false)

Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28)
Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7)
Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false)
Employee[3] = (Krauser, Angela, 426578, 1954.0, worker, 25, 0, true)
Employee[4] = (Noack, Christina, 663738, 5646.0, staff, 32, 0)

Overview of employees:
Employee[0] = (Becker, Boris, 235521, 4680.0, staff, 28, 28)
Employee[1] = (Hecht, Edgar, 878722, 7020.0, director, 1.0E7, 1.4927E7)
Employee[2] = (Becker, Marianne, 456212, 1554.0, worker, 23, 23, false)
Employee[3] = (Krauser, Angela, 426578, 2039.976, worker, 25, 0, true)
Employee[4] = (Noack, Christina, 663738, 5871.84, staff, 32, 0)
Employee[5] = (Brauer, Harald, 568900, 2200.0, worker, 25, 0, true)

Polymorphism

This program may look rather unspectacular, but it illustrates the killer application of object-oriented programming. Notice what we are doing: we are handling objects of different classes within one common structure. In methods like print and increaseSalary, we are calling an old inherited method from Company and nevertheless we get for each object the increased functionality of the class these objects actually belong to: a Director gets a larger salary increase than the others, the shift workers get five extra days of vacation.

The above gives an example of polymorphism in the more strict sense: polymorphism means that variables can actually stand for different kinds of objects. This implies that parts of the program which are formulated in general terms can be applied to different kinds of objects. This notion is closely linked to the notion of dynamic binding: the above described phenomenon, that at runtime the actual type is used to determine which of the methods with identical signature is going to be used.

At compile time, it is checked that any method is connected by the dot operator to an object of a class in which this method is defined. This is done by checking the declared type of the object. At run time, the method to execute is chosen by looking at the actual type of the object connected to the method.

Now it is time to mention that any class is implicitly defined as an extension of class Object. Object is at the top of the class hierarchy. Without knowing this, we have already been using this fact implicitly. Consider a print statement of the following kind:

  System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
How does this work? First the expression between the round brackets is evaluated. Here we use that the operator "+" is polymorphic, although for operators we rather say that they are overloaded. So, depending on the types of the arguments, "+" has a different effect.

This is nothing new, we already know that 3 / 4 < 0.5, while 3.0 / 4 > 0.5. The reason is here that in the first case "/" is evaluated as an integer operation, while in the second case it is evaluated as an operation between doubles. The rule for "/" is that it is evaluated as an integer operator if both its arguments are integers. If one of the arguments is a float or a double, then the other argument is converted to this type as well before the division is performed between floats or doubles. Notice that the resulting type has no impact: if x is a double, then "x = 3 / 4" is equivalent to writing "x = 0". Slightly more tricky is that "x = 3 / 4 * 10.0" has the same effect. The reason is that among operators with the same priority, the evaluation order goes from left to right (in this case).

The rules for "+" are different but similar. "+" between two String objects performs a concatenation of these. If the arguments are objects of other classes, then first the method toString is called. Because toString is defined in Object, this always works. Not overwriting toString results in a standard layout. Overwriting toString, as is done in Employee, allows to tune the output. Only when both arguments of "+" are of a numerical type, it is assumed that an addition is to be performed. Therefore we have

  "Value = " + i + i    !=     "Value = " + (i + i)
  i + i + "= Value"     !=     i + (i + "= Value")
Casts are sometimes needed to obtain a forced type conversion.

Encapsulation

We have seen several of the access modifiers. These are part of a hierarchy which allow the programmer to specify in which classes and packages the methods and instance variables of a method can be accessed.

Unfortunately there is no modifier for "the own class + all derived classes". The only way to obtain this is to define a method / instance variable as "protected" and not integrating any non-derived classes in the package.

A careful choice of the applied modifiers is of great importance: making everything public is convenient, but implies that external applications may essentially use features of the internal realization of a class. If later one wants to change this internal realization, then it may happen that these applications do not run correctly anymore.

It is good practice to fix a well-defined interface between the class and the outside world: that is to fix which instance variables of an object should be visible and which methods should be callable. Less visibility gives more flexibility! Classes should be defined according to their functionality, not according to how it is realized. For example: a Chain has the functionality of a special kind of (multi) set, with two insert operations and the possibility to unify to Chain objects. The general idea of limiting the access is called encapsulation, it is one of the corner stones of object-oriented programming.

The above argument should have made clear that it is wrong to only use public. But only using "private" or "public" is not good either. Sometimes classes are designed with the explicit purpose that they are going to be derived. One can consider Employee to be of this type. One may consider that the structure of Employee is so reasonable that there will never arise need to modify it. At the same time derivations are considerably facilitated if the instance variables and methods are accessible from the derived classes. Therefore, we have chosen to use "protected" for the instance variables in Employee.

The access modifiers allow the programmer to fix the degree of encapsulation of classes, objects and methods. Mostly instance variables are private or protected and can be accessed only by special access methods

Further Important Aspects

By now Java has expanded terribly and few people will have an overview of all classes and methods defined. Here we do not even attempt to provide a complete overview of this. However, there are still many very fundamental aspects which have not been mentioned above. Here some of them are shortly discussed.

Final and Abstract

We have encountered the keyword final in front of a variable. However, final is a general modifier. When final stands in front of a variable, then this means that this variable cannot be changed after declarations. In other words, the variable is actually a constant. This also means that the variable must be initialized upon declaration. Final for a method means that this method cannot be overwritten in a derived class. Final for a class means that the whole class cannot be derived. Possible advantages of the use of "final" are speed and security. If a method is final, then the compiler may "inline" it. Declaring classes final apparently also makes it harder to maliciously operate on a piece of code.

Abstract is more or less the opposite of final: an "abstract" class must be derived. One cannot create any objects of an abstract class. Likewise, an abstract method must be overwritten. To make this consistent, the designers of Java have decided that abstract methods can only appear in abstract classes (but an abstract class can have methods that are not abstract).

Polyinheritance and Interfaces

In many object-oriented languages, there are almost no limitations on ways to inherit. In Java there is a strict limitation: any class inherits from at most one other class. This implies that the complete structure is like a tree: there are no cycles. An example of this we have seen for the classes derived from Employee. Because any class is an extension of Object, there is in fact only one tree, with Object at the top connecting to all classes which are not derived from other classes themselves.

There are good reasons to allow polyinheritance: many objects incorporate aspects of several more general classes. A person can both be an Employee and a ClubMember, an article in a shop may both be a FoodArticle and a LuxuryArticle. However, polyinheritance may also lead to consistency problems: if BClass and CClass each extend AClass and DClass extends both BClass and CClass, then methods from AClass are inherited in two possible ways. If a method from AClass has been overwritten in BClass and/or CClass, then at runtime it would not be clear which one to take.

To exclude this kind of problems in Java polyinheritance is generally forbidden. In other languages this problem is addressed differently: One might generally allow polyinheritance, but forbid inheritances which result in having equally valid variants of methods. One might allow any kind of inheritance, and in case a method is inherited several times, one might for example always select the variant from the first listed class in which it is defined. Java has chosen the most restrictive approach, assuring correctness and facilitating the task of the compiler, at the expense of programming possibilities.

An interface is something like a class, but different. It is close to being a fully-abstract class: it has only abstract methods (because this is default, one does not have to define them as such); it has no instance variables. The only thing it may have is interface constants: static final variables. For the rest an interface is like a normal class: one can define variables, parameter and return values of an interface type.

Interfaces are typically used to express properties. Therefore, it is customary to give an interface a name ending on "able". In the earlier example "BetterCompany", the class FixedEmployee was only defined to obtain a common platform for all the derived classes implementing the method endOfYear. This could better have been done in the following way using an interface (click here if you want to download the complete modified program):

  interface EndOfYearAble
  {
    public void endOfYear();
  }

  class Director extends Employee implements EndOfYearAble
  {
    ...

    public void endOfYear()
    {
      budget = budget / 2 + yearlyBudget;
    }

    ...
  }

  class LowerEmployee extends Employee implements EndOfYearAble
  {
    ...

    public void endOfYear()
    {
      vacationDays = vacationDays / 2
                   + yearlyVacationDays;
    }

    ...
  }

  class Staff extends LowerEmployee
  {
    ...

    public void endOfYear()
    {
      super.endOfYear();
      vacationDays += overTime / 10;
      overTime = 0;
    }

    ...
  }

  class Worker extends LowerEmployee
  {
    ...

    public void endOfYear()
    {
      super.endOfYear();
      if (shiftDuty)
        vacationDays += shiftVacationDays;
    }

    ...
  }

  class BetterCompany extends Company
  {
    ...

    public void endOfYear()
    {
      for (int i = 0; i < size; i++)
        if (staff[i] instanceof EndOfYearAble)
          ((EndOfYearAble) staff[i]).endOfYear();
    }

    ...
  }

Because interfaces have neither instance variables nor worked-out methods, there are no problems related to having implementations of several interfaces and therefore a class may implement any number of interfaces.

Polyinheritance of Classes and Interfaces

Each class extends at most one other class, assuring that the inheritance hierarchy has a tree structure. But, classes may implement many interfaces, telling which methods certainly exist.

Exceptions

The final thing to know about Java is the notion and handling of exceptions. As remarked before, Java has paid utmost attention to preventing as far as possible programming errors. Therefore it has created a system to test for unexpected situations. One can enclose a fragment of program text in a try clause, which is followed by a catch clause.

How does this work? If an error occurs, then one of the following things happens:

  1. The faulty line is surrounded by a try-catch of the corresponding type. In that case the instruction specified in the catch are executed and the computation goes on in the way specified in the catch part.
  2. There is no such try-catch. In that case the exception is passed upwards to the method from which this fragment of code was called, and it is tested again for try-catch. And so on, until a matching try-catch is found or the program exits with an error.
When we think of Java also as the language for internet applications (mainly in the form of applets), then an error condition does not necessarily mean that the program is wrong: it may have asked for a connection to be opened which was impossible because the other side was not replying. Instead of crashing, one might want to try something else, or just go on and ignore the thing.

There are two types of exceptions: runtime and general exceptions. General exceptions must be dealt with, runtime exceptions might be dealt with. An example of a general exception is when reading: Java obliges you to take into account that you are trying to read beyond EOF. Thus, every read must be surrounded by a try-catch. The following piece of code gives a class which contains a static method for reading an integer. In case something goes wrong while reading, the user is informed, and 0 is returned from the method.

  import java.io.BufferedReader;
  import java.io.InputStreamReader;
  
  class IntReader
  {
    static int readInt() 
    // Reads an integer from input
    {
      try 
      {
         return Integer.valueOf(
           (new BufferedReader(
              new InputStreamReader(System.in)).readLine())).intValue();
      } 
      catch (java.io.IOException e) 
      {
        System.out.print("IO Exception occurred, returning 0");
        return 0;
      }
    }
  }

  class ExceptionTest
  {
    public static void main(String ps[])
    {
      int i;
      System.out.print("Give i   >>>   ");
      i = IntReader.readInt();
      System.out.print("i = " + i + "\n");
    }
  }

An example of a runtime exception is division-by-zero: you are free to test for this and to choose an appropriate reaction, but you are free to not do it: things become very slow if you test everything. Important in this context is the keyword throws: this allows to handle general exceptions at a higher level. Using throws indicates that one is aware of the possibility that something might go wrong, but that one does not want to deal with it at this level. Using throws may help to save many try-catch pairs.

  import java.io.BufferedReader;
  import java.io.InputStreamReader;
  
  class IntReader
  {
    static int readInt() throws java.io.IOException
    // Reads an integer from input
    {
       return Integer.valueOf(
         (new BufferedReader(
            new InputStreamReader(System.in)).readLine())).intValue();
    }
  }

  class ExceptionTest
  {
    public static void main(String ps[])
    {
      int i;
      System.out.print("Give i   >>>   ");
      try 
      {
        i = IntReader.readInt();
      } 
      catch (java.io.IOException e) 
      {
        System.out.print("IO Exception occurred, continuing with i == 0");
        i = 0;
      }
      System.out.print("i = " + i + "\n");
    }
  }

Everything is classes, and so are exceptions. By deriving from the class Exception, you can define your own exception classes, which might be useful to test for non-standard exceptions. Any Java book gives you examples in case you might need something of this kind.

Exceptions are there to assure that in case something goes wrong a decent output is produced and resources are freed before crashing or going on in an alternative way.

Turning Applications into Applets

It requires only small modifications to change a program (application) into an applet. Look in any book on Java to see how this is exactly done. This topic is not treated in this lecture.

Summary

Java is an object-oriented language. The guiding idea in the design of Java has been to assure correctness, even if this goes at the expense of speed or flexibility.

At a superficial level, the object-orientedness of Java is expressed by the way methods are called: an object is connected to a method of the class of this method with the dot-operator, putting the object in the foreground. Much more important are the following general concepts of object-oriented programming:

Inheritance:
A class is defined to be an extension of an existing class, inheriting all its instance variables and methods, with the possibility to add instance variables and to add, extend or overwrite methods.
Encapsulation:
Details of the implementation are made unvisible externally: a class is defined by its external functionality and not by its internal realization. This allows a high level of abstraction and the flexibility to later modify the details as long as the external functionality remains unchanged.
Polymorphism:
Methods can have the same name as long as they have a different signature. More importantly, an array (or other objects containing other objects), defined to hold objects of a certain type, may also hold objects of any derived type. This allows to store objects with different features in a common data structure.
Dynamic Binding:
When an object is connected to a method, then at compile time it is tested whether in the class of the declared type of the object this method exists. However, at runtime, it is the actual type of the object, which because of the described polymorphism does not need to be the same, which determines the method to be called. Especially in the context of an array (or other objects containing other objects) with polymorphic objects, this allows to give a specific treatment of objects with different features stored in a common data structure.

Further Code Examples

The following code examples illustrate many aspects of Java, even some more than discussed above, inside working programs. By modification these programs can be used for most common non-graphical programming tasks in Java.

Exercises

Some of the exercises are (almost) identical to the exercises from the chapter on C. This is not a mistake. It is instructive to make the similarities and differences explicit.
  1. Define a class IntArray. The class has two instance variables: an int "length" and an int[] "a". IntArray has one constructor; a method for printing all values in a; and two inversion methods. Inversion is the operation so that afterwards a[i] has the value which initially was found in a[n - i - 1]. The first inversion method works with a dummy array b[]. The second method performs the inversion in-situ, that is, without using much extra memory. Embed this class in a program. In main an IntArray object is created of length 20. The fields of the array are initialized with a[i] = 2 * i, for all i. Then the array is inverted with each of the inversion methods. After each big change the array is printed.

  2. The example class IntegerMatrix from above can be downloaded here. Augment this class with a method for adding matrices (a_{i, j} = b_{i, j} + c_{i, j}, for all i, j) were the two matrices to add are passed as parameters, while the matrix in which the sum is computed is passed as object. Add a similar method for multiplying matrices (a_{i, k} = sum_j b_{i, j} * c_{j, k}, for all i, j). In all operations you may assume that the matrices fit: they are all n x n matrices for some fixed n. On the other hand, you must take care that the product method even computes correctly when the involved matrices are not all different, for example when computing A = A * A. Should you also take extra care with the method for computing the sums? Further you should add methods for setting the value of a specified position of the matrix in an IntegerMatrix and for printing all values of the matrix.

    IntegerMatrix should also have a static variable totalSize keeping track of the sum of all sizes of all matrices, and the constructors should refuse to allocate new memory when totalSize would exceed MAX_TOTAL_SIZE, for some constant. In that case some output is produced, ideally this is handled by a self-defined exception, but this is not required. The method finalize() should be overwritten to assure that totalSize remains accurate even when IntegerMatrix objects are removed by the garbage collector.

    Integrate class IntegerMatrix into a program which creates several matrices, makes some assignments and performing some operations. More concretely, we want you to create matrices A, B and C, as specified below, and to compute A = A . (B + C).

               ( 1  7  2)      ( 3 -7 -3)      ( 0  4 -2)
           A = (-1  2  7)  B = (-4  2  3)  C = ( 0 -1 -5)
               ( 1  4 -5)      (-6 -1  3)      (10  5 -2)
        
    The initial, intermediate and final matrices should be printed. Check that the computed results make sense:
                   ( 3 -3 -5)                  (-17  12 -17)
           B + C = (-4  1 -2)    A . (B + C) = ( 17  33   8)
                   ( 4  4  1)                  (-33 -19 -18)
        

  3. Consider the problem of sorting pairs (x, y) on the value of x. The first position of such a pair is called its key, the second position its name. The key is an integer in a finite range: 0 <= x < m for some reasonably small value m. For the sake of simplicity, even the names are assume to be integers, but these could be arbitrary. The class of these objects is called Pair. Pair has methods getKey(), setKey(), getName() and setName(). The method toString() should be overwritten so as to produce a pretty output: a Pair with key 12 and name 245 should be converted to the string (12, 245). In main() your program should ask for the number of pairs n and m, and then create an array of Pairs with random values (bounding the key values to m).

    This array should be sorted. To this end you should define a class Sort which has a static method sort() which has an array of Pairs as parameter. Sort has another parameter which is used to pass the value of m. Here we are not so much interested in efficiency but in handling classes. Define a further class, called Node. A Node has two instance variables: a Node and a Pair. The class NodeArray mainly consists of an array of Nodes. In our application this array has length m. Because the Nodes will be linked to each other so that they form lists, an object of NodeArray can be viewed as a set of m linked lists. In NodeArray there is a method which allows to insert a Node at the beginning of a list at a specified position of the array. NodeArray also has methods which allow to enumerate all Nodes in all lists in a systematic way, starting with the list at array position 0. The sorting can now be performed by sort() as follows:

    1. A new empty NodeArray is created.
    2. The array of Pairs is traversed and each Pair (x, y) is enveloped in a Node which is added at the beginning of the list starting at position x of the array.
    3. Repeatedly call for the next Node from which the enveloped Pair is extracted (by calling a method in Node). Insert the Pairs in this new order in the array of Pairs.

    Fill in the details yourself and work this out to a running program. Test it for m = 10 and n = 20.

    In the current version, if there are several Pairs with the same key, then the order of these Pairs will get reversed. This is undesirable: in many applications it is required that a sorting subroutine is stable. With a minimal change the above sorting method can be made stable. How?

    What is the running time of your algorithm expressed in terms of n and m? What do you get for m = O(n)?

  4. Write a program for efficiently performing set operations using a boolean for every element of the set, packing 31 booleans (which indicate whether an element is present in the set or not) in an int. The elements in the sets have indices from 0 to n - 1, for some value n which is read at the beginning of the program. Use the class IntReader for this. It can be downloaded here. The supported operations must be:

    Of course you should define a class Set for this. All operations should be perfectly intransparent and the instance variables should not be visible outside the class. All calls to the methods of set must be performed in an object-oriented way, none of the mentioned methods may be static.

    Random numbers can be generated with help of the methods in the class Random in java.util. Use this to generate three random sets of size 100.000.000 each:

    The task is to compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably something is wrong with your program).

  5. The class Chain allows to insert elements at the beginning and end, but in the latter case the whole chain has to be traversed. Define a derived class LChain of Chain, which has one additional instance variable Node last. Of course LChain has its own constructor, which should also call the constructor of the mother class. The method addLast should be overwritten. Even the methods addFirst and addChain may have to be adapted: the method from the mother class should be called, only the new operations should be specified.

    Integrate LChain into a program: take the program ChainTest from the text above and change the type of c1 and c2 from Chain to LChain. The text of the program can be downloaded here.

  6. In program Chain a Node has instance variables int key and Node next. One can also define Node with instance variables int key, Node left and Node right. This gives nodes that can be used to construct a tree. Like the chain a tree is a linked structure but in a tree the nodes may have degree larger than one, in our case they have degree 0, 1 or 2. The Node designated by the instance variable left is called the left child of a node, the right child is defined analogously.

    In a search tree, the keys are not are not arranged in just any possible way, but so that for any node the key of its left child (if existing) is smaller than its own key, and that the key in its right child is larger. This arrangement allows to easily perform the operation find: determining whether an element with a specified key exists or not. This is done in the following way: If the value x is smaller than the key y of the current node, then, if x occurs at all in the tree, it must occur in the left child or the nodes which can be reached from there. If the current node has no left child, then x does not occur. In case x > y, we must go right. If x is equal to the key, then we have found the value.

    A search tree with 7 nodes

    Create a class SearchTree implementing the above ideas. The class has an instance variable Node root. "root" corresponds to "first" in Chain: this is the node from which the structure is entered. There must be a trivial constructor, a method find along the above guidelines and a method print. The return type of find should be Node: it returns null when the value x we were looking for does not occur, otherwise it returns the Node with key equal to x. "print" should print all nodes in some systematic way. A very good idea is to do it recursively, A method is called recursive when it works by calling itself again (with a certain stopping condition). This recursive printing should however be handed over to a method within the class Node or an extension thereof (after testing that root != null). It has a structure of the following kind:

           void print() 
           {
             if (left != null)
             {
               System.out.print("Going left\n");
               left.print();
             }
             System.out.print("Key value = " + key + "\n");
             if (right != null)
             {
               System.out.print("Going right\n");
               right.print();
             }
           }
        
    It is a good idea, but not required, to also hand over find to a method in the class Node.

    Create the search tree from the picture "by hand", that is, by creating nodes with appropriate keys one by one and hooking them in the correct way. Then call print for the tree.

    Inserting a node with key x in a search tree is also easy: Search for x. If x already occurs, we do not insert it again. Otherwise if the search ends in a node with key y != x, then if y > x, a new Node with key x is added as left child, otherwise as right child. Delete can be performed by marking the deleted nodes in a special way, if this value is inserted again later on, the marking must be undone.

    Create a derived class MarkNode of Node which has one additional instance variable: boolean deleted. Of course this class also needs a constructor. The class Dictionary is a derived class of SearchTree. It has additional instance variables int size and int realSize. "size" indicates the number of non-deleted nodes, while "realSize" indicates how many nodes are physically there. Methods insert and delete are added. The actual work should best be done at the level of MarkNode.

    Now create the same tree again by inserting the elements in appropriate order. For two trees to be the same the structure and the keys in corresponding nodes must be the same.

    Create an empty Dictionary. Generate 100,000 random values in the range 0, ... , 199,999 and insert these in the order they are generated. Print the size of the tree. It should lie between 77000 and 80000. Generate 100,000 random numbers in the same range and count how many of them occur in the tree. It should be about 39300. Generate 200,000 random numbers in the same range and perform a delete for all of them. Print again the number of remaining nodes, now print both size and realSize. size should now lie between 46000 and 49000. Again generate 100,000 random values in the range 0, ... , 199,999 and insert these in the order they are generated. Print size and realSize. size is about 107600, realSize should be about 126400.

    Create an empty Dictionary. Insert the numbers 0, 1, ..., 99,999 in this give order. What do you notice. What is the reason? Why did this not happen before? What is your conclusion about the suggested data structure Dictionary?

  7. Write a small Java program consisting of main and a method swap. swap exchanges the value of two integer values passed as parameters. Think of the simplest way to realize this. Hint: embed the variables into an object of the class Integer, or an own class of this kind.

  8. Write a program for converting a text to caps_format, all letters must be replaced by capitals, while the other characters and the layout remain unchanged. The original text is found on the file input, the converted text is written to the file output. Both files stand in the same directory as the program.

  9. Write an applet performing the above exercise on converting a text to caps format. There should be two text boxes, and a submit button. In the first text box the input text is entered, to the second text box the program writes the modified text after pressing the button. The arrangement should be: input text at the top, submit button in the middle and output text at the bottom. The labels of these should be "INPUT TEXT", "CONVERT", "OUTPUT TEXT", respectively.

  10. Write an applet simulating the operations in a post office. The details of the task are specified here.

  11. Define a class of complex numbers. Complex numbers are pairs of two doubles with certain arithmetic rules, which means that they can be added, subtracted and multiplied (and more). The rules are
    (a, b) + (c, d) = (a + b, c + d),
    (a, b) - (c, d) = (a - b, c - d),
    (a, b) * (c, d) = (a * b - b * d, b * c + a * d).
    Here the symbols +, - and * inside the brackets denote the operations on doubles. Define a class ComplexRing implementing these operations as methods. The methods should be called add, subtract and multiply. They should be non-static: the return value is the object the method is called with: computing x = y + z is performed by calling x.add(y, z). The method isZero is non-static. It returns a boolean when the complex number passed as an object is equal to zero. A complex number (a, b) is zero when a == 0 and b == 0.

    Add a constructor which can be called with two double arguments. Also add a method (called with an object connected to the method name with the dot operator) readComplex. readComplex asks for two doubles which are read with help of the class DoubleReader. It can be downloaded here. Overwrite the method toString from Object to enable printing complexNumbers in a decent way. The instance variables should be "protected" and the methods "public".

    We build on on the class ComplexRing. Define a derived class ComplexField. This class has one extra instance variable and some extra methods. The private double instance variable "norm" at all times gives the norm of the complex number, which for a number (a, b) is defined as a^2 + b^2.

    The method isZero is overwritten: the test on zero is simplified to norm == 0. reciprocal is a private static method returning the reciprocal of the complex number passed as an argument. This is defined as follows: for a number (a, b) the reciprocal is given by the number

    (a, b)^{-1} = (b / norm, -a / norm).
    Use this method to define a method divide which returns x / y = x * y^{-1} in the object z with which it is called for two complex numbers x and y passed as arguments. So, the method can be called as z.divide(x, y).

    In all methods both these of ComplexRing and CompledField one should be careful to assure that if the arguments overlap with the calling object, that then the correct result is computed. Either one might test for this and write a special strategy, or one should work with some dummy variable.

    Define an exception divisionByZero. This exception is thrown by the method divide when the second argument equals zero. Read the above text on exceptions to see how this is done and consider the example in class Seven.

    Create a program with main embedded in a class called ComplexTest. In main five complex numbers are created: u, v, x, y, z. u, x and y are read in. v is initialized at zero. Then compute z = x + y - z, and subsequently z = z * v. Print the results. Then compute x / x x / y and z / z and print the results.





Simple Data Types

In this chapter we present three data types, lists, stacks and queues, which are slightly arbitrarily grouped together. Their common feature is that they are mostly implemented either with arrays or with linked lists. We start by giving the formal definition of an abstract data type, a central notion in the study of data structures and a fundamental programming concept as well.

Abstract Data Types

An abstract data type (ADT) is a combination of a set (though one might even consider more general concepts) with a set of operations defined on them. The word abstract is essential here, we do not (yet) speak of realizations in the form of code in a computer language. Examples of ADTs are the following: The list of properties may be longer or shorter. Each abstract data type has, however, some properties which are so important, that without these properties we would not call it that way. These are the defining properties. In the above list, the defining properties are printed bold.

Depending on the precise operations that can be performed each ADT has variants that may be best realized with different implementations. Sometimes also the frequence with which the operations are performed may have an impact on the choice of the implementation. For example, priority queues may or may not support decreasKey. If decreaseKey is supported, this operation may be far more frequent than insert and deleteMin. Such aspects have resulted in a multitude of priority-queue implementations.

In the following we look at implementations, but it is essential to keep in mind the separation between an ADT and its implementation. If we speak about the list ADT, that this does not refer to a special actual data structure. And although priority queues are mostly implemented with heaps (which themselves can be implemented in various ways), we do not even want to know this when we talk about priority queues. In this context programming languages like Java that allow encapsulation are at their best.

Lists

We consider two implementations (realizations is maybe a more correct terminology for going from abstract to concrete) for the list ADT. The first is a simple array, the second is with a linked list.

We want to be able to perform a sub- or superset of the following operations:

Array Implementation

With an array all this is easy to realize: we just have to remember the number of elements (in Java nicely hidden from the user by making it private) and then we can implement all the methods in a straight forward fashion. Insert at position x implies that all subsequent elements are shifted one further. This is inefficient: it takes O(n) for a list of length n. In total we get the following times for the operations:

Linear List Implementation

For some of the desired operations things become more efficient with an alternative implementation. The canonical implementation for a list is the linear list. In C this structure is realized by pointers.

So, we have a "pointer" to the first element, and then there follows a number of elements that are linked together, until a final element that points to "null", the special element that means "no object". In Java there is no explicit concept of pointers. Instead in Java one can define a class with a field of the class itself, which is typically called next. Writing x = y.next brings us to the next element in the list, leaving all address and pointer management implicit.

We consider how we can perform the above operations with a linked list data structure. The main difference with before, is that once we are at a position (getting there is expensive because we cannot jump to an arbitrary position but must plow forward through the list) insertions and deletions are trivial: by simply relinking an element can be in/excluded in constant time. So, now we have

So, some operations are now considerably cheaper, while others are marginally more expensive. It depends on the ratio of the operations to be performed whether linear lists are to be preferred over arrays.

Implementational Details

In languages like Java, instead of pointers, we have "iterators". These are objects that for the given data structure, in this case the linked list, simulate the operations of a pointer (of course you are free to give them less or more functionality). So, what do we want to do with a linked-list iterator? This is exactly what is realized by the following:
 class Node
  {
    Object key;
    Node   next;

    Node(Object key, Node next)
    {
      this.key  = key;
      this.next = next;
    }

    Node(Object key)
    {
      this(key, null);
    }
  }

  public class Iterator
  {
    Node current;

    public void Iterator(Node current)
    {
      this.current = current;
    }

   public void Iterator(LinkedListIterator iterator)
    {
      current = iterator.current;
    }

    public boolean isLast()
    {
      return current == null;
    }

    public void next()
    {
      if(!isLast())
        current = current.next;
    }
    
    public Object getKey()
    {
      if(isLast())
        return null;
      else
        return current.key;
    }
  }

It is intentional that LinkedListIterator and all its methods are public, thus being accessible everywhere, whereas ListNode and current are unmodified thus being accessible only from within the package. Because current is not public, the second constructor is added which allows to create a new iterator "pointing" to the same node as before. The class LinkedList can now be build up on top of the iterator. Using the class LinkedListIterator as an intermediate gives a clear separation of the concepts of the list and a position in it.

Alternative Structures

There are other similar and interesting structures. The most important is a doubly linked list. A doubly linked list has "pointers" between the nodes in both directions, thereby allowing to also go back. This facilitates many operations, but of course also increases the cost of insertions and deletions and the amount of memory required.

Stacks

Stacks are another elementary abstract data type. We want to be able to perform the following operations. Push and pop together make that the elements that entered latest, leave the structure first. This way of doing is known as last-in-first-out, LIFO.

The requirements are more limited than for lists, and therefore we may expect all operations to be performable efficiently. In this context, that means in O(1) time. Stacks are of great importance in many contexts, particularly also in the computer itself: a procedure call results in pushing the current state of the calling routine on the stack, allowing to resume execution here at a later stage after restoring the state.

Implementation

Stacks can be implemented with linked lists, of course, and this is the most elegant way of doing. There is also a tremendously simple implementation with arrays. And even though it is not so appealing, it is so simple and efficient that it still may have its value in a subroutine that is crucial for the performance.
  public class Stack
  {

    // Invariant: at all times after completion of the routines,
    // numberOfElements always gives the current number of elements.
  
    private int[] a;
    private int   size;
    private int   numberOfElements;
  
    public Stack(int size)
    {
      this.size = size;
      a = new int[size];
      numberOfElements = 0;
    }
  
    public boolean isEmpty()
    {
      return numberOfElements == 0;
    }
  
    public boolean isFull()
    {
      return numberOfElements == size;
    }
  
    public void push(int x)
    {
      if (! isFull())
      {
        a[numberOfElements] = x;
        numberOfElements++;
      }
      else
        System.out.println("Stack full, ignoring addition!");
    }
  
    public int pop()
    {
      if (! isEmpty())
      {
        numberOfElements--;
        return a[numberOfElements];
      }
      else
      {
        System.out.println("Stack empty, returning 0!");
        return 0; 
      }
    }
  }

The Basic Stack Operations

Dynamizing the Size of the Array

Because initially we often do not know how big a stack might be, we somehow want to make this approach dynamic. Because we do not want to waste too much memory, we cannot just declare int a[size_of_universe]. Therefore, we apply a common trick: if the number of elements on the stack exceeds the size of the array n, then n is multiplied by a factor x, mostly x == 2. If the number of elements on the stack becomes smaller than n / x^2, then n is reduced by a factor x. This technique assures that on the average (amortized is the official word) the operations can be performed in O(1) time, and at the same time that we never waste more than a constant factor of the memory.

We work out the details. If we have just started a fresh structure, then we may assume the size of the stack, n, to be half of the size of the array, 2 * n. That is, if we are growing beyond the size of the array, then we have been performing at least n pushes to the stack. That is, creating a new array of double size and copying all the data from the stack into it would cost at most 2 copy operations per performed push operations. If the size of the stack is shrinking, then we are going to create a new array of size n and have to copy the remaining n / 2 elements of the stack into it. It takes at least n / 2 pops before we have to do this. Thus, amortizing these costs, this adds the cost for one copy for every performed pop.

These ideas require only minor modifications to push and pop. For x = 2, this has been implemented in a program, which can be downloaded here. In this program the effectiveness of this strategy is tested by performing many pushes and pops on an initially empty stack of size 2. More precisely, n steps are performed. At most 2 pushes are performed, each of these pushes is performed with 50% probability. Then a single pop is performed. So, the expected number of pushes equals the number of pops, but because pops are occasionally performed on an empty stack, the expected average stack size nevertheless grows with n. Running a test for n = 10^9 gives:

  Number of pushes          = 1000029574
  Number of pops            = 1000000000
  Number of pops on empty   = 2647
  Number of copied elements = 201196
  Maximum number on stack   = 37315
  Average number on stack   = 15456
  Final   number on stack   = 32221
In this case the maximum size is not so much larger than the average size. A more important observation is that the number of copied elements is negligible in comparison with the number of operations. In other words, the close-to-optimal size of the array is obtained at negligible additional cost. This program may be understood as an abstraction of some clerk doing work at a constant pace. In principle he is fast enough for the work supplied, but nevertheless, in the course of the years, the pile of work on his desk tends to grow.
An array-based implementation of a stack combines simplicity and efficiency. Dynamically adapting the size of the array to the number of elements on the stack, the memory usage can be kept to within a constant factor from the optimum, while assuring O(1) amortized time per operation both for pushes and for pops.

Achieving O(1) Worst-Case Time

The above idea gives us optimal O(1) amortized time per operation. Nevertheless, we are not entirely satisfied: the individual operations may take very long, O(n). If we are going to rebuild the whole stack when we want to perform a push, then our customer may be gone by the time we have entered his/her request! Even for this problem there is a solution (at the expense of slightly higher costs per operation). The idea is to work with zones: the green-zone, that is were we start after starting after a reconstruction; two yellow-zones indicating that we are getting nervous about the development of the number of elements on the stack; and the red-zone, which is forbidden territory. Once the number of elements on the stack enters the yellow zone, we start to build a new stack embedded in a smaller or larger array. This is done in the background, delaying our operations by a small factor. The speed of this rebuilding is chosen so, that if we progress towards a red-zone, the new structure is just ready by the time we are arriving there.

We work out the idea outlined above. In practice, depending on the distribution of pops and pushes, it may be best to work with green and yellow zones, but the easiest is to start building new structures immediately. As soon as we start working with the new array of size 2 * n, we create a new array: A_s of size n and A_l of size 4 * n. Then, whenever the size of the stack reaches a new maximum value n + x, we copy two more elements from the stack to A_l (the elements with indices 2 * x - 2 and 2 * x - 1). Whenever we reach a new minimum value n - x, we copy one element to A_s (the element with index x - 1). In this way, the copying has just been completed when the stack has 2 * n or n / 2 elements.

Because of the very special (simple) properties of the insertions and deletions to a stack, all this is rather easy. For other data structures, one can often apply the same idea, but not so easily. For a list, to which insertions can be performed anywhere one must be more careful, and updates may also have to be made to the half-ready structure. Actually, for stacks the cost for creating new smaller structures can be saved all together, when we do not throw away the previously used smaller arrays: the contents of these are still valid.

Starting the construction of the new larger array before it is actually needed, the memory usage can be kept to within a constant factor from the optimum, while assuring O(1) worst-case time per operation both for pushes and for pops.

Application

A simple and very useful application is testing whether all (different kinds of) brackets are matching. For example, the expression (<{[][{}]()}>{()}<(<>[({{}[]})]{<>})>) has a correct bracketing structure. This is done by traversing the file that must be tested. If an opening bracket is encountered, then it (or a corresponding number) is pushed. If a closing bracket is encountered, then the top element is popped from the stack and compared with the current closing bracket. Error conditions are:

Queues

The third linear ADT we are considering are the queues. We must come with implementations that support the following operations: Enqueue and dequeue together make that the elements that entered first, leave the structure first: first-in-first-out, FIFO.

Linear List Implementation

The most appealing and natural implementation of queues, more so than for stacks, is the linear list: keeping track of the first and the last list node, both enqueues and dequeues can be implemented in O(1).

Implementation of Queues with Linked Lists

Array Implementation

Alternatively, there is a trivial and stupid implementation of queues (which is nevertheless acceptable if the size of the queue is small, say 20): we have an array and maintain as invariant property, that at all times the element to be dequeued, the head of the queue, stands in position 0. This is achieved as follows: when dequeuing, all entries are shifted one position, when enqueuing, the new entry is written at the first free position. Doing this for a queue with n elements, enqueus take O(1) time, while dequeues take O(n) time.

It is not hard to achieve O(1) for both operations. There are variables head and tail. Head indicates the position of the array where the next element can be dequeued. Tail indicates where the next element can be enqueued. Initially we set head = 0 and tail = 0. The following operations then correctly perform the queue operations as long as the array is long enough:

  class Queue
  {

    // Invariant: at all times after completion of the routines,
    // head indicates position to dequeue, tail position to enqueue
  
    protected int a[];
    protected int size;
    protected int head;
    protected int tail;
  
    public Queue(int s)
    {
      size = s;
      a = new int[size];
      head = 0;
      tail = 0;
    }

    public boolean isEmpty()
    {
      return tail == head;
    }

    public boolean isFull()
    {
      return tail == size;
    }

    public void enqueue(int x)
    {
      if (!isFull())
      {
        a[tail] = x;
        tail++;
      }
      else
        System.out.println("No space left, ignoring!");
    }

    public int dequeue()
    {
      if (!isEmpty()) 
      {
        int x = a[head];
        head++;
        return x;
      }
      else
      {
        System.out.println("Attempt to dequeue from empty queue!");
        return 0; 
      }
    }
  }

The problem with this approach is that if we are performing many operations, then we will run out off the array, even when the maximum number of elements enqueued at the same time is small. There is a much better construction with arrays, which is hardly more complicated and still guarantees that all operations (as long as the size of the array is sufficiently large to store all elements) can be performed in O(1): head and tail are computed modulo the length n of the array a[]. There are three situations to distinguish:

So, as a result of enqueues and dequeues, the queue wanders through the array in a circular way. In Java this better implementation can easily be realized by creating a super class and overwriting some of the old methods giving them improved functionality:
  class BetterQueue extends Queue
  {
  
    protected boolean isFull;
  
    public BetterQueue(int s)
    {
      super(s);
      isFull = false;
    }
  
    public boolean isEmpty()
    {
      return (head == tail) && !isFull;
    }
  
    public boolean isFull()
    {
      return isFull;
    }
  
    public void enqueue(int x)
    {
      if (!isFull()) 
      {
        a[tail] = x;
        tail++;
        if (tail == size)
          tail = 0;
        isFull = tail == head;
      }
      else
        System.out.println("No space left, ignoring!");
    }
  
    public int dequeue()
    {
      if (!isEmpty()) 
      {
        int x = a[head];
        head++;
        if (head == size)
          head = 0;
        isFull = false;
        return x;
      }
      else
      {
        System.out.println("Attempt to dequeue from empty queue!");
        return 0; 
      }
    }
  }

The Basic Queue Operations

Exercises

  1. Consider an ADT supporting the following operations: Describe an implementation based on linked lists and one based on arrays supporting all these operations in O(1) time.

  2. Consider an ADT supporting the following operations: The structure should be so that it is somehow circular: previous and next are executable for any non-empty structure. Give a detailed description of an implementation based on linked lists in which all these operations can be performed in O(1) time.

  3. Suppose we want to test the correctness of bracket structures involving "()", "{}", "<>" and "[]". It was indicated above how one can use a stack for this. For the stack we want to use an array of fixed size. How long should this array be for the string (<{[][{}]()}>{()}<(<>[({{}[]})]{<>})>)? Write a piece of C-like pseudocode for determining the minimum necessary size of the array for testing a string given as a character array s[] of length n.

  4. Give a sequence of n pushes and n pops for which the array-based stack implementation with dynamic array size has to perform particularly many copying operations. Estimate this number of copying operations as a function of n.

  5. Consider again the program which implements a stack in a dynamically expanding and shrinking array. If the expected number of pushes exceeds the expected number of pops, then the size of the stack will grow without limit. Even if both numbers are equal, the average size gradually increases. If The expected number of pops exceeds the number of pushes, the average stack size remains constant. We want to study this in more detail.

    Modify the program so that in every round exactly one pop is performed, while two pushes are performed with probability p each, 0 <= p <= 1/2. For p = 0.45, 0.40, 0.30, determine the average stack size when performing 10^7, 10^8 and 10^9 rounds. Are the results in line with the claimed constant average size? Give the average size as a function of p. Depending on the context, the maximum stack size may be more important than the average stack size. Also measure these. If necessary you should repeat the experiments to get more stable results. Is even the maximum size independent of the number of rounds?

    These experiments are important because of the following. For the analogous dynamized implementation of a queue the average and maximum sizes are the same. The queue process can be interpreted as customers arriving at the cash of a supermarket being serviced one at a time in order of arrival. The attempts to dequeue from an empty queue can be viewed as "idle time". From the point of view of the shopkeeper, this is wasteful. On the other hand, if waiting times become too long, customers may prefer another shop next time. Customers are probably most sensitive to long waiting times. How much over-capacity must a shop offer in order to assure that less than one percent of the customers enters the queue when five or more other customers are waiting already?





Dictionaries

Trees

Together with arrays and linked lists are trees the most important data structure. With good organization they can be used to implement the dictionary ADT, supporting the operations find, insert and delete so that all operations can be performed in logarithmic time. As, these operations are the foundation of any dynamic data structure, this is of utmost importance.

Definitions

It is common to use terminology from family relations. In addition some tree-like terminology is used. A tree T is a directed acyclic connected graph. By induction it can be shown that a tree with n nodes has exactly n - 1 edges. A set of trees is called a forest. A forest consisting of k trees with n nodes in total has n - k edges. The tree concept is a general notion from graph theory. The trees considered in this chapter are rooted trees. That is, there is a specified node called root and all edges are normally directed away from this root node r. For rooted trees the following terminology is commonly used: When drawing trees, the root is typically drawn at the top and the leafs at the bottom! The notions of height and depth are given with respect to this convention. In general there are no restrictions on the degree of the nodes. A tree with n nodes may consist of a root directly connected to n - 1 leafs. In this case the root has degree n - 1. The depth of this tree is 1. The other extreme is that all nodes are linked together as a chain. In this case the nodes have degree at most 1. The depth of this tree is n - 1. A tree of degree at most 2 is called a binary tree, a tree of degree at most 3 is called a ternary tree.

Tree Definitions

Binary trees can be generated starting from trees with zero or one nodes in a recursive way. Generally, a tree is given by one of the following structures:

This characterization is very important because it leads to simple recursive algorithms for most common problems on trees.

In addition to the edges directed away from the root, other edges may be added in order to facilitate certain operations. It may be handy to add links to the parent nodes. Because every node has at most one parent, this does not cost much. If the leafs of T are linked together, it becomes easy to walk along the bottom of the tree. If, as in most binary-tree implementations the nodes have storage for links to two children, this can be implemented without using additional memory.

The following gives a basic Java class for the nodes of a binary tree:

  class TreeNode
  {
    int key;
    TreeNode leftChild;
    TreeNode rightChild;
  
    TreeNode(int key)
    {
      this.key   = key;
      leftChild  = null;
      rightChild = null;
    }
  }
A ternary tree may be realized by adding an instance variable TreeNode middleChild. This may also require a second key value. If the degree of the tree is bounded by some fixed value k, but not necessarily a small constant, then it is convenient to work with an array of length k containing the possible children. Because nodes like this require O(k) storage this should only be applied if all internal nodes have degree k or some value close to it. For the leafs another type of nodes should be used. If the degree is unbounded or if the tree is very irregular, then the children should be maintained in some list data structure.

Notice that any tree in which all nodes that are not leafs have degree 2 has at least n / 2 leafs. More precisely, such a tree has (n + 1) / 2 leafs. The proof goes by induction. The claim is true for a tree with one leaf. Now consider a tree T with at least one internal node. Let u be one of the deepest lying internal nodes. Because of this choice, both children v and w of u are leafs. Let T' be the tree obtained from T by replacing u and its two children by a single leaf node u'. T' is again a tree satisfying the condition that all non-leafs have degree 2, having n - 2 nodes. So, it has ((n - 2) + 1) / 2 = (n + 1) / 2 - 1 leafs because of the induction hypothesis. Thus, T has (n + 1) / 2 leafs, because it has v and w as leafs instead of u'.

Trees will be used to store information in the nodes, that can be found with help of keys that are used for guiding us on a path down from the root to the information we are looking for. The fact that more than half of the nodes are leafs implies, that if for some reason we prefer to only store information in the leafs, and not in the internal nodes (this may make updates easier), then we can do so at the expense of at most doubling the size of the tree.

In applications, a node may stand for a personal record which contains much more information than a single integer. A person has a first and a second name, a gender, an age an address, a personal number, etc. This can be handled in two rather different ways:

The first approach may be most natural, but does not give a separation between the tree structure and the managed data. Therefore the second solution is more elegant. The choice also has a strong impact on the memory management. With the first solution, the tree nodes become much larger, implying that the tree takes much more space. This may have a negative impact on the time for traversing the tree. Another disadvantage is that in some operations not the nodes of a tree are rearranged, but rather the information contained in them. With the second solution the cost of such an operation is independent of the amount of additional data. On the other hand, the first solution has the advantage that all information is readily at hand. The second solution implies another indirection for accessing the data. Therefore it will mostly be better to maintain the key information in TreeNode, because the keys are accessed much more frequently than the other information.

Binary Trees for Searching

How can a binary tree be used for building a simple database? Assume that we have somehow built it so that if we are looking for a key x, we can find it by going down the tree as follows: if key > x, then go left, if key == x, then found, if key < x, then go right. If the required branch does not exist, then we conclude that the object we are looking for is not there. For such a search to be correct, the tree should be a search tree in the sense of the following definition. A binary search tree, is a binary tree satisfying the search-tree property. A binary tree has the search-tree property if every node has the search-tree property. A node with key y has the search-tree property, if all (if any) key values in its left subtree are smaller than y and all (if any) key values in its right subtree larger.

So, on a binary search tree (hereafter the word "binary" will be omited, as we will mainly consider binary trees), testing whether a certain key x occurs in the set of stored keys can be performed as follows:

  boolean find(int x)
  // Test whether there is a node with key x.
  {
    if      (x < key)
      if (leftChild  != null) // go left
        return leftChild.find(x);
      else
        return false;
    else if (x > key)
      if (rightChild != null) // go right
        return rightChild.find(x);
      else
        return false;
    else // key == x
      return true;
  }
This method is called from outside with root.find(x) (assuming root != null). It takes time proportional to the number of traversed links. If there is a node with key x, this equals the depth of x. In general, the running time of find is bounded by O(depth), where depth gives the depth of the tree.

The above method find() is correct and easy to understand, but it can be formulated much more concisely and slightly faster as follows:

  boolean find(int x)
  // Test whether there is a node with key x.
  {
    if (x < key)
      return leftChild  != null && leftChild.find(x);
    if (x > key)
      return rightChild != null && rightChild.find(x);
    return true;
  }
Here we are explicitly using that boolean expressions are evaluated lazily: once it is known that leftChild == null, the second part of the expression is not evaluated.

Finding in a Binary Search Tree

Depth of Binary Trees

Form the above we see that the crucial cost parameter is the depth of the tree, that is, the number of links for reaching the deepest leaf when starting from the root. In this section we study some properties of the depth, and consider how to compute it.

In a binary tree each node has at most two children. This immediately gives a simple recurrency relation for the maximum number n_max(k) of nodes in a binary tree of depth k:

n_max(0) = 1,
n_max(k) = 1 + 2 * n_max(k - 1), for all k > 0.
Trying a few values gives n_max(0) = 1, n_max(1) = 3, n_max(2) = 7, n_max(3) = 15. This should suffice to formulate the hypothesis that generally n_max(k) = 2^{k + 1} - 1. This can easily be checked using induction. More generally, for g-ary trees, we have
n_max(g, 0) = 1,
n_max(g, k) = 1 + g * n_max(g, k - 1), for all k > 0.
n_max(g, 0) = 1, n_max(g, 1) = 1 + g, n_max(g, 2) = 1 + g + g^2 and n_max(g, k) = sum_{i = 0}^k g^k = (g^{k + 1} - 1) / (g - 1). Notice that n_max(g, k) >= g^k for all g and k, which is easy to remember. So, the number of nodes may be exponential in the depth of the tree.

A level in a tree is constituted by all nodes at the same distance from the root. Level k of a binary tree T is said to be full if there are 2^k nodes at level k of T. A binary tree is called perfect if all levels are full except possibly the deepest one. A kind of a reversal of the above analysis of N(k) is given by:

Lemma: A perfect binary tree with n nodes has depth equal to round_down(log_2 n).

Proof: A perfect binary tree with 2^k - 1 nodes has depth k - 1: it consists of k full levels. Thus, the perfect binary tree with 2^k nodes must have depth k. At this new level, 2^k nodes can be accommodated. So, any perfect binary tree with n nodes, for 2^k <= n < 2^{k + 1} has depth k. For all these n, the depth is given by k = round_down(log_2 n). End.

Small Perfect Trees

The depth of trees ranges from logarithmic to linear. Therefore, bounding the time of operations in the depth only makes sense if at the same time it is assured that the depth remains bounded.

The depth of all nodes can be computed by a simple recursive procedure, exploiting the structural characterization of a tree (empty, a node or a node which is the root of two trees):

  void computeDepth(int depth) 
  {
    this.depth = depth;
    if (leftChild  != null)
      leftChild.computeDepth(depth + 1);
    if (rightChild != null)
      rightChild.computeDepth(depth + 1);
  }
This method is called from outside with root.computeDepth(0) (assuming root != null). Because a single execution of the body of this method takes O(1) and because it is called once for every node, computeDepth() takes time linear in the number of nodes. This can also be proven in a more formal way:

Lemma: computeDepth() takes O(n) for a tree with n nodes.

Proof: The time consumption T(n) for applying computeDepth() to a tree with n nodes is given by

T(0) = 0, T(1) <= c, T(n) <= c + T(n_left) + T(n_right).
Here n_left and n_right denote the size of the left and right subtree of the root respectively. n_left + n_right = n - 1. The solution is T(n) = c * n. This can be proven by induction. For n = 0 and n = 1, this is correct. Now assume the claim has been proven for all n' < n, then we get T(n) <= c + T(n_left) + T(n_right) = c + c * n_left + c * n_right = c * (n_left + n_right + 1) = c * n. End.

Enumerating all Keys

Again exploiting the structural characterization of a tree, it is easy to enumerate all stored keys. For example as follows:
  void preorder()
  // Print all nodes in preorder.
  {
    System.out.println("Key = " + key);
    if (leftChild  != null)
      leftChild.preorder();
    if (rightChild != null)
      rightChild.preorder();
  }
This method is called from outside with root.preorderEnumerate() (assuming root != null). Just as computeDepth() it takes O(n) time for a tree with n nodes. Instead of the print statement any other instruction might be executed with the data of this node. The method performs a so-called preorder traversalinorder traversal, if the operation is performed after the second recursive call, it is called a postorder traversal.

Of these three traversals the inorder traversal is the most interesting if the key values are printed. Because of the definition of a search tree, this results in all key values being printed in sorted order. The correctness of this claim can be proven by structural induction. For any tree, we may assume the keys of the nodes in the left subtree of the root are printed in order. All these values are smaller than the key of the root, which is printed after them. This root key is printed before any of the keys of the nodes in the right subtree, which are printed after it, and which may be assumed to be printed in sorted order.

Tree Traversals

The traversal can also be performed in an iterative way using a stack: before continuing with the left child, the current node is pushed; when getting stuck the top element is popped from the stack and we continue at its right child. This idea can be worked out as follows:

  static void preorder(TreeNode current) 
  // Print all nodes in preorder.
  {
    Stack stack = new Stack(); // A stack of TreeNode
    while (current != null)
    {
      System.out.println("Key = " + key);
      stack.push(current);
      current = current.leftChild;
      while (current == null && stack.isNotEmpty())
        current = stack.pop().rightChild;
    }
  }
This is less elegant but may save time and storage. In order to save time the stack management should be integrated into the method. Memory is saved because for each node on the path only a single pointer is pushed on the stack. Very deep recursions, such as they may arise when applying the recursive algorithm to a very deep tree, are problematic because they may lead to an overflow of the internally managed recursion stack. The iterative algorithm does not share this problem because a user-defined stack can be as large as the available memory.

Constructing a Binary Search Tree

So far we assumed the tree to work on was given. Later we will see how to insert nodes to a tree and this allows to build a tree for a specified set of keys by inserting nodes with these keys one-by-one. This is a general idea: if a data structure supports insertions, then such a data structure can be build by repeated insertion. However, this does not need to be the most efficient way of constructing such an data structure and does not necessarily gives the most balanced construction.

A binary search tree can be built in a direct way by first sorting the key values then selecting the middle key for the root r and building recursively trees for the sets of smaller and larger values, which are attached to r as left and right child, respectively. For a set of keys stored in an array a[] of length n, this can be worked out as follows:

  static TreeNode build(TreeNode[] a, int l, int h)
  // Build a tree from the nodes in the subarray a[l] ... a[h].
  {
    if (h <  l)
      return null;
    int m = (l + h + 1) / 2;
    a[m].leftChild  = build(a, l, m - 1);
    a[m].rightChild = build(a, m + 1, h);
    return a[m];
  }

  static TreeNode build(TreeNode[] a, int n) 
  {
    Sort.sort(a, n);
    return build(a, 0, n - 1);
  }
How long does this take? First we are sorting. For general instances, this takes O(n * log n), but we should not forget that it sometimes can be performed faster. Then we have a recursive procedure for which the time consumption can be written as follows:
T(0) <= c
T(n) <= c + T(m) + T(n - m - 1), for all n > 0.
The most easy way to proof that the time consumption is linear is by considering the development of the times bottom-up: T(0) <= c, T(1) <= 3, T(3) <= 7 * c, T(7) <= 15 * c. Generally: T(n) <= (2 * n + 1) * c, for all n = 2^k - 1, for some k >= 0. This can be proven by induction: T(0) = c = (2 * 0 + 1) * c. Assume the relation holds for all k' < k, then T(n) = T(2^k - 1) <= c + 2 * T(2^{k - 1} - 1) = c + 2 * (2^k - 1) * c = (2 * n + 1) * c, as it should be. So, we need T_sort + O(n) in total. The constructed tree is a perfect tree, but at the lowest level the nodes are not "right-aligned".

Building a Perfect Binary Tree

We claim that buildTree has the optimal time consumption. For instances for which T_sort(n) = O(n), this is clear: in that case T_build(n) = O(n) and building a tree with n nodes takes at least linear time. So, assume T_sort(n) = omega(n). Above we have seen that, given a binary search tree with n keys, an inorder traversal can be used to output all keys in sorted order in O(n) time. Thus, denoting by T_build(a, n) the minimal time for constructing a binary search tree for the n keys in a[], we have

T_build(a, n) + c * n >= T_sort(a, n).
Thus, T_build(a, n) >= T_sort(a, n) - c * n. If T_sort(a, n) = omega(n), then T_sort(a, n) - c * n >= T_sort(a, n) / 2 for sufficiently large n, showing that T_build(a, n) = Omega(T_sort(a, n)). So, the optimality is of the strongest kind: for every input the time lies within a constant factor from the optimum for that input.

Dictionary ADT

Definition

For sets of objects with integer keys attached to them the dictionary ADT is the ADT supporting the following operations: More generally the keys may be comparable in the sense that any two values x and y can be ordered: either x < y, or x == y or x > y. The type of the keys has no impact on the algorithm, and therefore we will continue to assume the keys are integers.

Using arrays it is no problem to either perform insert in O(1) and find in O(n), or, when keeping the array sorted at all times, to do the find in O(log n) and the insert in O(n). Trees are the key to greater efficiency. Let us first consider how these operations can be implemented, without paying too much attention to the efficiency.

Realization with Static Tree

When can we construct a perfect binary tree for searching purposes? Whenever we have a fixed or almost fixed database on which we are going to perform many finds. Of course, in this case the tree does not add anything to the possibility to perform a binary search in a sorted array. Though, a search through the tree as given above is somewhat simpler: there are no indices to compute. On the other hand, it requires extra storage for the links. Actually, the construction of the tree is kind of a generic binary search: we follow all paths that ever may be followed.

If there are rare updates, then it is a good idea to keep in addition to the tree (or simply the sorted array of elements), a list with "new issues". If an element is added, then it is put in this list. If an element is deleted, then it is marked as deleted in the existing tree. Doing this, a deletion is trivial: the same time as a find. An insertion takes O(1) after testing whether the element is already there. A find takes O(log n + length of additional list). So, this is fine as long as the additional list contains at most O(log n) elements. Because of the extreme simplicity, this may indeed be a good idea in some practical cases.

A disadvantage is that every now and then the tree has to be rebuild entirely. However, there is no problem to amortize these costs. The idea is very similar to what we have been doing for queues and stacks implemented array: start rebuilding the tree already before we need the new one.

Even amortizing the cost will not bring us so much: already after O(log n) additions, we may need a new tree. This new construction costs us O(n) (most keys are already sorted), so n / log n per performed insertion. Thus, only if we perform many more finds and deletes than insertions (the factor is n / log^2 n), this works well.

Actually, this whole idea has been applied for centuries in the context of printed media. Think of an encyclopedia: the original work appears, soon followed by a list of errata. Then, every year, updates are added. The speed at which information can be found goes down with every update. Finally a completely new issue appears and the process begins again. One of the biggest advantages of the use of electronic storage devices is that information can be changed or deleted without leaving traces.

Realization with Changing Tree

The above shows that only rarely it is a good idea to once construct a perfect search tree and to perform updates to a separate structure. Generally it is better to perform the updates to the tree itself. Of course this implies that the structure of the tree may deteriorate: as a result of insertions and deletions, it may happen that the tree becomes deeper than strictly necessary for the number of nodes in it. Within certain limits such deviations will be tolerated. However, at all times the search-tree property must be preserved, because otherwise we might have to traverse the whole tree in order to perform a find.

Insertion

Inserting a node with key x to a binary search tree is easy: We start by performing a find for x. If there is already a node with key x, we may do some counting or just ignore the insertion, but the structure of the tree remains unchanged. Otherwise, the find procedure comes to the point where it would return false. Let u be the last node on this search path. Apparently u has no child on the side to which the search for x should continue (left or right depending on the values of u.key and x) Then a new node with key x is generated and added as a child to u on this side. This new node is a leaf of the tree, so the tree grows by adding leafs. This can be worked out as follows:
  void insert(int x)
  // Insert a node into the tree.
  {
    if      (x < key)
      if (leftChild  != null) // go left
        leftChild.insert(x);
      else
        leftChild  = new TreeNode(x);
    else if (x > key)
      if (rightChild != null) // go right
        rightChild.insert(x);
      else
        rightChild = new TreeNode(x);
    else // key == x
      System.out.println(
        "\nKey " + x + " exists, insertion ignored!");
  }

Lazy Deletion

Deleting a node with key x is more interesting. We start by performing a find for x. If there is no node with key x, we either ignore the deletion or we print an error message. If x is found, then we cannot simply remove the node u with key x from the tree because this might cut the tree. Only when u is a leaf this causes no problems.

The simplest general solution is to perform lazy deletions: a node to delete is marked by setting an additional boolean instance variable of TreeNode to false. Otherwise the node is left unchanged. This idea guarantees to not impair the search-tree property because the key of a deleted node is still available for guiding the search through the tree. So, the only change to the procedure find is to test whether a node with key == x is deleted or not before returning true. When inserting a node with a key x, while a deleted node with key x is still present in the tree, then the deleted node should be replaced by the new one. Any newly inserted node gets its mark set to true.

This is a very good solution if we have a finite set of elements which are repeatedly inserted and deleted. However, in that case we can better use a sorted array, which is even simpler. In general, this idea has the problem that over time the structure may become more and more polluted. The obvious solution is to rebuild the tree without the deleted nodes as soon as their number exceeds 50% (or some other constant fraction) of the total number of nodes. This implies a certain waste of memory space, but typically it has very little impact on the time for searching: if we can guarantee that the depth of a tree with n nodes is bounded by c * log n, for some constant c, then the time to search in a tree with n real nodes polluted with at most n deleted nodes is c * log (2 * n) = c * log n + c instead of c * log n for a clean tree.

The rebuilding is not too expensive: performing an inorder tree traversal, all non-deleted nodes can be extracted from the tree in sorted order in linear time. Building a new perfect tree out of these takes linear time as well. If the resulting structure has n nodes, then at least n / 2 delete operations can be performed before the next rebuilding. Thus, the rebuilding has an amortized cost of O(1) per deletion, which is negligible in comparison to the time needed for finding the nodes.

Real Deletion

Practically lazy deletions are quite satisfactory, but of course it is even better to keep the tree as small as possible, provided this can be realized at modest costs. As noticed above, a node v can be directly removed from the tree only when v is a leaf. In all other cases this would cut the tree. If v has degree 1, then its single child w may be attached directly to its parent u. This does not violate the search tree property, because if v is the right child of u, all keys in the subtree of v are larger than then key of u, so in particular all keys in the subtree of w are larger. The case that v is the left child of u is analogous. However, if v has degree 2, then we cannot simply attach both its children to u as this would increase its degree.

The general solution is to find a replacement v for u. The key of v should be such that all nodes in the left subtree have smaller values and all nodes in the right subtree larger values. The largest key value m in the left subtree has the desired property: by definition m is larger than all other keys in the left subtree, and because we may assume (induction) that so far the tree had the search-tree property, m is smaller than all keys in the right subtree. Also the smallest key value from the right subtree has the desired property. The problem is that the node v with key m is not necessarily a leaf itself. However, the node with the largest key in the left subtree of u has no right child and does has degree at most 1. The same is true for the node with the smallest key in the right subtree of u. So, the deletion ends here. This can be worked out as follows:

  void delete(TreeNode parent, int x)
  // Delete the node with key x from the tree.
  {
    if      (x < key)
      if (leftChild  != null) // go left
        leftChild.delete(this, x);
      else
        System.out.println(
          "\nKey " + x + " does not exist, deletion ignored!");
    else if (x > key)
      if (rightChild != null) // go right
        rightChild.delete(this, x);
      else
        System.out.println(
          "\nKey " + x + " does not exist, deletion ignored!");
    else // x == key
    {
      if      (leftChild  == null)
        if (this == parent.leftChild)
          parent.leftChild  = rightChild;
        else
          parent.rightChild = rightChild;
      else if (rightChild == null)
        if (this == parent.leftChild)
          parent.leftChild  = leftChild;
        else
          parent.rightChild = leftChild;
      else // this has degree 2
        key = leftChild.deleteMax(this);
    }
  }

  int deleteMax(TreeNode parent)
  {
    if (rightChild != null)
      return rightChild.deleteMax(this);
    else
    {
      if (this == parent.leftChild)
        parent.leftChild  = leftChild;
      else
        parent.rightChild = leftChild;
      return key;
    }
  }
Here parent is always taken along in the searches in order to easily get access to the parent of the current node which is called child.

The above shows that the time consumption for all three operations is bounded by O(depth of tree). As long as the tree is as it should be: balanced in the sense that for n nodes the maximum depth is bounded by O(log n), we are happy. However, very natural operations may lead to trees that are degenerated: having depth omega(log n). This already happens when the elements are inserted in sorted order: all elements are inserted as right childs, leading to a tree with the structure of a linear list.

If it can somehow be guaranteed that the tree remains balanced in the dynamic context of insertions and deletions, then all operations can be performed in O(log n) time.

Inserting and Deleting on a Binary Search Tree

Empty Trees

The above methods work fine, except for one case. What happens if the tree is empty? In that case we will probably have root == null. But then, writing root.key or root.find(x) leads to a segmentation fault. The most convenient way to overcome this problem is to never allow trees to be empty. For example, by fixing that an empty tree is actually a tree with a single node with some special key value. This extra node is called a sentinel. Using sentinels is a very general programming technique which in many cases helps to save tests or to eliminate special cases. In the case of trees it is handy to add a sentinel node with infinite key value. The actual tree will be the leftchild of this node. Keeping the sentinel as root assures that we do not have to deal with the special case of deleting the root. Only when rebuilding we must take care that the sentinel should not participate. This can be worked out as follows:
class Tree
{
  TreeNode root;

  public Tree()
  {
    root = new TreeNode(-1);
  }

  static TreeNode build(TreeNode[] a, int l, int h)
  // Build a tree from the nodes in the subarray a[l] ... a[h].
  {
    if (h <  l)
      return null;
    int m = (l + h + 1) / 2;
    a[m].leftChild  = build(a, l, m - 1);
    a[m].rightChild = build(a, m + 1, h);
    return a[m];
  }

  public void rebuild()
  // Rebuild the tree in a perfect way.
  {
    if (root.leftChild != null)
    {
      TreeNode[] a = root.leftChild.treeToArray();
      root.leftChild = build(a, 0, a.length - 1);
    }
  }

  public boolean find(int x)
  // Test whether there is a node with key x.
  {
    return root.find(x);
  }

  public void insert(int x)
  // Insert a node into the tree.
  {
    root.insert(x);
  }

  public void delete(int x)
  // Delete a node with key x from the tree.
  {
    root.delete(null, x);
  }

  public int size()
  // Compute the size of the tree.
  {
    return root.size();
  }

  public void preorder()
  // Enumerate the nodes of the tree in preorder.
  {
    System.out.println("\nPrinting nodes preordered:");
    root.preorder(0);
    System.out.println(
      "Tree has " + IO.intToString(size(), 4) + " nodes in total");
  }

  public void inorder()
  // Enumerate the nodes of the tree in inorder.
  {
    System.out.println("\nPrinting nodes inordered:");
    root.inorder(0);
    System.out.println(
      "Tree has " + IO.intToString(size(), 4) + " nodes in total");
  }
}

Click here to see most of the presented code fragments integrated into a running program. The final tree is the perfect tree with keys 1, 13, 27, 44, 54, 64, 71, 89, 92, which is shown in a picture given above.

AVL Trees

Motivation and Definition

The goal is to come with a tree structure which guarantees balance even in the context of insertions and deletions. Clearly we cannot impose perfect balance at all times: a single insertion may require considerable rebuilding, in an exercise you are asked to show that this time for rebuilding cannot be amortized. So, we must be willing to make some concessions. One option is to allow trees to have a varying degree. This is the topic of the next section, where we consider so-called 2-3 trees. Another possibility is to allow trees whose nodes do not all lie at the same depth, but within a certain range. This range is chosen so small that the depth of the tree is still bounded by O(log n), but so large that there is sufficient flexibility to perform inserts and deletes easily. A well-known example of such trees are the so-called AVL trees, which are considered in this section. AVL-trees and 2-3 trees are far from the only possibilities to obtain O(log n) operations. These two structures are chosen because they are simple, rather practical and quite different, nicely illustrating two different approaches.

The balance of a node is the difference between the depth of the left and right subtree. The balance can be computed easily by a modification of the method computeDepth(). A node is said to have the AVL property, if its balance is -1, 0 or 1. A tree is said to be an AVL tree if all its nodes have the AVL property. This definition leaves considerable flexibility. One of the positive aspects is that the AVL property is local, this implies that an insertion or deletion only has impact for the nodes along the search path. Any perfect tree is an AVL tree, but an AVL tree does not need to look like a perfect tree at all: the leafs may lie at many different levels. A priori it is neither clear that the depth of such trees is bounded to O(log n) nor that we can perform inserts and deletes on AVL trees without needing reconstructions that cost more than O(log n) time.

An AVL Tree with Balances Indicated in Nodes

The above given java program has been extended to incorporate the notion of balance. This has been realized by defining subclasses of the classes TreeNode and Tree. A HeightedTreeNode has one additional instance variable, height. This is used for storing the height of a node in the tree. In this implementation the height is not updated during insertion and deletion. Instead it is recomputed for all nodes before producing output. The balance of a node can be computed in constant time from the heights of its children. So, in an AVL-tree implementation, there is no need to save and update the heights of both its children. In the extended Java program, first the above shown AVL tree is constructed, then it is turned into a perfect tree. The perfect tree has smaller depth and less unbalance illustrating that in general an AVL tree is far from perfect.

Bounding the Depth of AVL Trees

For proving an upper bound on the depth of an AVL tree with n nodes, it is useful to first do the opposite: prove a lower-bound on the number of nodes N(d) in an AVL tree of depth d. The smallest AVL tree of depth d, T_d, is obtained by taking T_{d - 1} and T_{d - 2} and joining them by an added root. This gives the following recurrence:
N(0) = 1,
N(1) = 2,
N(d) = N(d - 1) + N(d - 2) + 1, for all d > 1.
Thus, the N(d) are very similar to the Fibonacci numbers, and even slightly larger. How do these numbers develop exactly? We have N(0) = 1, N(1) = 2, N(2) = 4, N(3) = 7, N(4) = 12, N(5) = 20, N(6) = 33, N(7) = 54, N(8) = 88, N(9) = 143, N(10) = 232, ... . We see that they grow fast! Actually, looking at them we get the strong feeling that they grow exponentially, even though they do not double every step.

How can one prove such a fact? In (almost) all such cases, one should use induction. For induction we need an induction hypothesis. The problem here is that we do not know exactly what we want to prove. If we just want to prove that the development is exponential, we can use an underestimate. It is easy to check that all N(d) are positive. From this it follows that N(d) > N(d - 1) for all d > 1. Thus, N(d) > 2 * N(d - 2). So, as an hypothesis we could use N(d) >= sqrt(2)^d. We must check the basis of the induction. This is ok, because sqrt(2)^0 = 1 <= N(0), and sqrt(2)^1 = sqrt(2) <= N(1). After that we must only check that indeed N(d + 2) > 2 * N(d) >= 2 * sqrt(2)^d = sqrt(2)^{d + 2}.

In principle this is good enough. If, however, we want a more accurate estimate, then we can do the following which is a special case of an approach that works in general for estimating the exponential development of a recurrency relation. We assume that the development is as a^d, for the time being we forget about constant factors and additional terms. How big is a? Because the exponential behavior dominates all other things, we should basically have a^d = a^{d - 1} + a^{d - 2}. Dividing left and right side by a^{d - 2} gives a quadratic equation: a^2 - a - 1 = 0. Solving with the ABC-rule gives a = (1 + sqrt(5)) / 2 ~= 1.618, the famous golden ratio. For this a the above induction proof can be repeated. We are lucky that we have "+ 1" and not "- 1". Otherwise the development would still have been of the same order, but the proof would have been harder. So, we know now that the depth of an AVL tree with a given number of nodes N is at most log_a N = log_2 N / log_2 a = 1.440 * log_2 N. In most cases it will be less, but the main point is that this is logarithmic.

Minimal AVL Trees

Find

As on any binary search tree, we want to perform the following three operations on AVL trees: The first of these is easy to realize: the logarithmic depth of an AVL tree implies that a find can be performed in O(log n) time by simply walking down the tree using the keys to guide us.

Insert

Insert is in principle easy. As for any binary search tree, when inserting a new node u with u.key = x, we search for x. If there is not yet a node with key x in the tree, then u is attached as a child to the last node v on the search path. If x < v.key, u is attached as left child, else as right child. Because of the logarithmic depth, this takes O(log n) time. The problem is that somehow we must guarantee that the tree is again an AVL-tree afterwards. Otherwise we could soon not guarantee anymore that the depth of the tree is logarithmic, and all our time bounds are based on that assumption.

Basic Observations

A trivial, but important, observation is that the balance may have changed only for nodes on the search path, because only for those nodes there may have changed something in a subtree. So, after insertion, we can update the depth information in O(log n) time: for every node v on the path we check and possibly update the depth to be max{depth(v_l), depth(v_r)} + 1, where v_l is the left child and v_r is the right child. Doing this, we might discover nodes that are out of balance.

Another observation is that a single insertion can change the depth of any subtree by at most one. So, if after the insertion there are nodes in unbalance, then this is because one of their subtrees has depth exactly 2 larger than the other subtree.

Let w be the deepest node with unbalance. Without loss of generality we may assume that the left subtree of w, rooted at w_l is too deep in comparison to the right subtree rooted at w_r. More precisely, we assume that the balance of w is +2. The subtrees of w_l are denoted by w_ll and w_lr. Possibly these subtrees are empty, but at the level of this theoretical discussion it does not matter. It helps to consider the performed operations in an abstract way.

A crucial observation is that the tree was balanced before the insertion. Because now the depth of the subtree of w_l is two larger than that of w_r, it must have been exactly one larger before. The insertion must have changed the depth of w_l, and consequently of either w_ll or w_lr, otherwise the balance would still be ok. In order to increase the depth of a tree, one must perform the insertion in the subtree whose depth is larger or equal than the other. If this depth would have been larger already before the insertion, then it would now be larger by two. But, if the depth of either w_ll or w_lr exceeds the other by two, then w_l is unbalanced after the insertion, in contradiction with our assumption that w is the deepest unbalanced node. So, now we know incredibly precisely how the tree looked before and after the insertion:

The subtrees rooted at w_ll, w_lr and w_r all had the same depth. As a result of the insertion, the depth of w_ll or w_lr has increased by one.

Single Rotation

The action that must be undertaken to restore the balance of w depends on which of the following two events has occurred:

The first case is the simplest. In that case, we make w_l to the new root of this subtree, hook w to it as right child and w_ll as left child. w gets as right child w_r and as left child w_lr. The ordering is preserved by these operations, and after this, the balance has become perfect, because the tree of w_ll has moved one level up, and the tree of w_r has moved one level down. This operations is called single rotation.

This operation works because the subtree rooted at w_ll, whose increased depth was the cause of all trouble, is on the outside. So, it remains attached to w_l, which is lifted, so it is lifted along. Thus, this effectively reduces the depth of the left subtree by one. The depth of the subtree rooted at w_r, which was two smaller, is increased by one, so it now becomes equal. The subtree rooted at w_lr is thrown over to w_r. Because w_r now has the same level as w_l before, its depth remains unchanged. We conclude that afterwards all depths are equal.

Single Rotation Picture

Double Rotation

Now assume that the insertion has been performed in the right subtree of w_l so that the depth of w_lr has increased and the balance of w_l has become -1. In this case it is not sufficient to exchange the positions of just two nodes. The reason for this is that the subtree with excessive depth is now on the inside. This implies that raising w_l will not help, because then w_lr will be attached to w_r, which is moving down to the previous level of w_l. So, its depth will be the same as before.

So, we must apply a more elaborate technique called double rotation. The idea is to make w_lr the root of a newly constructed subtree with w_l as its leftchild and w as its rightchild (among other things you should notice that indeed the key of w_lr is larger than that of w_l and smaller than that of w). w_l gets new children w_ll and w_lrl. w gets new children w_lrr and w_r.

The subtree that was deepest, the one that was rooted at w_lrl or w_lrr is indeed raised one level by this operation (because these nodes now come at distance two from the root, whereas before this was three). The subtree of w_ll remains at the same level, and the subtree of w_rr is going down one level, but this is fine. So, afterwards, we have three of the four subtrees at the same level, and one of them one less.

Double Rotation Picture

Delete

Lazy Deletion

As for any search tree, deletes might be performed lazily. The tree is rebuild as soon as 50% of the nodes in the tree are deleted nodes. The amortized cost of the rebuilding is O(1). Also the additional cost of the finds due to the extra nodes in the tree is bounded by O(1).

For stacks and queues implemented with arrays, it was quite easy to turn an amortized time bound into a worst-case time bound, by starting the rebuilding before it was actually needed. For search trees this is much harder because the structure does not only change on one side, but everywhere. Even the structure may change. So, it is not easy to make a "snapshot" of a tree while operations are performed on it. Therefore, for applications in which a time-out for rebuilding is unacceptable, it is desirable to have a real deletion running in O(log n) time.

Real Deletion

How can we really delete elements from AVL-trees? This is not particularly hard, comparable to inserts. Inserts are always performed as a leaf. When we are deleting, however, we are not free to choose the element we are going to delete. So, this may be an internal node. In that case, the node to delete v is replaced by either the smallest node in its right subtree or the largest node in its left subtree. Such replacements are repeated until reaching a leaf. This idea allows to focus on the deletion of a leaf in the following.

If the balance remains within the acceptable bounds, then we are ready. Otherwise, we look for the deepest node w whose balance is no longer ok. From the fact that w was ok before the deletion, we conclude that one of the subtrees of w, say the right one, has depth exactly two smaller than the other. Denote by w_l and w_r the left and right child of w, and by w_ll, w_lr, w_rl and w_rr the grandchildren. The depth of w_r must have been reduced. Because it was balanced before, the deeper of its subtrees must have had depth exactly one larger than the other, and thus now the trees rooted at w_rl and w_rr have the same depth. For the depths of the trees rooted at w_ll and w_lr there are three possibilities: (+2, +1), (+1, +2) and (+2, +2). Here a number +x denotes how much deeper this subtree is than the trees rooted at w_rl and w_rr. There must be a +2 because w is unbalanced now and there cannot be anything else because w was balanced before.

The treatment of these cases is completely analogous to what we have done before. The simplest case is again (+2, +1). Then it is sufficient to perform a single rotation: w_l becomes the new root. w becomes its right child. w gets w_lr as new left child.

The case (+1, +2) can be treated with a simple double rotation: w_lr becomes the new root, with w as right child and w_l as left child. w has w_r as right child and w_lrr as left child. w_l has w_lrl as right child and w_ll as left child.

The only really new case is (+2, +2). Fortunately, performing a single rotation, helps in this case: single rotation makes the subtree rooted at w_r one deeper, the subtree rooted at w_ll becomes one undeeper, and the subtree rooted at w_lr remains at the same level.

Deletion with New Balancing Case

The rotations can be remembered easily: in each of them the rebuilding involves w and its child and grandchild on the insertion path. From these three nodes a new top structure of depth 1 is build. Considering the values of the keys, this can be done in only one way. All the other subtrees are connected to this top structure in the unique way dictated by the key values of their roots. More precisely, starting at w, the search path is followed down for two steps towards the child with maximum height. This gives a set of in total three nodes: w, w_x and w_xy, Here x and y stand for 'l' or 'r'. In each of these four cases, these three nodes are used to construct the top of a tree of depth 1. The node with smallest key appears as left child, the node with largest key appears as right child. Finally, the four involved subtrees (one from w, one from w_x and two from w_xy) are attached to the two leafs of this small tree in the same order as before.

Practicality

The mentioned operations are quite technical, but they are not much work. Single rotation involves a subtree with in total 4 links, of which just 2 are changed. Double rotation involves a subtree with in total 6 links, of which 4 are changed. Furthermore, we establish much more than required. This implies that we cannot give a sequence of operations that requires a rotation after every insertion.

2-3 Trees

Definition

A much simpler idea than AVL trees is used in 2-3 trees. These are in which all nodes either have degree two or three (other variants are trivial generalizations of the ideas in this section). All leafs can be found at the same depth. So, because all nodes have degree at least two, the depth is clearly logarithmic.

In this case, the internal nodes typically do not hold information, but one or two keys that guide the searching process. If there is one key k, then k could be the largest value of the left subtree. If there are two values k_1 and k_2, then k_1 is the largest value of the left subtree and k_2 the largest value of the middle subtree.

2-3 Tree with 29 Nodes

Find

Find is just as easy (but slightly more time consuming) than before. we must make a threefold decision. Suppose we are looking for a value x. Then, in a binary node with a single key k, we test x <= k, and go left or right based on the outcome. In a ternary node, with keys k_1 and k_2, we do
  if      (x <= k_1)
    goleft();
  else if (x <= k_2)
    gomiddle();
  else //  x >  k_2
    goright();
In this way we continue until we reach a leaf. There we test whether the key is equal or not and perform a suitable operation in reaction.

The construction and these more elaborate comparison imply a certain overhead, but all the rest becomes much simpler because of this. If you would implement a search tree ADT based on 2-3 trees, then it is a good idea to have different classes for internal nodes and leafs, as they allow very different operations and have different fields as well.

Insert

For an insertion, we search were the node should come. If it is not there, we create a new leaf with the appropriate key. If the internal node to which it should be attached has degree two so far, then everything is fine: the new leaf is attached, and we are done. Otherwise, this parent node (which was on the point of getting an illegal degree four) is split in two internal nodes of degree two each. The new internal node should be added to the internal node one level up. There we must check the degree and possibly split the node again. In this way we either find a node with degree two and exit, or ultimately split the root in two and add a new root.

It is essential that 3 + 1 = 2 + 2. This implies that, once the maximum degree of a node is exceeded, it can be split in two nodes with degree within the legal range. This implies that we could make a 3-5 tree in the same way, because 5 + 1 = 3 + 3, but that a 3-4 tree would be clumsy, because 4 + 1 = 5 < 3 + 3. These generalizations are called a-b trees. The degrees of the nodes lie between a and b and we must have b + 1 >= 2 * a. 2-3 trees have the problem that after splitting a node (as the result of an insertion) the new nodes have degree 2, which is the minimum. Therefore, it may happen that a subsequent deletion immediately leads to a fusion again. These structural changes may go all the way up to the root every time. Even though all these takes only logarithmic time, it still will mean a considerable increase of the cost. If we would use 2-5 trees instead, then after a splitting operation we obtain nodes of degree 3, which are not critical, and which cannot be split immediately afterwards. For 2-5 trees one can even show that the amortized number of restructuring operations is constant.

Delete

Again we can perform deletions by just marking deleted nodes. However, in this case, there is a rather simple inverse of the insertion operation. First we look for the element to delete. If we have found it, then there are three cases to distinguish for deleting a child w from node v:

Inserts and Deletes on 2-3 Trees

Practicality

When comparing with AVL trees, 2-3 trees have several practical disadvantages, but none of them is really serious. Storing information only in the leafs means that the storage for the internal nodes is wasted. However, a tree with n leafs has at most n - 1 internal nodes, so this means that the storage is at most doubled. This factor is smaller if the data to store consists of more than a single value. Storing data only in the leafs also gives also leads to a small increase of the average length of the search path. Another point is that the internal nodes may have degree 2 or 3. Case distinction can be avoided by giving the second key value infinity in nodes of degree 2. In this way a search will never enter the non-existing right branch. Furthermore, the search is slowed down because now two comparisons have to be made for each node on the search path, but this path is shorter than on an AVL tree.

That all leafs lie at the same depth is an advantage: this means that it is trivial to determine that a search has reached the level above the leafs. The find algorithm can now also be written using a for loop, because the number of links to follow is known from the start. The distinction between leafs and internal nodes is very natural. Because leafs will never become internal nodes, there is no need to reserve storage for pointers to other nodes in the leafs. This gives a considerable saving.

The most serious disadvantage of 2-3 trees, is the possibility that alternating inserts and deletes result in restructuring again and again the whole path up to the root. Though this will rarely happen in practice, it is rather unpleasant, because restructurings are more involved than the one or two comparisons which are performed when passing through a node on the path from the root towards the leafs.

The notion of 2-3 trees can be generalized in a natural way to a-b trees. An a-b tree is just like a 2-3 tree, but now the degree of an internal node must be at least a and at most b. If after an insertion a node has b + 1 children it is split into two nodes with round_down((a + 1) / 2) and round_up((a + 1) / 2) children. If after deletion a node has degree a - 1, it either takes over a child from a sibling or fuses with a sibling. In order to work, this imposes the conditions b + 1 >= 2 * a and 2 * a - 1 >= b. Both are the same. 2-3 trees are a-b trees with a = 2 and b = 3, these are the smallest possible choices of a and b. There are many good choices, for example, 2-4, 2-5, 3-5 and 3-6. On the other hand, 3-4 trees are not practical, because after splitting a node of degree 5, we obtain a node of degree 2 which is not allowed.

For 2-5 trees it can be shown that any sequence of n insertions and deletions causes at most O(n) splitting and fusing operations in total. This means that on 2-5 trees insertions and deletions can be performed with O(1) amortized time per operation in addition to the time for the finds. The reason why this works for 2-5 trees and not for 2-3 trees, is that after splitting or fusing the new nodes have degree 3. 3 is not a critical value in the sense that reducing or increasing it by one immediately leads to a node with too low or too high degree again. So, in between every restructering of a node, it is accessed at least once without being restructured.

Nodes of higher degree lead to a shallower tree. A shallow tree is fine because this may reduce the number of cache misses: the information concerning a node may be assumed to stand contiguously in the memory, but going from one node to another typically means making an expensive jump through the memory space. How about the costs of the operations on an a-b tree? To answer this question it must be established how exactly the keys in the nodes are organized. The easiest idea is to maintain these keys in a sorted array. Doing this, we can use binary search to find the link that must be followed. This takes O(log b) time (in practice, for small b, a linear search might be better). Because the depth of an a-b tree is at most log_a n = log n / log_a, the time for find becomes O(log b / log a * log n). Thus, if a and b differ by a constant factor, and there is no reason to take b larger than 4 * a, finds can be performed in O(log n) time, just as on a 2-3 tree, while visiting only log_a n nodes. Unfortunately, such a good result is not automatically obtained for inserts and deletes. If a node is split or if two nodes get fused, this means splitting or fusing also the set of keys of these nodes. When using arrays, this costs time proportional to the size of these arrays. There are two solutions:

Theoretically the first idea is the best, but the tree requires extra memory and causes handling overhead as well. The second idea promises to be more practical. It is not necessary to take the same a and b at all levels of the tree.

Other Operations

The above search-tree ADT was designed to support three basic operations: find, insert and delete. However, the described implementations can be used for a number of other operations as well. Two of the most important extensions are discussed in this section.

Range Queries

An important operation is the range query. In a range query, two values low and high with low <= high are specified and something must be done with all nodes whose keys x satisfies low <= x <= high. For example, all these keys should be output, or it should be counted how many of these nodes occur in the tree.

This can be realized in several ways, but two very simple ways are based on search trees. First we assume that the information is only stored in the leafs of the tree and that the leafs are connected in a doubly linked way to each other (if for the leafs we use the same objects as for the internal nodes, then we have some spare links, which can be used for this!). In this case we search for low and then walk along the leafs until hitting high or a larger value. This whole procedure takes O(depth of tree + number of nodes in the range). This is clearly optimal, because any search takes at least O(depth of tree) time and handling all elements takes at least O(number of nodes in the range). With minor modifications to the insertion and deletion algorithms, the linking of the leafs can be kept up-to-date at all times.

Tree with leafs linked together

Range queries can also be performed on a conventional binary tree with the information stored in all nodes. The idea is to walk down the tree until low and high do no longer follow the same path. Then, whenever the search for low moves left in a node u, u and all keys in its right subtree should be enumerated. Analogously, whenever the search for high moves right in a node v, v and all keys in its left subtree should be enumerated. There is a very simple recursive work-out of this algorithm, a minor modification of the inorder traversal.

Range Query Example

Priority Queues

A priority queue is a data structure supporting three operations:

One of the subsequent chapters is entirely devoted to priority queues. Here we consider a basic realization with search trees in which the priorities are used as keys. On this structure insert is trivial: use the insert operation of the search tree.

An operation that can be implemented very easily on search trees is "find the node with smallest key": just walk left all the time until hitting a node without left child. Thus, the following can be used to implement a deletemin:

  int getSmallest()
  {
    if (leftChild != null)
      return leftChild.getSmallest();
    else
      return key;
  }

  int deleteMin()
  {
    int x = getSmallest();
    delete(x);
    return x;
  }
Of course, in an implementation these two subroutines can be integrated so that the path from the root to the node holding x needs to be traversed only once.

Decreasekey is also simple: go to the specified node, delete it from the search tree and reinsert it with modified priority. The only problem is that typically the node of which the priority should be modified is not specified by its priority but by another tag. For example, if we are managing the waiting list of a hospital, a typical instruction will not be decreaseKey(89, 74), but rather decreaseKey("Baker", 74), indicating that the priority of "Baker" should be set to 74. The solution is to use a second search tree in which the names are used as key.

The simplest realization of this idea is to have two trees in which all nodes have two keys. One tree is ordered according to the priorities, the other according to the names. Inserts are performed in each tree. For a deletemin, one searches the node with minimum priority in the tree which is ordered according to the priorities. There one finds the corresponding name which can subsequently be searched for in the other tree. When performing a decreasekey, one first searches for the name in the tree which is ordered according to the names, there one finds the corresponding priority which can subsequently be searched for in the other tree. In this way the information in both trees remains the same after completion of each operation.

Using Two AVL-Trees for Efficient Decrease-Key

The same idea can be realized more efficiently if there are links from one tree to the other in both directions. In that case decreaseKey("Baker", 74) is performed by searching for "Baker" in the name tree and then following the link to the node with its priority in the other tree. When performing deletemin one searches in the tree with priorities for the smallest key. From there one follows the link to the other tree. The structure is more complex, but the additional links reduce the time for finding the element in the second tree.

Two Linked Trees for Efficient Decrease-Key

Each of these realizations assures that, as long as we use one of the balanced search trees, all three priority-queue operations can be performed in O(log n) time. This is not the best achievable, but quite acceptable. As we will see, either insert or deletemin must take Omega(log n) time.

Exercises

  1. For inorder traversal we have seen a recursive and an iterative algorithm. Give modifications of the iterative algorithm which print the keys of the nodes in the tree in inorder and postorder, respectively.

  2. Suppose we have two binary search trees T_1 and T_2 with n_1 and n_2 nodes respectively. Give an efficient algorithm for merging the trees, that is, for constructing a new binary search tree T containing all n_1 + n_2 nodes of T_1 and T_2. What is the time consumption of your algorithm?

  3. We consider binary search trees with with one key per node. Such a tree is called perfect when all levels are full except possibly the deepest one. A tree with n nodes is called balanced when it has depth O(log n). A tree with n nodes is called entirely degenerate when it has depth n - 1.
    1. Draw the tree which is obtained by inserting objects with keys 9, 3, 6, 7, 5, 4, 2, 1, 0, 8 into an initially empty tree in the given order without performing any rearrangement of the tree.
    2. Draw a perfect tree containing the same keys.
    3. Draw another perfect tree with the same keys. How many different perfect trees can be drawn with these keys?
    4. Draw an enirely degenerate tree with the above keys. Give the input order of the keys which is leading to the tree you have drawn.
    5. Draw 2 other entirely degenerate trees with the same keys. How many different entirely degenerate trees are there with these keys?
    6. Now we are going to perform searches for keys. The cost measure is the sum of the numbers of nodes on the search paths. So, searching for the key in the root costs 1. How much does it cost to search for the following sequence of keys: 7, 9, 2, 9, 2, 3, 9, 1, 2, 5, 9, 2 for the non-perfect tree from subquestion 1 and for the perfect tree from subquestion 2?
    7. The above example shows that the quality of a tree depends on the queries performed on it. Consider the tree obtained by inserting 0, 1, 2, 3, in this order, and construct a query sequence for which this tree is optimal.
    8. For any degenerate tree T with n nodes whose keys have values v_0, ..., v_{n - 1}, describe a query sequence for which T is optimal in the sense of the above cost measure.
    9. Why is it nevertheless useful to try to keep trees balanced?

  4. We consider the structure of binary trees. d_i denotes the depth of node i in the tree. The depth of a node is the distance from the root node, distance being the number of links on the path. For a tree T, let D(T) be the sum of all distances. Thus, for a tree with n nodes, D(T) = sum_{i = 0}^{n - 1} d_i. Prove that within the class of all binary trees with n nodes the perfect trees are those that minimize D(T). Hint: Induction may work, but you do not need it in this case!

  5. We consider the structure of binary trees. Trees are different when they have a different branching structure. This can be defined more precisely with a recursive definition: Two trees T_1 and T_2 have the same structure when one of the two following cases applies:

    1. Consider the tree in the following picture. Show how the keys 0, ..., 9 can be assigned to the nodes in such a way that the search-tree property is respected.

      A Tree with 10 Nodes

    2. Suppose the keys of the objects to be stored in a tree T with n nodes are 0, ..., n - 1. Show that given T there is exactly one assignment of the keys to the nodes respecting the search-tree property.
    3. Denote by N(n) the number of different binary trees with n nodes. Give the values of N(0), N(1), N(2), N(3), N(4).
    4. Give a recurrence relation expressing N(n) in terms of N(k) for k < n. Hint: check that your relation gives the correct value for N(4).
    5. Use this recurrence to compute N(5) and N(6). Without a prove: give a rough estimate of how N(n) develops as a function of n.

  6. The procedure for building perfect binary trees given in the above text constructs trees tends to spread the nodes as evenly as possible. Modify the procedure so that the lowest level is filled from left to right without missing nodes, as far as nodes are available.

  7. Assume that TreeNode objects also have a field int size. For any node u, size gives the number of nodes of the subtree rooted at u. Write a procedure for computing the correct size values of all nodes in a binary tree rooted at a node called root. The algorithm should have running time linear in the number of nodes n. Hint: write down a recursive definition of the size of a node, based on the recursive definition of a tree.

  8. Assume that TreeNode objects also have a field int height. Write a procedure for computing the correct height values of all nodes in a binary tree rooted at a node called root. The algorithm should have running time linear in the number of nodes n. Hint: write down a recursive definition of the height of a node, based on the recursive definition of a tree.

  9. Suppose we want to keep the search tree we are working on perfectly balanced. Clearly a single insertion may require a large rebuilding. But, one might hope that this happens only occasionally, so that nevertheless a good amortized time can be assured. Show that this hope is idle: give a sequence of n insertions to an initially empty tree that requires Omega(n^2) time when after each insertion the tree is made perfect again. The estimate of the time consumption must also be given.

  10. Write a method in Java-like code for testing whether a non-empty tree constructed from TreeNode objects is an AVL tree.

  11. Consider an AVL tree T of depth k. So, in T there is at least one leaf at depth k. What is the minimum depth at which we may find the undeepest leaf? Prove your claim.

  12. Consider an AVL tree of depth k with root node u. Give bounds on the ratio of the size of the left and the right subtree of u.

  13. Give a sensible generalization of the AVL-tree ideas to ternary trees. What unbalance can be tolerated? Describe in sufficient detail how insertions are performed, focusing on the rebalancing operations.

  14. AVL trees can be generalized in many ways. We can also allow the differences in the balances to be slightly larger. For example, a node is considered to be balanced as long as the difference of the depth of its left and right subtree is at most two.
    1. Give a lower bound on the number of nodes of such a relaxed AVL tree of depth k. An estimate is good enough, constant factors may be ignored.
    2. Given an upper bound on the depth of such a relaxed AVL tree with n nodes.
    3. Describe the rotations that are needed to restore the balance after insertions. Prove that these rotations indeed achieve their goal.

  15. Add overwritten variants of the methods insert and delete as they can be found in the basic program to the classes HeightedTreeNode and HeightedTree given in the extended program so that the heights are kept up-to-date under insertions and deletions. Define further subclasses AVLTreeNode and AVLTree, which perform rotations so as to keep the tree AVL-balanced at all times.

  16. Give pseudo-code for enumerating all keys with key values between low and high for a given binary search tree in which the information is also stored in the internal nodes. For a tree of depth d the procedure should run in O(d + |S|), where |S| denotes the number of elements to enumerate. Hint: use a suitable modification of the inorder-traversal procedure.

  17. Starting with an initially empty 2-3 tree, draw the sequence of trees arising when consecutively inserting nodes with keys 5, 3, 2, 6, 7, 8, 1, 9, 4. Now delete the nodes with keys 7, 2 and 6.

  18. Describe an infinite class of 2-3 trees and for each of them an alternation of insertions and deletions so that each insertion leads to splitting the nodes all the way up to the root and each deletion leads to fusing the nodes all the way up to the root.

  19. Consider a perfect binary tree of depth k with n = 2^{k + 1} - 1 nodes. Compute the average depth of the nodes. If we assume that when performing find(x) for a key x which actually occurs in the tree any of the key values is selected with probability 1 / n, the average depth plus one gives the expected number of nodes to visit. Alternatively, all data could be stored in the leafs of a tree of depth k + 1. Doing this, most searches visit k + 2 nodes. Compare the number of visited nodes and draw a conclusion.

  20. Assume that on a 2-3 tree there are equally many nodes of degree 2 and 3 more or less evenly distributed over the tree. As a function of n, how deep is the tree? The constant factor of the leading term is important, but o(log n) contributions may be ignored.

  21. In many applications of search trees the main quality features are how deep a tree with n nodes is, and how many comparisons must be performed while searching. For an AVL tree, the maximum number of comparisons to find a node in a tree with n nodes is approximately 2 * log_1.618 n = 2.880 * log_2 n, for a 2-3 tree this number is 3 * log_2 n, which is marginally larger. 3-4 trees are better in this respect, a search takes at most 4 * log_3 n ~= 2.524 * log_2 n comparisons. This means that 3-4 trees potentially outperform both 2-3 trees and AVL trees. Of course a search tree should also support insert and delete operations. In that sense 3-4 trees are problematic because 3 + 2 = 5 > 4, and 4 + 1 = 5 < 2 * 3. So, the degree constraints cannot be maintained as easily as with 2-3 trees. Describe how insertions and deletions can be performed on a 3-4 trees so that all internal nodes, that is, excluding the root, have degree 3 or 4 at all times. The root may have degree 2, 3, 4 or 5.

  22. Dictionaries are supposed to support the operations find, insert and delete. However, in the context of a-b trees we have encountered two more operations: split and fuse. For a data structure T containing n keys, split is the operation of constructing two such data structures T_1 and T_2 containing n_1 and n_2 = n - n_1 keys of T, respectively. All keys in T_1 should be smaller than those in T_2. Fuse is the operation in which out of two data structures T_1 and T_2 with n_1 and n_2 keys, respectively, a single data structure T with n = n_1 + n_2 keys has to be constructed.
    1. For 2-3 trees describe how to perform the fuse operation.
    2. For 3-4 trees describe how to perform the fuse operation. Hint: notice that in a-b trees with a > 2 the root node can even have degree smaller than a.
    3. For 2-3 trees describe how to perform the split operation.
    4. For 3-4 trees describe how to perform the split operation.
    All operations should be performed in time proportional to the depth of the involved trees.

  23. For a search tree the depth and the degree of the nodes are not the only important features. Consider, for example, an implicit implementation of a tree. A d-ary tree is said to be implicit, when it is realized without any pointers using an array. Call this array a[]. The key of the root of the tree is stored in a[0], the keys of its children in a[1], ..., a[d]. More generally, the children of a node whose key is stored in a[i] are stored in a[d * i + 1], ..., a[d * i + d]. Vice versa, the key of the parent of a node in a[i] is stored in a[(i - 1) / d]. Such an implicit tree can be traversed very fast, because all children of a node are guaranteed to be stored contiguously. If d is a power of two, the multiplications and divisions can be performed in a single clock cycle by shifts. Maintaining a tree implicitly also means a considerable reduction of the memory consumption, because all pointers remain implicit.

    Implicit trees are most efficient for complete perfect d-ary trees because the array which is used for storing the tree must have the maximum size any way. That is, for a d-ary tree of depth k the array must have length N(d, k) = sum_{i = 0}^k d^i = (d^{k + 1} - 1) / (d - 1) even if the tree has n << N nodes.

    1. Give an expression for the ratio of N(2, k) and the minimal number n(k) of key values that can be stored in an AVL tree of depth k before a further insertion may require a tree of depth k + 1. You should both indicate the approximate development for large k and exact expressions for 0 <= k <= 5.
    2. Answer the same questions for 2-3 trees. That is, for an implicit 2-3 tree of depth k, you should give the ratio between the size of the used array and the minimal number of keys that can be stored before a further insertion may require a tree of depth k + 1.
    3. In the light of the above discussion and answers, discuss for each of the tree types their suitability for an implicit implementation. In the discussion you should compare for a given number n of nodes to store the sizes of the smallest implicit trees which can certainly accomodate n nodes.

    Implicit trees might be suited for storing the sets of key values in the internal nodes of search trees of high degree. Maintaining these sets in sorted arrays leads to linear-time updates. At the expense of more complex operations and a waste of memory, on an implicit tree inserts and deletes can be performed in logarithmic time. On implicit trees it takes linear time to perform the split and fuse operations needed in a-b trees because this implies creating and initializing arrays of linear size. This is not that serious, because taking b sufficiently larger than a, the cost of each split or fuse can be amortized over a large number of inserts and deletes.

  24. The basic program implementing a search tree must be used to construct a priority queue which only supports the operations insert and deletemin (that is, there is no need to realize decreasekey). Then the priority queue is used for sorting purposes. More precisely the following tasks have to be executed:
    1. Implement classes PriorityQueue and PriorityQueueNode which are subclasses of Tree and TreeNode, respectively. These classes should have a method deletemin. The original classes should remain unchanged.
    2. Use the priority queue operations to construct a method for sorting an array a[] with n integer values. This method should work as follows: first all elements of the array are inserted one-by-one in a priority queue, then calling deletemin n times, they are extracted from it and reinserted into the array.
    3. What is the running time of this algorithm, with the current implementation of the search tree, for sorting an array in which the values are already sorted? Give an analysis.
    4. Run experiments for n = 2^k, for k = 16, 17, ..., with a[i] = i for all i, 0 <= i < n. Measure the times and give them in a table. Are your measurements in line with the theoretical analysis?
    5. Assuming that the tree is perfectly balanced at all times, how long would it take to sort n numbers?
    6. Run experiments for n = 2^k, for k = 16, 17, ..., where the values a[i], 0 <= i < n, are chosen randomly. Measure the times and give them in a table. How do the running times appear to develop?

  25. Assume a 2-3 tree is used as a priority queue. If we start with a perfect binary tree of depth k with n = 2^k leafs, how many fusion operations result in total, when performing n deletemin operations?





Hashing

Context

The dictionary ADT also called search-tree ADT is the abstract data type that at least supports the following tree operations on a set:

The canonical way of implementing this ADT is by using a search tree, but this is not the only possible way. If we really only want to implement the above three operations (and do not want to perform range queries or findmins as well), there are alternative ways, that are

The idea presented in this section, hashing, is an idea of great practical importance.

Direct Addressing

The simplest case, which is nevertheless very important, is that the range of key values is small. Suppose all objects for which we want to build a dynamic search structure have unique keys in the range {0, ..., M - 1}, then we can use a boolean array a[] of size M. This array should be initialized with a[i] = false, for 0 <= i < M, but alternatively one can apply virtual initialization, a technique which is presented further down. If the keys are unique, then all operations can be performed in constant time in a trivial way:
  void insert(int k) 
  {
    a[k] = true; 
  }

  void delete(int k) 
  {
    a[k] = false; 
  }

  boolean find(int k) 
  {
    return a[k]; 
  }
Of course, in most real-life applications some additional information will be stored along with the entry in a[], but this holds for any of the discussed data structures: here we focus on how to perform the basic operations. This special case is not called hashing, but direct addressing. The name is inspired by the fact that the key values are used directly, without any modification or computation, for determining where data are stored. Its efficiency essentially depends on the availability of RAM.

Direct addressing is based on the same underlying idea as bucket sort. Like bucket sort, direct addressing is applicable only for moderate M. There is a difference with bucket sort though: bucket sort even requires O(M) time, whereas direct addressing, applying virtual initialization, only requires sufficient memory.

Direct Addressing Example

Virtual Initialization

Assume we have sufficient memory and want to quickly implement a search-tree data structure for elements with keys between 0 and 10^8. This requires an array of size 10^8. This is no problem nowadays. However, it takes some time to initialize this array. It is slightly surprising that we can save this time: there is no need to first set all values in the array to some special value to indicate "no entry here".

More generally, we have at most N numbers from 0 to M - 1. Then we have one integer array a[] of size M and one integer array b[] of size N plus one counter c, giving the number of inserted elements, initially with value 0. To insert an element k, we perform

  void insert(int k) 
  {
    a[k] = c;
    b[c] = k;
    c++; 
  }

To check whether x is there we perform

  boolean find(int k) 
  {
    return a[k] < c && b[a[k]] == k; 
  }

Deletions are simple to perform:

  void delete(int k) 
  {
    a[k] = c; 
  }
One might also use any other value which is guaranteed not to be equal to the original value of a[k]. A disadvantage is that the entry in b[] is not removed. So, if insertions and deletions are alternated then c will increase to a value larger than M.

This is unnecessary, the unused space can be made available again by a slightly more complicated delete procedure. In this procedure, the last inserted element is deleted and reinserted at the position of k:

  void delete(int k) 
  {
    c--; 
    a[b[c]] = a[k];
    b[a[k]] = b[c];
  }

If we compare with the simple approach in which the values are first initialized, then we see that now all operations are more expensive by a small constant factor. A larger disadvantage is the additional memory usage: Before we had one array of N bits. Now we have two arrays with N and M integers, respectively. Even when we may assume that M is much smaller than N (otherwise the whole idea of virtual initialization does not make sense), this is considerably more. Therefore, virtual initialization appears to be a nice idea with limited practical interest.

Virtual Initialization Example

Hashing Idea

The idea underlying hashing is an extension of the above. It is simple and clever at the same time. Suppose all objects for which we want to build a dynamic search structure have unique keys in the range {0, ..., M - 1}, and suppose that we want to support at most n elements, then we The function f is called the hash function. Of course it is not obvious how to choose N (it should not be too large), and how to construct f (it should be injective on the set of keys as far as possible).

Suppose for a second that f maps the keys injectively to {0, ..., N - 1}, that is, for any two keys k_1 and k_2, f(k_1) != f(k_2). Then we have a great search structure: the three operations can be performed in constant time almost just as with direct addressing. The main difference is that instead of accessing position k, we must access position f(k):

  void insert(int k) 
  {
    a[f(k)] = k; 
  }

  void delete(int k) 
  {
    a[f(k)] = -1; 
  }

  boolean find(int k) 
  {
    return a[f(k)] == k; 
  }
There is one more difference with direct addressing: because many values may be mapped to the same position, it does not suffice to set a boolean. It is essential to mark a position by the key itself, so that when performing a find it can be tested whether it was really k that was mapped here or some other value k' with f(k') = f(k). Thus, a[] must be an integer array. Just as with direct addressing, the efficiency of hashing depends on the availability of RAM: once we know the index we can go to the desired position in the array without noticeable delay.

Collisions

The very reason that we use hashing and not direct addressing is that in general M is a very large number, and we cannot create an array of length M. So, the function f maps many values from {0, ..., M - 1} onto values of {0, ..., N - 1} (at least one value is the image of at least M / N rounded-up values from the domain). Thus, collisions, the event that more than one key is mapped to the same value in {0, ..., N - 1}, are to be expected. Therefore, when using hashing for implementing the search tree ADT the following two questions must be answered: These questions are the topic of this and the following sections.

Because the keys are not known beforehand, we cannot hope to choose the hash function f in an optimal way, avoiding all collisions. Such a function might also be expensive to compute. The best we can reasonably hope for is that f is as good as randomly scattering the elements over the array. We cite some basic facts from probability theory:

These facts make clear that collisions are frequent and already occur when the array a[] is still almost empty. We should even expect that there are positions to which a considerable number of elements gets mapped. Somehow we must deal with this.

Chaining

The simplest collision-handling strategy is called chaining. It works as follows: every entry of the list is the start of an initially empty linked list. Inserting elements at a given position is done by inserting this element in the linked list starting here. The simplest is to insert the elements at the beginning of the list. Doing this, inserts can be performed in O(1) time. Finding and deleting an element with key k requires traversing the list starting at position f(k) of a[].

In Java, with objects of a type ListNode with fields key and next, the chaining idea can be implemented as follows:

  void insert(int k) 
  {
    int i = f(k);
    a[i] = new ListNode(k, a[i]); 
  }

  void delete(int k) 
  {
    int i = f(k);
    if (a[i] != null) // not an empty list
    { 
      if (a[i].key == k)
      {
        a[i] = a[i].next;
        System.out.println("Key " + k + " deleted");
      }
      else 
      {
        ListNode prev = a[i];
        ListNode node = a[i].next;
        while (node != null && node.key != k) 
        {
          prev = node;
          node = node.next; 
        }
        if (node != null)
        {
          prev.next = node.next; 
          System.out.println("Key " + k + " deleted");
        }
      } 
    } 
  }

  boolean find(int k) 
  {
    ListNode node = a[f(k)];
    while (node != null && node.key != k)
      node = node.next;
    return node != null; 
  } 
This implementation is complicated by the fact that when deleting we must take care of the special case of empty lists. At the expense of O(N) extra storage, the operations can be made easier by adding sentinels to the lists. Click here to see the above piece of code integrated in a working Java program.

For a successful find we only have to traverse half the list on average. The most expensive operation is an unsuccessful find or an attempt to delete a non-existing element: in these cases the whole list must be traversed. If before inserting an element k it should first be verified that k does not yet occur, than even each insertion is preceded by an unsuccessful find. The average cost of unsuccessful finds can be reduced by a factor two, by keeping the lists sorted. Doing this, a search for k can be terminated as soon as we reach a node with key k' > k. If now we add sentinels with key value infinity, we can even save a lot of tests.

Hashing with Linear Probing

Choice of Hash Function

If we do not know anything about our keys, then in principle it is acceptable to try the simplest of all hash functions: f(k) = k mod N. This function maps M / N potential key values (rounded up or down, if N does not divide M) to each position of the array. If the keys are uniformly distributed over {0, ..., M - 1}, it achieves a fair distribution of the keys. The good thing is that this hash function is simple and cheap to compute. If N is a power of two, it can even be computed in one clock cycle: f(k) = k & N', where N' = N - 1. The bad thing is that its quality depends on the input. If for some reason all our numbers have a special structure, they may all be mapped to a small set of different values. So, this hash function does not allow to give even the weakest quality claim.

We give an example. Suppose we would like to build a data base for the about one million inhabitants of northern Sweden. More precisely, for the inhabitants of the five provinces (län) which constitute Norrland: Gävleborgslän, Västernorrland, Jämtland, Västerbotten and Norrbotten. As key we use the personal number and the array has length N = 1,000,000. The first six positions of the Swedish personal number consists of the date of birth (in the format yymmdd). Position 7 and 8 indicated until recently the län in which the person was born and only the two last digits are slightly arbitrary (though the gender is encoded in these). If we now simply use f(k) = k mod N, then we will get a very poor distribution: most people living in Norrland are born in this area, so most people are sharing the same 5 possibilities for digit 7 and 8. Furthermore, there are only 31 days, so digits 5 and 6 have only 31 values. Thus, the big majority of the one million people is going to be mapped to at most 15.000 positions, at least 60 for each slot. In this case it is quite obvious and we know an easy way out. Out of the personal number, we should first construct a condensed number, taking into account that there are only 12 months with at most 31 days, and 6 län (the mentioned five and outsiders), and not all final two digits.

We work out the above idea in a more general setting. Suppose we have keys composed of n >= 1 "digits" (which may consist of more than one actual digit). The keys in the above example may be decomposed in 5 of these digits, each being a group of two actual digits. Assume that digit i may assume m_i different values. In the above example, the digit giving the month assumes 12 values, while the digit giving the day assumes 31 values. Thus, the total number of different values is M = prod_{0 <= i < n} m_i. For a key k, let k_i, 0 <= i < n, denote digit i. The reduced key k' is given by k' = k_0 + m_0 * k_1 + m_0 * m_1 * k_2 + ... + m_0 * m_1 * ... * m_{n - 2} * k_{n - 1}. This reduced key k' is much better suited for hashing then k itself. For example, one could use f(k) = k' mod N. From the above expression it may appear that computing k' from k takes quadratic time. However, a variant of Horner's rule computes k' in linear time: k' = k_0 + m_0 * (k_1 + m_1 * (k_2 + m_2 * ( ... + (k_{n - 2} + m_{n - 2} * k_{n - 1}) ) ) ).

The above describes an idealized situation: we assumed that there are digits for which it is known how many different values are assumed, and that these values are assumed uniformly. If we would like to apply hashing to create a dictionary of words, then it is clear that not all letters are equally frequent ('e' and 'a' are much more frequent than 'q' and 'y'), but it is not obvious what to do with such observations.

Not knowing the structure of the numbers we can never guarantee that the chosen hash function is good, however, this can be made independent of the input. The idea is to choose the hash function uniformly from a large class of good hash functions. Even though for any particular hash function there are sets of keys which are mapped to few different positions, it can be achieved in this way that the a priori probability that any two keys k_1 and k_2 are mapped to the same position is exactly 1 / N, that is, just as if they are randomly scattered over the array. This idea is called universal hashing.

Problems may occur but mostly we will be lucky. In that case the elements are distributed more or less evenly. Consider the case that the number of distributed elements n equals the size of the array N. Some longer lists will occur, but most elements are close to the root of their list. If the element to search for is selected uniformly from among the n elements, then it can easily be shown that the expected time for a find operation is constant. Then this also holds for deletes and the time for inserts is constant independently of the length of the lists. The statement may be strengthened further: every individual find may take O(log n) time, but with high probability the time for a sequence of log n finds for independently selected keys is bounded by O(log n) as well. Thus, amortizing over O(log n) operations, the time per operation is O(1) with high probability.

Open Addressing

Chaining, the technique to have a linked list starting at each position in which the keys are listed that were mapped to this position of the array, is (at least conceptually) simple but causes some additional overhead in the form of links. This makes it slower to traverse and requires additional storage.

An alternative is to store elements in the array itself. For a while this works fine, but what do we do when an element gets mapped to a position that is already taken?

Linear Probing

There is a really simple solution: if k gets mapped to a full position f(k), then the positions f(k) + 1, f(k) + 2, ..., are tried until finding an empty position. This first empty position is used to store k. This technique is called linear probing. The sequence of positions p(k, t) = f(k) + t, for t >= 0, is called the probe sequence of k.

How could we do a find in this case? We go to position f(k). If it is empty we are done. If we find k as well. But, if position f(k) is occupied by another element, then we do not know anything. Subsequent positions must be tested until finding k or an empty position. When performing deletions we have to be careful: simply removing an element k does not work. It may namely be the case that other elements which lie after k become unfindable when we would remove k. Therefore, we should perform lazy deletion by somehow marking k as deleted. When performing a find these deleted positions are handled as filled positions, when performing subsequent inserts, they are handled as empty positions, reusing the memory. This technique works fine because the memory space is available anyway.

If all key values are positive, the above ideas may be implemented as follows:

  // -1 means empty
  // -2 means deleted

  void initialize() 
  {
    for (i = 0; i < N; i++)
     a[i] = -1; 
  }

  void insert(int k) 
  {
    for (i = f(k); a[i] >= 0; i = (i + 1) % N);
    a[i] = k; 
  }

  void delete(int k) 
  {
    for (i = f(k); a[i] != k && a[i] != -1; i = (i + 1) % N);
    if (a[i] == k)
      a[i] = -2; 
  }

  boolean find(int k) 
  {
    for (i = f(k); a[i] != k && a[i] != -1; i = (i + 1) % N);
    return a[i] == k; 
  }
Click here to see slightly modified code integrated in a working Java program. This program is conceived so that it can be modified very easily to work for the better conflict-resolution strategies presented in the following.

Not all simple ideas are good ideas. Linear probing has lousy performance once the array becomes slightly fuller. In this relation we need the term load factor, which is the ratio n / N. Mostly expressed in %. So, if we say that the hash table has a load factor of 80%, we mean that we are storing n = 0.8 * N keys in an array of length N. The problem with linear probing is that once we have a small sequence of consecutive positions that are full, that then it is quite likely that the next insertion hits this chain. So, chains tend to grow very fast. This problem is called primary clustering. More precisely, it means that all elements that get mapped to any of the positions of a cluster will make the cluster larger.

Without going into detail we look at some numbers. Suppose we have an array of length 100. Suppose we have inserted 20 elements. Suppose all elements are still isolated (this is quite unlikely). For isolated elements, the probability that they are hit and that the length of their chain becomes 2 is 0.01. If this happens (20% probability, so within a few insertions it will indeed happen), then hereafter, the probability that the chain gets hit is no longer 0.01, but 0.02. When it is hit, the probability becomes larger again. In addition, if the array gets fuller, one large chain may swallow the following chain, further increasing the rate at which chains are growing. A simulation, generating uniformly distributed random numbers in the range 0, ..., N - 1, shows that the load factor should be at most 60 or 70%. Otherwise the performance breaks down. This means that we must always waste at least 30 to 40% of the memory.

Hashing with Linear Probing

Quadratic Probing

Once the algorithms designers started thinking about hashing they invented chaining on day two and linear probing on day three. Then they discovered the clustering problem, and came with several more or less clever ways to prevent such a behavior. Inserting a key k is performed as follows:
  void insert(int k) {
    for (t = 0; a[p(k, t)] is occupied; t++);
    a[p(k, t)] = k; }
Here p(k, t) gives the probe sequence of k. For linear probing, p(k, t) = (f(k) + t) mod N. The next simplest idea is to try p(k, t) = (f(k) + t^2) mod N. This, or similar variants, is called quadratic probing.

Linear probing had the problem of primary clustering, the problem that any key mapped to any position in a chain follows the same trajectory through the array. This leads to a self reinforcing clustering in which the clusters tend to grow very fast. With quadratic probing, we have a less serious form of clustering: all keys that get mapped to the same position f(k) follow the same trajectory. This is called secondary clustering. Primary clustering is worse because it arises even if the hash function distributes the keys over the array completely uniformly. Secondary clustering is a problem only when there are many key values k with the same f(k).

Hashing with Linear Probing

Double Hashing

One might believe that secondary clustering is inevitable, but it is not. An even better distribution is obtained when using the following probe sequence: p(k, t) = (f_1(k) + t * f_2(k)) mod N. So, we use a second hash function. Therefore, this collision-handling strategy is called double hashing. This second function f_2 should be so that

The first point implies that we should not take a simple modulo function. The second point implies that for all k, f_2(k) should be relatively prime to the size N of the hash table. This is most easily assured if N itself is a prime number, then f_2 can assume any value different from multiples of N. If f_1(k) = k mod N, then f_2 can be based on k / N. For example, f_2(k) = 1 + (k / N) mod (N - 1).

Hashing with Linear Probing

Rehashing

As we cannot always know beforehand for how many elements the data structure is intended, we must occasionally rebuild the whole structure into a larger one. This is simple: choose a new size N' and a new hash function. Then traverse the array and insert all elements encountered in the larger array. If N' is sufficiently large and we have no bad luck, this will cost O(N).

Of course we can continue inserting until the array is entirely full, but this is not clever: the last insertions will cost linear time. The simplest criterion for rebuilding is the load factor. For example, one can rebuild as soon as the load factor exceeds 80%. This is quite static though and does not tak into account that sometimes we are running into bad hashing behavior because of an unlucky choice of the hash function. Therefore, it is better to rebuild as soon as performance becomes too poor. For example, one can rebuild as soon as the latest 1000 insertions took more than c * 1000 steps, for some small constant c.

If it is important to keep the maximum time per operation bounded, then the rebuilding can be started somewhat earlier and run in the background until the new structure is ready. In this way the amortized cost of the rebuilding can be reduced to O(1).

Exercises

  1. Modify the given program so that the lists are kept sorted according to their keys. Add sentinels with key value infinity to all lists. Modify the methods find and delete as well, minimizing the number of visited nodes and the number of comparisons. Notice that now with O(1) extra time per insertion it can be assured that an existing key is not inserted a second time.

    The developed data structure can be used for sorting. Suppose it is given that the numbers in an array b[] of length n originate from a bounded, but arbitrarily large, range. That is, 0 <= b[i] < M, for all i, 0 <= i < n. Create a hash table with N = n. Use as hash function f(x) = x / c, where c = M / N. In general this is not a good choice, because if the numbers in the input are not uniformly distributed, this will lead to some long lists. But for sorting it is good: using this function, all numbers in list a[i] are smaller than all numbers in list a[i + 1]. So, first inserting all elements of b[] into the hash table and then enumerating the elements of the lists gives the numbers in sorted order.

    1. Change f() and add a method which outputs all elements in the hash table to an array which is passed as a parameter. Add a static method to main which assigns random values to all entries of an array.
    2. Use the program to sort arrays with random entries of length n = 2^k, for k = 16, 17, ..., measure the time and give the results in a table.
    3. How does the time develop as a function of n? Explain the observed behavior. Hint: give an expression for the probability that a list has length l, for any l >= 0, and compute these probabilities for l <= 5.

  2. The topic of this exercise is the memory efficiency of chaining versus open addressing. There are n items to store. We either use chaining with an array of length n or open addressing with an array of length 2 * n. For chaining we use an array of ListNode, each ListNode consisting of a key and a next field, the structure depicted above. For each of the solutions, how much memory does this take? Express your answer in words, assuming that every item of information requires one word. Which of the two solutions is most memory efficient?

  3. The topic of this exercise is the time efficiency of chaining versus open addressing with linear probing. One time unit is counted for every accessed memory position. Let n be the number of elements stored. We either use chaining with an array of length N = n or open addressing with an array of length 2 * n. When applying linear probing for an instance with these parameters an unsuccessful find accesses on average about four positions of the array. For the analysis of the chaining approach, we assume that the values f(k) are uniformly distributed, that is, the probability that f(k) = i equals 1 / N for all 0 <= i < N.
    1. For a chain of length l >= 0, how many time units does it take to perform an unsuccessful find?
    2. Compute the probability that for n = N a chain has length 0, 1, 2, 3, 4. You may assume that N is large and lower-order terms may be neglected. Hint: use that for large x, (1 - 1/x)^x ~= 1 /e.
    3. Use the computed values to estimate the expected time for an unsuccessful search when applying chaining. The expected value of a random variable X is given by E(X) = sum_x x * Probability(X == x). The probabilities for lengths larger than 4 are so small that they may be neglected.
    4. In the light of the above, which technique appears more efficient? Actually we should take into consideration that not all memory accesses cost the same, but that loading a new cache line costs much more than accessing an already cached memory position. Would this change your conclusion?

  4. Write a program which allocates a boolean array of length N. All entries are initialized with false, which is indicating that the positions are still free. Then generate N random numbers with value in {0, ..., N - 1}, which are inserted using linear probing: the array is traversed cyclically until finding the first free position, which is then set to true. During this insertion the average time for the insertions is measured as follows: for every N / 100 insertions, the number of hops is counted and printed after dividing by N / 100. Sketch the curve based on these measurements for N = 100,000. For which loading degree do we need more than 5 hops per insertion? Let us call this the maximum saturation. Try larger values of N. Does the maximum saturation depend on N?

  5. The topic of this exercise is hashing with linear probing. Deletions are normally performed lazily, marking deleted elements, but not removing them. This is done in order to keep the search integrality. Even though this space may later be reused, it implies that elements may stand unnecessarily far from the position to which they were hashed. Present an alternative deletion algorithm in which an element is really removed, while guaranteeing that finds can be performed correctly. Hint: Consider again how real deletions are performed on binary trees. Why is this idea much harder to implement for quadratic probing or double hashing?

  6. When applying hashing, the most important cost parameter is the average time per operation. However, the maximum time may be relevant too. For hashing with open addressing and linear probing, describe a simple modification of the insertion so that the distance between the position to which an element is hashed and where it is stored is minimized. By distance we mean the number of hops that must be made, not the difference of the indices. Mention two reasons why this idea does not work well for quadratic probing or double hashing.

  7. The topic of this exercise is quadratic probing. N denotes the length of the used array, k is the value of a key.
    1. Consider the example given above with N = 20 and f(k) = k mod N. How many different positions lie on the probe sequence p(k, t) = f(k) + t^2, for t = 0, 1, ... . Verify that 52 cannot be inserted after inserting 34, 55, 12, 8, 45, 37, 32, 88, 98, 54, 21, 42, 56 and 74,
    2. With quadratic probing, the number of different positions on the probe sequence of k is independent of k: it only depends on N and the way the sequence is constructed. Compute this number for all N <= 12. Also determine how many hops must be made before all these positions are reached.
    3. Give, as a function of N, an estimate of how many positions are visited in any case before returning to any earlier visited position. For which values of N is this estimate sharp?
    4. In the literature quadratic probing is mostly defined slightly differently: p'(k, t) = f(k) for t == 0, p'(k, t) = p'(k, t - 1) + t for t > 0. Compute the number of different positions on this probe sequence for all N <= 12. Also determine how many hops must be made before all these positions are reached. Is this kind of quadratic probing better in the sense that it reaches more positions, or reaches them faster?
    5. Formulate an hypothesis: for which N are all positions visited? Test this hypothesis for one further instance.

  8. The topic of this exercise is double hashing. N denotes the length of the used array, k is the value of a key. If N is not a prime number, f_2 must be defined with special care. At the same time there may be practical reasons not to take N a prime number. For example, when taking N = 2^l, for some l > 0, computing the modulo function can be performed in a single clock cycle. This also facilitates doubling the size when the current size is no longer sufficient.
    1. In general, working on an array of length N, which property should a value f_2(k) satisfy in order that the sequence p(k, t) = f_1(k) + t * f_2(k) visits all positions of the array?
    2. Let c(N) denote the number of values x so that |{t * x mod N| 0 <= t < N}| = N. So, c(N) gives the number of good choices for f_2(k). Compute c(60), c(2^l), for any l > 0, and c(30,000,000).
    3. Give an f_2, which assures that for N = 2^l, for any l > 0, and all k the probe sequence visits all positions of the array and which assumes c(2^l) different values itself.
    4. Answer the same question for N = 30,000,000. In this case f_2 is not a function in a closed mathematical form. Rather you should give a procedure which runs in O(1) time and returns an integer. For efficiently computing f_2, it may be necessary to have access to a precomputed table. The size of this table and the time for constructing it should be negligible.
    5. Why is it important that f_2 assumes many different values?





Union-Find

Definition

The defining properties of the subset ADT are union and find. It is used to maintain sets of subsets. Initially there are n subsets each consisting of 1 element. These elements may be assumed to be the numbers 0 to n - 1. Gradually subsets get fused (union) together and become larger. At the same time there are queries (find operations) asking for the unique identifier, the name of the subset to which an element belongs. Initially the name might be taken equal to the single number in the subset. Later it might be one of the numbers in the subset, possibly, but not necessarily, the smallest number. The only important thing is that find(x) and find(y) return the same values when x and y belong to the same subset, and different values when they belong to different subsets.

The above leaves much freedom in how to perform the unions and how to choose the names. A possibility is to apply a rule like "add first to second", meaning that an operation union(x, y) is performed by adding all elements of the subset in which x lies to the subset in which y lies. The new name for this set then becomes the name of the subset in which y lies.

Union-Find: Add First to Second

The subset ADT may sound unimportant, but it shows up everywhere. Subsets are the canonical example of the mathematical concept of equivalence relations. An equivalence relation is a binary operator "~" on elements of a set with the property that

If we read "~" like "in the same subset as", then all three properties are satisfied. The most important example of an equivalence relation is "is reachable": in a road system with two-way roads, all three properties are satisfied. Thus, the subset ADT can be used to compute something called the "connected components" of a graph.

The subset ADT allows to maintain equivalence relations dynamically: relations may be added by applying the union operation. The only limitation is that there is no de-union: once unified, the sets cannot be split anymore. This latter feature would be much more expensive to implement, in particular it would require that all previous operations are recorded.

There are several implementations, ranging from extremely simple giving modest performance (one operation O(1), the other O(n)), to slightly less simple giving excellent performance (almost constant time for both operations).

Array-Based Implementation

Simple Approach

The simplest way to implement the subset ADT is by maintaining an integer array a[], in which for every node we have stored the current index of the subset to which it belongs. Initially a[i] = i. find(i), then simply returns a[i].

A union is more complicated: if all the elements of a subset S have to be renamed, then, in the simplest implementation, we have to scan the whole array to find those which belong to S. This takes O(n) time. Thus, for all n - 1 unions (after n - 1 non-trivial all elements have been unified into one set ), we need O(n^2) time.

  void initialize() {
    for (int i = 0; i < n; i++)
      a[i] = i; }

  int find(int k) {
    return a[k]; }

  void union(int k1, int k2) {
    k1 = find(k1);
    k2 = find(k2);
    if (k1 != k2) // rename all elements in subset of k1
      for (int i = 0; i < n; i++)
        if (a[i] == k1)
          a[i] = k2; }

Array-Based Union-Find

If we are slightly more clever, we might maintain the elements that belong to a set in a linked list. In that case, we do not have to scan through all the elements when performing a union operation. A union is now performed by traversing one of the lists, accessing a[i] for each listed element i and updating it with the index of the other set. Then this list is hooked to the other list.

We consider an implementation of this. The lists are also implemented in an integer array b[]. In general the value b[i] gives the successor of i in its list. It is convenient to maintain a set of circular lists. The initial situation for this can be established by setting b[i] = i, for all 0 <= i < n. These ideas are worked out in the following piece of code:

  void initialize() {
    for (int i = 0; i < n; i++)
      a[i] = b[i] = i; }

  int find(int k) {
    return a[k]; }

  void union(int k1, int k2) {
    int i, j;
    k1 = find(k1);
    k2 = find(k2);
    if (k1 != k2) { // rename all elements in subset of k1
      i = k1;
      do {
        a[i] = k2;
        j = i;
        i = b[i]; }
      while (i != k1);
      // glue list of k1 into the list of k2 
      b[j] = b[k2];
      b[k2] = k1; } }

At a first glance this appears to be a very good idea: the implementation is simple and does not cause too much overhead. The work of a union is now proportional to the number of renamings that is, it is proportional to the size of the subset of k1 and no longer Theta(n).

However, consider the following sequence of unions

  for (int i = 1; i < n; i++)
    union(i - 1; i)
If always the first set is joined to the second, then in total we have to rename sum_{i = 0}^{n - 1} i = (n - 1) * n / 2 = Omega(n^2) elements. This is half as much as with the trivial implementation, and in view of the extra work for each renaming, it is actually no improvement at all.

Union by Size

The problem with the previous construction is that a large set is repeatedly joined to a small set. Performance improves tremendously if we maintain the size of the subsets and always join the smaller to the larger. In that case the number of nodes that is renamed is bounded by O(n * log n). We prove this.

Union-Find: Union by Size

How do we prove such a thing? The usual way of bounding the time of a sequence of operation is to put a bound on the time per operation. Here this approach does not work, because union operations may take liner time. In this case one should not perform an operation-based analysis, but an element-based analysis: bounding the cost per operation instead of bounding the cost per element. The idea is to consider the maximum number of times that any node may be given a new name. We show that this happens at most log n times. Then it follows that in total there are at most n * log n renamings.

Lemma: A node that has been renamed k times belongs to a set of size at least 2^k.

Proof: To be formal, we use induction. We must check a base case and a step. The base case is easy: a node that has been renamed 0 times belongs to a set of size at least 1 right from the start. Now suppose the lemma holds for all k' < k, for some k > 0. Consider a node x that has been renamed k - 1 times. The induction assumption implies that the size n_x of the subset to which x belongs satisfies n^x >= 2^{k - 1}. Assume x gets renamed again when performing union(x, y). Because unions are performed by-size, this implies that n_y >= n_x, where n_y gives the size of the subset to which y belongs. For the size n_{xy} of the subset created by the union, this gives n_{xy} = n_x + n_y >= 2 * n_x >= 2 * 2^{k - 1} = 2^k. End.

Corollary: A node can be renamed at most log n times.

Proof: Assume some node x is renamed k > log n times. Then according to the lemma it belongs to a subset of size n_x >= 2^k > 2^{log n} = n, which is impossible. End.

Theorem: When performing union-by-size, the time consumption of any sequence of n - 1 unions is bounded by O(n * log n) time.

This result is sharp: there is a sequence of unions that actually requires Omega(n * log n) renamings:
  for (int i = 1; i <= log n; i++)
    for (int j = 0; j < n / 2^i; j++)
      union(j * 2^i; j * 2^i + 2^{i - 1})
The number of renamings is log n * n / 2: in every round n/2 of the nodes get a new name. The factor two difference with the upper bound comes from the way the upper bound was proven: though every individual element may have to be renamed log n times, it is not possible that all elements are renamed that often. Mostly we do not care about such small factors.

Maintaining the size of the sets is trivial: if two sets are joined, the new size is the sum of the two old sizes. It is trivial to implement this using an additional array s[] for storing the sizes, initialized at 1. If we are very tricky (this is quite ugly hacking) then we can also do without this extra array: Normally, we find at position i of a[] the index of the subset to which node i belongs. If a[i] = i, then we can also flag this by just putting a[i] = -1 (or any other value that does not lie in the range [0, n - 1]). But then we can also store there the size of the subset of i. Here we use that we have one spare bit. If n is extremely large, needing all bits, then this idea does not work. In code this may be implemented as follows:

  void initialize() {
    for (int i = 0; i < n; i++) {
      a[i] = -1; 
      b[i] = i; } }

  int find(int k) {
    if (a[k] < 0)
      return k;
    return a[k]; }

  void union(int k1, int k2) {
    int i, j;
    k1 = find(k1);
    k2 = find(k2);
    if (k1 != k2) {
      if (a[k1] < a[k2]) { // the set of k1 is larger
        i = k1; 
        k1 = k2;
        k2 = i; }
      // rename all elements in subset of k1
      a[k2] += a[k1];
      i = k1; 
      do {    
        a[i] = k2;
        j = i;  
        i = b[i]; } 
      while (i != k1); 
      // glue list of k1 into the list of k2 
      b[j] = b[k2];
      b[k2] = k1; } } 

Click here to see the above piece of code integrated in a working Java program. This program is executing the same example with n = 10 as shown in the pictures.

Implementation Alternatives

We consider slightly closer the above implementations. The set of linked lists is realized in an array. In the following section we will see how a set of trees (with links directed towards the roots) is realized in an array. This is highly efficient: it saves memory and time. In this case it is also reasonable to do so.

What is special about the application of union-find, that we are using arrays here to realize linked structures, whereas before we were using a structure build of list nodes linked together? The answer is that in the current case, there is a fixed number of nodes which have keys from 0 to n - 1. This makes that we can apply direct addressing: the information for node k, including the information of its next field, is stored at position k of one or more arrays.

In the example above we are working with two arrays. Alternatively, we might also work with one array of objects of the following type:

  class ArrayNode
  {
    int a;
    int b;

    ArrayNode(int i)
    {
      a = -1;
      b =  i;
    }
  }
It becomes even more elegant if a boolean instance variable is added indicating whether a gives the size or the name of the list. An organization with ArrayNode objects is more object oriented then an organization with several arrays.

Which of these two organizations is more efficient depends on the memory access pattern. If there are several arrays, each array may be assumed to stand consecutively in the memory. This allows for speedy access to all information of one kind. This is more true if this information is accessed in a consecutive way, then if it is accessed by single accesses such as in the find operation. In an organization with one array of ArrayNodes, the information belonging to each node stands together. This makes it cheaper to access several fields of a node as is done in the union operation. Of course using ArrayNodes causes some overhead because there is an extra indirection: accessing nodes[i].b is more involved then b[i].

Memory Organization

Tree-Based Implementation

A key feature of find is that it does not need to return any specific name, it just should be the same for all elements belonging to the subset and different for elements of other subsets. Another point is that it is acceptable that a find operation takes more than constant time if this helps to reduce the time of execution for the whole set of unions and finds to perform. This allows for a lot of flexibility, which will be exploited.

Simple Approach

A suitable implementation of the disjoint-subset ADT is by using a set of trees. Initially each node has its own tree of size one. find(k) returns the index of the root of the tree of k. union(k1, k2) hooks the root of one of the involved trees to the root of the other tree if these are not the same any way.

Tree-Based Union Find

This idea can be realized very simply using an array to represent the set of links:

  void initialize() {
    for (int i = 0; i < n; i++)
      a[i] = i; }

  int find(int k) {
    while (a[k] != k)
      k = a[k]; 
    return k; }

  void union(int k1, int k2) {
    k1 = find(k1);
    k2 = find(k2);
    if (k1 != k2) // hook k1 to k2
      a[k1] = k2;

Here a root k of a tree is characterized by the fact that it has a[k] == k. The good thing is that with the tree-based implementation, there is no need to access all elements of a set. Thus, there is no need to have the additional list structure requiring a second array. Conceptually using trees is a step, but practically it is even easier than the array-based implementation.

How about the efficiency? A find requires that one runs up the tree to find the index of the root. A union, once the two finds have been performed, is trivial, it just requires that one new link is created. So, here we have reduced the cost of the union at the expense of more expensive finds. The finds can actually be arbitrarily expensive: if the tree degenerates, it can have depth close to n. In that case finds may take linear time.

Tree-Based Union-Find

Union by Size

As for the previous approach, it is a good idea to maintain for every subset its size and to join the smaller subset to the larger one. In code this requires only small modifications:
  void initialize() {
    for (int i = 0; i < n; i++)
      a[i] = -1;  }

  int find(int k) {
    while (a[k] >= 0)
      k = a[k]; 
    return k; }

  void union(int k1, int k2) {
    k1 = find(k1);
    k2 = find(k2);
    if (k1 != k2) {
      if (a[k1] < a[k2]) { // the set of k1 is larger
        int i = k1; 
        k1 = k2;
        k2 = i; }
      a[k2] += a[k1];
      a[k1] = k2; } }

Here a root k of a tree is characterized by the fact that it has a[k] < 0. In that case -a[k] gives the size of the tree.

Lemma: A tree of depth k has at least 2^k nodes.

Proof: The proof goes by induction. To settle the base case, we fix that a tree with one node has depth 0. Now assume the claim is correct for given k. How do the depths develop? If T_1 is joined to T_2, and T_1 has depth smaller than T_2, then nothing changes. If T_1 has depth >= the depth of T_2, then the new depth equals the depth of T_1 + 1. While performing unions, new trees of depth k + 1 can thus only arise when a tree T_1 of depth k is joined to another tree T_2, which, because of our clever joining technique, must have at least as many nodes as T_1. Because of our induction assumption, T_1 has at least 2^k nodes, and thus T_1 + T_2 has at least 2^{k + 1} nodes. End.

Corollary: Using trees and performing union-by-size, union takes O(1), while the time for a find is bounded by O(log n).

Proof: The time to perform find(x) is proportional to the depth of node x in its tree. So, assume some node x has depth k > log n. Because the depth of a tree equals the maximum of the depths of all its nodes, this implies that the depth of the tree of x is at least k. According to the lemma this implies that the size n_x of the tree satisfies n_x >= 2^k > 2^{log n} = n, which is impossible. End.

The given bound is sharp: trees of logarithmic depth may really arise when repeatedly performing union for trees of equal size. If we are mainly interested in limiting the depth of our tree, then we can just as well perform union by depth: the shallower of the two trees (if any) is hooked to the other. Doing this, it is even easier to prove that the number of nodes in a tree with depth k is at least 2^k and that consequently the depth is bounded by log n.

Path Contraction

If we compare the tree-based approach with the simpler approach (both with union-by-size or by height), then we see that we have one constant time and one logarithmic time operation in each case. This appears equally good, but one should realize that there may be arbitrarily many finds, whereas the number of unions is limited to n. So, the tree-based idea as it is should be considered to be inferior. However, it can be made much better.

The only further algorithmic idea in this domain is path contraction. That means, that when we are performing find(k), after we have found that find(k) = r, we start once more at k and link all nodes on the path directly to r. This makes the individual finds twice as expensive, but has a very positive impact on the structure of the tree. The idea that expensive operations lead to an improvement of a search structure is not limited to union-find. Similar ideas are also applied for search trees and priority queues. Notice that even a union operation involves two finds, so even a union may lead to changes in the trees more than just hooking one to the other.

Using the same initialize as before, the code for find now looks as follows:

  int find(int k) {
    int l = k;
    while (a[l] >= 0)
      l = a[l];
    // Now l == find(k)
    while (a[k] > 0) {
      int m = a[k];
      a[k] = l;
      k = m; }
    // Now all nodes on the path point to l
    return l; }

Click here to see the above piece of code integrated in a working Java program. This program is executing the same example with n = 10 as shown in the pictures.

The idea of path contraction is that we invest something extra right now, in order to exclude that in the future we have to walk this long way again.

Union-by-size/depth and path contraction

There are alternative implementations of this idea. We can also save the second run, by keeping a trailer and just reducing the depth of the search path by a factor two. This reduces the number of elements to address and may therefore be slightly faster in practice:

  int find(int k) {
    int l = k;
    int m = k;
    while (a[l] >= 0) {
      a[m] = a[l];
      m    = l;
      l    = a[l]; }
    return l; }

The combination of union-by-size (or by height, although the height information may become inaccurate due to the finds) and path contraction leaves very little to desire. A partial analysis is given separately in the next section. Even understanding how good exactly the algorithm is is not trivial.

Theorem: The time for an arbitrary sequence of m unions and finds is bounded by O(m * log^* n) provided m >= n.

Here log^*, pronounced log-star, is the function informally defined as "the number of times the log function must be applied to reach 1 or less". More formally, log* n = min{i >= 0| log^(i) n <= 1}. Here log^(i) n denotes i the function obtained by i times applying the log-function. More generally, for any function f, f^(i) is defined by
f^(1)(n) = f(n)
f^(i)(n) = f(f^(i - 1)(n)
log* 1 == 0, log* 2 == 1, log* 4 == 2, log* 16 == 3, log* 65536 == 4, log* 2^65536 == 5. In practice log* cannot be distinguished from a constant. The actual results is even much stronger: the time for any sequence of m >= n unions and finds is bounded by O(m * alpha(m, n)), where alpha(,) is called the inverse Ackermann function. For any m slightly larger than linear, alpha(m, n) is constant.

It costs very little extra to perform the union by size, but still one may wonder whether it is necessary. Possibly just performing the finds with path contraction might be enough. This makes the union procedure even simpler and faster. Trying examples suggests that this has almost no negative impact on the time of the finds. However, doing this, there is a sequence of n unions and n finds requiring Omega(n * log n) time. This shows that, at least in theory, one really needs the combination of union-by-size and path contraction to obtain the best achievable performance.

Analysis

In this section we partially analyze the extremely good performance of tree-based union-find using union-by-size and path contraction, proving the above theorem. However, as already announced, the actual result is even much better, being formulated in terms of the inverse Ackermann function, which will be considered first.

Ackermann Function

Ackermann's function is defined as follows:
A(1, j) == 2^j, for j > 0
A(i, 1) == A(i - 1, 2), for i > 1
A(i, j) == A(i - 1, A(i, j - 1)), for i, j > 1.
Ackermann's function grows terribly fast. It is instructive to fill in a small square of values. Now one can define alpha(m, n) == min{i >= 1| A(i, m / n) > log n}. Because of the growth rate of A(i, j), is alpha(m, n) practically bounded by 4, even for m / n == 1.

Values of the Ackermann Function

Theorem: The time for an arbitrary sequence of m unions and finds is bounded by O(m * alpha(m, n)) provided m >= n. This bound is sharp.

The proof of this is technical and omited. Instead we prove the weaker theorem above, which is still incredibly strong in itself. Comparing the the two theorems, we see that the "weakness" of the first is most notable for slightly larger values of m: Because A(2, j) == 2^2^ ... ^2, with in total j exponentiations, we have that A(2, log* n) == n. Thus, if m / n > log^* n, then alpha(m, n) == 2, and thus we find that already for m which are only a little bit larger than n, m operations can be performed in O(m) time and not something that is super-linear in m.

Using Ranks

We first slightly modify the algorithm. Instead of hooking by size, we are going to hook by rank. The rank of a tree, maintained at its root, is the depth of the tree without considering the effect of the path compressions. That is, it gives the depth of the tree when only performing a sequence of unions. It is easy to maintain the ranks: hooking a tree with rank r_1 to a tree with rank r_2, gives resulting rank r_2 if r_1 < r_2 and r_2 + 1 if r_1 == r_2. So, union by rank can be performed in constant time, once the roots of the trees are found just as union-by-size.

Let us summarize the algorithm:

Ranks may only change with unions, so for proving results on the numbers of nodes with given ranks, we can forget the path compressions.

When only performing a set of unions, a node of rank r has at least 2^r descendants (counting the node itself as well).
The proof goes by induction over time t. In other words, we reformulate the claim as an invariant property: at all times any tree with rank r has at least 2^r nodes. First we check that the claim is true for t = 0: initially all trees have size 1 and rank 0, so this is ok. At any given time two trees are hooked together. If the rank of the new tree is unchanged, then certainly no tree violating the condition is resulting. The rank only increases when two trees with the same rank r are hooked. Each of them has at least 2^r nodes because of the invariant property. The resulting tree thus has at least 2^{r + 1} nodes, preserving the invariant.
The ranks decrease strongly monotonically on a path away from the root.
Because of the union-by-rank rule, this is obvious when we would not perform path contraction. However, if a node v is a descendant of w after path contraction, then it was already a descendant of w before path contraction, and thus must v have smaller rank than w.
There are at most n / 2^r nodes of rank r.
Consider a node of rank r. Without path compression it would be the root of a subtree of size at least 2^r. All these subtrees are disjoint. The path compression has no consequences for the rank, and the unions are also performed independently of them, as they only consider ranks and not the actual depths. So, there can be at most n / 2^r nodes of rank r.

Counting Trick

Even knowing the above, it is not easy to proof the main result directly. Mostly a proof of a bound of f(n, m) on the running time of an algorithm performing m operations on a structure with n elements can be given by doing one of the following: In our case there is a rare and interesting mixture of both proof techniques: Costs will be both accounted to the operations and to the elements. As we are interested in minimizing the sum of the two cost factors, we will at the end choose things so that both contributions are approximately equal: m * log* n.

The actual proof is slightly technical but the idea is really simple and beautiful. The ranks are somehow divided in rank groups consisting of consecutive ranks. F(g) gives the largest rank in group g. So, group g comprises all of the F(g) - F(g - 1) ranks F(g - 1) + 1, ..., F(g). Let G(n) be the total number of rank groups.

Consider a find starting in a node v and leading to the root of its tree r. Whenever we follow a link (w, w') leading to another rank group, or when w' = r, or when w = r (the final step), we account one cost unit to the find operation. Because there are only G(n) rank groups and 2 final steps, this allocates at most G(n) + 2 cost units to any find. Notice that this result holds independently of the path-contraction we are performing.

For following all other links (that is all non-final links within a rank group), the cost unit is accounted to the node w and not to the find operation. In order to bound these costs over the course of the operations, we need that we apply path-contraction. Because of this, we know that w will get a new link, leading to a node w'' higher in the tree. By our earlier proof, we know that the rank of w'' is strictly larger than the rank of w'. Thus, if w belongs to rank group g, we are allocating at most F(g) - F(g - 1) cost units to w under this rule. Notice that the above argument applies just as well to the alternative path-contraction technique in which the path length is only halved.

The total cost is now bounded by

m * (G(n) + 2) + sum_{g = 0}^{G(n) - 1} #{nodes in rank group_g} * (F(g) - F(g - 1))
The number of nodes in rank group g, starting with rank F(g - 1) + 1, can easily be estimated because we know that there are at most n / 2^x nodes with rank x. This gives that there are at most sum_{r = F(g - 1) + 1}^F(g) n / 2^r <= n / 2^F(g - 1) nodes in rank group g. So, our formula becomes
m * (G(n) + 2) + sum_{g = 0}^{G(n) - 1} n * (F(g) - F(g - 1)) / 2^F(g - 1)
Which we simplify to
m * G(n) + n * sum_{g = 0}^G(n) F(g) / 2^F(g - 1)
A clever choice is F(g) = 2^F(g - 1). Then the left term gives m * log* n, and the right term only n * G(n) = n * log* n.

Exercises

  1. We consider array-based union-find for a set of 8 elements, using arrays a[] and b[] as in the examples of the text. As union strategy we either use first-to-second or by-size. The following union operations are performed: (3, 5), (5, 1), (1, 2), (0, 7), (3, 1), (2, 5), (7, 6). For each of the union strategies, give the complete sequence of resulting a[] and b[] values and indicate for each operation the number of renamings.

  2. We consider tree-based union-find for a set of 8 elements, using an array a[] as in the examples of the text. As union strategy we either use first-to-second or by-size. The following union operations are performed: (3, 5), (5, 1), (1, 2), (0, 7), (3, 1), (2, 5), (7, 6). For each of the union strategies, give the complete sequence of resulting a[] values and also draw the corresponding sets of trees.

  3. For array-based union-find applying union-by-size it was shown that each element could be renamed at most log n times and that therefore the total number of renamings is bounded by n * log n. In the given example the number of renamings is bounded by n / 2 * log n. Prove that this latter value is sharp. That is, show that the maximum number of renamings is bounded by n / 2 * log n. You may assume that n = 2^l for some positive l. Hint: use an argument involving a potential function. Denoting the size of the subset of element i by n_i, the potential of a set of subsets is given by sum_{0 <= i < n} log (n / n_i) / 2. Compute the potential for the initial and the final situation and show that the number of renamings during any union operation is bounded by the decrease of the potential.

  4. Rewrite the tree-based union-find without path contraction so that the union is performed by depth and not by size. Prove that also in this case the depth is bounded by O(log n).

  5. We consider tree-based union-find for a set of 10 elements. As union strategy we use first-to-second. For the finds we use path contraction. The following union operations are performed: (3, 5), (9, 4), (5, 2), (8, 4), (0, 7), (7, 4), (4, 1), (1, 2), (2, 6). Draw the resulting tree. Now the following find operations are performed: 4, 5 and 0. Draw the tree after each operation.

  6. We consider tree-based union-find with first-to-second union strategy and finds with path contraction. The path contraction hooks all nodes on the search path directly to the root of the tree. One time unit is counted for each traversed tree link. For a set of n elements, give a sharp upper bound for the time of any choice of m consecutive find operations. So, these finds are not interrupted by unions. Prove the correctness of the given bound. For which m is the amortized time per find operation constant?

  7. We consider tree-based union-find with first-to-second union strategy and finds with path contraction. For the path contraction the alternative strategy is applied, hooking each node on the search path to its grandparent. One time unit is counted for each traversed tree link. For a set of n elements, give a sharp upper bound for the time of any choice of m >= n consecutive find operations. So, these finds are not interrupted by unions. Prove the correctness of the given bound. For which m is the amortized time per find operation constant?

  8. We consider tree-based union-find with first-to-second union strategy and finds with path contraction. The path contraction hooks all nodes on the search path directly to the root of the tree. Construct a sequence of O(n) unions and finds taking Omega(n * log n) time. Hint: first look for trees of depth k = 3, 4, ..., which more or less are found back after one union and one find.

  9. We consider tree-based union-find with first-to-second union strategy and finds with path contraction. For a set of n elements, the following unions are performed: (i, i + 1), for all 0 <= i <= n - 2. Show the resulting tree for n = 16. The path contraction is performed with the alternative method, traversing the path to the root only once. Draw the resulting trees after performing find(4) and find (0). In general, when performing find(k) for an element k lying at distance d from the root of its tree, at what distance does it exactly lie after the find operation?

  10. Describe a schedule of unions and finds, performed by-rank and with path compression, leading to a somewhat larger time consumption. Hint: Start with a very basic goal, like "how can we assure that during n finds at least 2 * n links are traversed?".

  11. The given program, can be used for testing the performance of the four combinations of union-by-size and path contraction:

    First build tree structures, performing n - 1 union operations by picking at any time two of the remaining roots at random. Then perform k * n find instructions. Count the number of links traversed until reaching the roots. The cost measure is the average number per operation over the last n find instructions.

    Perform the above tests for one given large n, for example n = 2^25 and consider how the numbers develop for k = 1, 2, 3, ... . Perform the above tests for k = 1 and n = 2^x, for x = 10, 11, 12, ... . The experiments for small x must be repeated sufficiently often. Plot the results as a function of x. Which strategy appears to be the best choice in practice considering both performance and overhead?".

  12. When performing tree-based union-find, it is no longer possible to efficiently print an overview of the elements in all subsets. In the given program, the method print has high complexity. Specify this complexity as a function of n, the number of elements. Now write a procedure, either in Java or in pseudo-code, which computes out of the information available in the array a[] an array b[]. b[] is defined as follows: b[i] contains the successor of i in a circular list containing all elements of the subset to which i belongs. The given procedure should have linear complexity.





Sorting

Introduction

Sorting is one of the richest topics in the theory of algorithms. For many different applications and cost models, hundreds of algorithms and implementations have been designed. Sorting is needed for all kinds of operations, particularly also to get objects with the same key in consecutive positions. For example for bundling mail to the same address. For such applications one would not need sorting (which implies a total order) but for this "simpler" problem, we do not really know more efficient solutions (though hashing with chaining works well in practice).

Sorting is one of the few problems for which a non-trivial lower bound can be proven relatively easily. We will do this, showing that under reasonable assumption sorting requires at least Omega(n * log n) time. The remarkable thing is that at the same time we know algorithms running in O(n * log n). So, the sorting problem has been solved in the sense that there are optimal algorithms, algorithms whose running time matches the lower bound.

Orderings

The only requirement for being able to sort a set of objects is that the set S from which their keys are coming is strongly ordered. That is, on S there should be defined a transitive relation "<", so that for any two elements x, y in S, with x != y, either x < y or x > y. The simplest case is that "<" stands for the operator with this symbol over one of the numerical types. But an ordering can also be defined on characters and by extension on strings. Strings can be ordered lexicographically in a recursive way:
  boolean isSmaller(String s1, String s2) {
    if (s1 == "" && s2 == "") // equal strings
      return false;
    if (s1 == "") // shorter string with common prefix
      return true;
    if (s2 == "") // longer string with common prefix
      return false;
    Char c1 = head(s1); // first character of s1
    Char c2 = head(s2); // first character of s2
    if (c1 == c2) // common first character
      return isSmaller(tail(s1), tail(s2)); // compare remainders
    return c1 < c2; } // compare first character

The lexicographical ordering is not limited to strings. The only typical thing is that strings may have arbitrary length. More generally an ordering can be defined on a product set S_1 x S_2, whenever there is an ordering both on S_1 and on S_2. These do not need to be the same. The ordering relations are denoted "<_1" and "<_2", respectively. The ordering "<" is then defined by (x_1, x_2) < (y_1, y_2) = (x_1 <_1 y_1) or (x_1 == y_1 and x_2 <_2 y_2). If a product of more than two sets S_1 x S_2 x ... x S_n is interpreted as S_1 x (S_2 x ... x S_n), the definition of the lexicographical ordering extends in a natural way.

Definition

The most common case is that one has an array a[] of objects of some non-elementary type. These objects have keys of a strongly ordered type. The sorting problem is to rearrange the objects in the array so that afterwards a[i].key <= a[j].key for all i < j. The task is to do this as efficiently as possible, minimizing the time and memory consumption.

The objects in the array may be large (for example, records of personal data). Fortunately, this does not need to bother us: an array of objects actually consists of an array of pointers to the objects. Thus, independently of their size two objects can be swapped in constant time:

  if (a[i].key > a[j].key)
  {
    myClass x = a[i];
    a[i]      = a[j];
    a[j]      = x;
  }
Alternatively one might copy all instance variables one-by-one, but for objects with many instance variables this would be much more expensive.

Scope and Goal

Because of the above observation there is no fundamental difference between sorting arrays of integers and sorting arrays of any other type. Therefore, without loss of generality, we will focus on sorting arrays of integers in the remainder of the chapter.

The most common cost measure is the number of comparisons made. One might also count the number of assignments, but because essentially these cost the same as comparisons, it does not make sense to reduce the number of assignments at the expense of substantially more comparisons.

So, the main goal in this chapter is to find sorting algorithms that minimize the number of comparisons. Only at the end we will encounter a special case which can be solved by counting rather than comparing.

Bubble Sort and Variants

Bubble Sort

For sorting an array of given length n, we can apply the following very simple piece of code:
  void sort(int[] a, int n)
  {
    for (int r = n - 1; r > 0; r--)
      for (int i = 0; i < r; i++)
        if (a[i] > a[i + 1])
          {
            int x    = a[i];
            a[i]     = a[i + 1];
            a[i + 1] = x;
          }
  }

Click here to see the above piece of code integrated in a working Java program.

The underlying algorithm is called bubble sort. The name comes from the bubble-like way the elements move through the array. The algorithm is correct because the following invariant property holds:

Lemma: At the end of round r, n >= r >= 0, the largest n - r elements have reached their destination positions.

Proof: The proof goes by induction. For r == n, it is clearly true. So, assume the statement holds for a given r. Because the largest n - r elements are already positioned correctly and are not addressed any more, they are of no importance: we are just working on the subarray of r elements. Consider the maximum m of this subarray. Assume at the beginning of round r - 1 it is located in a[j]. When i == j, m is exchanged and put at position i + 1. Then i is increased and m is exchanged again. In this way it bubbles until it reaches position r - 1 at the end of round r - 1, proving the induction step. End.

Thus, at the end of round 1, the largest n - 1 elements have reached their destination positions. For the smallest element, this leaves no place to stay but position 0. Thus, by then all elements have reached their destination positions.

Sorting 10 Elements with Bubble Sort

Bubble sort is simple and correct, but relatively inefficient: The number of comparisons equals (n - 1) + (n - 2) + ... + 1 = n * (n - 1) / 2 = Theta(n^2). For an array sorted in reversed order (that is: in decreasing order) each comparison results in a rearrangement, requiring three assignments. By testing whether still something is happening, one might save a few rounds, but for any arrangement with the largest element in position n - 1 the n - 1 rounds are really required, because elements move at most one position to the left in every round.

One can easily invent variants of bubble sort. In the following we present some of these. All of them have quadratic running time, but they may be faster by a constant factor.

Odd-Even Transposition Sort

In bubble sort the comparisons run in waves from left to right, waves that become shorter towards the end of the algorithm. It is also possible to perform n rounds with about n / 2 comparisons in each of them. In the "even" rounds the algorithm compares all pairs a[2 * i] and a[2 * i + 1], for all i, 0 <= i < n / 2. In the "odd" rounds the algorithm compares all pairs a[2 * i + 1] and a[2 * i + 2], for all i, 0 <= i < n / 2 - 1. This algorithm is called odd-even transposition sort.
  void sort(int[] a, int n)
  {
    for (int r = 0; r < n; r++)
      for (int i = r & 1; i + 1 < n; i += 2)
        if (a[i] > a[i + 1])
          {
            int x    = a[i];
            a[i]     = a[i + 1];
            a[i + 1] = x;
          }
  }

Click here to see the above piece of code integrated in a working Java program.

The number of operations is about the same as with bubble sort. The correctness is not obvious. It can be proven most easily by arguing that any sequence consisting of only zeroes and ones gets sorted correctly. Then applying the so-called "zero-one sorting lemma" gives the result. This proof can be found in most text books on parallel algorithms.

Applying odd-even transposition sort the order in which the operations in a round are performed does not matter, which means that on a computer with more than one processor these operations can be executed in parallel. It also means that it is very easy to restructure the inner loop so that it can be executed faster on a modern processor with deep pipelines and some kind of parallelism in the processor itself.

Sorting 10 Elements with Transposition Sort

Selection Sort

The main effect of a bubble sort round is to transport the largest remaining element to the rightmost position. Some other elements are also rearranged, but this is not exploited. However, there is no need to move this largest element step by step: we can just as well first determine the maximum, and then with a single exchange move it to its destination position. This sorting algorithm is called selection sort.
  void sort(int[] a, int n)
  {
    for (int r = n - 1; r > 0; r--)
    {
      int x = 0;
      for (int i = 1; i <= r; i++)
        if (a[i] > a[x])
          x = i;
      int y = a[x]; 
      a[x] = a[r]; 
      a[r] = y;
    }
  }

Click here to see the above piece of code integrated in a working Java program.

The number of comparisons is the same as before, but the maximum number of assignments is now reduced to n^2 / 2 + O(n). For randomly permuted elements, the number of assignments is even much smaller (it is bounded by O(n * log n)). This means that in practice the time consumption is determined by the comparisons. This is better than with the basic algorithm, because there most comparisons result in an exchange, even for random inputs.

Sorting 10 Elements with Selection Sort

Insertion Sort

Another variant works also in rounds, but differently: after round r, all numbers in the positions r, ..., n - 1 are sorted, while the other numbers have not yet been touched at all. It is called insertion sort because repeatedly a new number is inserted in an already sorted sequence.

In an implementation it is handy to first determine the maximum element and placing it in the last position. This element then can be used as a sentinel: having the largest element in place implies that there is no possibility to run out off the array. Thereby, it is not necessary to test whether the indices are still valid. After the maximum element has been swapped to position n - 1 of a[], we can be sure that a[n - 2] <= a[n - 1], and therefore we can skip round n - 2.

  void sort(int[] a, int n)
  {
    int x = 0;
    for (int i = 1; i < n; i++)
      if (a[i] > a[x])
        x = i;
    int y = a[x]; 
    a[x] = a[n - 1]; 
    a[n - 1] = y;
    for (int r = n - 3; r >= 0; r--)
    {
      x = a[r];
      int i = r;
      while (x > a[i + 1])
      {
        a[i] = a[i + 1];
        i++; 
      }
      a[i] = x;
    }
  }

Click here to see the above piece of code integrated in a working Java program.

If the array is sorted in the wrong order, the new number must be compared with all existing numbers. In that case the number of comparisons and assignments is about n^2 / 2. For a random input, the new number only goes half-way on average. So, for random inputs the expected number of comparisons and assignments is about n^2 / 4. So, the number of passes through a loop tends to be only half as large as for selection sort. This makes that in practice the algorithm runs almost twice as fast. Of course the number of comparisons of elements can easily be reduced to O(n * log n), determining the position of the new number with binary search. However, this complicates the algorithm and some loop condition must be tested anyway, so this will be profitable only when comparing keys is more expensive than comparing integers.

Sorting 10 Elements with Insertion Sort

Importance

The alternative algorithms might be somewhat better than the basic one, but they are still not good. Even on a fast computer they will take on the order of an hour to sort 1,000,000 integers. Applying quick sort this takes on the order of a second. From this one might conclude that bubble sort and its variants are meaningless. This is not true for several reasons:

Merge Sort

A much better, and still simple procedure is merge sort. It is based on the notion of merging which is important of its own. Merging is the procedure of turning two sorted arrays of length n_1 and n_2, respectively, into one array of length n_1 + n_2 with all elements in sorted order. Thus, out of (7, 12, 18, 40, 41, 47, 85) and (1, 3, 5, 43, 45, 49), with n_1 = 7 and n_2 = 6, we should make (1, 3, 5, 7, 12, 18, 40, 41, 43, 45, 47, 49, 85). Of course this could be achieved by any sorting procedure, but that would not be clever, as merging is much easier: the fact that the two arrays are already sorted should be exploited.

Merging

Merging is simple. The following gives the basic idea:
  static int[] merge(int[] a, int[] b)
  {
    int i, j, k, n = a.length + b.length;
    int[] c = new int[n];
    for (i = j = k = 0; k < n; k++)
    {
      if (a[i] <= b[j])
      {
        c[k] = a[i];
        i++;
      }
      else
      {
        c[k] = b[j];
        j++;
      }
    }
    return c;
  }

One has to be careful with the end of the arrays a[] and b[]. In the current routine we may run beyond it. A handy idea is to add a sentinel: an element with key infinity at the end of each of them: this will prevent going beyond the ends because all elements in the array have smaller key values, and thus these will be written first. As an alternative, one can check again and again whether i < a.length and j < b.length. Once this is no longer true, one should copy the possible remainder of a[] or b[] to c[].

Merging takes linear time: the algorithm writes one element for every comparison performed. The last comparison can actually be saved, because as soon as there is only one non-sentinel left, it can be written away without comparison. The operation would be perfect if we would not need the extra space. Theoretically this is not so beautiful, and practically it means that we need memory access to twice as many data. These data must be brought in the cache which takes some extra time. It is quite likely that this time for bringing data in cache dominates the total costs and thus this may take almost twice as long as a comparable routine not requiring additional space. If the merge procedure is called many times, the allocation and deallocation may be expensive too.

Sorting

Now that we know about merging, it is not hard to construct a sorting algorithm. We start with sorted arrays of length one. Then we apply the algorithm for k = log_2 n rounds to all pairs of adjacent subarrays. In round r, 0 <= r < k, we are merging n / 2^r subarrays each of size 2^r. So, each round involves merging operations involving in total n elements. So, each round takes n comparisons, and thus the whole algorithm is running in O(n * log n) time, which is optimal. The algorithm is easiest for n = 2^k, for some integral k > 0, but can easily be modified for other n:
  static void merge(int[] a, int b[], int l, int m, int h)
  {
    int i = l, j = m, k = l;
    while (i < m && j < h)
      if (a[i] <= a[j])
        b[k++] = a[i++];
      else
        b[k++] = a[j++];
    while (i < m)
      b[k++] = a[i++];
    while (j < h)
      b[k++] = a[j++];
    for (i = l; i < h; i++)
      a[i] = b[i];
  }

  static void sort(int[] a, int n)
  {
    int[] b = new int[n]; // scratch space
    for (int d = 1; d < n; d *= 2)
    {
      for (int l = 0; l < n; l += 2 * d)
        if (l + 2 * d <= n)
          merge(a, b, l, l + d, l + 2 * d);
        else if (l + d < n)
          merge(a, b, l, l + d, n);
    }
  }

Merge Sort

Practical Remarks

It is a good idea to create the additional array once at the start of the algorithm to be of length n and using it in all passes of the loop, swapping the elements back and forth between the two arrays, saving unnecessary copying operations. In Java this is kind of clumsy to realize because we need pointers to int[] for this, but counting the number of performed swaps, at the end it can be determined whether the internal variable a[] finally corresponds to the external variable that was originally passed. The following runs about 20% faster than the above:
  static void merge(int[] a, int b[], int l, int m, int h)
  {
    int i = l, j = m, k = l;
    while (i < m && j < h)
      if (a[i] <= a[j])
        b[k++] = a[i++];
      else
        b[k++] = a[j++];
    while (i < m)
      b[k++] = a[i++];
    while (j < h)
      b[k++] = a[j++];
  }

  static void sort(int[] a, int n)
  {
    int[] b = new int[n], c; // scratch space and dummy
    int r = 0;
    for (int d = 1; d < n; d *= 2)
    {
      r++;
      for (int l = 0; l < n; l += 2 * d)
        if (l + 2 * d <= n)
          merge(a, b, l, l + d, l + 2 * d);
        else if (l + d < n)
          merge(a, b, l, l + d, n);
        else
          for (int i = l; i < n; i++)
            b[i] = a[i];
      c = a; a = b; b = c; // swap a and b
    }
    if ((r & 1) == 1) // odd number of rounds
      for (int i = 0; i < n; i++)
        b[i] = a[i];
  }

Click here to see the above piece of code integrated in a working Java program.

The running time in practice depends on two factors: the number of operations performed and the amount of data that must be brought into the cache. The first is closely related to the number of comparisons made, the second is linear in the number of rounds performed. Most likely, it will be profitable to reduce the number of rounds even if this requires extra comparisons (within reasonable limits).

One idea is to start by sorting runs of some well chosen length by an asymptotically slower procedure that is faster for small numbers. Maybe one can find an optimal method for sorting all sequences of 8 elements. Then one saves three passes through the loop. If n = 1024, this means 7 passes instead of 10. In general, this idea reduces the number of rounds by a constant. If the initial sorting is implemented in an optimal way, this does not increase the number of comparisons.

With one comparison we can find the maximum of two elements. With two comparisons we can find the minimum of three elements:

  int minimum(int a, int b, int c)
  {
    if (a <= b) 
    // b is not the minimum
    { 
      if (a <= c)
        return a;
      return c;
    }
    // a is not the minimum
    if (b <= c)
      return b;
    return c;
  }

This can be used to merge three arrays into one: compare the current elements of the arrays and write away the smallest. Using this, we can repeatedly multiply the length of the sorted sequences by three and thus perform the whole sorting in log_3 n rounds. As log_3 n = log_2 n * ln 2 / ln 3 = 0.63 * log_2 n, this reduces the number of rounds by a constant factor. The number of comparisons has increased from about n * log_2 n to 2 * n * log_3 n, which is larger by a factor 2 * ln 2 / ln 3 = 1.26. This is so little, that it will certainly be profitable to perform this 3-way merge. One can push further and apply 4-way or even multi-way merge. In the latter case, the header elements of the arrays to be merged are inserted in a priority queue and one repeatedly calls deleteMin().

Quick Sort

Merge sort is a typical bottom-up algorithm: starting with small problems, larger and larger problems can be solved. It can also be written in a recursive way, but even then it ends by merging two arrays of length n / 2. Sorting can also be done in a top-down manner:
  1. Select a splitter value s.
  2. Determine the sets of elements which are smaller and larger than s.
  3. Sort each of the sets recursively.
The algorithm based on this idea is called quick sort. It is one of the most efficient sorting algorithms. The idea of splitting a problem and then solving the resulting smaller problems is also called a divide-and-conquer approach.

Algorithm

The simple idea of the algorithm is sketched above. The main theoretical question is how to select the splitter s. It is not good to choose a fixed value, because then all elements in the set S which has to be sorted may be smaller or larger. It is better to select an element from S itself. A good idea is to select the splitter s uniformly at random. We start with a simple pseudo-code work-out:
  void sort(int[] a, int n)
  // sort n elements standing in positions 0, ..., n - 1 of a[]
  {
    if (n > 1)
    {
      int i;
      int a_smaller[n],  a_equal[n],  a_larger[n];
      int n_smaller = 0, n_equal = 0, n_larger = 0;
      int s = a[randomly generated number x, 0 <= x < n];

      // Split the set of elements in three subsets using s
      for (i = 0; i < n; i++)
        if      (a[i] < s)
          a_smaller[n_smaller++] = a[i];
        else if (a[i] == s)
          a_equal[n_equal++] = a[i];
        else
          a_larger[n_larger++] = a[i];

      // Solve two recursive subproblems
      sort(a_smaller, n_smaller);
      sort(a_larger, n_larger);

      // Combine the results
      for (i = 0; i; < n_smaller; i++)
        a[i] = a_smaller[i];
      for (i = 0; i < n_equal; i++)
        a[i + n_smaller] = a_equal[i];
      for (i = 0; i < n_larger; i++)
        a[i + n_smaller + n_equal] = a_larger[i];
    } 
  }

Quick Sort

Analysis

For a number s from a set S, the rank equals the number of elements in S smaller than s. The best case for quick sort is that the splitter s has rank n / 2: in that case two subproblems of half the original size result. An element with rank n / 2 in a set with n elements is called a median of S. If by chance all selected splitters are medians in their sets, then the size of the treated problems is reduced by a factor two every round, and we thus have a recursion tree of depth log_2 n. Because at every level of the recursion all elements are involved in only one sort operation, the time consumption is bounded by O(n * log n) in this case.

The worst case is that the rank of s lies close to 0 or n. In that case, only a few elements are split off. If this happens again and again, we get a recursion tree of depth Omega(n), and the total running time becomes Theta(n^2).

Quick Sort Time Consumption

What happens really? Clearly we can not hope to choose medians every time. But, 50% of the selected s are good, in the sense that n/4 < rank(x) < 3/4 n. This probability is independent of n. Every time we are hitting a good splitter, the problem size is reduced by at least a factor 3 / 4. So, the sequence of recursions in which an element x is involved certainly terminates after log_{4/3} n good splitters have been selected from the set to which it belongs. Even the splitters which are not good give some reduction of the problem size, so the expected number of recursive levels in which x is involved is bounded by 2 * log_{4/3} n. Thus, the expected time spent on comparing and rearranging x is O(log n). Because of the so-called "linearity of expectation", the expected time for the whole algorithm equals the sum of the expected times spent for each element. Thus, the expected running time of the quick-sort algorithm is bounded by O(n * log n). A more careful analysis shows that the probability to need substantially more than the expected time is very small.

Practical Remarks

A great improvement, is not to just select any element as a splitter, but to invest slightly more here, with the (reasonable) hope that then the splitter gives more even splitting, reducing the number of rounds by a constant factor (say from 2 * log n to 1.1 * log n). The easiest, and sufficiently effective idea, is to do the following:
  1. Randomly and uniformly select k presplitters.
  2. Somehow (for example, by sorting them) find the median s of the presplitters.
  3. Use s as splitter in the quick-sort algorithm.

In practice the most common strategy of this kind is to take "middle of 3" or "middle of 5". It is a good idea to adapt the value of k to n: the larger n, the more one can invest on selecting a good splitter.

Even more so than for merge sort, it is better to no perform the recursion until we have reached a subproblem of size 1: for very small subsets it is rather likely that the division is very uneven. It is much better to apply bubble-sort for all sets with size smaller than some constant m. The optimal choice for m depends on the details of the hardware, but it will not be large. 10 might be reasonable. One can also switch to merge sort: because merge sort is efficient itself, this may be done already for quite large subproblems.

In-Situ Sorting

Why should one perform quick sort, if there is also merge sort? One of the reasons is that quick sort is even easier to program for all values of n. Another reason is that it is easier to make quick sort in-situ. A procedure is called in-situ, if for a problem of size n, it requires only n + o(n) space. An in-situ routine is more likely to run in cache and therefore may be expected to be faster. If the total available memory is finite, then an in-situ routine can be used to solve larger problems.

In the case of quick sort a simple modification makes the algorithm in-situ and even saves on the number of elements that are copied. The idea is not to use the additional arrays a_smaller[], a_equal[] and a_larger[], but to swap the elements in the array a[] itself. This can be realized by starting from both sides of the array, searching for elements which stand at the wrong side. If all elements have different values, the splitting may be implemented as shown in the following sorting routine:

  void sort(int[] a, int l, int h, Random random)
  // sort h - l + 1 elements standing in positions l, ..., h of a[]
  {
    if (h > l) // At least two elements
    {
      int low = l; 
      int hgh = h;
      int s   = a[random(l, h)]; // choose random element

      // Splitting
      while (low < hgh) 
      {
        while (low <= h && a[low] <= s)
          low++;
        while (hgh >= l && a[hgh] >  s)
          hgh--;
        if (low < hgh) // swap a[low] and a[hgh]
        {
          int x = a[low]; a[low] = a[hgh]; a[hgh] = x; 
          low++;
          hgh--;
        } 
      }

      // Recursion
      sort(a, l, hgh, random);
      sort(a, low, h, random);
    }

The additional tests, low <= h and hgh >= l, are necessary, otherwise we might run out of the interval if s happens to be the smallest or largest value in this subarray.

After the splitting, low > hgh and the following holds:

This can be proven formally by showing that these properties hold as invariants at any time during the algorithm. Using an invariant is nothing but a special case of induction over the number of performed steps. Initially the properties hold, because for low == l and hgh == h the claims are void. In any pass of the main while-loop, low passes only elements which are smaller than or equal to s, hgh only elements which are larger than s. The increase of low and the increase of hgh stop just in time. After swapping a small element has moved to the left and a large element to the right, so performing low++ and hgh-- does not violate the invariants. Because finally low > hgh, all elements are classified and we can continue recursively.

In-situ Quick Sort

Same Elements

If there are elements with the same keys, then the above splitting is still correct. The problem is that elements with the same value are not taken out of the set. Thus, we will never reach a subset of size 1 and the algorithm will not terminate. There are several solutions:
  1. Make elements with the same key different by attaching the original index as a secondary sorting key. This costs extra time and memory.
  2. Treat elements with key equal to s as if they are larger in the loop in which low is increased, and if they are smaller in the loop in which hgh is decreased. Doing this, these elements are unnecessarily swapped, but the main point is that they are not all going to end up at the same side. Click here to see this solution integrated in a working Java program.
  3. Allocate the keys with value s with probability 1/2 to either of the two subsets. This is very simple and works fine. However, if there are many elements with the same key, this idea requires many random bits which are time consuming to produce.
  4. The most elegant is to single out the subset of elements with key equal to s. That is, instead of a two-division, we achieve a three-division again. If this is done the algorithm becomes faster the more elements have the same value. If there are only k different values in total, the algorithm is guaranteed not to perform more than k rounds, thus certainly terminating in O(k * n) time.

As we have seen, the fourth solution is easy to implement if we use one array for each subset to create, but this is a waste. Even now we would like to perform the sorting in-situ. This is possible, though not as easily as before. It is doubtful whether in practice this is better than the second solution. The idea is to maintain 5 regions instead of 3. There are 4 variables, eql, low, hgh, eqh. At all times eql <= low <= hgh <= eqh. The following properties are maintained invariant:

For eql == low == 0, hgh == eqh == n - 1, the invariants trivially hold, so these are good values to start with. If we manage to increase low and decrease hgh until low = hgh + 1, all elements are classified. Then it remains to rearrange them so that the elements with value equal to s appear in the middle of the array. This is not hard to arrange.

As long as low < hgh, out of a situation in which the four invariants hold, a new such situation with larger low and/or smaller hgh can be established as follows:

  while (low <= h && a[low] <= s)
  {
    if (a[low] == s)
    {
      int x = a[low]; a[low] = a[eql]; a[eql] = x; 
      eql++;
    }
    low++;
  }
  while (hgh >= l && a[hgh] >=  s)
  {
    if (a[hgh] == s)
    {
      int x = a[hgh]; a[hgh] = a[eqr]; a[eqlr = x; 
      eqr--;
    }
    hgh--;
  }
  if (low < hgh) 
  {
    int x = a[low]; a[low] = a[hgh]; a[hgh] = x; 
    low++;
    hgh--;
  }

In-situ Quick Sort with Equal Elements

Lower Bound

Sorting for Small n

In practice sorting is a very fast operation. It is not linear time, but the constants are really good, and therefore, for all reasonable n, it will be faster than elaborate operations with smaller complexity (not union-find, this is also very simple). On a modern computer it is no problem to sort one million ints in less than a second.

Nevertheless, one might wonder whether O(n * log n) is optimal. For n = 2, one can sort with one comparison. For n = 3, one can find the smallest element with two comparisons, and one more comparison sorts the other two. For n = 4 it already becomes a nice puzzle to just use 5 comparisons. A convenient notation helps: sorting algorithms can concisely be presented by a set of arrows as in the following:

      a   b           a   b   c            a   b   c   d 
  1    <->             <->                  <->
  2                    <----->                      <->
  3                        <->              <----->
  4                                             <----->
  5                                             <->
The above shows how to sort 2, 3, 4 numbers with 1, 3, 5 comparisons. Every x <--> y denotes the two elements to compare. If x > x, then the elements are swapped. So, for any input sequence, at the bottom we should find the elements in sorted order, the smallest on the left. The notation with arrows can easily be converted into an algorithm:
  void swap(int[] a, int i, int j) {
    int z = a[i]; a[i] = a[j]; a[j] = z; }

  void sortTwo(int[] a) {
    if (a[0] > a[1]) swap(a, 0, 1); }

  void sortThree(int[] a) {
    if (a[0] > a[1]) swap(a, 0, 1);
    if (a[0] > a[2]) swap(a, 0, 2);
    if (a[1] > a[2]) swap(a, 1, 2); }

  void sortFour(int[] a) {
    if (a[0] > a[1]) swap(a, 0, 1);
    if (a[2] > a[3]) swap(a, 2, 3);
    if (a[0] > a[2]) swap(a, 0, 2);
    if (a[1] > a[3]) swap(a, 1, 3);
    if (a[1] > a[2]) swap(a, 1, 2); }

sortFour() is correct because a[0] holds the minimum after Step 3, because a[3] holds the maximum after Step 4, and the two middle elements are sorted in Step 5.

Decision Trees

For larger values of k, it is not automatic that one can construct such oblivious schedules, in which the choice of the performed comparisons does not depend on the outcome of previous comparisons. A really hard puzzle is to find a schedule with 7 comparisons for n = 5. Here it appears that in any schedule with just 7 comparisons, the choice of the later comparisons must depend on the outcome of the previous ones. But, anyhow, it can be done. So, we need 1, 3, 5, 7 comparisons for n = 2, 3, 4, 5. This looks like T(n) = 2 * n - 3. We did not succeed finding a schedule with 9 comparisons for n = 6. Should we try harder? If one studies questions of this kind it is a good idea to start from both directions: one should try (maybe with the help of a computer) to find better schedules, at the same time one should be aware that possibly the presumed theorem is not true, and therefore one should also try to proof a lower bound.

In this case the second approach is more successful: we prove that sorting requires Omega(n * log n) comparisons. For n numbers, there are n! arrangements. Our sorting algorithm must somehow decide which of them corresponds to the current input. We assume that the algorithm is only making comparisons, and is not performing operations on the values. By making a comparison, we can exclude some of the possible arrangements: comparing x = a[i] with y = a[j] either excludes all arrangements with x < y or all arrangements with x > y. So, every comparison gives a certain reduction of the number of possible arrangements. Only once we have reduced the number of possible arrangements to a single one, we may stop and output this single remaining arrangement as the sorted order.

Drawing the set of all possible arrangements at the top and then indicating how, by making comparisons, each of the n! possibilities is singled out gives a tree: In the root represents the set of cardinality n!, the n! leaves the sets of cardinality 1, and the branching in each node is given by the comparisons. This tree is called a decision tree.

Decision-Tree for Sorting Four Numbers

Just like the decision tree, a sorting algorithm is a recipe on how to perform comparisons until only one possible arrangement remains. Actually there is a one-one correspondence between them. Algorithms can readily be turned into trees (though it would be no fun to actually do this for n > 5) and trees can be turned into algorithms with one if-statement for every internal node of the tree. For example, corresponding to the above decision tree we have the following algorithm:

  void sortFour(int A, int B, int C, int D) 
  {
    if (A < B)
      if (C < D)
        if (A < C)
          if (B < C)
            printf("The order is ABCD\n");
          else
            if (B < D)
              printf("The order is ACBD\n");
            else
              printf("The order is ACDB\n");
        else 
          ...
      else
        if (A < D)
          ...
        else
          ...
    else
      if (C < D)
        if (B < C)
          ...
        else
          ...
      else
        if (B < D)
          ...
        else
          ...

Time Complexity

The time complexity of an algorithm is the time the algorithm takes on the hardest input. So, this is the maximum running time over all inputs. The decision tree gives a very explicit way of checking this: all inputs are considered and the maximum time, which is equated with the number of comparisons, corresponds to the depth of the tree. So, the above algorithm for sorting four numbers has a time complexity of 5 comparisons, even though for one third of the inputs the sorting is finished with 4 comparisons. The time complexity of a problem is the minimum complexity of all possible algorithms. So, this is the minimum of a set of maxima. Any algorithm provides an upper-bound on the complexity of a problem: the above algorithm shows that the complexity of sorting four numbers is at most 5 comparisons. Proving upper-bounds is therefore a rather concrete task. On the other hand, proving a lower-bound of x time units for a problem, requires that one proves that there is no algorithm solving the problem in fewer than x time units. In general, this is much harder than proving upper-bounds, because the set of all algorithms is much harder to overlook than the set of all inputs.

In the case of sorting we are quite happy: the one-one correspondence between comparison-based sorting algorithms and decision trees, makes it much easier to abstract from the details of the algorithms. In the case of sorting it suffices to prove that there is no decision tree of depth smaller than x. Because we are working in a world of binary comparisons, the tree is binary (in the worst case that all keys are different). A decision tree for sorting n elements must have at least n! leaves. We know that a binary tree of depth k has at most 2^k trees. So, for t_n, the minimum number of comparisons for sorting n numbers, we have the following condition

t_n >= round_up(log_2(n!)).

For small values of n, we can use this formula to find t_2 = 1, t_3 = 3, t_4 = 5, t_5 = 7, t_6 = 10. More in general we get:

Theorem: Sorting n numbers requires Omega(n * log n) comparisons.

Proof: n! = Prod_{i = 1}^n i > Prod_{i = n/2}^n i > (n/2)^{n/2}. Thus. log_2 n! > n/2 * (log_2 n - 1) = Omega(n * log n). Stirlings formula can be used to obtain a more accurate estimate: n! ~= (n/e)^n, thus log_2 n! ~= n * log_2(n / e) = n * (log_2 n - log_2 e) ~= n * (log_2 n - 1.44). End.

So, except for constant factors we now know how many comparisons are needed for sorting n numbers. At first one may believe that in principle one does not need more than round_up(log_2(n!)) comparisons, because a decision tree can always be constructed as follows: at any node choose the comparison that divides the set of remaining possibilities as evenly as possible in two. For all small values of n, this idea leads to trees with the minimal depth of round_up(log_2(n!)). However, for n = 12 this does not work: up(log_2 n!) = 29, and it has been shown that 30 comparisons are necessary (and sufficient) for sorting 12 numbers.

Bucket Sort and Variants

Bucket Sort

Not withstanding the lower bound of Omega(n * log n), it is sometimes possible to sort in O(n). This is efficient and simple. We are talking about the case of sorting integers with values in {0, ..., M - 1} (of course this may be shifted by a constant).

First we consider the special case M == 2. So, all values in a[] are 0 or 1. This is an important special case: for example, the zeroes can stand for personal records of men, the ones for records of women. How should we sort? Quite easy: create two buckets of sufficient size, throw all zeroes in one bucket and all ones in the other. Finally copy all elements back to a[], first the zeroes, then the ones:

  void sort(int[] a, int n) 
  // All values in a[] are 0 or 1.
  {
    int[] b0 = new int[n];
    int[] b1 = new int[n];
    int c0 = 0, c1 = 0;

    // Throw ones and zeroes in different buckets
    for (int i = 0; i < n; i++)
      if (a[i] == 0)
      {
        b0[c0] = a[i];
        c0++;
      }
      else
      {
        b1[c1] = a[i];
        c1++;
      }

    // Write all elements back to a[]
    for (int i = 0; i < c0; i++)
      a[i]      = b0[i];
    for (int i = 0; i < c1; i++)
      a[i + c0] = b1[i];
  }

Click here to see this piece of code integrated in a working Java program. This is certainly a very fast sorting routine. For n = 10^6 it is 10 times faster then quick sort!

In the above example and in the following ones, there is a possibly confusing double usage of a[i]. In a comparison like "a[i] == 0", a[i] denotes the key value. In an assignment like "b0[c0] = a[i]", it stands for the object. Only because we are here sorting integers these are the same. In this case the sorting can be simplified even further: it satisfies to count the number c0 of zeroes in a[], and then we can set the first c0 positions to 0 and the remaining ones to 1. But this is cheating: for the sake of simplicity we are considering here how to sort integers, but always we have applications in mind in which these are keys from larger objects.

Suppose now that M is a small constant. How do we extend the above idea? Still simple: create M buckets and throw the elements with value i in bucket i; finally write them back to a[] bucket by bucket. Now it becomes handy to use the key values as indices instead of using a chain of if-statements. So, this is an application of direct addressing.

  void sort(int[] a, int n, int M) 
  // All values in a[] are in {0, ..., M - 1}
  {
    int[][] b = new int[M][n];
    int[]   c = new int[M];
    for (int j = 0; j < M; j++)
      c[j] = 0;

    // Throw elements in buckets according to values
    for (int i = 0; i < n; i++)
    {
      int j = a[i]; // key used as index
      b[j][c[j]] = a[i];
      c[j]++;
    }

    // Write all elements back to a[]
    int s = 0;
    for (int j = 0; j < M; j++)
    {
      for (int i = 0; i < c[j]; i++)
        a[i + s] = b[j][i];
      s += c[j];
    }
  }

Click here to see this piece of code integrated in a working Java program. For M == 2, the performance is as good as before, but for larger M it deteriorates.

What is worse, is that the algorithm needs n + M + M * n memory. This is acceptable only for very small M. An algorithm is said to be efficient if its consumption of a resource (time or memory) exceeds that of the best solution known by at most a constant factor. In this formal sense the above algorithm is not memory-efficient for M = omega(1), that is, for non-constant M.

The problem of excessive memory usage can be overcome at the expense of some more operations. If we first count the number n_j of elements with value j, 0 <= j < M, then b[j][] can be defined to be of length n_j. Because sum_j n_j = n, this is memory-efficient even for larger M. It is even more efficient to arrange all these buckets after each other in a single array b[]. This leads to the following efficient implementation:

  void sort(int[] a, int n, int M) 
  {
    int b[] = new int[n];
    int c[] = new int[M];

    // Count the number of occurrencies of each value
    for (int i = 0; i < M; i++) 
      c[i] = 0;
    for (int i = 0; i < n; i++)
      c[a[i]]++;

    // Determine the starting points of the buckets
    int sum = 0;
    for (int i = 1; i < M; i++)
    {
      int dum = c[i];
      c[i] = sum;
      sum += dum;
    }

    // Throw elements in buckets according to values
    for (int i = 0; i < n; i++)
    {
      b[c[a[i]]] = a[i]; 
      c[a[i]]++;
    }

    // Write all elements back to a[]
    for (int i = 0; i < n; i++)
      a[i] = b[i];
  }

Click here to see this piece of code integrated in a working Java program. For M == 2, the performance is even somewhat better than the above versions: saving memory often goes hand-in-hand with saving time. In C we can save the final copying from a[] to b[] by passing a[] by reference and instead of the loop making the assignment *a = b.

The above sorting algorithm is known as bucket sort. Clearly its time and memory consumption is bounded by O(n + M). For all M = O(n) this is memory-efficient. Not a single comparison between elements of a[] is made. Bucket sort does not contradict the proven lower bound, because we very explicitly use the integer-nature of the keys: using direct addressing gives us something like the power of a n-way comparison in constant time.

A disadvantage of bucket sort is that there are quite a lot of operations. More costly is that the memory access in the loop in which the elements are allocated to their buckets is very unstructured. As long as the cache is large enough to hold one cache line for each bucket, this is no big deal, but for large M this causes n cache faults. Therefore, for large M, bucket sort is hardly faster than an optimized version of quick sort.

Bucket Sort

Radix Sort

Consider the problem of sorting n numbers in the range 0, ..., M - 1, for some M = m^2. Applying bucket sort is inefficient if M > n. However, a two-stage rocket works well. All numbers are written m-ary, that is, a number z is interpreted as a pair of keys (x, y), with x = z / m and y = z % m, the unique numbers so that z = x * m + y and 0 <= x, y < m. These numbers x and y are here called the digits of z. The numbers are sorted in two phases:
  1. Perform a bucket sort on the second digit.
  2. Perform a bucket sort on the first digit.
Notice that we first sort on the least significant digit! This sorting technique is called radix sort. Click here to see the details of this idea in a working Java program. The complexity of the algorithm immediately follows from that of bucket sort: it takes 2 * O(n + sqrt(M)) = O(n + sqrt(M)) time and uses O(n + sqrt(M)) memory.

The correctness is less obvious. It essentially requires that the sorting in phase 2 preserves the order of elements with the same first digit. More generally, a sorting algorithm which does not change the relative order of elements with the same key is called stable. The given implementation of bucket sort is stable and is therefore suitable as sorting method in phase 2 of radix sort. However, some optimized bucket sort versions are not stable and could not be used. This is not an exceptional situation. Also merge sort and quick sort can be implemented in a stable way, but there are also non-stable implementations. For example, the given in-situ implementation of quick sort is not stable.

Lemma: Radix sort is correct.

Proof A sorting algorithm is correct if and only if for an arbitrary input and an arbitrary pair of numbers z and z' with z < z' the number z are correctly arranged in the array after the sorting. That is, if z stands in position i and z' in position i', we must have i < i'. Let z = x * m + y and z' = x' * m + y'. z < z' implies that (x, y) < (x', y') in the lexicographical ordering. There are two cases to distinguish:

In the first case, the (assumed) correctness of the sorting on the first digit in phase 2 assures that i < i'. In the second case, the (assumed) correctness of the sorting on the second digit in phase 1 assures that j < j'. Here j and j' denote the positions of z and z', respectively, at the end of phase 1. Because of the stability of the sorting in phase 2, this implies that i < i'. End.

In general, the radix-sort idea can be used to sort numbers in the range 0, ..., M - 1 for M = m^k with k applications of bucket sort, using m buckets in every round, starting with the least significant digits. This requires O(k * (n + m)) time and O(n + m) memory. An important special case is that M = n^k, for some constant k. Applying a k-level radix sort with m = n solves the sorting in O(n) time using O(n) memory. This means that sorting can be performed in linear time as long as M is polynomial in n. Theoretically this may be interesting, but practically radix sort is not better than quick sort for M > n^2.

The main disadvantage of bucket sort is that for large M the loop in which the elements are allocated to their buckets causes n cache misses. For the extremely important special case that M == n, this is quite unfortunate. However, if the main memory can accommodate n integers, then on any reasonable system the cache can accommodate sqrt(n) cache lines. So, when applying a two-level radix sort for the special case that M == n, we may assume that m = sqrt(M) cache lines can be accommodated, which implies that the handing-out operations have reasonable cache behavior. In addition this idea reduces the memory consumption from 3 * n to 2 * n + sqrt(n). Comparing the performance of the given bucket sort program with that of the radix sort program, shows that for large n a two-level radix sort is several times faster than bucket sort, clearly showing that the sheer number of performed operations is no longer what determines the time consumption. This also becomes clear when comparing bucket sort and quick sort. For large n, due to its O(n * log n) time consumption, quick sort performs considerably more operations. Nevertheless, it is about equally fast as bucket sort.

Radix Sort

Lexicographical Sorting

A similar sorting technique is lexicographical sorting. Suppose we have a set of keys which are ordered according to some lexicographical ordering. Then it appears natural to first sort them on the first key, then on the second and so on. Such a sorting approach is called lexicographical sorting.

By a subkey we mean one of the keys constituting the entire lexicographical key. If the subkeys lie in a finite range, it may appear natural to use bucket sort again and again. For example, when sorting a set of words, one may start by throwing words in 26 buckets according to their first letters, than splitting the buckets by looking at the second letters, etc. However, one should be aware that bucket sort costs O(M) even if there are very few elements to sort. So, as soon as there are fewer than M elements, another sorting algorithm should be used. Otherwise, when sorting words with five levels of bucket sort, we would create 26^5 buckets, most of them not getting a single word.

Radix sort is different: sorting n numbers up to M - 1 for M = m^2, we do not create m^2 buckets at the second level, but only n. At the second (and deeper) level all elements are distributed over one set of buckets again. The advantage is the small number of operations, the disadvantage is that the sorting does not involve ever fewer elements.

Lexicographically Sorting 10 Numbers from {0, 1, ..., 99}

Exercises

  1. Suppose we only know how to compare single-digit numbers: 0 < 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9. Using this, define a procedure isSmaller(int x, int y), which return true if and only if x < y. x and y may be assumed to be non-negative.

  2. Rewrite the bubble-sort procedure so that after round r it has determined the r smallest values.

  3. We consider how bubble sort operates on an array of objects with integer keys which initially is sorted in reversed order. For such an array, in round r the maximum of the remaining elements is transported to position n - r. This is achieved at the expense of many assignments. As assignments we only count the assignments of objects, which are made when swapping two values. How many of these assignments are made in the course of the algorithm? Specify the leading constant.

  4. The presented bubble sort procedure has quadratic running time even if the array a[] is sorted from the beginning. This is unnecessary.
    1. Rewrite the procedure so that it performs at most one round more than necessary.
    2. If one of the k smallest elements is initially located in one of the rightmost k positions of the array, how many rounds does your algorithm require?
    3. If the keys are the numbers from 0 to n - 1 which are allocated according to one of the n! permutations sampled uniformly, how many rounds do you expect to save with the modified algorithm? Hint: use the answer from the preceding question.
    4. Of course the additional testing also costs time. How many operations do you expect to save at the expense of how many extra operations? Conclude which of the implementations will typically be faster.

  5. In the given merge-sort implementations, three tests are performed for each element to merge:
          while (i < m && j < h)
            if (a[i] <= a[j])
              b[k++] = a[i++];
            else
              b[k++] = a[j++];
        
    The comparison of a[i] and a[j] is essential, but we do not really need two tests in the condition of the while-loop.
    1. Assume that a[m - 1] and a[h - 1] are the two largest elements in {a[l], ..., a[m - 1], a[m], ..., a[h - 1]}. Rewrite the above loop so that it correctly merges {a[l], ..., a[m - 1]} with {a[m], ..., a[h - 1]} with 2 * (h - l) + O(1) tests .
    2. Assume now an index 0 < j < m is given so that a[j - 1] <= a[m - 1] < a[j]. Rewrite the merge subroutine so that it correctly merges {a[l], ..., a[m - 1]} with {a[m], ..., a[h - 1]} with 2 * (h - l) + O(1) tests.
    3. Rewrite the merge subroutine so that, without any assumptions, it correctly merges {a[l], ..., a[m - 1]} with {a[m], ..., a[h - 1]} with 2 * (h - l) + o(h - l) tests. Hint: take care of equal numbers and the cases that all numbers in one subarray are smaller than all numbers in the other subarray.
    4. Download the program. Measure the time consumptions for sorting n numbers for n = 10^5, 10^6 and 10^7. Replace the merge subroutine by your new one and repeat the tests. Give all measured times in a table and draw a conclusion: are the differences in line with your expectations?

  6. In the section on merge sorting it was mentioned that two-way merging can be extended to three-way merging. Perform the necessary changes to the methods merge(int[] a, int b[], int l, int m, int h) and sort(int[] a, int n).

  7. In the text it was stated that it does not need to bother us that the objects to sort are large, because we are working with pointers any way. Assume now that we are sorting a large number of objects, so that only a small fraction of the information fits into the cache. Assume that a cache line fits 16 integers or pointers. Cache faults are considerably more expensive than instructions on data residing in cache. So, in general it may be a clever idea to perform some more instructions in order to reduce the number of cache faults.
    1. When sorting an array of n integers using merge sort, how many cache faults may be expected in the final round in which two arrays of length n / 2 are merged? The leading constant factor is important, but lower-order terms should be neglected.
    2. When sorting an array a[] of n objects, making comparisons of the form a[i].key > a[j].key, how many cache faults may be expected in the final round of a merge sort?
    3. Present an algorithm for sorting objects which produces considerably fewer cache faults. The running time of this algorithm should not depend on the size of the objects.
    4. Assuming that almost the entire time consumption is due to cache faults, how much faster might the improved algorithm be?

  8. The running time of quick sort is highly dependent on when the recursion is ended. In the example program, the recursion is stopped when h < l + 1. Going on until h == l costs 20% more. The reason is that there are many small subproblems: stopping one level earlier reduces their number by a factor two. Test the efficiency of a sorting routine with the following structure for n = 10^6:
          if (h > l)
            if (h < l + c)
              // Run insertion sort on <= c elements
            else 
              // Run quick     sort on >  c elements
        
    Plot the resulting times for c = 1, 2, ..., 20. What is the optimal choice for c? How much faster is this best variant than the original routine?

  9. Suppose we are performing quick sort, as usual, in a recursive way. Suppose we would not apply an in-situ routine, but rather work with separate arrays for holding the elements in each of the partitions. How much memory might the algorithm need in total? Your answer should be given in the form O(f(n)), for some suitable function n.

  10. Consider the situation that is found after performing an in-situ splitting with equal elements in an application of quick sort. That is, for some value s we have an array a[] of length n with the following distribution: a[i] == s for all i, 0 <= i < eql; a[i] < s for all i, eql <= i < low; a[i] > s for all i, hgh < i <= eqr; a[i] == s for all i, eqr < i <= n - 1. The parameters satisfy 0 <= eql <= low == hgh + 1 and hgh <= eqr <= n - 1. Describe in detail how to rearrange the numbers in a[] so that all elements smaller than s stand on the left, all elements equal to s in the middle and all elements larger than s on the right of the array. Your algorithm should not perform any comparisons between the elements of a[], run in linear time and use only O(1) additional memory.

  11. Consider sorting five numbers in a comparison-based way.
    1. Give a lower bound on the number of comparisons needed. Also give the computation leading to this value.
    2. Give a sorting algorithm. Try to minimize the number of comparisons, the goal is to match the lower bound. The algorithm must not necessarily be given in textual form, a detailed picture may be just as good. Analogous cases do not need to be handled again and again as long as the analogy is made clear.

  12. Consider sorting in a comparison-based way by oblivious and non-oblivious algorithms. In an oblivious algorithm the comparisons made do not depend on the outcome of previous comparisons.
    1. If the decision tree for sorting n numbers is turned into code, how big can this code be?
    2. When turning an oblivious algorithm for sorting n numbers with m comparisons is turned into code, how big can this code be?
    3. Give an oblivious algorithm for sorting five numbers. Try to minimize the number of comparisons. The algorithm may be given in pictorial form.
    4. Let m be the number of comparisons in an oblivious algorithm for sorting n numbers. Give an algorithm, working out in detail only the major ideas, for verifying that m is the minimum needed number of comparisons.
    5. Express the complexity of your verifying algorithm in terms of n and m. Do you think this algorithm could be used to verify that the given algorithm for sorting five numbers is optimal?

  13. Decision trees are a powerful tool for proving lower bounds even for problems other than sorting. Use a decision-tree argument to prove formally that any comparison-based algorithm for testing at which position an element x occurs in an array of length n requires at least n - 1 comparisons when the arrangement in the array is arbitrary and at least round_up(log_2 n) if the array is sorted. How many comparisons are needed in each of the two cases for testing whether an element x occurs or not? Show that the bounds given for the sorted array hold independently of the used data structure. That is, the bounds are lower bounds for the problem find in general.

  14. Decision trees are a powerful tool for proving lower bounds even for problems other than sorting. Use a decision-tree argument to prove a lower bound on the number of comparisons for merging sorted arrays with lengths n_1 and n_2, respectively.

  15. The topic of this question is in-situ bucket sorting. The numbers are stored in an array a[] of length n, their values are elements of {0, ..., M - 1}.
    1. Give an in-situ algorithm for the case M == 2.
    2. Give an in-situ algorithm for the case M == 3.
    3. Give a sorting algorithm with running time O(n + M) and memory usage of at most n + 2 * M + O(1).
    4. Under what condition on M can we call this an in-situ sorting algorithm? Hint: refer to the definition of "in-situ".
    5. Compare your algorithm with the conventional bucket-sort algorithm. Will it be slower or faster? Under which conditions would you prefer the new algorithm?

  16. A splitting round of quick sort can be interpreted as a special case of bucket sort.
    1. Explain in what sense exactly.
    2. Knowing this, derive an alternative algorithm for in-situ quick sort for the case that the elements with value equal to the splitter are directly assigned to an own subset.
    3. With this interpretation in mind it is easy to give a modified quick-sort-like sorting algorithm in which in each round the set is split in 3 subsets. Describe how.
    4. Further generalize the idea to split the sets in P subsets in each round. On a set of n elements, the splitting should run in O(n * log P).

  17. Priority queues were introduced in the chapter on search trees, where a basic implementation was given, and constitute the topic of a following chapter. Let T_b(n) denote the time for building a certain priority queue of size n. Let T_i(n) and T_m(n) denote the time for the insert and delete min operation, respectively. Describe how a priority queue can be used for sorting. Express the complexity of your algorithm in T_b, T_i and T_d. Formulate a lower-bound expression involving T_b(n) and T_d(n). What does this imply for the special case T_b(n) = O(n); and what does it imply for the special case T_d(n) = O(1)?

  18. One of the main applications of sorting is to collect all elements of a set which have the same key. More formally, the task is to return the input array a[] of length n rearranged so that a[i] != a[i - 1] implies a[j] != a[i] for all j < i and that a[i] != a[i + 1] implies a[j] != a[i] for all j > i. This problem will be called collecting. Sorting the array solves the problem, but possibly collecting is easier. Describe how to efficiently solve the collecting problem using hashing with open addressing. Provided that a good hash function is available, how fast does your algorithm run? Hint: it may be helpful to first determine all occurring keys and count their frequences.

  19. We consider sorting a set S of n numbers with m different key values. m is supposed to be small, but the keys can be arbitrarily large.
    1. How much time does an appropriate version of quick sort need for sorting S?
    2. Describe how to sort S in O(n) time if it is known that m == 3 (but the values of the keys are not known).
    3. Describe how, to sort S in O(n * log m) time. Hint: first determine all occurring keys.
    4. Describe how, under a reasonable assumption, to sort S in O(m * log m + n) time. Mention the assumption you need.

  20. Consider sorting n numbers stored in an array a[]. Assume it is given that 0 <= a[i] < M, for all 0 <= i < n, for some M = n^k. A k-phase radix sort is a natural choice.
    1. Give the complete algorithm, including the modified bucket-sort algorithm you are using as a subroutine.
    2. Which property of the bucket-sort algorithm is crucial for the radix sort which is build on top of it?
    3. Prove the correctness of the k-phase radix-sort algorithm, assuming the correctness of the bucket-sort algorithm. Point out where the mentioned property is used.

  21. Consider sorting n numbers stored in an array a[]. Assume a[i] >= 0, for all 0 <= i < n. Consider the following algorithm: first determine the smallest value M so that a[i] < M, for all 0 <= i < n; then create n buckets and throw all numbers with value x with i * M / n <= x < (i + 1) * M / n into bucket i; then sort the numbers in each bucket using a suitable sorting method.
    1. Work out the algorithm in more detail, giving it in code-like form. In the last step you must name the sorting routine to be used, but you do not need to give its code.
    2. The given algorithm can be considered to be a variant of one of the algorithms presented in the text. Which one?
    3. Give upper bounds on the time and memory consumption, assuming a good choice of the sorting subroutine in the last step.
    4. In the remainder of this question we assume that the numbers are uniformly distributed within the interval 0, ..., M - 1. Let P_c, c >= 0, denote the probability that there are c numbers in a bucket. Give expressions for P_0, ..., P_4 and estimates for P_c in general. Hint: you may use that (n over k) <= (n / k)^k, for all k > 0.
    5. The expected time to sort the numbers in a bucket is given by Exp[T] = sum_{c >= 0} P_c * T_sort(c). Using the above estimates, give and estimate of Exp[T].
    6. There are n buckets. The expected time to sort the numbers in them is the same for all buckets. In such a case, the expected time to sort the numbers in all buckets equals n times the expected time to sort the numbers in a single bucket. Use this to obtain an estimate of the expected running time of the entire algorithm.


This page was created by Jop Sibeyn.
Last update Monday, 07 March 05 - 12:35.
For any comments: send an email.

Priority Queues

Definition

There are many real live applications where one can imagine that customers / patients / jobs / tasks get higher priority depending on their urgency / the amount they are willing to pay for the service / the amount of service they require.

A priority queue ADT is a set on which at least the following two operations are supported:

These two are the defining properties: it is a queue (which implies enqueues and dequeues) but not a usual one because the objects in the queue have priorities. These priorities allow much more flexibility. By giving the objects appropriate keys a priority queue can both be used as a stack and as a queue. In the following presentation we will focus on these two operations, though occasionally we will consider others. It is very common to assume that in addition to insert and deletemin a priority queue also supports Further operations that may be considered are Of course in many cases, a higher priority means more urgent. In that case we should have a deletemax instead of a deletemin operation.

Basic Realizations

It is no problem to realize the operations of a priority queue with insert in O(1) and deletemin in O(n): use an array, write the added elements at the end, and search for the minimum by traversing the array. For certain applications (if it is not the time determining component of the program) this may be ok because of its extreme simplicity.

It is also possible to maintain the array sorted. In that case the insertions are expensive, and the deletemins are O(1). The advantage of this is that it allows to implement a findmin in O(1).

A much better idea is to use a balanced search tree. In that case, deletemin can be performed in logarithmic time, the same as required for inserts (and all other operations). Maintaining an extra pointer to the smallest element (using 2-3 trees this is always the leftmost leaf), the findmin operation can even be performed in O(1) time.

Heaps

Definition and Operations

Now we are introducing one more fundamental data structure: the heap. A heap has as underlying structure a tree. So, it looks similar to a search tree. However, the defining property is different, and this makes that it has different properties and different usages.
A heap is a tree for which any node v has a key that is smaller (or equal) than the keys of all its children (if any).
The above property will be referred to as the heap property. It clearly implies that the smallest key must stand in the root of the tree, and that the second smallest element is one of its children. Thus, findmin can be done in O(1): just return the key of the root.

Heap-Ordered Tree

A deletemin is slightly harder. It is not hard to remove the root, but we should write another element instead of it, so that afterwards the tree is again a heap. But, this is not too hard either. If the root r of a heap is deleted, it may be replaced by the smallest of its children. In this way the heap property is preserved at the level of the root. Recursively deleting the roots of the heaps at the lower levels gives a correct deletion. When reaching a leaf node, the recursion stops, deleting the whole node. This process can be viewed as a free place, a hole, moving downwards until exiting the tree. Therefore, this is also called deletion by performing a percolate down. In pseudo-code the deletemin looks as follows:

  void percolateDown(Node v) 
  {
    if (v has children) // v is not a leaf
    {
      determine the child w of v with the smallest key;
      v.key = w.key; // maybe even other data to copy
      percolateDown(w); 
    } 
    else
      remove v;
  }

  int deleteMin(Node r) 
  {
    int x = r.key;
    percolateDown(r);
    return x; 
  }

An insert is similar. The new node v can in principle be attached to any node w. If the key of v is smaller than the key of w, then the heap property is restored by exchanging v and w, but possibly v may have to bubble up even further. At a certain level, possibly at the root, v will have found a place in which it is no longer violating the heap property and we are done. This operation in which the inserted element is bubbling upwards through the tree is most commonly called a percolate up.

  void percolateUp(Node w) 
  {
    if (w has a parent v) // w is not the root
      if (v.key > w.key) 
      {
        int x = v.key; v.key = w.key; w.key = x;
        percolateUp(v); 
      }
  }

  int insert(int x) 
  {
    create a new node w;
    w.key = x;
    attach w to an appropriate node v;
    percolateUp(w); 
  }

Percolate up and down are symmetric, but there are also important differences: when percolating up, the key value needs to be compared with only one other value at each level. In a percolate down it must to be compared with the minimum of the children which must be determined first. So, the cost of an insert is O(depth_of_tree), while the cost of a deletemin is O(depth_of_tree * degree_of_nodes). Furthermore, while percolating up, we move along the unique path leading towards the root. When percolating down, it is not a priori known which path will be followed. Another important difference is that a deletemin always goes the whole way until reaching a leaf, while an insert stops as soon as the new key has reached a level where it does not conflict with the key of its parent.

Binary Heaps

In the literature it is sometimes assumed that the tree is binary and perfectly balanced, however, the structure of the tree has no implications for the way the operations are done. One should not think that it is part of the definition of a heap that it is realized as a balanced binary tree. The balanced-tree property is only needed for efficiency reasons: otherwise the tree might degenerate into a structure that resembles a path with depth close to n. Because the time consumption of insert and deletemin is proportional to the depth of the tree, this is highly undesirable.

From the above, we have seen that one has a lot of freedom for doing things. This will be exploited to come with a very simple and very efficient implementation. The tree will always be kept perfectly balanced: that is, it will always be a binary tree with all levels completely filled, except for possibly the lowest.

Perfect Binary Trees with 1, ..., 10 Nodes

This means, that if we are adding a node, performing insert, we must insert it at the next free place in the lowest level. If the last level is just full, we must create a new level, inserting the element as the left child of the leftmost node in the bottom level.

A deletemin cannot be performed as before, because then we cannot control where the hole is going. Therefore, we are modifying the routine. The last position in the bottom level is freed, possibly cancelling the whole level. The key v of this node is temptatively placed in the root, and then it percolates down by exchanges with the smaller child. The whole deletemin now looks like

  void percolateDown(Node v) 
  {
    if (v has children) // v is not a leaf
    {
      determine the child w of v with the smallest key;
      if (w.key < v.key)
      {
        int x = v.key; v.key = w.key; w.key = x;
        percolateDown(w); 
      }
    } 
  }

  int deleteMin(Node r) 
  {
    int x = r.key;
    let v be the rightmost node at the bottom level of the tree;
    r.key = v.key; // maybe even other data to copy
    remove v;
    percolateDown(r);
    return x; 
  }

Heap Operations

Lemma: The deletemin procedure is correct: it removes the entry with minimum key value; preserves the heap property and returns the minimum key value.

Proof: Because we may assume the heap property was given before the u operation, the root is the entry with minimum key. This value is overwritten and returned. It remains to check that the heap property is preserved. A crucial observation is that the it might only be disturbed along the processed path. A formal correctness proof goes by some kind of induction. The assumption is that at any time, the current node v is the only node of the tree in which the heap property may possibly be violated. Once this is proven, the correctness follows, because when the process terminates, the heap property is assured in v because either v.key <= w.key, w being the node with minimum key value among the children of v, or v is a leaf, for which the heap condition is void. Initially the tree is unchanged except for the root, the node at which percolateDown starts. So, assume the hypothesis holds at the beginning of some step. Then if v.key < w.key, the values of v and w are exchanged. After this, key.v < key.w and because w was the node with minimum key among the children of v, it is even true that after swapping key.v <= key.w' for all other children of v. So, the heap property has been established in v. The only node whose key has changed is w, the node which is considered in the next round. End.

Using a perfect binary tree, a heap with n entries has depth round_down(log n), so both operations can be performed in O(log n) time. In a really efficient implementation we do not perform exchanges but keep the element for which the position in the heap still has to be determined in an additional memory position and shift the elements on the path simply one level up or down. Doing this, the number of assignments is reduced from 3 * length_of_path to 1 * length_of_path + 2.

Another observation that is essential for very efficient implementations in practice is that a perfect binary tree can very well be maintained in an array, avoiding all pointers. The idea is to number the nodes level by level from left to right, starting with the root which gets number 0. In that case, for any node with index i, the leftchild has index 2 * i + 1 and the rightchild has index 2 * i + 2. This allows to access the children of a node by a simple computation, which requires two clock cycles (maybe even one because often additions and multiplications can be executed in parallel), which is certainly not more than the cost for fetching the address of leftchild. At the same time it gives a considerable reduction of the memory consumption, saving n pointers. If we start with index 1 for the root, then the left child of node i has index 2 * i and the right child 2 * i + 1, saving half of the additions. This indexing idea even works for d-ary heaps which are based on perfect d-ary trees. A perfect tree which is maintained in an implicit way in an array without any pointers is called an implicit tree.

Node Numbering in Implicit Trees

Expected Insertion Time

For the implementation with a perfect binary tree, the time for insert and deletemin is bounded by O(log n). This is an upper bound of the time consumption. There are many problems though, for which the upper bound is unnecessarily pessimistic, the observed behavior in practice being better. How about our operations?

When performing a deletemin, the element that is put tentatively in the root will typically have a rather large key because before it was a leaf. This is not always so: it is possible that many nodes with small keys stand deep in the tree, but in general this will not be the case. Therefore, typically this element will be percolated down rather far before the process stops: even in practice the cost of a deletemin is proportional to log n.

The situation for insertions is different: better. In practice it turns out that insertions go very fast. Much faster than O(log n). The reason is simple, a precise analysis is quite hard. Of course an analysis requires an assumption: practice cannot be analyzed. So, we assume that the keys are sampled uniformly from an interval, say they are reals in [0, 1]. Let us try to estimate the expected number of calls to the routine percolateUp under this assumption.

Consider the case that we are only performing inserts and no deletemin operations. Randomly and uniformly select a key. It is essential that the previous nodes were sampled from the same probability distribution. The node is only moving up k levels, if it has the smallest key in its subtree of depth k - 1. The lowest level of this tree may have been empty except for the node itself, but all the other levels are full. So, only if the node is the smallest among 2^k or more nodes (also counting the node itself) it is moving up k levels. This means that the expected distance it is moving upwards can be estimated as follows: the probability that the node is moving up exactly k levels is at most 1 / 2^k for all k. Denoting the upwards movement by the random variable X, we get

Exp[X] <= sum_{k > 0} k / 2^k = 1 * 1/2 + 2 * 1/4 + 3 * 1/8 + ... = 2.
Here Exp[X] denotes the expected value of X, which is defined by
Exp[X] = sum_{all possible values i of X} i * Prob[X == i].

The above analysis is not entirely correct: the keys in the lowest levels of the heap are not entirely uniformly distributed. The fact that they are standing there implies that they are somewhat larger than average. However, this dependency is very weak (because there are so few elements at the higher levels of the tree), and the analysis is correct up to a small correction. The computed constant is not that important. The important point to remember is that the expected time for an insert is O(1).

Buildheap

How long will it take to build a heap consisting of n nodes? Doing this by performing n inserts, may lead to a worst-case time of O(n * log n). This happens if the elements are inserted with decreasing key values: in that case the i-th element has to percolate-up about log i positions, so the total time is
sum_{i = 1}^n log i > sum_{i = n / 2}^n log i > sum_{i = n / 2}^n log (n / 2) = n/2 * (log n - 1) = Omega(n * log n).

Can we hope to do better? Yes. The fact that the expected cost for an insertion is O(1) hints, but not more than that, that we may hope to do it in O(n) time. Notice the fundamental difference with a search-tree structure: because the elements of a search tree with n elements can be output in sorted order in O(n) time (by running an inorder traversal), the Omega(n * log n) lower bound on sorting implies that the construction of any search tree, balanced or not, takes Omega(n * log n) time. For heaps there is no such fundamental obstacle against an efficient construction: the elements are only very weakly sorted.

A first idea is to randomize the input and then perform n inserts. This overcomes the problem that the elements may stand in the wrong order: with high probability (meaning that the probability of failure is bounded by O(n^-alpha) for some constant alpha > 0) the whole sequence of insertions costs only O(n) time.

However, this bound can also be established deterministically by a rather simple algorithm. The idea is that we do not maintain a heap at all times (this is not necessary as we are not going to do deletemins during the buildheap). We simply create a perfect binary with n nodes, and then heapify it. That is, we are going to plow through it until everything is ok.

How to proceed? Our strategy must in any case guarantee that the smallest element appears in the root of the tree. This seems to imply that the root must (also) be considered at the end, to guarantee that it is correct. From this observation one might come with the idea to work level by level bottom up, guaranteeing that after processing level i, all subtrees with a root at level i are heaps. Let us consider the following situation: we have two heaps of depth d - 1, and one extra node with a key x, which connects the two heaps:

Two Heaps Connected by a Root

How can we turn this efficiently into a heap of depth d? This is easy: x is percolated down. So, two heaps of depth d - 1 + one extra node can be turned into a heap of depth d in O(d) time.

Now, for the whole algorithm we start at level 1 (the leafs being at level 0) and proceed up to level log n. In pseudo-code this gives the following algorithm:

  void heapify(Heap h, int n) 
  {
    // The nodes at level 0 constitute heaps of depth 0
    for (l = 1; l <= round_down(log_2 n); l++)
      for (each node v of h at level l) do
        percolateDown(v); // Now v is the root of a heap of depth l
      // Now all nodes at level l are the root of a heap of depth l
    // Now the root is the root of a heap of depth round_down(log_2 n)
  }
The correctness immediately follows from the easily checkable claims (invariants) written as comments within the program, which hold because the above observation about the effect of percolating down.

How about the time consumption? At level l we are processing fewer than n / 2^l nodes, and each operation takes O(l) time. Let c be the constant so that the time for processing a node at level l is bounded by c * l, then the total time consumption can be estimated as follows:

sum_{l = 1}^log n c * l * n / 2^l < c * n * sum_{l >= 1} l / 2^l = 2 * c * n. = O(n).

The given algorithm can easily be generalized for any kind of heaps, but for heaps in which all nodes have the same degree implemented in arrays, it can be written even simpler. Assume an array a[] of length n should be turned into a binary heap, then we can do the following:

  void percolateDown(int i)
  {
    int j = i;
    int k = (i << 1) + 1;
    if (k < n && a[k] < a[j])
      j = k;
    k++;
    if (k < n && a[k] < a[j])
      j = k;
    if (j != i)
    { 
      k = a[j]; a[j] = a[i]; a[i] = k;
      percolateDown(j);
    }
  }

  void buildHeap()
  {
    for (int i = (n >> 1) - 1; i >= 0; i--)
      percolateDown(i);
  }
Click here to see the above piece of code integrated in a working Java program.

Heapification of a Binary Tree

d-Heaps

Of course we do not have to limit our studies to heaps that are build out of binary trees. Taking trees of degree 3, 4 or more generally of degree d is possible as well. The heap property remains the same: the key of any node is smaller or equal than that of all its children. Deletemin is simple: remove the root, replace it by the last leaf and perform a percolate down, now considering d children. An insert is also easy: add a new leaf and let the new key percolate up.

Practically there are reasons to choose d to be a power of two: in that case the array implementation requires only bit shifts for the computation of the location of the parent and the children (and no divisions which might be more expensive). For d-heaps mapping the nodes to consecutive numbers in a way so that the indices of the children can be computed easily is the same as before: start with the root, and number on level by level. Giving number 0 to the root, the children of a node with index i are the nodes with indices d * i + 1, ..., d * i + d.

We prove that this is correct. Denote by f_d(k, i) the index in a perfect d-ary tree at position i of level k (the root being at position (0, 0)). We should prove that the children of node with index f_d(k, i) have indices d * f_d(k, i) + 1, ..., d * f_d(k, i) + d. This can be shown by first analyzing the relation between the indices of the leftmost nodes. f_d(k + 1, 0) = sum_{l = 0}^k d^l = d * sum_{l = 0}^{k - 1} d^l + 1 = d * f_d(k, 0) + 1. Now consider the other nodes. Node f_d(k, i) has index f_d(k, 0) + i and its children have indices f_d(k + 1, 0) + d * i, ..., f_d(k + 1, 0) + d * i + d - 1. Substituting f_d(k + 1, 0) = d * f_d(k, 0) + 1 the result follows.

Choosing a tree of degree d reduces the depth of the tree from log_2 n to log_d n. Thus, a deletemin now takes O(d * log_d n). This is more than before. On the other hand, the insert has become cheaper: it only takes O(log_d n). Practically this is not such an interesting improvement as even degree 2 gives us expected time O(1), but theoretically it might be. For example, if we take d = log n, then the cost for the inserts has been reduced to O(log n / loglog n), which is asymptotically faster.

A more important reason, just as for d-ary search trees, is that every access of a node means a cache or page fault. If the tree is shallow, then the number of these accesses is reduced, which in practice will imply a reduction of the time to perform the operations. The right choice of d depends on the type of application. As long as all data fit in the main memory a good choice might be d = 4: this reduces the depth of the tree by a factor two at the expense of few extra operations.

If we consider an application in which the data do not fit into the main memory, then most accesses imply a page fault. In that case the tree should be kept as flat as possible by taking d very large. In that case a good idea is to take d = sqrt(n), assuring that the whole tree has depth 2. More generally, for any d = n^eps, for some constant eps > 0, the depth is constant, assuring that inserts can be performed in constant time.

A problem with such large d is, of course, that when percolating downwards, the minimum has to be selected out of d elements which becomes rather costly: a deletemin takes O(n^eps), which is not good. A solution is to maintain the children of a node not in an array or list, but in a priority queue. Keeping these priority queues up-to-date is not trivial, but clearly any percolate-step, both up and down, can be performed in O(log d) time when using conventional binary heaps for these priority queues of size d. The time for both inserts and deletemins is then bounded by log_d n * O(log_2 d) = O(log_d n * log_2 d) = O(log n). This is the same as before, and due to the more complicated structure this will actually be slower if the data set is small. However, if we have a problem whose size exceeds that of the main memory by a factor 100, then with this approach everything can be organized with at most 2 page faults per operation, whereas otherwise we would need 6 or 7.

Binomial Heaps

Most ADTs are based on a small number of underlying actual data structures. The most important examples are the following: We will add one more great data structure to this list: the binomial forest structure. It allows to efficiently support the two priority-queue operations plus the extra operation of merging. Merging two priority queues means that out of two of them we create one new one containing all the elements. With heaps realized with perfect binary trees, this is hard to achieve. Of course, when there are n elements in total, it can be done in O(n) by building a new heap, but this is not what we are looking for. Using the binomial queue structure, all three operations can be performed in O(log n) time, and, this is also a very interesting property, insertions can be performed in O(1) amortized time. So, binomial queues really offer us some features that none of the previous data structures was offering.

Binomial Trees

A binomial tree has a very special recursively defined structure:

Lemma: A binomial tree of depth d has 2^d nodes.

Proof: The proof goes by induction. The lemma is ok for d = 0 because 2^0 = 1. This is the basis. So, assume the lemma is ok, for all depths up to d - 1. Then, the tree of depth d has 1 + sum_{i = 0}^{d - 1} 2^i = 2^d nodes, because sum_{i = 0}^{d - 1} 2^i = 2^d - 1 (this might again be proven by induction). End.

There is an alternative definition of binomial trees, which gives rise to the same structures:

Smallest Binomial Trees

Binomial Forests

To create a structure with n nodes for some n which is not a power of two, then we simply use the binomial trees corresponding with the ones in the binary expansion of n. For example, for n = 45 = 101101, we would take BNT_5, BNT_3, BNT_2 and BNT_0. Here BNT_d denotes the binomial tree of depth d. Such a structure with at most one binomial tree of each depth is called a binomial forest.

Binomial Forest with 45 Nodes

Using the second definition of binomial trees and the binary addition, it is easy to merge a binomial forest with n_1 nodes with a binomial forest with n_2 nodes: starting with the smallest trees in each forest, if there are two or three trees of the same depth d, two of them are linked to one tree of depth d + 1. The number of these operations is bounded by the length of the binary expansions of n_1 and n_2 and is thus bounded by O(log (n_1 + n_2)). In the literature the operation of combining two data structures to a single one is mostly called melding rather than merging.

As an example we consider n_1 = 22 = 10110 and n_2 = 10111. There is a single BNT_0, which remains unchanged. The two BNT_1 are linked and give a BNT_2. Now there are three BNT_2, two of which are linked and give a BNT_3. This is the sole BNT_3 and it survives. Finally, the two BNT_4 are merged to one BNT_5.

Merging Two Binomial Forests

Binomial Heaps

The explanation of binomial trees and forest so far was not specific to priority queues. Possibly these interesting structures may also be useful for realizing other ADTs. Now we will see how binomial forests can be used for a simple and efficient implementation of priority queues called binomial heaps.

Each node of the forest is used for storing one entry. Each tree is organized as a heap (here we encounter an example of heaps with a non-uniform structure), but there is no condition on how the keys are distributed over the trees. As a result the smallest element may stand in any of the trees. We give an example: a priority queue with 29 entries can be realized as a binomial forest with 29 nodes, binomial trees of size 16, 8, 4 and 1, each tree being a heap:

Binomial Heap with 29 Nodes

How to perform the operations? Findmin is easy, it can be performed in O(log n) time, by determining the minimum value of the keys stored in the roots of all of the at most log n trees in the forest.

The other operations are build on the merge operation, so let us first consider how a merge can be performed efficiently. We already know how to merge two binomial forests into one new binomial forest. The only open question is how to assure that all resulting trees have the heap property afterwards. However, this is trivial: when joining two BNT_d to one BNT_{d + 1}, the idea is to always hook the tree with the larger root to the tree with the smaller root. In this way the heap property holds for the root of the new tree and because the remaining structure is unchanged it holds for all nodes. Thus, each of these join operations can be performed in O(1) time and thus two forest with n_1 and n_2 nodes, respectively, can be merged in O(log (n_1 + n_2)) time.

Insert and Deletemin

Now that we know how to perform merges, all other operations are easy! For inserting we just create a new binomial forest with a single node. This takes constant time. Then we merge it with the existing tree. This takes O(log n) time.

For a deletemin, we look to the at most log n roots of the binomial trees. The minimum is the minimum of these roots. This minimum element is removed. If this is the root of a BNT_d, removing the root results in a bunch of new trees, a BNT_0, a BNT_1, ..., a BNT_{d - 1}. Each of these trees is a heap itself, and thus they constitute a binomial Forest with 2^d - 1 nodes in their own right. This forest is merged with the rest of the binomial forest to obtain the resulting binomial forest. Finding the correct root and removing it takes O(log n), merging the two forests also takes O(log n). So, even deletemin can be performed in O(log n) time.

Inserts can be performed also directly without relying on the merge routine. The idea is that we look for the smallest index j so that b_j = 0 (referring to the binary expansion of n). Then we know that the trees T_0, ..., T_{j - 1} + the new element have to be merged into one new tree with 2^j nodes which is replacing the smaller trees in the binomial forest. One can do this in several ways. One possible way is to add the new element as a new root and then to percolate down. This is correct but not very efficient: at the top level, we have j comparisons, at the next level up to j - 1, and so on. The whole operation takes O(j^2) operations. So, it appears better to nevertheless stick more closely to the merging pattern: first we add 1 to T_0 and create a new tree T'_1, which is added to T_1. This gives a new tree T'_2, which is added to T_2. Etc. This is simple and requires only O(j) time.

Operations on Binomial Forests

Expected Insertion Time

In this section n denotes the number of nodes in the binomial forest and d = rounded_down(log n). Above we have shown that inserts and deletes can be performed in O(log n) time. But, in practice the inserts are mostly much faster.

We analyze the expected time for an insert. The time for inserting an element to a binomial forest with n elements depends on the binary expansion of the number n. Let this be (b_d, ..., b_3, b_2, b_1, b_0), and let z_n be the smallest number so that b_{z_n} = 0. Then the insert involves only the trees with BNT_i with i < z_n. Thus, such an insert can be performed with z_n comparisons. If we have just an arbitrary tree, whose number of elements is uniformly distributed, then with 50% probability b_j = 0 for any j. Thus, the expected number of comparisons for an insert can be given by

T_exp <= sum_{j >= 0} j / 2^j = 2.
This shows that the expected time for insertion is constant, O(1).

The above result, assumes nothing about the distribution of the keys it only assumes that we have no a priori knowledge about the size n of the binomial forest. Therefore, this is already much stronger than the earlier result for binary heaps, where we needed that the keys were uniformly distributed, a fact which lies outside the influence of the programmer: for certain inputs it is good, for others it is not.

Amortized Insertion Time

We now analyze what happens if a sequence of consecutive insertions is performed. Even though we cannot exclude that a single operation is requiring O(log n), these unlucky events do not cluster: we will prove that any sequence of m >= log n operations takes at most O(m) time.

For this analysis we need some theory. First we consider a problem from daily life with the same spirit. Consider a person who wants to keep track of his expenses. There are numerous smaller and larger expenses, so this requires quite a considerable bookkeeping and it is likely that some expenses are forgotten. Assume this person has a regular income of 1000 units per month and he/she had 1270 units on his account at the beginning of the year and 490 units at the end of the year. Then without knowing how much was spent when and where, we can immediately conclude that the sum of all expenses during the year has been 12 * 1000 + 1270 - 490 = 12780.

When trying to determine cost functions in computer science quite often one has to perform "clever bookkeeping". Costs are allocated to operations that did not really cause them in order to later not have to care when they arise. This idea will prove effective here too. It is quite common to make this bookkeeping explicit using tokens. A token is a cost unit. It costs one unit to deposit a token. This can be viewed as a prepayment for future operations: to consume a token, that is executing an operation which earlier has been deposited a token for, is namely considered to be free. More precisely, the amortized time is given by

t_amortized = t_actual + number_of_deposited_tokens - number_of_consumed_tokens.
The total amount of deposited tokens gives the potential of the data structure. The amortized time equals the actual time plus the change of the potential.

If the amortized time as defined above for operations on a data structure of size n can be bounded to t(n) and p(n) gives an upper bound on the potential, then any sequence of m operations takes at most m * t(n) + p(n) time, which means that for m >= p(n) / t(n) the average time per operation for any sequence of operations is bounded by (m * t(n) + p(n)) / m <= 2 * t(n) = O(t(n)). So, the intuitive notion of amortized time as being the average time over a sufficiently long sequence of operations asymptotically coincides with the formalized definition in terms of a potential.

Lemma: The amortized time for performing insertions on a binomial tree is constant. For a structure with n nodes, the used potential has maximum value O(log n).

Proof: Above we noticed that an insert on a forest of size n costs O(1 + z_n) time. Thus, the real cost of an operation is proportional to 1 + z_n. z_n gives the the number of ones in the binary expression of n which have to be turned into zeroes. This number can be as large as log n. However, for any number n, there is exactly one position in which n has a zero where n + 1 has a one. So, it does not cost much to deposit one token for every newly created one. Furthermore, if we start with one token for every one, we can assume that at all times, there is a token available for each one in the binary expression of n. Said otherwise, as potential we use that number of ones in the binary expression of n. For the amortized time this gives t_amortized = 1 + z_n + 1 - z_n = 2. End.

Corollary: Any sequence of m >= log n consecutive insertions to a binomial forest with n elements can be performed in O(m) time.

Applications and Extensions

Other Operations

So far we have discussed deletemin, insert and merge. In addition one could perform decreaseKey, increaseKey and delete. All these operations require that we can access a specified element. For this we need a secondary structure (for example a search tree or a hash table) which holds "pointers" to the elements in the heap.

A decreaseKey is performed by updating the key and percolating the element up. Both on d-ary heaps and on binomial forests this takes logarithmic time. An increaseKey can be performed similarly: update the key and percolate down. On d-ary heaps this takes O(d * log_d n), which is ok for any constant d, but on binomial forests a percolate down may be expensive. A delete can be performed in several ways. Lazy deletions are cheap and have little impact on the time for the other operations. Lazy deletion in combination with insert offers a very simple way to realize increase and decrease key on any priority queue. On heaps one can replace the node to delete by the last leaf which subsequently is percolated down, analogously to doing a deletemin.

Selection of k-th Smallest Element

Finding the minimum or maximum of a set of n numbers is easy: scanning through the numbers and keeping track of the current maximum or minimum solves the problem in O(n) time. The problem becomes harder when we want to find the k-th smallest element, that is the element which in the sorted order would come at position k. It is also said that this element has rank k.

This problem is called the selection problem. Selection can be solved trivially by sorting and then picking the right element. If the numbers do not allow a fast linear-time sorting method, then this is not a good method because of the O(n * log n) time consumption.

Another simple method is to perform k rounds of bubble-sort (in a variant that makes the smallest numbers reach their positions first) and then take the number at position k. This costs O(k * n), which is good for small k, but worse than O(n * log n) for large k.

The same time is obtained when generalizing the idea to find the minimum: a pocket with the currently smallest k numbers is maintained. Each new number to consider is compared with these k and if it is smaller then the largest one, this largest one is kicked out.

This last idea can be improved. Suppose we have a priority queue with an insert and deletemax operation requiring at most O(log k) time per operation. Then we can first enter the first k elements into the priority queue. We perform a deletemax and store the value in a variable x. The further elements are compared with x. If such a value y < x, then y is inserted in the priority queue and the result of deletemax is assigned to x. In this way any element of the array is processed at a cost of at most O(log k) time for a total time consumption of O(n * log k). In pseudocode the algorithm looks as follows:

  int select(int k, int[] a, int n)
  {
    create and empty priority queue pq;
    for (i = 0; i < k; i++)
      pq.insert(a[i]);
    x = pq.deletemax();
    for (i = k; i < n; i++)
      if (a[i] < x)
      {
        pq.insert(a[i]);
        x = pq.deletemax();
      }
    return x;
  }
The above algorithm is correct, because it can easily be shown that at all times x equals the element with rank k out of the subset of elements a[j] with j < i.

A simple alternative is to apply buildHeap to all n elements, building a heap in O(n) time, and then perform k deletemins. This has the same worst-case complexity. This idea can be improved further: there is no need to really delete the elements from the heap. The algorithm can also search from the root for the k smallest elements. The crucial observation is that an element with rank i + 1 is a neighbor of one of the elements with ranks i or less. From among these elements, it is the one with smallest value. Thus, the problem of finding the k smallest elements can be solved by a special kind of search through the tree:

  int select(int k, int[] a, int n)
  {
    if (k > n)
      return a special error value;

    construct a heap hp containing all elements in a[];
    create an empty priority queue pq;
    insert the key of the root of hp into pq;
    for (i = 1; i < k; i++) 
    {
      x = pq.deleteMin(); /* x has rank i */
      let u be the node of hp of which x is the key value;
      if (u.leftchild != null)
        pq.insert(u.leftchild.key);
      if (u.lrightchild != null)
        pq.insert(u.rightchild.key);
    }
    return pq.deleteMin();
This routine takes O(n + k * log k) time because the size of pq is increased by at most one in each iteration, and its size thus remains bounded by k. So, for all k = O(n / log n), this selection algorithm runs in linear time. This final idea is interesting because it uses heaps and priority queues in a non-trivial way. For k = o(n / log n) it is even quite efficient, because the the time is dominated by the time to build the heap, which goes fast even in practice. However, there are deterministic and randomized selection algorithms running in O(n) time for all k. Particularly the randomized algorithms are simpler and more efficient.

Smallest Elements in a Heap

Heap Sort

If we have a priority queue, not necessarily a heap, then it can be used for sorting: first all elements are inserted one-by-one (alternatively one may call a more efficient building routine such as buildHeap), then we call n times deleteMin, and store the elements in the order they are returned by the priority queue. As any reasonable priority queue implementation performs insert and deletemin in O(log n) time or less, this gives a simple O(n * log n) sorting algorithm. This algorithm, which is known as heap sort is rather good, but quick sort and merge sort are somewhat faster.

An advantage of heap sorting is that the answer is produced gradually. If we use a heap and call buildHeap first, then after O(n + k * log n) time the first k elements are available. Using quick sort, no output is available until shortly before the end. Of the discussed sorting methods, only bucket sort shares this property, but its efficiency is far worse. So, if the sorted sequence is the input to a follow-up routine, then applying heap sort allows to pipeline the two operations.

Heapsort

Impossibility of Fast Deletemin and Fast Insert

The fact that we can apply priority queues for sorting, implies that a fast (that is O(1)) insert routine must imply a slow (that is O(log n)) deletemin and vice-versa. The reason is the lower bound of Omega(n * log n) for comparison-based sorting. As all our priority-queue operations are comparison-based, we must have n * T_insert(n) + n * T_deletemin(n) >= n * log n. Dividing by n gives T_insert(n) + T_delemin(n) >= log n. So, at least one of the two operations must take logarithmic time. Improved variants of priority queues indeed achieve both extreme cases: one operation O(1), the other O(log n).

One might have thought that constructing a heap is closely related to sorting, but the above shows that it is actually much less. It is much closer related to finding the minimum.

Exercises

  1. Write a Java class which has public methods insert and deleteMin. Internally it uses an array for maintaining the integers. deleteMin must be implemented in O(1), the time of insert is not critical.

  2. Describe how a priority queue can be used as queue and how it can be used as a stack.

  3. Assume that elements which are inserted later tend to have larger keys than elements inserted earlier. Describe a, somewhat realistic context in which such a situation may arise. Describe a realization that performs inserts and deletemins in O(1) time if later insertions have strictly larger keys. If those which are not appearing in order have arbitrary keys how many of them can be tolerated before it is better to use a conventional heap implementation in which all operations take O(log n)?

  4. Assume we have a priority queue supporting insert and deleteMin. This might be realized in a Java class MinPriorityQueue. Describe how based on these operations a class MaxPriorityQueue can be defined with operations insert and deleteMax.

  5. Prove that the heap-property implies that the element with the smallest key must stand in the root. For a binary heap, indicate with small sketches all positions where the second and third smallest element can stand.

  6. Consider a binary heap. Initially it is empty. Then a number of operations is performed: insert 17, 4, 6, 8, 2, 23, 12, 14 followed by 3 times deletemin. Draw the resulting heaps after each operation.

  7. Consider the example program. Make some modifications so that the number of calls to percolateUp and percolateDown can be determined. For n = 4^k, k = 6, ..., 11, determine the number of these calls performed when building a heap, when repeatedly inserting elements and when removing them one by one. Can n insertions indeed be performed in linear time? Match a function f(n) = a + b * n through the number of percolate ups, so that the relative deviation with the observed values is minimized.

  8. Consider the example program. In this program the percolates are implemented in a recursive fashion. Furthermore, the elements on the path are exchanged instead of using the more efficient idea to keep a free position, to shift the elements and finally to close the free position. Rewrite the procedures in a non-recursive version using this idea. Compare the time consumptions for n = 1,000,000.

  9. The underlying idea of buildheap is not limited to perfect binary trees. However, for that particular case an upper bound on the number of accessed nodes could be computed rather easily. In general by heapifying, we mean to allocate the keys in an arbitrary way to a tree with the desired structure, and then to work upwards level-by-level, performing percolate down on all nodes in the levels.
    1. Give the exact maximum value for the cost of heapifying a full perfect d-ary tree of depth k. The cost measure is the number of key values considered. Hint: first give the required expression for d = 2 and 3. Do not forget that when percolating down the keys of all children have to be considered.
    2. For the same number of nodes, is it cheaper or more expensive to heapify a d-ary tree for d > 2 than a binary tree?

  10. This question deals with the problem of how to build a binomial heap with n nodes for a given set of n elements.
    1. Estimate the time when performing n times an insert to an initially empty binomial heap.
    2. Consider a binomial tree of depth d. Assume that the heap property holds for all nodes except for the root. How can this tree be turned into a heap and how long does this take at most. Hint: be careful, the operation is more expensive than you might think at first.
    3. Suggest an algorithm for building a binomial heap which is analogous to the algorithm for building a binary heap in O(n) time.
    4. Show the most important stages of the above construction for the following set of 13 keys: 7, 98, 3, 5, 16, 15, 1, 17, 75, 22, 2, 23, 8.
    5. Prove an upper bound on the time consumption of your algorithm.

  11. How many merge operations are performed, when performing a sequence of insertions on a binomial forest changing the number of nodes from n_0 to n_1? Give an exact expression.

  12. Consider a sequence of n_i insertions and n_d deletemins on a binomial forest. Let n_0 be the initial number of nodes. Assume that the insertions and deletemins are randomly mixed. Uniformly select any of the insertions at random with probability 1 / n_i. Give a bound on the expected number of merges performed during this insertion.

  13. There are several ways to perform an increasekey operation. One possibility is to assign the new larger key value followed by a percolate down. How much does this increasekey algorithm cost for a binomial heap with n nodes? Describe a more efficient way of performing increasekey.

  14. Consider the selection algorithm using a priority queue of size k to maintain the k smallest elements encountered so far. The worst-case time is O(n * log k). Give an example of an input for which the time consumption is Omega(n * log k). In general the algorithm may run faster. Give an example of an input for which the time consumption is O(n). Give a more precise expression of the time consumption, using an additional parameter. Assume that the values in a[] are randomized (rearranged according to a uniformly selected permutation of the n indices). Give an expression of the expected time consumption involving both k and n for this case. For which values of k is the expected running time of the algorithm linear?

  15. Assume we have a priority queue supporting buildQueue and deleteMin. Let T_1(n) be the time for building such a priority queue with n elements. Let T_2(n) be the time for performing a single deletemin from a structure with n elements. Formulate an implication of the form: "if T_1(n) = o(f(n)), then T_2(n) = Omega(g(n))", and a similar one with T_1 and T_2 exchanged.


This page was created by Jop Sibeyn.
Last update Monday, 07 March 05 - 12:35.
For any comments: send an email.

Graph Algorithms

Definitions

Graphs are a mathematical concept, that can be used to model many concepts from the real world. A graph consists of two sets. One set are the nodes (also called vertices) the other are the edges (sometimes also called arcs). The edges connect the nodes. A road map (without the background coloring) is an example of a graph: the cities and villages are nodes, the roads are the edges. This is an example of an undirected graph: typically a road on a map can be used in both directions. But, we can also make a sociogram: for each considered person there is a node, and there is an edge from person A to person B if A likes B. These edges are directed: liking someone is not always reflexive. More mathematically, if there are n nodes, then we will number them from 0 to n - 1. These numbers are called indices. An edge can be given as a pair of nodes (u, v) indicates the edge from node u to node v. With any graph it should be specified whether it is directed or not. In the latter case an edge (u, v) then can be used both for going from u to v and for going from v to u. Any undirected graph can be replaced by a directed graph by replacing each edge by a pair of edges.

In connection with graphs there are many notions. Some of them are important already now. A neighbor of a node u is any node v for which there is an edge (u, v). In this case one also says that v is adjacent to u. The nodes u and v are called the endpoints of the edge. A path from u to w is a sequence of edges (u, v_1), (v_1, v_2), ..., (v_{k - 1}, w) connecting u with w. A path is called simple if all nodes on the path are different. A cycle is a simple path which has the same begin- and endpoint. In the example this means that u == w. A graph without cycles is called acyclic. The length of a path is the number of used edges, in the example the path has length k. The distance from u to v is the length of the shortest path between u and v. A graph is called connected if for any pair of nodes u, v there is a path running from u to v. For directed graphs one mostly speaks of strongly connected if we take the direction of the edges into account for these paths, otherwise one speaks of weakly connected. A connected component is a maximum subset of the nodes that is connected. For directed graphs one speaks of strongly (weakly) connected components, often these are also called strong (weak) components. An undirected graph is said to be a tree if it is acyclic and connected. A directed graph is a tree when it is a tree considered as an undirected graph. Mostly it is required in addition that all edges are directed towards or away from a specific node called root. A forest is a set of trees. A spanning tree is a tree that reaches all nodes of a connected graph. A spanning forest is a set of trees, one for each connected component of a graph. The degree of a node u is the number of edges incident on u. For directed graphs it is customary to distinguish indegree, the number of edges leading to u, and outdegree, the number of edges starting in u. The degree of the graph is the maximum of the degrees of all nodes. If an edge (u, v) occurs more than once (that is, the "set" of edges is actually a multiset), then we will say that there are multiple edges. A self-loop is an edge (u, u). A graph without multiple edges and self-loops is said to be simple. It is common to assume that graphs are simple. The number of nodes of a graph is often denoted with n, the number of edges with m. Simple graphs have at most n * (n - 1) edges if they are directed and at most n * (n - 1) / 2 edges if they are undirected. If m = O(n) (as in road graphs), then the graph is said to be sparse. If m is much larger than linear the graph is called dense. This notion is not precisely defined, sometimes it means m = omega(n), sometimes it means m = Omega(n^2).

Directed Graph with 12 Nodes and 16 Edges

Representations

The simplest representation of a graph is as a list of edges. In that case the input has size 2 * m. This representation is unsuitable for solving most common graph problems, because there is no efficient way to access all edges incident on a node. But, one of the presented algorithms for computing the connected components of a graph can work with a list of edges. Furthermore, this may be the initial way the graph is specified. If this is the case and for the problem to solve this format is unsuitable, a more suitable representation must be constructed first.

A common way to represent graphs in the computer is by creating for each node a set of all its neighbors. This is a particularly good idea for sparse graphs. In general one might use linear lists for these sets. This is called the adjacency list representation of the graph. Such an implementation requires O(n + m) memory. The strong feature of using adjacency lists, is that it becomes very easy to determine all neighbors of a node. Using linked lists, it is also trivial to add or delete a particular edge. If all edges have about the same degree with a maximum g, then it is much more convenient to represent the graph as an array of arrays of length g, storing for each node its degree as well. Even for lists of variable length an array may be used, marking for each node where in the array its neighbors are beginning. This implementation requires n + m + O(1) memory. The major disadvantage of adjacency lists is that it takes time proportional to the degree of node u to test whether an edge (u, v) exists or not.

For dense graphs the most appropriate representation is with an adjacency matrix: an n x n boolean matrix, where a 1 at position (u, v) indicates that there is an edge (u, v). If the graph is undirected, the adjacency matrix is symmetric. This representation requires n^2 bits of memory, independently of the number of edges m. Thus, for really large graphs this representation cannot be used. Furthermore, any operation that requires the access of all neighbors of a node takes O(n) time. On the other hand, this is the representation to use if frequently the existence of single edges must be tested. Furthermore, for rather dense graphs of moderate sizes storing n^2 bits may require less memory than storing n + m ints.

Undirected Graph with 13 Nodes and 20 Edges

Graph Traversal

An elementary but important problem on graphs, directed or undirected, occurring as a subproblem when solving the connected-components or spanning-tree problem, is graph traversal. This means, some systematic way to visit all n nodes of the graph. In this section, while doing this the nodes are numbered, but later this is used to perform other operations. Sometimes a specific processing order is required (when computing shortest paths or when determining a topological sorting), but quite often it is not.

Basic Graph Traversal

We have encountered several ADTs. An important ADT which was not discussed explicitly before is the bag. A bag supports three operations: Stacks and queues are bags. Leaving things unspecified allows to use potentially even more efficient implementations (though it is unlikely that there exists anything better and simpler than a stack).

In case no specific processing order of the nodes of a graph is required, the following non-recursive procedure might be the simplest and most efficient:

  void traversal(int[] number) 
  {
    for (v = 0; v < n; v++)
      number[v] = -1; // Flag value
    counter = -1;
    Bag b = new Bag(n);
    for (r = 0; r < n; r++)
      if (number[r] == -1) // r is the root of a new component
      {
        counter++;
        number[r] = counter;
        b.add(r);
        while (b.notEmpty()) 
        {
          v = b.remove();
          for (each neighbor w of v)
            if (number[w] == -1) // w has not been visited yet
            {
              counter++;
              number[w] = counter;
              b.add(w);
            } 
        } 
      } 
  }

Clearly every node is numbered only once, because only nodes that were not visited before are assigned a value. Because counter is increased just before numbering a node and never decreased, all numbers are different. All nodes are pushed exactly once, and upon popping all their outgoing edges are considered. This means that the algorithm has running time O(n + m).

Traversal Numbers

Spanning Forests

The highlighted edges in the above picture constitute a spanning tree. This is not a lucky coincidence: the red edges are the ones along which the traversal algorithm was discovering the new nodes to add to the bag. Because all nodes are reached this is a spanning tree. Thus, with a tiny modification the above algorithm can be used to compute a spanning forest of an undirected graph.

The produced tree will be directed towards the root: for each node the next node on the path to the root will be computed. The roots are characterized by the fact that they are pointing to themselves. Because for each node only one value has to be stored, independently of the degree of the tree, the whole tree can simply be maintained in an array:

  void spanningForest(int[] parent) 
  {
    for (v = 0; v < n; v++)
      parent[v] = -1; // Flag value
    Bag b = new Bag(n);
    for (r = 0; r < n; r++)
      if (parent[r] == -1) // r is the root of a new component
      {
        parent[r] = r; // r points to itself
        b.add(r);
        while (b.notEmpty()) 
        {
          v = b.remove();
          for (each neighbor w of v)
            if (parent[w] == -1) // w has not been visited yet
            {
              parent[w] = v; // w is reached from v
              b.add(w);
            } 
        } 
      } 
  }

The time consumption is clearly the same as before: O(n + m). However, for a formal correctness proof, the above argument should be refined. It should be shown that all nodes in a connected component belong to the same tree. So, assume there is a node w belonging to the same component as r, which is not reached during the traversal starting from r. Consider a path from r to w. Such a path exists because r and w belong to the same component. Let u be the last node on the path for which the parent-value has been set during the traversal starting from r, and let v be the next node on the path. u has been added to the bag upon setting its parent-value. Because the traversal only stops once the bag is empty, u also has been removed from it. Upon that occasion all neighbors of u, including v, are inspected and when they were not yet reached at that point of time, their parent-value is set. This contradicts the assumption that there is a node v on the path from r to w which has not been reached during the traversal starting from r.

Spanning Forest

Breadth-First Search

Sometimes the nodes should be processed in the order in which they were discovered. The corresponding graph traversal is called breadth-first search and the produced numbering is called a BFS numbering of the graph. This type of graph traversal has many important applications of which we will encounter a few.
  void bfsTraversal(int[] number) 
  {
    for (v = 0; v < n; v++)
      number[v] = -1; // Flag value
    counter = -1;
    Queue q = new Queue(n);
    for (r = 0; r < n; r++)
      if (number[r] == -1) // r is the root of a new component
      {
        counter++;
        number[r] = counter;
        q.enqueue(r);
        while (q.notEmpty()) 
        {
          v = q.dequeue();
          for (each neighbor w of v)
            if (number[w] == -1) // w has not been visited yet
            {
              counter++;
              number[w] = counter;
              q.enqueue(w);
            } 
        } 
      } 
  }

The algorithm is identical with the previous one, except that we use a queue instead of a bag. Depending on how the bag is implemented, this may have a strong impact on the produced numbering, but not on the complexity: all nodes are enqueued and dequeued exactly once and upon dequeuing all their outgoing edges are considered. Thus, the complexity is O(n + m).

BFS Numbers

Depth-First Search

There is an alternative way of traversing a graph called depth-first search, the produced numbering is called a DFS numbering. Even this method has many important applications. DFS uses (implicitly or explicitly) a stack instead of an unspecified bag or a queue. DFS can be solved recursively and non-recursively. As usual the recursive algorithm is shorter, though most likely the non-recursive variant will be more efficient.

Non-Recursive Algorithm

First we consider a non-recursive variant of the DFS algorithm.
  void dfsTraversal(int[] number) 
  {
    for (v = 0; v < n; v++)
      number[v] = -1; // Flag value
    counter = -1;
    Stack s = new Stack(m);
    for (r = 0; r < n; r++)
      if (number[r] == -1) // r is the root of a new component
      {
        s.push(r);
        while (s.notEmpty()) 
        {
          v = s.pop();
          if (number[v] == -1) // v has not been visited yet
          {
            counter++;
            number[v] = counter;
            for (each neighbor w of v)
              s.push(w); 
          } 
        }
      }
  }

As before this algorithm numbers each node only once with different numbers from 0 to n - 1. Here the nodes are marked as visited only after they are popped from the stack. This implies that nodes may be pushed on the stack many times, and that the size of the stack may become as large as m (even 2 * m for undirected graphs). This is not such a large disadvantage: the graph itself also requires so much storage. If one nevertheless wants to prevent this, one should either apply the recursive algorithm, where the command "s.push(w)" is replaced by a conditional recursive call to the DFS procedure, or one should push instead of just w also v, and the index of w within the adjacency list of v. When popping this entry w, the next neighbor of v should be written instead of it.

Recursive Algorithm

DFS can more easily be formulated with a recursive algorithm.
  void dfs(int v, int* preCounter, int* postCounter, 
         int[] preNumber, int[] postNumber) 
  {
    (*preCounter)++;
    preNumber[v] = *preCounter; // preorder number
    for (each neighbor w of v)
      if (preNumber[w] == -1)
        dfs(w, &preCounter, &postCounter, preNumber, postNumber);
    (*postCounter)++;
    postNumber[v] = *postCounter; // postorder number 
  }

  void dfsTraversal(int* preNumber, int* postNumber) 
  {
    for (v = 0; v < n; v++)
      preNumber[v] = -1; // Flag value
    preCounter = postCounter = -1;
    for (r = 0; r < n; r++)
      if (preNumber[r] == -1)
        dfs(r, &preCounter, &postCounter, preNumber, postNumber); 
  }

Here we computed two types of numbers: preorder DFS numbers and postorder DFS numbers depending on whether they were assigned before or after the recursion. The preorder numbers are the same that were simply called "dfs numbers" before. The importance of computing both types of numbers will become clear soon.

The last code fragment is written in a C-like style. In Java where integer parameters cannot be passed by reference, one should either make the counters global (ugly but efficient), or pass it in some kind of object, for example as an object from the class "Integer". Click here to see all presented traversal algorithms integrated in a working Java program.

DFS Numbers

Edge Classification

Classification

The pre- and postorder numbers are not just any arbitrary numbers. They are useful for classifying the edges. Generally, with respect to a spanning tree of a graph, the edges of the graph may be classified as On undirected graphs, there is no distinction between forward and backward edges. If the spanning tree is a DFS tree of the graph, then for an undirected graph, there are no cross edges because of the way the DFS search is performed. For directed graphs, there are no forward cross edges with respect to a DFS tree, but there may be backward cross edges.

Classification of Edges of Directed Graph

With respect to a given spanning tree, for which we have computed pre- and postorder numbers, edges can be classified in constant time per edge as follows:

This classification does not distinguish tree edges from forward tree edges. If this matters, they can be distinguished testing whether u == parent[v] or not, where parent[v] gives the node from which v was reached. The classification allows to test in O(n + m) time whether a given spanning tree of a graph G is a DFS tree of G: compute the pre- and postorder numbers and test whether there are forbidden cross edges.

DFS Tree

Finding Cycles

Cycles are often problematic. For example, if the edges indicate some kind of order in which tasks are to be performed (precedence relations) then a cycle implies a deadlock situation. Fortunately it is easy to test for the existence of cycles: a graph is acyclic if and only if there are no backward tree edges. In principle this solves the problem, but there are more efficient special-purpose algorithms which will be presented in the following.

For undirected graphs this gives a particularly easy test for cycles:

  boolean isAcyclic() 
  {
    int[] parent = new int[n];
    for (v = 0; v < n; v++)
      parent[v] = -1; // Flag value
    Bag b = new Bag(n);
    for (r = 0; r < n; r++)
      if (parent[r] == -1) // r is the root of a new component
      {
        counter++;
        parent[r] = r;
        b.add(r);
        while (b.notEmpty()) 
        {
          v = b.remove();
          for (each neighbor w of v)
            if (parent[w] == -1) // w has not been visited yet
            {
              parent[w] = v;
              b.add(w);
            } 
          else
            if (w != parent[v])
              return false;
        } 
      } 
    return true;
  }

The above algorithm is a minor modification of the spanning-forest algorithm and has running time O(n + m). On undirected graphs one must be careful not to detect "cycles of length two": an edge (u, v) followed by the same edge in the other direction (v, u). In the algorithm these cases are singled out by the test "w != parent[v]". The applied method is simple and efficient, but one may nevertheless wonder whether it is necessary to use an additional array of n ints. One of the exercises deals with this, and the conclusion is that an array of n booleans is sufficient as well.

For directed graphs, the above algorithm might detect false cycles: reaching an earlier reached node does not need to imply that a directed cycle has been closed. On undirected graphs any graph-traversal algorithm can be used, but on directed graphs it appears to be essential to process the nodes in DFS order. Doing this, cycles are created only by backward tree edges, edges leading from a descendant v to an ancestor w. The ancestors of a node w are easy to recognize: these are exactly these nodes which already have been reached by the traversal, but for which the search is not yet completed. This can be characterized with two booleans per node. More formally, during the search any node may be in three possible states:

0.
unreached
1.
reached but not finished
2.
finished
So, the idea can also be implemented conveniently with a three-valued array:
  boolean hasCycle(int v, boolean[] status)
  {
    status[v] = 1;
    for (each neighbor w of v)
      if (status[w] == 0) // Unvisited node
      {
        if (hasCycle(w, reached, finished)) // Cycle in subtree
          return true;
      }
      else 
        if (status[w] == 1) // w is an ancestor of v
          return true;
    status[v] = 2;
    return false; // No cycles in any of the subtrees
  }

  boolean isAcyclic() 
  {
    byte[] status = new byte[n];
    for (v = 0; v < n; v++)
      status[v] = 0;
    for (r = 0; r < n; r++)
      if (status[r] == 0) // Unvisited node
        if (hasCycle(r, reached, finished)) // Cycle in subtree
          return false;
    return true; // No cycles in any of the subtrees
  }

Connected Components

Undirected Graphs

One of the simplest application of graph traversal algorithms is to determine the connected components of an undirected graph. This requires only a minimal modification of the traversal algorithm. The algorithm is very similar to the spanning-forest algorithm, which should come as no surprise: the trees of the spanning forest are in one-to-one correspondence with the connected components.
  void connectedComponents(int[] component) 
  {
    for (v = 0; v < n; v++)
      component[v] = -1; // Flag value
    Bag b = new Bag(n);
    for (r = 0; r < n; r++)
      if (component[r] == -1) // r is the root of a new component
      {
        component[r] = r;
        b.add(r);
        while (b.notEmpty()) 
        {
          v = b.remove();
          for (each neighbor w of v)
            if (component[w] == -1) // w has not been visited yet
            {
              component[w] = r;
              b.add(w);
            } 
        } 
      } 
  }

Hereafter, for all nodes v, component[v] gives the index of the "root" of the component. In this case, this index equals the smallest index of the nodes belonging to the component. The graph is connected if and only if component[v] == 0 for all v. Using a counter which is increased only when finding the root of a new component, the components can be numbered consecutively, which saves memory if the sizes of the connected components are to be stored in an int array size[]. Click here to see this algorithm integrated in a working Java program.

Directed Graphs

Determining the weak components of a directed graph is trivial: replace each directed edge by an undirected edge and run the above algorithm for undirected graphs.

Much more interesting and relevant is to determine the strong components. This problem is non-trivial: just running a graph traversal does not bring anything, reaching a node v from a node r does not mean that there is also a way back from v to r. At first it may even appear that finding the strong components is of an essentially harder nature than the problems discussed so far. This is not true: O(n + m) time is enough.

The algorithm consists of two rounds of DFS. For a graph G it proceeds as follows:

  1. Perform DFS on G computing the postorder numbers;
  2. Construct the inverted graph G', with an edge (v, u) if and only G has an edge (u, v);
  3. Perform DFS on G', addressing the nodes in the order given by their postorder numbers: u is inspected before v if postNumber[u] > postNumber[v].

Lemma: The nodes reached by a search starting from a node r during the second DFS performed on G' precisely give the strong component to which r belongs.

Proof: Denote the strong component to which r belongs by S_r. By definition of strong components, for any node u in S_r there is a path from u to r. Thus, there is also a path from r to u in G'. This implies that all of S_r is reached during the search starting from r. Now consider a node v which is reached during the search starting from r. Apparently there is a path in G' from r to v, and thus there is a path from v to r in G. It remains to prove that there is a path from r to v in G. r is considered before v. This implies that r has larger postorder number than v. If we assume that during the first DFS v was reached before r, then we get a contradiction, because we know that there is a path from v to r, and thus the postorder number of v would be larger than that of r. So we may assume that during the first DFS r was reached before v. In that case r only gets a larger postorder number when v is a descendant from r in the DFS tree. That is, there must be a path from r to v.

Topological Sorting

Suppose we have to perform a certain number of tasks and that we can perform one task at a time. There are certain dependencies between the tasks: it may be the case that some task T_1 should definitely be performed before another task T_2. The order of other tasks may be irrelevant. The whole set of dependencies can be drawn as a graph: a node for every task and an edge leading from T_1 to task T_2 if T_2 cannot be performed before T_1. We may hope the graph is acyclic, otherwise there is no feasible schedule. If the graph is acyclic we have a directed acyclic graph, abbreviated DAG. The result of this section will be that for any DAG G it is possible to compute a numbering so that for any edge (u, v) of G the number of u is smaller than the number of G. The process of computing this numbering, and even the numbering itself, is called topological sorting. A topological sorting of G provides a correct execution order for the tasks. Thus, a directed graph can be topologically sorted if and only if it is acyclic.

Indegree-0 Algorithm

Which nodes can be savely numbered? A node with indegree equal to zero can be given number 0 without any risk. Furthermore, in a finite acyclic graph there is always at least one node with indegree zero. This can be seen as follows: consider the graph with inverted edges G'. Start at any node r and walk until reaching a node u without outgoing edges. Ultimately we will reach such a node, because due to the acyclicity we cannot walk in circles, and therefore after each step we reach a hitherto unreached node. Because the graph is finite, we cannot continue to reach unvisited nodes for ever. This u is a node with outdegree zero in G', which means that u has indegree zero in G.

The above idea immediately leads to an algorithm:

  1. Find a node u with indegree zero, give u the next number.
  2. Remove u and all its outgoing edges from the graph and continue with the reduced graph until there are no nodes left.
The correctness of this algorithm follows from the above observation and the fact that removing a node and its edges does not create cycles, and that thus the reduced graph is acyclic when the original graph was so.

It remains to find an efficient implementation of the above. The idea is to maintain for each node its current indegree in a separate array. The original values in this array can be computed in O(m). Then in one pass through the array all nodes with indegree zero are detected and entered in a bag. The nodes are taken out of the bag in arbitrary order. When removing an edge (u, v) the indegree of v is reduced by 1. In this way the whole algorithm has running time O(n + m). A nice feature of the algorithm is that it is not necessary to test beforehand whether the graph is acyclic or not: if the bag is empty before all nodes are numbered, then we know there is a cycle. If this does not happen, we know that apparently the graph must have been acyclic.

  boolean topologicalSort(int[] number) 
  {
    int[] degree = new int[n];
    for (v = 0; v < n; v++)
      degree[v] = 0;
    for (v = 0; v < n; v++)
      for (each neighbor w of v)
        degree[w]++;
    Bag b = new Bag(n);
    for (v = 0; v < n; v++)
      if (degree[v] == 0)
        b.add(v);
    counter = 0;
    while (b.notEmpty()) 
    {
      v = b.remove();
      number[v] = counter;
      counter++;
      for (each neighbor w of v)
      {
        degree[w]--;
        if (degree[w] == 0)
          b.add(w);
      }
    } 
    return counter == n; // Graph is acyclic
  }

Click here to see this algorithm integrated in a working Java program. The algorithm is efficient both considering time and memory consumption:

DFS-Based Algorithm

There is an alternative somewhat different method for computing a topological sorting. The idea is as follows:
  1. Construct the graph with inverted edges G'
  2. Perform a DFS traversal of G' computing the postorder numbers

Lemma: For a graph G, the postorder numbers of the graph with inverted edges G' constitute a topological sorting of G.

Proof: Assume there is an edge (u, v) in G. This implies that in G' there is an edge (v, u). So, if during the DFS on G', v is reached before u, then u will be reached as a descendant, either directly or indirectly, from v and therefore u will get a smaller postorder number than v, as it should be. If u is reached before v, then u gets a smaller postorder number than v unless v is reached as a descendant from u. However, this would mean that there is a path in G' from u to v. Together with the edge (v, u), this means that u and v lie on a common cycle. That is, G', and hence G, is not acyclic, a contradiction.

This algorithm immediately leads to an O(n + m) algorithm even though inverting the graph is slightly technical if we are working with an array-based implementation of the adjacency lists. Furthermore, this doubles the memory requirement.

Shortest Paths

One of the most important problems on graphs is computing distances. Distances are not only distances in meters, but may also be time, cost, ... . The problem has many variants, the most important being Surprisingly, in the worst-case the first problem is only marginally easier than the second though on many graphs the problem can be solved much faster. For the third problem the best algorithms do not perform substantially better than simply solving the second problem for all s (this is not true if the edges may have negative weights). Thus, we can focus on the single-source-shortest-path, SSSP, problem.

BFS-Based Algorithm

For unweighted graphs the SSSP problem is easy, it can be solved by BFS: the distance from s of a newly reached node is one larger than the distance of the current node from s. In code this looks as follows:
  void unweightedSSSP(int s, int[] dist) 
  {
    for (v = 0; v < n; v++)
      dist[v] = INFINITY; 
    Queue q = new Queue(n);
    dist[s] = 0;
    q.enqueue(s);
    while (q.notEmpty()) 
    {
      v = q.dequeue();
      for (each neighbor w of v)
        if (dist[w] == INFINITY) // w has not been visited yet
        {
          dist[w] = dist[v] + 1;
          q.enqueue(w);
        } 
    } 
  } 

The algorithm is so that nodes v that are unreachable from s have distance[v] = infinity, which appears to be reasonable. Click here to see it integrated in a working Java program.

BFS Tree

Here and in the following dist[u] denotes the distance values in the algorithm, while distance(u, v) denotes the real distance in the graph from node u to v. Because the length of the concatenation of two paths is the sum of the lengths of each path, the triangle inequality holds for all triples of nodes u, v and w:

distance(u, w) <= distance(u, v) + distance(v, w).

Using induction over the number of performed enqueue operations, it is easy to prove that at all times for all nodes dist[v] >= distance(s, v). Initially this is true. So, assume it is true after t enqueuing operations. If in step t + 1 we set dist[w] = dist[v] + 1, then we may assume that dist[v] >= distance(s, v), and thus, using the triangle inequality, dist[w] = dist[v] + 1 >= distance(s, v) + 1 = distance(s, v) + distance(v, w) >= distance(s, w). Thus, the computed values are not too small. It remains to prove that they are not too large. This is not as easy as one might think.

Lemma: The values dist[v] of the enqueued (dequeued) nodes is monotonically increasing.

Proof: We use induction over the number of performed enqueue (dequeue) operations. Initially the claim holds because any sequence of length at most 1 is monotonic. So, assume the claim holds for the first t operations. Then in operation t + 1 we are enqueuing a node w with value dist[w] = dist[v] + 1. Because v is the latest dequeued node, we may assume that dist[v] >= dist[u] for any earlier dequeued node u. But then dist[w] = dist[v] + 1 >= dist[u] + 1, the value of an arbitrary node on the queue. Here we essentially use that in a queue nodes that were enqueued earlier are also dequeued earlier (the FIFO order).

Theorem: At the end of the algorithm dist[v] = distance(s, v) for all v.

Proof: It remained to show that the values are not too large. The proof goes by contradiction. Assume that dist[w] > distance(s, w) for some w, and assume that w is the node lying closest to s which gets assigned too large a value dist[w]. Let u be the last node before w on a shortest path from s to w. So, distance(s, w) = distance(s, u) + 1. Because u lies closer to s than w, we may assume that u gets assigned the correct value: dist[u] = distance(s, u). Let v be the node which was responsible for enqueuing w. This implies dist[w] = dist[v] + 1. So, dist[v] = dist[w] - 1 > distance(s, w) - 1 = distance(s, u) = dist[u]. Thus, according to the previous lemma, node u will be dequeued before node v. Thus, w should have been enqueued by u instead of v, and we should have dist[w] = dist[u] + 1. This is a contradiction, which can be traced back to our assumption that there is a node w with dist[w] > distance(s, w).

In this kind of proofs it is very common to argue by contradiction, focusing on a supposed first occasion for which the algorithm makes a mistake. If there is no first mistake, then there is no mistake at all!

Dijkstra's Algorithm

If there are weights, then the simple queue-order processing of the elements is no longer correct. Particularly, it is not necessarily true that the first time one discovers a node this is along a shortest path: a path with 10 short edges may be shorter than a path with 5 long edges. However, a simple modification works. The algorithm is known under the name Dijkstra's algorithm. It is assumed that all edge weights are positive, or at least non-negative! If there are negative weights, the algorithm will run in the same time, but in that case the computed values only give an upper-bound on the distances.
  void weightedSSSP(int s, int[] dist) 
  {
    for (v = 0; v < n; v++)
      dist[v] = INFINITY; 
    PriorityQueue pq = new PriorityQueue(n, INFINITY);
    dist[s] = 0;
    pq.decreaseKey(s, dist[s]);
    while (pq.notEmpty()) 
    {
      v = pq.deleteMin();
      for (each neighbor w of v)
        if (dist[w] > dist[v] + weight[v, w]) // shorter path
        {
          dist[w] = dist[v] + weight[v, w];
          pq.decreaseKey(w, dist[w]);
        } 
    } 
  } 

In order to get a formulation without case distinctions, initially all nodes are inserted into the priority queue with infinite key value. This works correct even if some nodes may not be reachable from s: for these nodes the key value remains unchanged throughout the algorithm. As soon as all reachable elements are deleted from the priority queue, these are deleted and they are processed just as reachable nodes, but certainly this does not lead to improved distance values. So, the existence of unreachable nodes only causes some useless work.

At most n elements are enqueued and dequeued. At most m decreasekey operations are performed. It depends on the priority queue used how much time this takes. Using a binary heap, the construction can be performed in O(n) time and each decreasekey and deletemin in O(log n) time. So, with an ordinary heap Dijkstra's algorithm has running time O((n + m) * log n). Better priority queues (Fibonacci heaps) allow to perform the decreasekey operations in O(1) amortized time, reducing the running time to O(m + n * log n). Thus, for all m = Omega(n * log n), this is O(m), which is clearly optimal. Refined algorithms perform better for sparser graphs.

Click here to see the algorithm integrated in a working Java program. In this implementation the priority queue is realized in a primitive way using an array. This minimizes the memory consumption and allows to perform a decrease-key operation in constant time, but makes deletemins expensive: they take O(n) time. So, the running time of the whole algorithm is O(n^2 + m), which for all simple graphs is O(n^2). For dense graphs this is ok, but for all other graphs it is much better to use a priority queue with faster deletemins. In general, implementing a decrease-key operation may require an additional search tree to find the nodes, but because in this particular case all node indices lie between 0 and n - 1, it is sufficient to maintain an extra array pos[] of length n, pos[u] giving the position of node u in the priority queue (one more application of direct addressing).

The proof of correctness of the algorithm is similar to the proof of correctness for the unweighted case. First one shows that the nodes that are dequeued have increasing distances.

Lemma: The values dist[v] of the dequeued nodes is monotonically increasing.

Proof: We use induction over the number of performed dequeue operations. Initially the claim holds because any sequence of length at most 1 is monotonic. So, assume the claim holds for the first t operations. Let w be the node we are dequeuing in step t + 1. Let u be any node dequeued before. At the time u was enqueued using deletemin we must have had dist[u] <= dist[w]. So, consider possible updates to dist[w] after u was dequeued. Let v be the node which caused the latest update of dist[w]. In that case dist[w] = dist[v] + weight[v, w]. From our induction assumption it follows that dist[v] >= dist[u]. But then, dist[w] = dist[u] + weight[v, w] >= dist[u]. It is at this point that we essentially use that weight[v, w] >= 0. Otherwise this lemma cannot be proven!

Theorem: At the end of the algorithm dist[v] = distance(s, v) for all v.

Proof: As before it can be shown that the values of distance[] at all times give an upper bound on the distances: as long as they are infinity, this is clear, once they have a finite value, the value corresponds to the length of a path: there may be shorter paths, but this path can be used for sure. So, we may assume that dist[v] >= distance(s, v) for all nodes at all times.

Consider the node w lying closest to s, having smallest value of distance(s, w), which upon dequeuing has dist[w] > distance(s, w). If weight[v, w] = 0, and the shortest path from s to w runs through v, then we will nevertheless say that v lies closer to s than w. Let v be the last node before w on a shortest path from s to w. Thus, distance(s, w) = distance(s, v) + weight[v, w]. Because weight[v, w] >= 0, we have dist(s, v) <= dist(s, w), and thus v gets correct value, that is dist[v] = dist(s, v). So, dist[w] > distance(s, w) = distance(s, v) + weight[v, w] = dist[v] + weight[v, w] >= dist[v], and therefore, because of the previous lemma, v will be dequeued before w. But at that occasion the algorithm would have set dist[w] = dist[v] + weight[v, w] = distance(s, v) + weight[v, w] = distance(s, w), in contradiction with the assumption that distance[w] > dist[s, v].

In the following picture the action of Dijkstra's algorithm is illustrated. Edge weights are indicated along the edges, the current values of dist[] are indicated in the nodes. 99 stands for infinity. Nodes that have been removed from pq have final distance value. The (connected) subgraph with these nodes is marked.

Dijkstra's Algorithm

Finding Edges on Paths

One can easily keep track of the edges lying on the shortest paths during the algorithm, but it is just as easy to determine them afterwards as follows
  void findShortestPathEdges(int[] dist, int[] parent) 
  {
    for (all nodes w)
      parent[w] = -1;
    for (all edges (v, w))
      if (dist[w] == dist[v] + weight[v, w])
        parent[w] = v; 
  }

The routine is given for directed graphs. For undirected graphs (depending on the input format) it may be necessary to consider an edge (v, w) both for w and for v. In the current version parent[w] may be set several times if there are several paths of the same length from s to w. This may be prevented by performing the assignment only when parent[w] == w, but this does not make the routine faster.

This routine takes O(n + m) time independently of the type of graph, so these costs are negligible except for unweighted graphs. For weighted graphs it will certainly be more efficient to determine the edges by this separate procedure. But, even for unweighted graphs it may be profitable to perform a second pass over all edges: this reduces the amount of data that are handled in a single subroutine, and may therefore allow to hold a larger fraction of the graph in cache.

Afterwards every node has a unique predecessor, and the graph defined by parent[] is acyclic provided that there are no zero-cost cycles. So, the whole graph defined by parent[] constitutes a tree directed towards s, spanning all nodes reachable from s. In particular: independently of m, the set of all shortest paths has size n. Once parent[] has been computed, queries of the form "give the shortest path from v" can be solved using O(n) time and memory: start at v, push all edges on the path from v to s on a stack and print them while popping them.

It is not nice that the above algorithm does not handle zero-cost cycles correctly. This problem can most easily be overcome by an alternative algorithm which is only marginally slower than the given one: run a spanning-tree algorithm on the graph G' = (V, E'), where E' = {(v, w) in E| dist[w] == dist[v] + weight[v, w]}.

Exercises

  1. Describe how to convert a list of undirected edges into an adjacency-list representation.

  2. Write a class AdjacencyMatrix. It should have a constructor AdjacencyMatrix(int n), and posses methods void addEdge(int u, int v), void deleteEdge(int u, int v), int numberOfEdges() and boolean isEdge(int u, int v), which can be used to add an edge, delete an edge, aks for the number of edges and test whether an edge exists, respectively. All operations should run in O(1) time, independently of the number of nodes or edges. For a graph with n nodes, the whole data structure should require n^2 + O(n) bits. It is not correct to assume that booleans are realized as bits.

  3. Draw a directed graph with 14 nodes and 20 edges. Give an adjacency matrix and an adjacency list of the graph. Select an arbitrary start node and compute preorder and postorder numbers for all nodes. The edges of a node are considered in the order in which they appear in the adjacency list.

  4. A common operation on graphs is edge contraction. This means that two adjacent nodes are fused. The edge between them is eliminated and all edges that formerly where running towards / away from either of them is now running towards / away from the new node. The new node can be given the index of one of the two previous nodes. The other node may be left without any edges. More precisely, if the edge to contract is (u, v), then the tasks are to
    1. Find all edges (v, w) and to replace them by edges (u, w);
    2. Find all edges (w, v) and to replace them by edges (w, u);
    3. To not create self-loops;
    4. To not create multiple edges.

    Give pseudocode realizing this operation. Distinguish four cases: an adjacency-matrix representation and a representation with adjacency lists, both for directed and undirected graphs. Indicate the time consumption for each of them. The time bounds should be given in terms of n, the number of nodes; m, the number of edges; d_u, the degree of u; d_v, the degree of v; and d_G, the current degree of the graph G.

    For directed graphs it is hard to find the edges (w, v). But, choosing an appropriate graph representation requiring O(n + m) memory, a slightly more complicated algorithm can perform edge contraction efficiently: Point 1, 2 and 3 can be realized in O(d_v) time. Describe the necessary graph representation and the algorithm in full detail.

    Point 4 is somewhat harder. Give a trivial realization requiring O(d_v * d_G) time. Organizing the adjacency structures in a suitable way, this can be improved to O(d_v * log(d_G)). Describe how this can be achieved. Instead of changing the organization of the adjacency structures, we can also use some additional data structure, so that we reasonably can expect to perform this operation in O(d_u + d_v) time. Describe how this can be achieved.

  5. For unweighted graphs the connected-components and spanning-tree problem can be solved in O(n + m) time by modifications of the graph-traversal algorithm. In a theoretical sense these algorithms are optimal, but practically this may not be true: if we assume that n = 100,000 and m = 10 * n, then the graph does not fit into the cache and therefore a traversal will produce on the order of n cache faults. On the other hand, we can store one integer per node in the cache. Therefore it is attractive to perform in cases like this a simplified version of Kruskal's algorithm: using a union-find data structure, all edges are considered in the order they are stored and two sets are unified whenever they are connected by an edge. Work this idea out to an algorithm in pseudocode. What is the complexity of this algorithm using the best implementation of the union-find ADT? How do you estimate the performance in practice will be in comparison with a traversal-based algorithm?

  6. The topic of this exercise are the properties of acyclic undirected graphs and an algorithm to test acyclicity.
    1. What kind of graph structure does an acyclic connected undirected graph have? For a graph with n nodes, how many edges does such a graph have?
    2. What structure does an arbitrary acyclic undirected graph have? Give a concise and general expression for the number of edges m in terms of the number of nodes n and a third parameter. Prove the general correctness of your expression.
    3. Suggest a simple algorithm for testing whether an undirected graph is acyclic exploiting the given expression. How can you assure that for any graph the running time is bounded by O(n)?

  7. Write a non-recursive version of the DFS algorithm requiring a stack whose size is bounded by O(n). Compare the performance with that of the presented non-recursive algorithm which can be downloaded here. Test graphs with n = 100,000 and m = k * n, for k = 1, 2, 4, 8, 16.

  8. For a directed graph G, several of the presented algorithms required the construction of the graph with inverted edges G'. Assume that for G we use an array-based implementation of the adjacency lists: there is one array f[] of length n + 1 and an array a[] of length m so that the neighbors of node u stand in a[] between (inclusively) f[u] and (exclusively) f[u + 1]. Construct from this the corresponding arrays b[] and g[] giving the adjacency lists of G'. The algorithm should run in O(n + m) time and it should only use O(1) memory beyond the four mentioned arrays. Now try to improve the algorithm: rewrite it so that it becomes semi-insitu. Here an algorithm solving a graph problem is called semi-insitu if it uses m + O(n) storage.

  9. Consider a directed acyclic graph of a special form: there is only one node s, called source, without ingoing edges and only one node t, called sink, without outgoing edges. For an edge (u, v), let N(u, v) denote the number of different paths from s to t containing (u, v). The task is to present an efficient algorithm for computing N(u, v). Notice that N(u, v) = N^-(u) * N^+(v), where N^-(u) denotes the number of different paths from s to u and N^+(v) denotes the number of different paths from v to t. Define Pred(u) = {x | (x, u) is an edge} and Suc(v) ={x | (v, x) is an edge}.
    1. Give recursive expressions of N^-(u) and N^+(v) in terms of Pred(u) and Suc(v). Do not forget to treat the special cases u == s and v ==t.
    2. Using these expressions, give a recursive algorithm for computing N(u, v).
    3. Analyze the time consumption of your algorithm.
    4. Processing the nodes in the right order, the value of N^-(u) can also be computed by a non-recursive algorithm. In which order should the nodes be processed?
    5. Outline a complete non-recursive algorithm for computing the values N(u, v) for all edges (u, v) in linear time.

  10. Consider the single-source-shortest-path problem on weighted graphs. The standard algorithm uses a priority queue, which implies some extra overhead (unstructured memory access, extra information for finding the nodes). One might consider alternatives. How about the following (Kruskal's minimum-spanning-tree algorithm):
    1. Sort all edges according to edge weight;
    2. Consider the edges in the order of increasing edge weight, an edge is added to the set of edges if it does not create a cycle.
    Is this algorithm correct? That is, does the shortest path from a node s to a node t always run over the edges of the constructed tree?

  11. Consider an arbitrary tree T with n nodes. Assume that there is a specified root node r and that for any node u in the tree the roots of the subtrees of u can somehow be accessed. Outline an algorithm for finding a separator of the tree. By separator we mean a node s so that removing s from T decomposes it into 2 or more components, all of them with size at most n / 2. Your algorithm must have O(n) running time.

  12. Consider the program implementing Dijkstra's algorithm. Design a method testDistances(int[] dist) to be added to the class Graph, which for a given graph tests in O(m) time that the computed distances are correct. In general for any non-trivial programming task it is a very good idea to add independent test routines whenever possible. In this case the task consists of two parts: testing that the computed distances are not too large and testing that they are not too small. Hint: for the second test it is handy to first determine the tree consisting of all edges lying on a shortest path.

  13. Test the program implementing Dijkstra's algorithm. Run experiments for n = 2^k for k = 12, 13, 14, 15 and m = l * n, for l = 10, 100, 1000. Fit a function involving four parameters through the measured values so that all time measurements are reasonably predicted. Do the experiments conform with the theoretical analysis?

    Now take the given heap implementation and complement it with a method decreaseKey(int i, int x) based on percolating up along the lines sketched in the text above. Replace the priority queue in the program testing Dijkstra's algorithm by this heap implementation.

    Repeat the experiments for the same pairs of n and m values. Compare the times. Fit a suitable function through your results. Do the experiments conform with the theoretical analysis?

  14. It was suggested how to solve the problem of finding the tree T giving all shortest paths in a weighted graph G = (V, E) by running a spanning tree algorithm on a reduced unweighted graph G' = (V, E'), where E' = {(v, w) in E| dist[w] == dist[v] + weight[v, w]}. Prove the correctness of this algorithm. That is, prove that the unique path in T from s to a node v indeed gives a shortest path from s to v.

    Actually the edges of T can be computed without explicitly constructing G'. Give an algorithm which, in addition to the memory required for storing G, essentially only requires memory for the array parent[].


This page was created by Jop Sibeyn.
Last update Monday, 07 March 05 - 12:35.
For any comments: send an email.