As html is not suited for mathematical formulas, some additional notation is used (as used in the typographical package Latex). a_i denotes a with subscript i. a^i denotes a to the power i. <= and >= are used as in most computer languages. Curly brackets, "{" and "}", are use to group things. sum_{i = 0}^j denotes the sum for i running from 0 to j. The same may also be written sum_{0 <= i <= j}. ~= stands for "approximately equal" and ~ for "proportional to". Greek letters are written out. For logical expressions we either use the notation from C or the operators are written out in text. So, "a && !b" is the same as "a and not b". sqrt stands for the square root function and log without specifying the base number for the logarithm to basis 2. There are a few more notations, but they should be understood easily. New notions are printed bold there where they are defined. Inside the chapters particularly important ideas and short key notes are highlighted without introductory text. There are many pictures. These are intended to be self-explanatory. Typically they are placed just after the text fragment to which they belong, generally there is no direct reference to them in the text.
There are very few references to the literature. Clearly most of the presented material is not new. A large part is even common knowledge. Next to several other lecture notes, the following book has been used as a source:
Notice: the following text may be overcomplete. At the examination it is expected that the students only know all that has been presented during the lectures.
In the course of the lecture we will encounter the following data structures:
Algorithms can be formulated in many ways. Consider the problem of finding the maximum from a set of n numbers which is stored in an array a[]. The algorithm can be formulated as follows:
Start by initializing a variable, which we will call M hereafter, with the value of a[0]. Then check the numbers a[i] starting from a[1] and going all the way to a[n - 1] in this order. Each of these numbers is compared with the current value of M, and if a[i] happens to be larger than M, then a[i] is assigned to M.This is definitely an algorithm. All the important details are mentioned. This is about how a (good) cooking recipe might be given or how one may be instructed to repair a punctured tire.
However, in computer science texts, it is common to use a slightly more formalized way of expression, using pseudo-code. Pseudo-code intermixes common program-language-like fragments with fragments in natural language. This is convenient, because this allows to express loops and tests more clearly. On the other hand, there is no need to declare variables. So, we might also write:
function maximum(array a[] of length n)
begin
if (n equal to 0) then
return some special value or generate an exception
else
M = a[0];
for (all i from 1 to n - 1) do
if (a[i] > M) then
M = a[i];
return M;
fi;
end.
In such a more formal notation one might also more easily detect that
we should also handle the case of an empty set correctly. In this
simple case, we could just as well have written C or Java code, but
in general this may distract from the essentials of the algorithm.
Actually all these notions should be written with the "element-of symbol", so one should say "T(n) = 6 * n^2 + 2 * n^3 + 345 is an element of O(n^3)", but it is very common to use the equality symbol. Nevertheless one should not be fooled by this: T(n) = O(f(n)) does not imply that O(f(n)) = T(n) (this is not even defined) or that f(n) = O(T(n)) (which might be true, but which does not need to be true).
By far the most common of these symbols is O(). This symbols gives us an upper-bound on the rate of growth of a function, allowing us to make an overestimate. In the chapter on union-find we will give an analysis of the time consumption and we will conclude that certain operations can be performed very efficiently, stating T(n) = O(f(n)) for some function f() which is almost constant. However, the actual result is even sharper, even in an asymptotic sense. Most other algorithms and data structures we study in this lecture are so simple that the time consumption of operations can be specified exactly. In that case we might even write T(n) = Theta(f(n)), but because it is typically the upper bound which interests us most, we will even in these cases mostly use O().
It is easy to show, just arguing formally, that if T_1(n) = O(f_1(n)) and T_2(n) = O(f_2(n)), that then T_1(n) + T_2(n) = O(max{f_1(n), f_2(n)}). As an example we will prove this. Other relations can be proven similarly. We know T_1(n) < c_1 * f_1(n) for all n > n_1, and T_2(n) < c_2 * f_2(n) for all n > n_2. This implies T_1(n) + T_2(n) < (c_1 + c_2) * max{f_1(n), f_2(n)}, for all n > max{n_1, n_2}.
Common terminology:
It happens, even in the literature, that people mix-up Omega, Theta and O. You will see that O is used where clearly one of the other two is understood. The most common usage of the above notation is that one tries to express the time complexity of an algorithm consisely as O(f(n)) for some suitable and simple function f(). For an expression f(n) + g(n), f(n) is said to be the leading term and g(n) a lower-order term when g(n) = o(f(n)). Typically the time complexity involves a number of contributions. In that case only the leading term should be retained, while the lower-order terms are scratched away. The leading term should be given without constants. For example, 23 * n * log + 2 * n^2 + 100 * n, is normally written as O(n^2). In some contexts it may make sense to specify the leading constant, the constant with which the leading term is multiplied. Then this same result can be given as 2 * n^2 + o(n^2), as (2 + o(1)) * n^2, or as 2 * n^2 + O(n * log n).
In order to determine the leading term, it is handy to know a few rules which all can be proven by using the definition above:
The last two relations can be proven conveniently using l'Hopital's rule. This rule states that for differentiable functions f() and g(), with lim_{n -> infinity} f(n) = infinity and lim_{n -> infinity} g(n) = infinity, lim_{n -> infinity} f(n) / g(n) = lim_{n -> infinity} f'(n) / g'(n). We now prove the fourth relation, leaving the last as an exercise. lim_{n -> infinity} log^c(n) / n = 0 <=> (lim_{n -> infinity} log^c(n) / n)^{1/c} = 0 <=> lim_{n -> infinity} log(n) / n^{1/c} = 0. The last transition is not trivial but may be found in textbooks on analysis. To prove this last we use l'Hopital's rule. First we notice that both f(n) = log(n) and g(n) = n^{1/c} are differentiable for n > 0 and go to infinity with n. f'(n) = 1 / n and g'(n) = 1/c * n^{1/c} / n. So, f'(n) / g'(n) = c / n^{1/c}. For any constant c, lim_{n -> infinity} f'(n) / g'(n) = 0, and therefore we conclude lim_{n -> infinity} f(n) / g(n) = 0, as was to be shown.
One should be aware that this model is only a coarse approximation of the reality that holds for modern computer systems. Of course, at least in theory, this is dealt with by the O() assumption: all operations are constant time, but the constants may be quite considerable. The most important aspect is the non-uniform cost of memory access. The following picture gives a very high-level view of a modern computer system.
Other factors may, however, be equally important. The main other factor is space. There are examples of problems that can be solved faster if we are willing to use more space. So, in that case we find a space-time trade-off. A trivial (but extreme) example is to create a dictionary for all words of at most five letters with 27^5 storage. After initialization, we can perform insertions, deletions and look-ups in constant time. Actually, applying the idea of virtual initialization (treated in one of the later chapters), the prohibitively expensive initialization does not need to be performed.
A further point that should be stressed here, is that when evaluating the performance of an algorithm, we consider the time the algorithm might take for the worst of all inputs. That is, we perform a worst-case analysis. Thus, if we state "this sorting algorithm runs in O(n * log n)", we mean that the running time is bounded by O(n * log n) for all inputs of size n. If on the other hand we say "this sorting algorithm has quadratic running time", we do not mean that it takes c * n^2 time for all inputs of size n, but that there are inputs (maybe not even for all values of n) for which the algorithm takes at least c * n^2 time for some constant c. This type of analysis is by far the most common in the study of algorithms, but occasionally one gets the feeling that this does not reflect the "real" behavior in practice. Maybe there are only some rare very artificial instances for which the algorithm is slow. In that case one may also perform some kind of average case analysis. The problem is to define what the average case is. For example, when sorting n numbers, can one assume that all n! arrangements are equally frequent? In the real practice there may be a tendency to have some kind of clustering. The strong point of a worst-case analysis is that it leaves no room for such discussions. Furthermore, one can easily find many contexts, where one wants guaranteed performance.
The mathematics in this lecture will mostly be of the elementary type, but we will also encounter a little bit of probability theory and we will deal with graphs, but not in a very theoretical way.
log_a n = log_b n / log_b a, for all positive a, b and n.This relation is mostly used with b = 2, showing that for any constant a > 0, log_a n = O(n).
(1 + x)^a ~= 1 + a * x,These are so-called first-order Taylor approximations, which are accurate only for x close to zero (positive or negative). All of the above approximate relations can be made more accurate by using the second-order Taylor approximations:
1 / (1 + x) ~= 1 - x,
log_e (1 + x) ~= x,
e^x ~= 1 + x,
sqrt(1 + x) ~= 1 + x / 2.
(1 + x)^a ~= 1 + a * x + a * (a - 1) * x^2 / 2,The size of the second-order term gives insight on how accurate the first-order estimate is. These first- and second-order approximations are particularly useful for obtaining inequalities. For example, 1 + x / 2 - x^2 / 8 <= sqrt(1 + x) <= 1 + x / 2, for all x >= 0.
1 / (1 + x) ~= 1 - x + x^2,
log_e (1 + x) ~= x - x^2 / 2,
e^x ~= 1 + x + x^2 / 2,
sqrt(1 + x) ~= 1 + x / 2 - x^2 / 8.
(1 - 1 / n)^n < 1 / e < (1 - 1 / (n + 1))^n, for all n >= 1.In particular this shows that (1 - 1 / n)^n converges to 1 / e for large n. In most applications the expression to estimate is slightly less beautiful. It is common to have a relation like (1 - a / n)^{b * n}. Substituting m = n / a, gives (1 - a / n)^{b * n} = (1 - 1 / m)^{a * b * m} = ((1 - 1 / m)^m)^{a * b} < 1 / e^{a * b}. The notation suggests that a and b are constants, but this is not essential.
n! ~= sqrt(2 * pi * n) * (n / e)^n.The above formula, which is known as Stirling's Formula, is quite exact. As a consequence, n! = Theta(sqrt(n) * (n / e)^n). Essentially n! grows as (n / e)^n. Another consequence is that log_2(n!) ~= log_2((n / e)^n) = n * (log_2 n - log_2 e) ~= n * log_2 n - 1.44 * n.
(n / k)^k <= (n over k) <= (n * e / k)^k.This estimate is strongly related to Stirling's formula: (n over k) = n! / ((n - k)! * k!) ~= sqrt(2 * pi * n) * (n / e)^n / (sqrt(2 * pi * (n - k)) * ((n - k) / e)^(n - k) * sqrt(2 * pi * k) * (k / e)^k) = sqrt(n / (2 * pi * (n - k) * k)) * 1 / (1 - k / n)^{n - k} * (n / k)^k. For small k, this essentially grows as (n / k)^k, for large k, we must also take into account the factor 1 / (1 - k / n)^{n - k}. Using m = n / k, we get 1 / (1 - k / n)^{n - k} = 1 / (1 - 1 / m)^{k * m - k)} = 1 / (1 - 1 / m)^{m * k * (1 - 1 / m)} ~= e^{k * (1 - 1 / m)}.
sum_{k = 0}^n (n over k) = 2^n, for all n >= 0.This tells that the sum of the values in row n of Pascal's triangle equals 2^n. Realizing that (n over k) gives the number of ways to select k numbers out off n numbers, this expression is equivalent to the statement that in total there are 2^n possible ways to flip n coins. From this it immediately follows that (n over k) <= 2^k, for all n >= 0 and 0 <= k <= n. It follows also that there are values of k for which (n over k) >= 2^n / n.
sum_{k >= 0} 1 / k! = e.This goes back on the fact that the derivative of e^x is e^x itself again. A consequence of the above is the slightly surprising fact that sum_{k >= 0} k / k! = sum_{k >= 1} k / k! = sum_{k >= 1} 1 / (k - 1)! = sum_{k >= 0} 1 / k! = e. So, the value is the same (analogously, it follows that sum_{k >= 0} k * (k - 1) * ... * (k - l) / k! = e for any constant l). This can be used in the following computation sum_{k >= 1} (k - 1) / k! = sum_{k >= 1} k / k! - sum_{k >= 1} 1 / k! = e - (e - 1) = 1. Here we used that under conditions which are mostly satisfied for the functions we are considering, the limit of a sum equals the sum of the limits.
What does proving something by induction mean?
We now apply this proof method to the formula above. First we notice that sum_{i = 0}^0 a^i = 1 = (1 - a^{0 + 1}) / (1 - a). This gives our base case. Now assume that we have proven the equality for some arbitrary n >= 0. Then we can write
sum_{i = 0}^{n + 1} a^i =def of sum=
sum_{i = 0}^n a^i + a^{n + 1} =induction hypothesis=
(1 - a^{n + 1}) / (1 - a) + a^{n + 1} =computation=
(1 - a^{n + 2}) / (1 - a).
So, using the claimed property for n, we have proven it for n + 1.
Together with the basis, the axiom of proving by induction now allows
to conclude that the property holds for all n >= 0.
In a similar way many more of these relations can be proven. Often one needs a small twist that is not entirely obvious. For example,
The above arguments suggest that f(n) ~= 1/2 * n^2. What is the exact value? A reasonable guess is that f(n) == g(n) for some polynomial of degree two g(n) = 1/2 * n^2 + b * n + c. We know that f(0) = 0 and that f(1) = 1. So, if f(n) == g(n), then we must have g(n) = 1/2 * n^2 + 1/2 * n + 0 = 1/2 * n * (n + 1). Notice that we do not yet claim that this is true: we have only explained how one can guess an expression and assuming its correctness determine the occurring parameters. Proving that indeed f(n) == g(n) for all n can be done with induction. The base case, n == 0, is ok because f(0) == sum_0^0 0 == 0 == 1/2 * 0 * (0 + 1) == g(0). So, assume f(n) == g(n) for some n >= 0 and all smaller values, then the following completes the proof:
f(n + 1) =def of f=
sum_{i = 0}^{n + 1} i =def of sum=
sum_{i = 0}^n i + (n + 1) =induction hypothesis=
1/2 * n * (n + 1) + (n + 1) =computation=
1/2 * (n + 1) * (n + 2) =def of g=
g(n + 1).
More generally, you should know that similar formulas exist for f_k(n) = sum_{i = 0}^n i^k, for all finite k. Again approximating the sum by an integral, we find f_k(n) ~= n^{k + 1} / (k + 1). Guessing that this constitutes the leading term in a polynomial of degree k + 1, the expression can be derived: in general, a polynomial of degree d is uniquely determined when we know its value in d + 1 points: using these values, the coefficients can be determined by solving a system with d + 1 unknowns and d + 1 equations. For f_k we can take the values f_k(0), ..., f_k(k).
This proof method can also be used to prove that any algorithm for determining the maximum element in an array a[] with n elements must inspect all n positions of the array. Assume the opposite, so assume there is a correct algorithm which does not inspect all positions for a certain input array a[]. Let a[i] be a position which is not inspected. Let m be the value which is output by the algorithm. If this is not the maximum the algorithm is not correct for the given array. Otherwise, we change a[] by setting a[i] = m + 1. Running the algorithm on the modified array will output the same value m, because position i is not inspected and for the algorithm there is thus no way to notice the difference. But m is not the maximum of the modified array, showing that the algorithm is not correct for all inputs. We conclude that there is no correct) maximum algorithm not inspecting all positions and that thus any algorithm must inspect all positions of the array at least once. This implies that any maximum algorithm has running time Omega(n). At the same time, the simple algorithm presented at the beginning of this chapter has running time O(n) because it consists of a loop which is executed n times while each pass of the loop takes O(1) time. So, the running time is n * O(1) = O(n). By the above argument we now know that there is very little space for improvements, up to constant factors this algorithm is optimal. The proven lower bound was not for the worst-case running time of a particular algorithm, but for all possible algorithms. That is, we have shown that the time complexity of the problem of computing the maximum of an array with n elements is Omega(n), and because there is an algorithm with matching time complexity, we can say that the complexity of maximum computation is Theta(n). There are not many problems for which matching upper and lower bounds are known.
0! = 1,
n! = n * (n - 1)!, for all n > 0.
The general idea of a recursive definition of a function f is that
Clearly there is a problem here: one should assure that for all values n, the recursion terminates. That is, one should make sure that expanding the recursion, going from f(n) to f(n') to f(n''), ..., one ultimately reaches one of the base cases.
Not only functions can be recursive, but also programs and algorithms (in the light of the existence functional programming this is not surprising). In a recursive algorithm, it is fixed how some special cases can be treated and for all other cases it is told how they can be solved, using the solution of some other cases. As for recursive functions, the problem is to assure that for any possible input ultimately a base case is reached. That is, one should make sure that for any finite input the computation reaches one of the base cases in a finite number of steps and thus terminates.
A problem which can very well be solved in a recursive way is sorting. The task is to design an algorithm that sorts sets of numbers. Let n denote the cardinality (the number of elements) of the set to sort. As base case we take n == 1. In that case the algorithm simply outputs the single element, constituting a sorted set of size 1. For any n > 1, it performs the following steps:
The correctness of recursive algorithms is proven by induction. Mostly this is an induction over the size of the input. This works in the case of our sorting algorithm: for sets of size 1 the correctness is obvious. This is the basis of our inductive proof. So, let us assume the algorithm is correct for all sets of size n and smaller, for some n >= 1. Consider a set of size n + 1. The algorithm splits this set in S_0, S_1 and S_2. S_0 and S_2 have size at most n, and thus we may assume by the induction hypothesis, that they are correctly sorted. But then the whole algorithm is correct, because all values in S_0 are strictly smaller than all those in S_1, which in turn are strictly smaller than all those in S_2.
As an example we consider the time consumption of the recursive sorting algorithm presented above. We can estimate
T(n) == T(n_0) + T(n_2) + c * n,for some constant c, where n_0 = |S_0| and n_2 = |S_2|. The reason behind this estimate, is that splitting a set as specified can somehow be performed by considering all elements once and then putting them in the right bag. In any reasonable implementation, this takes linear time, covered by the term c * n. Sorting S_0 takes T(n_0) and sorting S_2 takes T(n_2). As long as we do not spend too long on selecting x, all other operations take constant or at most linear time.
In this case the problem is that we do not know the values of n_0 and n_2, because these depend on the selected item x. The only thing we know is that n_0 + n_2 <= n - 1. In a later chapter this algorithm is considered again. Here we will only analyze its worst-case. The worst that can happen, is that we split the set very unevenly, for example by selecting the largest element. In that case n_0 == n - 1 and n_2 == 0. If this unlucky situation happens again and again, then the time consumption is given by the following recurrence relation:
T(1) = c,Here c is chosen so large that the expressions on the right actually give an upper bound on the time consumptions on the left. Expanding the recurrence relation several steps, one soon discovers that T(n) == f(n) for f(n) == sum_{i = 1}^n (c * i).
T(n) = T(n - 1) + c * n, for all n > 1.
The formal proof that T(n) == f(n) goes by induction. The basis of the induction is given by T(1) = c = f(1). Assuming that T(n) == f(n) for some value n and all smaller values, it follows that
T(n + 1) =def of T()= T(n) + c * (n + 1) =induction assumption= f(n) + c * (n + 1) =def of f(n)= f(n + 1).But now we can use our knowledge on the sum of a arithmetic progression to conclude that for this function T(), T(n) = c/2 * n * (n + 1) = Theta(n^2).
As soon as one believes to know the solution of a recurrence relation, it is normally not hard to verify the correctness of this believe. However, in general, it may be very hard to find such solutions. There are mathematical methods which allow to solve several classes of common recurrence relations. Here we will not discuss these methods (which are similar to solving differential equations). Rather we list some of the most common types for reference purposes.
Here a, b and c are arbitrary positive constants, mostly integers. For all recurrences we assumed that T(1) = c. The last recurrence is the worst: if we are trying to solve a problem of size n by reducing it to two subproblems of size n - 1, then the get four subproblems of size n - 2, ..., and finally 2^n subproblems of size 1. Fortunately, none of the algorithms presented in this course are working so inefficiently, but there are many problems, for which such a behavior cannot be excluded.
T(n) <= T(alpha * n) + c T(n) <= c * log_{1/alpha} n, for alpha < 1 T(n) <= T(alpha * n) + c * n T(n) <= c * n / (1 - alpha), for alpha < 1 T(n) <= T(n') + T(n - n') + c T(n) <= c * (2 * n - 1), for all 0 < n' < n T(n) <= T(alpha * n) + T((1 - alpha) * n) + c * n T(n) <= c * n * log_{1 / alpha} n, for 1/2 <= alpha < 1 T(n) <= T(n - a) + c T(n) <= c * n / a T(n) <= T(n - a) + c * n T(n) <= c * n^2 / a T(n) <= a * T(n / b) T(n) <= n^{log_b a} T(n) <= a * T(n - 1) + c T(n) <= c * (a^{n - 1} - 1) / (a - 1)
boolean linear_search(int* a, int n, int x) {
/* Test whether x occurs in the array a[] of length n. */
int i;
for (i = 0; i < n && a[i] != x; i++);
return i < n; }
This algorithm is clearly correct and takes O(n) time: each pass
through the loop takes constant time, and the loop is traversed at most
n times. For the general case, this is optimal, because if one is
looking for a number x, one cannot conclude that x does not
occur before having inspected all n numbers. Each inspection takes at
least a constant amount of time, so Omega(n) is a lower bound on the
set-membership problem for sets without special structure. Thus, in
that case the complexity of this problem is Theta(n), because we have
matching upper and a lower bounds.
On the other hand, if the elements stand in sorted order, that is a[i] <= a[i + 1], for all 0 <= i <= n - 2, this lower-bound argument does not apply: as soon as one finds an i so that a[i] < x and a[i + 1] > x, then there is no need to inspect any further number, so, we cannot argue that all numbers must be inspected in case x does not occur. And indeed, for sorted arrays, there is a much faster method for testing membership:
boolean binary_search_1(int* a, int n, int x) {
/* Test whether x occurs in the array a[] of length n. */
int i;
while (n > 0) {
i = n / 2;
if (a[i] == x)
return true;
if (a[i] < x) { /* Continue in right half */
a = &a[i + 1];
n = n - i - 1; }
else /* Continue in left half */
n = i; }
return false; }
The time consumption of this algorithm follows from the following claim: after k passes of the loop, i_hgh - i_low <= n / 2^k, for all k >= 0. To prove this claim is left as an exercise. This claim implies that after log n passes, the difference becomes smaller than 1. Because at the same time we know that these numbers are integral, they must be equal. So, after log n steps or less we will either have found x, or we can conclude that x does not occur. Because each pass of the loop takes constant time, the time consumption of the complete algorithm is bounded by O(log n). Under the assumption that the algorithm works by performing comparisons and that in constant time at most constantly many numbers can be compared, O(log n) is optimal. We do not prove this here, the topic of lower bounds is addressed again in the chapter on sorting.
The above algorithm also has a very simple recursive formulation:
boolean binary_search_2(int* a, int n, int x) {
/* Test whether x occurs in the array a[] of length n. */
int i;
if (n > 0) {
i = n / 2;
if (a[i] == x)
return true;
if (a[i] < x)
return binary_search_2(&a[i + 1], n - i - 1, x);
return binary_search_2(a, i, x); }
return false; }
The three subroutines for searching an element in an array integrated in a running C program can be downloaded here. Here the array a[] is initialized with increasing random numbers. On average every second number occurs. Testing for increasing values of n clearly shows that the two versions with O(log n) running time are always ready immediately, while for large n linear search starts to take noticeable time.
// In the following we have as invariant that at the
// beginning of each pass through the loop c == x^i,
// so in the end c = x^n.
for (c = 1, i = 0; i < n; i++)
c *= x;
Assuming that all the multiplications can be performed in unit time, this algorithm has complexity O(n). However, we can do this much faster! Supposing, for the time being, that n = 2^k, the following is also correct:
// In the following we have as invariant that at the
// beginning of each pass through the loop c == x^i,
// so in the end c = x^n.
for (c = x, i = 1; i < n; i *= 2)
c *= c;
Here the number of passes through the loop is equal to the number of
times we must double i to reach n. That is exactly k = log_2 n times.
This algorithm is of the same type as binary search: there is some
notion of repeated halving/doubling, which leads to logarithmic time,
whereas doing the operation in a linear way gives linear time.
Now, we consider the general case. Assume that n has binary expansion (b_k, b_{k - 1}, ..., b_1, b_0). Then we can write n = sum_{i = 0 | b_i == 1}^k 2^i. So, x^n = x^{sum_{i = 0 | b_i == 1}^k 2^i} = prod_{i = 0 | b_i == 1}^k x^{2^i}. If we now first perform the above computation and store all intermediate c values in an array of length k, then x^n can be computed from them with at most log n additional multiplications and a similar number of additions. That is, the whole algorithm has running time O(log n). Actually it is not necessary to store the c-values: the final value can also be computed by taking the interesting factors when they are generated. The complete routine may look as follows:
int exponent_1(int x, int n)
{
int c, z;
for (c = x, z = 1; n != 0; n = n >> 1)
{
if (n & 1) /* n is odd */
z *= c;
c *= c;
}
return z;
}
It is a good idea to try how the values of z, c and i develop for x =
2 and n = 11.
A slightly different idea works as well. The idea is to start from the top-side: x^99 = x * x^98, x^98 = x^49 * x^49, x^49 = x * x^48, x^48 = x^24 * x^24, x^24 = x^12 * x^12, x^12 = x^6 * x^6, x^6 = x^3 * x^3, x^3 = x * x^2, x^2 = x * x. This idea can be turned into code most easily using recursion:
int exponent_2(int x, int n)
{
if (n == 0) /* terminal case */
return 1;
if (n & 1) /* n is odd */
return x * exponent_2(x, n - 1);
return exponent_2(x, n >> 1) * exponent_2(x, n >> 1);
}
As usual, the recursive algorithm is easy to understand and its correctness is obvious, while the iterative algorithm was rather obscure. How about the time consumption? Check what happens for n = 32. Formally the time consumption can be analyzed by writing down a recurrence relation. For numbers n = 2^k for some positive k, the time consumption T(n) is given by
T(1) = c_2,The solution of this is given by T(n) = (c_1 + c_2) * n - c_1. Once this relation has been found somehow, for example by intelligent guessing after trying small values of n, it can be verified using induction. So, define the function f() by f(n) = (c_1 + c_2) * n - c_1. Then T(1) = c_2 = (c_1 + c_2) * 1 - c_1 = f(1). This gives a base case. Now assume the relation holds for some n. Then we get
T(n) = 2 * T(n / 2) + c_1, for all n > 1.
T(2 * n) =def T()= 2 * T(n) + c_1 =induction assumption= 2 * f(n) + c_1 =def f()= 2 * ((c_1 + c_2) * n - c_1) + c_1 =computation= (c_1 + c_2) * (2 * n) - c_1 =def f()= f(2 * n).Thus, assuming the equality for n = 2^k, we can prove it for 2 * n = 2^{k + 1}. Because it also holds for n = 1 = 2^0, it holds for all n which are a powers of two.
So, the running time of exponent_2 is at least linear! What went wrong? The problem is that we are recursively splitting one problem of size n in two subproblems of size n / 2. At the bottom of the recursion this inevitably leads to a linear number of subproblems. For other problems this may be inevitable, but here there is an easy solution:
int exponent_3(int x, int n)
{
int y;
if (n == 0) /* terminal case */
return 1;
if (n & 1) /* n is odd */
return x * exponent_3(x, n - 1);
y = exponent_3(x, n >> 1);
return y * y;
}
Algorithm exponent_3 performs the same number of multiplications as exponent_1 (the exact analysis is left as an exercise). Nevertheless, even though the difference will not be large, it will be somewhat slower because every recursive step means that the whole state vector must be pushed on the stack.
Let us now assume that the time for the multiplications increases with the size of the number. This is reasonable, because unless the numbers are small, c^n will soon become a very large number. In order to get an easy comparison, we assume that multiplying an n_1-digit number and an n_2-digit number costs O(n_1 * n_2). Under this assumption, the conventional algorithm takes
sum_{i = 0}^{n - 1} log c * log c * i = O(log^2 c * n*2).
The cost of the improved exponentiation can be estimated as
sum_{i = 0}{log n - 1} (log c * 2^i)^2 = O((log c * n)^2).
So, even though we have reduced the number of products to compute
from linear to logarithmic, we have not gained much when we look at
the time for the whole computation because it are the last few
products that dominate the cost.
If we are computing expo_mod, that is (c^n) mod m, then the situation is much better: generally we have (a * b) mod m = ((a mod m) * (b mod m)) mod m, and thus we can compute modulo after each multiplication: the numbers do not grow beyond the size of m, and therefore, all products cost the same (possibly with the exception of the first few). So, for expo_mod, the reduction of the number of the products gives the performance one would hope to achieve.
83415 * 61298 =
6 * 83415 shifted left 4 positions +
1 * 83415 shifted left 3 positions +
2 * 83415 shifted left 2 positions +
9 * 83415 shifted left 1 positions +
8 * 83415 shifted left 0 positions
How long does this take? When multiplying two n-digit numbers, there are n multiplications of a 1-digit number with an n-digit number, n shifts and n additions. Each such operation takes O(n) time, thus the total time consumption can be bounded by 3 * n * O(n) = O(n^2). Clever tricks may reduce the time in practice quite a bit, but this algorithm appears to really need Omega(n^2). This quadratic complexity is precisely the reason that it is so tedious to multiply two 4-digit numbers. There is an alternative method. It is a pearl of computer science, surprisingly simple and, for sufficiently long numbers, considerably faster, even in practice.
x * y = (x_1 * 10^m + x_0) * (y_1 * 10^m + y_1)
= x_1 * y_1 * 10^{2 * m} + x_1 * y_0 * 10^m + x_0 * y_1 * 10^m + x_0 * y_0.
This formula suggests the following recursive algorithm:
superlong prod(superlong x, superlong y, int n) {
/* add(x, y) adds x to y,
shift(x, n) shifts x leftwards n positions */
if (n == 1)
return x * y /* Product of ints */
if (n is odd)
add a leading 0 to x and y and increase n by 1;
compute x_1, x_0, y_1, y_0 from x and y;
xy_11 = prod(x_1, y_1, n / 2);
xy_10 = prod(x_1, y_0, n / 2);
xy_01 = prod(x_0, y_1, n / 2);
xy_00 = prod(x_0, y_0, n / 2);
xy = xy_00;
xy = add(xy, shift(xy_01, n / 2));
xy = add(xy, shift(xy_10, n / 2));
xy = add(xy, shift(xy_11, n));
return xy; }
How long does this take? Is it faster than before? Let us look what happens. Instead of one multiplication of two numbers of length n, we now have 4 multiplications of numbers of length m = n / 2 plus 3 shifts plus 3 additions. The additions and the shifts take time linear in n. So, all together, the second part takes linear time. That is, there is a d, so that the running time for this part is bounded by d * n. The first part is formulated recursively. So, it makes sense to formulate the time consumption as a recurrence relation:
T_prod(n) = 4 * T_prod(n / 2) + d * n T_prod(1) = 1
To solve recurrence relations it often helps to try a few values in order to get an idea:
T(1) = 1 T(2) = 4 * 1 + 7 * 2 = 18 T(4) = 4 * 18 + 7 * 4 = 100 T(8) = 4 * 100 + 7 * 8 = 456 T(16) = 4 * 456 + 7 * 16 = 1936 T(32) = 4 * 1936 + 7 * 32 = 7968 T(64) = 4 * 7968 + 7 * 64 = 32320Here we assumed d = 7 (estimating 1 * n for each linear time operation, counting the construction of the numbers x_0, x_1, y_0 and y_1 as one operation). Quite soon one starts to notice that actually this additional term c * n does not matter a lot. The main development is determined by the factor 4. Which function returns a four times larger value when taking twice as large an argument? How about n^2? Indeed, this algorithm has running time O(n^2).
Let us try to prove this, taking T(n) = a * n^2, and see whether it works, and for which value of d. Our estimate should be exact or an overestimate, so substitution should give:
a * n^2 <= a * 4 * (n / 2)^2 + 7 * n = a * n^2 + 7 * n.This does not work! There is no choice for the parameter a for which this relation is true. Bad luck, we apparently estimated T(n) wrong. Still we feel that the quadratic development is essentially correct.
The second thing one might try, motivated by the nature of the recurrence relation, is a polynomial of degree 2. That is, an expression of the form f(n) = a * n^2 + b * n + c, for some constants a, b and c. Any polynomial of degree d is entirely determined by its values in d + 1 points. So, assuming that T(n) = f(n), the parameters of f() can be determined by using T(1) = 1, T(2) = 18 and T(4) = 100. This gives three equations with three unknowns:
1 * a + 1 * b + 1 * c = 1,The solution is a = 8, b = -7, c = 0, so f(n) = 8 * n^2 - 7 * n. This does not yet prove that T(n) = f(n), for all n, but if T(n) is given by a polynomial of degree two at all, then it must be given by this function f(n). Checking that T(n) = f(n) for all n goes by induction. The basis is satisfied, because (due to the choice of the parameters) T(n) = 1 = f(n). So, assume T(n) = f(n) for some n, then T(2 * n) = 4 * T(n) + 7 * (2 * n) = 4 * f(n) + 7 * (2 * n) = 8 * (2 * n)^2 - 7 * (2 * n) = f(2 * n). This completes the proof.
4 * a + 2 * b + 1 * c = 18,
16 * a + 4 * b + 1 * c = 100.
superlong prod(superlong x, superlong y, int n) {
/* add(x, y) adds x to y,
shift(x, n) shifts x leftwards n positions */
if (n == 1)
return x * y /* Product of ints */
if (n is odd)
add a leading 0 to x and y and increase n by 1;
compute x_1, x_0, y_1, y_0 from x and y;
xy_11 = prod(x_1, y_1, n / 2);
x_sum = add(x_1, x_0);
y_sum = add(y_1, y_0);
xy_ss = prod(x_sum, y_sum, n / 2);
xy_00 = prod(x_0, y_0, n / 2);
xy = xy_00;
xy = add(xy, shift(xy_ss, n / 2));
xy = subtract(xy, shift(xy_00, n / 2));
xy = subtract(xy, shift(xy_11, n / 2));
xy = add(xy, shift(xy_11, n));
return xy; }
So, we compute x * y as x_0 * y_0 + (x_1 + x_0) * (y_1 + y_0) * 10^m - x_0 * y_0 * 10^m - x_1 * y_1 * 10^m + x_1 * y_1 * 10^n, which is just right. Clever or not? Let us write the time expression again.
T_prod(n) = 3 * T_prod(n / 2) + c * n T_prod(1) = 1Here c is somewhat larger than before. Estimating the cost of all linear-time operations as before gives c = 11. What matters much more is that now there are only three calls of the form prod(..., n / 2), giving 3 * T_prod(n / 2) instead of 4 * T_prod(n / 2).
Let us look at a few numbers again:
T(1) = 1 T(2) = 3 * 1 + 11 * 2 = 25 T(4) = 3 * 25 + 11 * 4 = 119 T(8) = 3 * 119 + 11 * 8 = 445 T(16) = 3 * 445 + 11 * 16 = 1511 T(32) = 3 * 1511 + 11 * 32 = 4885 T(64) = 3 * 4885 + 11 * 64 = 15359Again the development is dominated by the multiplication, so essentially, when doubling n, the time is multiplied by three. That is just what happens for the function n^{log_2 3}. Guessing, inspired by the solution of the above recurrence relation, that the solution has the form f(n) = a * n^{log_2 3} + b * n, and b are determined by the T(1) = 1 and T(2) = 25. This gives
1 * a + 1 * b = 1,The solution is a = 23 and b = -22, that is, f(n) = 23 * n^{log_2 3} - 22 * n. Proving that T(n) = f(n) for all n goes by induction again. The assumption is ok for n = 1, because T(n) = 1 = f(n). Assuming T(n) = f(n), we get T(2 * n) = 3 * T(n) + 11 * (2 * n) = 3 * f(n) + 11 * (2 * n) = 23 * (2 * n)^{log_2 3} - 11 * (2 * n) = f(2 * n).
3 * a + 2 * b = 25.
An experiment is the only way to verify such an expectation. We have implemented several variants:
The program is written in C. All algorithms are somewhat optimized without going to the bottom. Divisions and modulo computations are replaced by biwtwise operations which can be performed in one clock cycle. Particularly for Karatsuba's algorithm the time might be further reduced by reducing the amount of copying. Tests have been performed on a computer with a 2.66 GHz Pentium IV processor running Linux.
log_2 n school opt. school opt. recrs Karatsuba
10 16E-3 17E-3 13E-3 11E-3
11 48E-3 41E-3 44E-3 16E-3
12 192E-3 150E-3 189E-3 44E-3
13 781E-3 608E-3 809E-3 191E-3
14 4644E-3 5220E-3 4625E-3 605E-3
15 24894E-3 19073E-3 23936E-3 2255E-3
16 176E+0 109E+0 118E+0 7634E-3
17 660E+0 454E+0 486E+0 25619E-3
18 4527E+0 2652E+0 1965E+0 72144E-3
19 17394E+0 12740E+0 7818E+0 218E+0
20 70936E+0 53073E+0 31616E+0 787E+0
The recursive algorithm without optimization is several times slower
than any of the other algorithms. This is mainly due to the additional
overhead at the lowest levels of the recursion. Karatsuba's algorithm
is the best of all. For small n the difference is not that big, but
for n = 2^20 it is almost 100 times faster than the standard
implementation of the school algorithm. For n which are so large
that n integers do not fit in the first- or second-level cache, the
performance of all algorithms deteriorates. This effect is strongest
for the school algorithm because it does not have a local character.
For large n the improved method is 30% faster. This difference is
mainly due to the reduced number of cache faults. The optimized
recursive algorithm and the blocked algorithm are about equally good,
for n >= 2^18 they are more than twice as fast as the standard
algorithm.
As usual, the times increase slightly faster than predicted theoretically. This cannot be explained by considering the lower-order terms, but is rather caused by slower memory access when working on larger data sets. For the school method the development of the running time can be estimated as t(n) ~ n^{2.18}. For Karatsuba we get t(n) ~ n^{1.66}.
0! = 1
n! = n * (n - 1), for all n > 0.
This formulation can immediately be turned into a recursive algorithm:
int fac_1(int n)
{
if (n == 0)
return 0;
return n * fac(n - 1);
}
This is elegant and correct, but not terribly efficient because of the overhead due to making procedure calls. The time consumption T_fac(n) is given by the recurrence relation
T_fac(0) = b,It is easy to guess and prove that the solution of this is given by T(n) = a * n + b. Thus, T(n) = O(n).
T_fac(n) = T_fac(n - 1) + a, for all n > 0.
Linear time consumption is also achieved by the following simple iterative implementation:
int fac_2(int n)
{
int f = 1;
for (int i = 2; i <= n; i++)
f *= i;
return f;
}
fib(0) = 0,
fib(1) = 1,
fib(n) = fib(n - 1) + fib(n - 2), for all n > 1.
Turning this formulation into a recursive algorithm gives
int fib_1(int n)
{
if (n == 0)
return 0;
if (n == 1)
return 1;
return fib(n - 1) + fib(n - 2);
}
This program is clearly correctly computing the function. The time consumption T_fib(n) is given by the recurrence relation
T_fib(0) = b,Because the constants a and b are positive, it can be easily shown by induction that T_fib(n) >= fib(n), which is not good, because fib(n) is exponential. More precisely, fib(n) ~= alpha^n, for alpha = (1 + sqrt(5)) / 2 ~= 1.61. So, for n = 100, there are already 2.5 * 10^20 calls, which take years to perform.
T_fib(1) = b,
T_fib(n) = T_fib(n - 1) + T_fib(n - 2) + a, for all n > 1.
The following simple non-recursive algorithm has linear running time, computing fib(100) goes so fast that you cannot even measure it (but it does not fit in a normal 32-bit integer):
int fib_2(int n)
{
int x, y, z, i;
if (n == 0)
return 0;
if (n == 1)
return 1;
for (x = 0, y = 1, i = 1; i < n; i++)
// Invariant: y = fib(i); x = fib(i - 1)
{
z = x + y;
x = y;
y = z;
}
return y;
}
Even a recursive algorithm can be made to run in linear time, but the formulation lacks the typical elegance of recursive algorithms:
int rec_fib(int* f0, int n)
{
int f1, f2;
if (n == 1)
{
*f0 = 0;
return 1;
}
f1 = rec_fib(f0, n - 1);
f2 = *f0 + f1;
*f0 = f1;
return f2;
}
int fib_3(int n)
{
int f;
if (n == 0)
return 0;
return rec_fib(&f, n);
}
The three subroutines for computing Fibonacci numbers integrated in a running C program can be downloaded here. Testing for increasing values of n clearly shows that the two versions with O(n) running time are ready immediately, while for n > 30, the simple recursive version starts to take noticeable time.
The examples in this section show that often recursion is elegant, and sometimes iterative alternatives are much harder to write and understand. However, using recursion always implies a certain extra overhead and a careless use of recursion may even be fatal for the performance.
Create two programs: one based on the non-recursive and one based on the recursive variant of binary search. In each program, generate arrays of length 2^k, for k = 12, 16 and 20, containing all numbers starting from 0. Test for all numbers in the range whether they occur, measuring how long this takes in total. Compute the time per operation and compare the times for the recursive and non-recursive variants.
The problem of computing reciprocals can be solved by a method called Newton iteration. It starts by making a coarse estimate e_0 of the value 1 / y to compute. Then it computes e_1, e_2, ..., giving better and better estimates of 1 / y. These e_i are computed according to the following rule (which is presented here in an ad hoc way, but which is actually a special case of a more general method for computing functional inverses):
e_{i + 1} = 2 * e_i - e_i * e_i * y.It can be shown that the number of correct positions doubles in every iteration (but starting with a very bad initial value it may take a while before we have one correct digit or it may not converge at all).
Any program is created inside an editor, it does not matter which one is used. It is common, maybe even required, to give programs names which end in ".c", so for example "my_program.c". Then within a unix-like environment, the program is compiled by calling one of the available compilers followed by the name of the program and compiler options. A compiler is a program which tests the source code and, when it does not find obvious errors, translates it into machine code.
Available compilers are at least gcc and cc. gcc is a real C compiler, cc is a C++ compiler. These compilers do not generate the same code and there is no guarantee that a program which runs when compiled with gcc also runs when compiled with cc or vice versa. There are two main reasons for this:
Compiler options can tell the compiler all kind of useful
things: the degree of optimization, the file the code should be written
to, the amount of warnings that should be printed, how tolerant the
compilation should be performed, which libraries should be loaded, etc.
A possible compile command looks like
gcc my_program_c -O3 -Wall -ansi -o my_executable
Here "-O3" means that we want optimized code, "-Wall" means that
we want to hear all warnings, "-ansi" means that we want strict
enforcement of the ansi rules, "-o my_executable" indicates that
the compiled code should be written to the file my_executable.
Of course, most programs, especially those of beginning programmers, contain syntactical errors when they are compiled for the first time. Syntactical errors are deviations from the syntax. The compiler is checking whether all syntactical rules have been applied and only when there are no violations an executable is generated. If there are no syntactical errors, there may nevertheless be all kind of other errors. Possibly the programming is crashing at runtime, because one is performing a division by zero or running out off an array. In this case we say that the program has runtime errors. But even if the program has neither syntactical nor runtime errors, it does not need to be correct. Turning on the warnings with -Wall will find some of the non-syntactical errors, for example it detects that one is using an uninitialized variable, but no software whatever clever, can detect that the program is not fulfilling the specification (unless the program is fed with the specification).
With help of a compiler the source code of a program is translated to executable machine code.
C provides several primitive numerical types and characters, but no primitive types for strings or booleans.
#include "stdio.h"
int main() {
int a, b, c;
printf("Give the value of a >>> ");
scanf("%d", &a);
printf("Give the value of b >>> ");
scanf("%d", &b);
c = a * b;
printf("The product of a and b equals %1d\n", c);
return 1; }
Except for the (conditional) jump statements and calls to subroutines
which are discussed later, a program is executed in linear order,
starting with the first line of the section called "main".
The first line tells that before the actual compilation is started, routines from the library stdio.h must be loaded. This library contains IO routines. Other libraries which one may need are math.h (mathematical routines such as exp, cos and sqrt), sys/time.h (routines and types for determining setting clocks), string.h (routines for manipulating strings). Using mathematical routines does not only require the inclusions of math.h, but also the compiler option -lm. So, then the compilation command might look "gcc my_prog.c -lm".
Then the program starts in the section main. The "{" indicates the beginning of the text. The end is indicated by "}". In between we find lines of code, giving several instructions. Each instruction is ended with a ";". The piece of code "int main()" is called the program header. It tells us that the return value of the program is an int, and, in this case, that the program does not have external parameters.
Following the header, it is told which variables we are going to use: "int a, b, c;". Such a statement is called a declaration. This means that we declare three integer variables. Later we can use these names to store and retrieve integer values. Variables of other types are declared in a similar way. It is quite common to write the declarations at the beginning, but in C one can also write the declarations where they are needed. In any case: a variable must be declared before its first usage; otherwise the compiler would not know how much memory to allocate for the variable.
After the variable declaration, we find a statement producing a line of output. The format of IO statements varies strongly from language to language. C is quite convenient. In a print statement, the text to print is enclosed between a pair of """ symbols. Possibly we want to print the value of a variable. In that case it is indicated what kind of variable it is and how much space one can use for it by a combination like "%1d". Here the "%" indicates "take care, here a variable value must be inserted". The "1" indicates that the default number of used positions is 1, always using at least the number of positions the number actually requires. The "d" indicates that here we are going to print an integer. For a float one uses "f", for a char one uses "c", for a string one uses "s".
Reading a value is analogous. Only now we must precede the name of the variable by the symbol "&". In the section on procedures we will see that this symbol means that we are passing the address of the variable, which allows to return a value into it. Several values can be read in a single statement: "scanf("%d%f", &a, &x);" can be used to read an integer a and a float x.
The statement "c = a * b" is an example of a simple computation followed by an assignment. In an assignment the value of an expression on the right is assigned to a variable on the left. All of the usual operators are available in C: +, -, *, /, % for manipulating numbers; |, & and ^ for bitwise operations; ||, && and ! for boolean operations; <, <=, ==, !=, >= and > for comparison. The first four numerical operators are defined for all numerical types, but the division has different meaning: on integers "/" is division while throwing away the non-integral part of the result. So, 11 / 3 == 3. "%" is only defined for integers. It returns the remainder of the division, 11 % 3 == 2. Said otherwise, "%" corresponds to a modulo computation. Finally there are the shift operators >> and <<. They work on integers. a >> b, returns the value of a when shifting its bit pattern rightwards over b positions, throwing away the least significant b bits. Thus, 55 >> 3 == 6, while 55 == 110111 and 110 == 6. a >> b is equivalent to a / 2^b, but it can be performed faster because it is one of the primitive operations of most processors. In the same way a << b returns the value of a shifted leftwards over b positions. It is equivalent to a * 2^b. 6 << 3 == 48, because 6 == 110 and 110000 == 48. Notice that (a >> b) << b in general does not give a again. On the other hand, (a << b) >> b == a, provided that no overflow occurs.
There are even some special operators which are merely shorthands for combinations of the above operators:
i++; <-> i = i + 1; i--; <-> i = i - 1; i += j; <-> i = i + j; i -= j; <-> i = i - j; i *= j; <-> i = i * j; i /= j; <-> i = i / j; i %= j; <-> i = i % j; i &= j; <-> i = i & j; i ^= j; <-> i = i ^ j; i |= j; <-> i = i | j; i <<= j; <-> i = i << j; i >>= j; <-> i = i >> j;
All operators belong to 1 of 15 priority levels. Any book on C provides a complete table. Here we give a shortened version:
Here in every row we first indicate the priority (higher priority operators are executed first), then the operators and finally the execution order for operators in the same priority class. "Normal" indicates execution from left to right, "reversed" indicates execution from right to left. One must not know all this. In case of doubt one should rather use brackets than look it up: the compiled code does not become longer because of this, but it becomes much easier to understand the program!
15 bracketlike () [] .normal 14 unary operators ++ -- ! + - * &reversed 13 multiplicationlike * / %normal 12 additionlike + -normal 11 shifts << >>normal 10 comparisons < <= > >=normal 9 equalitylike == !=normal 8 ... 4 logical & ^ | && ||normal 2 assignment = += -= *= /= %= &= ^= |= <<= >>=reversed 1 comma ,normal
The last line of the program is "return 1;". Not all compilers require this, but some of them insist that "main" is returning an integer value. That is why we have written "int main()". Any value will do. The value can be used to flag where the program is left.
The execution of any C program starts at "main". The usage of any variable must be preceded by a declaration. In expressions it is essential to assure the correct execution order.
a[0] = b[0];
a[1] = b[1];
...
a[99] = b[99];
In the following section we will see how to write this more concisely.
Compound types can also be declared. In general this is a very important construction, but in our applications of C we will rarely encounter it: we use C for simple programs which should be ready as soon as possible and/or which should run very fast. For more elaborate programming an object-oriented language, such as C++ and Java, is much more suitable. Therefore, we only indicate a few aspects of this constructions, further details can be found in any book on C.
For example for playing cards we can write
struct {
char color;
int value; } my_card;
This declares a variable my_card with two fields: a character and a
value. A correct assignment is "my_card.value = 11", setting the
value field of my_card to 11. Accessing the fields of a compound
variable is done in most languages with help of the dot operator
".".
This idea becomes much more useful, when we define a new compound type, so that we can reuse it for several declarations:
struct playing_card {
char color;
int value; };
Hereafter, we can write:
struct playing_card my_first_card; struct playing_card my_second_card;
We can go one step further by using the keyword "typedef" for defining special types. This can be integrated into the struct definition, but there is no need to do so. If we write
typedef struct playing_card playing_card_type;then we can write later in the program
playing_card_type my_card; playing_card_type deck_of_cards[10];The latter declares an array of playing_card. After this declaration we can write "deck_of_cards[8].value = my_card.value;".
Unlike with arrays, and this is not very intuitive, writing "deck_of_cards[8] = my_card" copies the values of the fields of my_card into the values of the fields of deck_of_cards[8]. Consider the following program:
int main() {
struct playing_card {
char color;
int value; };
typedef struct playing_card playing_card_type;
playing_card_type my_card;
playing_card_type deck_of_cards[10];
my_card.color = 's';
my_card.value = 11;
deck_of_cards[8] = my_card;
my_card.color = 'h';
my_card.value = 6;
printf("deck0 = %10X, deck8 = %10X, my_card = %10X\n",
deck_of_cards[0], deck_of_cards[8], my_card);
printf("deck_of_cards[8]: (%1c, %1d)\n",
deck_of_cards[8].color, deck_of_cards[8].value);
return 1; }
The output reads
deck0 = FFBEECD0, deck8 = FFBEECC8, my_card = FFBEECC deck_of_cards[8]: (s, 11)
Nevertheless, one should be very careful with such assignments. In Java this would not have worked! There a struct variable is actually a pointer and the assignment would result in deck_of_cards[8] pointing to the same object as my_card. The memory space which deck_of_cards[8] was originally pointing to would have become unreachable by this.
Explicitly defining types with help of "typedef" makes it much easier to work with more complex types. Consider for example the declaration of the array playing_card. It is now entirely analogous to declaring an array of int by writing "int a[10]".
Arrays and structs are the most important derived data types. One should be very careful with assignments of derived types. Explicitly defining new types is an important structuring step.
Working with gotos however tends to lead to programs which are very hard to follow (so-called "spaghetti code"). Therefore, this instruction has been banned (though it is still available even in most modern languages). Instead of the unconditional goto jumps, there are conditional jumps provided by the if-then-else mechanism and by the repetitive statements: statements telling that a block of code must be executed one or more times repeatedly until a boolean condition at the beginning or end becomes false. In the following we describe these instructions for C with tiny variations they can be found in other computer languages.
if (a == 1000)
a = 0;
The keyword "then" is not written (C likes to save writing, even when
this goes at the expense of clearness). The boolean condition, is
written between round brackets.
The operational semantic of the if statement is that if evaluating the boolean condition results in true, in the example this happens when the variable a equals 1000, that then the following statement is executed and otherwise this statement is skipped.
The if statement only applies to the statement immediately following it. If one wants to apply it to several statements, these should be compounded by writing them between curly brackets:
if (a == 1000) {
a = 0;
b++; }
The whole section of the program within the curly brackets is also
referred to as a compound statement.
The spacing and the placement of the brackets is not prescribed (though clearly it has a strong impact on the readability). The above code can also be written as
if (a == 1000)
{
a = 0;
b++;
}
Generally one should choose one fixed way of writing and stick to it.
But, their are some conventions which are widely accepted. The most
important is indentation, which means that for every level the
program opens (like going into an if statement), the code is shifted
right a fixed number of positions. 2 positions is a convenient choice.
It is also a good idea to surround all binary operators, including "=",
by a blank and to add a blank between if and other operators and the
following conditional. It is a very good idea to put matching curly
brackets above each other as is done in the last example: this
highlights the structure of the program and you will never have to
search for the corresponding bracket. Some people even write curly
brackets around a single statement. Even this is a good idea, because
if later you (or someone else editing your program!) are adding a
statement you cannot forget to add the brackets. Consider the
following:
if (a == 1000)
a = 0;
b++;
Even this is a correct fragment of C code. The compiler is not so
clever to notice the indentation and to guess that you actually forgot
to write the brackets. However, the meaning of this is different from
the fragment above: here the statement "b++;" is always executed.
The second form of the if statement also has an else-part:
if (a == 1000)
a = 0;
else {
a += 4;
s += a[i]; }
The if statement is the main conditional statement.
i = 0;
while (n > 0) {
i++;
n >>= 1; }
Notice that the (compound) statement following the while ( ... ) may be executed zero times. If we start with n <= 0, the compound statement is not executed at all. If there is something important happening there, like a declaration or an initialization, we may run into trouble further down in the program.
The while statement has a second form with the conditional at the end of the iterated statement:
int main() {
int a, b;
do {
printf("\nGive the next a and b value to multiply >>> ");
scanf("%d%d", &a, &b);
printf("a * b = %8d\n", a * b);
printf("Continue? (type 0 to stop, 1 to continue) >>> ");
scanf("%d", &a); }
while (a == 1);
printf("\n");
return 1; }
A possible session looks like
Give the next a and b value to multiply >>> 24 5 a * b = 120 Continue? (type 0 to stop, 1 to continue) >>> 1 Give the next a and b value to multiply >>> 7 8 a * b = 56 Continue? (type 0 to stop, 1 to continue) >>> 1 Give the next a and b value to multiply >>> 4 5 a * b = 20 Continue? (type 0 to stop, 1 to continue) >>> 0Here we really want that the loop is executed at least once, and therefore the usage of the do-while construction is sensible.
The while statement offers an elegant way to realize a conditional repetition.
The for statement is a variant of the while statement most appropriately used in cases where a variable is repeatedly increased by a fixed amount, especially in combination with operations on arrays. The following program initializes all fields of an array and then computes the sum of all values.
#define SIZE 100
int main() {
int i, a[SIZE], sum;
for (i = 0; i < SIZE; i++)
a[i] = i;
for (i = 0, sum = 0; i < SIZE; i++)
sum += a[i];
printf("The average value is %5.2f\n", (float) sum / SIZE);
return 1; }
The output is
The average value is 49.50
The above program contains several new concepts. First we use "#define" to set the value of a constant. The preprocessor simply replaces all occurrencies of SIZE by the value 100. So, at runtime this is just as fast as when writing 100 everywhere, but it nevertheless gives us the possibility to later change the array size in a simple way. We also see how to print a floating point number. Here we specify that it should take at least 5 positions in total, of which exactly 2 are used for the decimal positions. Then in "(float) sum / SIZE" we are dividing two integers. The default way of doing this is to use the integer division, which means that the result is rounded down. Here we want the exact answer (within the limits of the precision of floats), so we want that the floating-point division is applied. Therefore we write "(float)". This is called a cast, which means a forced type conversion: the variable sum is for the time of this statement converted to a float, and the rule is that any operation is performed at the most precise level of any of its arguments: int plus int gives int; int + float gives float; int + double gives double; float + double gives double; ... .
Let us now consider the actual for loops. The first is the simplest. Between the round brackets we find three parts separated by semicolons. The first part is executed once before anything else. The second part is a boolean condition which is evaluated before each execution of the following (compound) statement. The third part is executed after each execution of the compound statement. Thus
for (i = 0; i < SIZE; i++)
a[i] = i;
is equivalent to
i = 0;
while (i < SIZE) {
a[i] = i;
i++; }
The first and the third part may also be empty, thus we might
also write
i = 0;
for ( ; i < SIZE; ) {
a[i] = i;
i++; }
though this cannot be called an improvement. Sometimes it is mainly a
matter of taste whether one wants to use a for loop or a while loop. In
the second for loop we see that the first part may actually consist of
several instructions. These instructions must be separated by the
comma operator. The same applies to the third part. Thus, we might also
have written
for (i = 0, sum = 0; i < SIZE; sum += a[i], i++);
This is nice and short, but does not necessarily express so clearly
what is happening. Here we see that the following statement in
particular may also be empty: No-statement is also a statement! We also
see the extreme importance of being precise: it is the semicolon which
marks the end of the for loop, without it, the following statement, if
any, would have been considered to constitute the body of the loop,
giving a quite different computation.
The for statement should be used instead of a while statement when the repetition has a counting nature.
/* Written by Jop Sibeyn, 25.10.2002
This program computes the average value of the elements
of an array */
#define SIZE 100 /* The length of the used array */
int main() {
int i, a[SIZE], sum;
/* Initialization */
for (i = 0; i < SIZE; i++)
a[i] = i;
/* The main part of the program */
for (i = 0, sum = 0; i < SIZE; i++)
sum += a[i];
/* Printing the results */
printf("The average value is %5.2f\n", (float) sum / SIZE);
/* Finishing */
return 1; }
You should not write books, but concise explaining phrases are essential. Any complete program of more than 50 lines should be commented.
Comment is essential for understanding a program.
In C, procedures returning a value have the following form:
return_type procedure_name(parameter_type parameter_name, ... )
{
instruction;
...
instruction;
return variable_of_return_type;
}
Here the first line is called the header. Procedures not
returning a value look slightly differently:
void procedure_name(parameter_type parameter_name, ... )
{
instruction;
...
instruction;
}
The special word void stands for no type and the return
statement is omited. Inside the procedure new variables may be declared
and used in the same way as the parameters.
A procedure is called by writing its name and specifying all parameters:
my_procedure(parameter_name, ... );For procedures returning a value it is most common to use this value in some way, for example in an assignment:
variable_of_return_type = my_procedure(parameter_name, ... );
Procedures offer the possibility to reuse the same piece of code at several places in the program and are the main way of structuring a program in C.
The most important non-trivial aspect of handling procedures, is the status of the parameters. What happens with them inside the procedure? In any case their value outside is available within the procedure, so they are more than just variables: they have a value from the start. Can we assign values to the parameters and what happens then? In most languages one can assign to all parameters. But it depends on the kind of variables whether this effect becomes visible outside the procedure or not. At this point there are differences between computer languages. In many languages, C is among these, the user can specify this, in some languages, Java is one of them, the status of the parameters only depends on their type.
Consider the following two C procedures:
void local_swap(int lx, int ly) {
int z;
z = lx;
lx = ly;
ly = z;
printf("Values in local_swap:\n");
printf(" x = %10d, y = %10d\n", lx, ly);
printf(" ax = %10X, ay = %10X\n", &lx, &ly); }
void global_swap(int* rx, int* ry) {
int z;
z = *rx;
*rx = *ry;
*ry = z;
printf("Values in global_swap:\n");
printf(" x = %10d, y = %10d\n", *rx, *ry);
printf(" ax = %10X, ay = %10X\n", rx, ry); }
int main() {
int x, y;
x = 17;
y = 23;
printf("Values in main at beginning:\n");
printf(" x = %10d, y = %10d\n", x, y);
printf(" ax = %10X, ay = %10X\n", &x, &y);
local_swap(x, y);
printf("Values in main after local_swap:\n");
printf(" x = %10d, y = %10d\n", x, y);
printf(" ax = %10X, ay = %10X\n", &x, &y);
global_swap(&x, &y);
printf("Values in main after global_swap:\n");
printf(" x = %10d, y = %10d\n", x, y);
printf(" ax = %10X, ay = %10X\n", &x, &y);
return 1; }
It is not serious if not all details are clear. Actually we are not
doing more than calling two simple procedures and generating some
formatted output. Running the program gives the following output:
x = 17, y = 23 ax = FFBEED2C, ay = FFBEED28 Values in local_swap: x = 23, y = 17 ax = FFBEED0C, ay = FFBEED10 Values in main after local_swap: x = 17, y = 23 ax = FFBEED2C, ay = FFBEED28 Values in global_swap: x = 23, y = 17 ax = FFBEED2C, ay = FFBEED28 Values in main after global_swap: x = 23, y = 17 ax = FFBEED2C, ay = FFBEED28Integer values are given in decimal notation, addresses are printed hexadecimally.
Apparently, the exchange of x and y in local_swap does not become visible outside the procedure itself. The reason is that in this case the procedure is called by value: the parameters are nothing else than local variables which are initialized with the values passed when calling the procedure.
On the other hand, the exchange of x and y in global_swap does become visible outside the procedure. The reason is that in this case the procedure is called by reference: when calling the procedure we do not pass the value of the parameters, but their addresses. This is indicated by the address operator "&" (the same symbol is also used as the binary logic operator bitwise and). In the header of the procedure this is reflected in the usage of the symbol "*": the parameter x is not an integer, but a pointer to an integer, also called a reference of an integer. Accessing the value of a pointer type variable can be done by preceding it by the dereferencing operator also called indirection operator "*". In the above example the printed values of the addresses depend on the computer the program is run on and the used compiler.
There is an essential difference in call-by-value and call-by-reference.
int a[100];This instruction creates 400 bytes of space for storing 100 integers. This space can be accessed with the help of the name a. Actually there is an internal mechanism, which is called address arithmetic, which is used to access the stored values: if we write "b = a[23]", then the value of a is fetched, this is a memory address, namely the address of a[0], then 23 * 4 is added to this value, and the integer at this address (the value represented by the 4 bytes starting at the specified address) is returned. Therefore, the above assignment is equivalent to writing "b = *(a + 23)". The most remarkable feature is that we only need to add 23 and not 4 * 23, because internally it is known that a is an array of integers.
The header of a procedure with an integer argument may be written in two ways. At a first glance the most correct way of doing is to write
int sum(int a[], int n) {
int i, s;
for (i = s = 0; i < n; i++)
s += a[i];
return s; }
In this way it is explicitly told that the argument must be an integer
array. Alternatively, one may write
int sum(int* a, int n) {
int i, s;
for (i = s = 0; i < n; i++)
s += a[i];
return s; }
Both procedures are correct C.
A possible way of doing this is by simply exchanging all values:
void initialize(int a[], int b[], int n) {
int i;
for (i = 0; i < n; i++) {
a[i] = i;
b[i] = -i; } }
void simple_exchange(int a[], int b[], int n) {
int i, c;
for (i = 0; i < n; i++) {
c = a[i];
a[i] = b[i];
b[i] = c; } }
void print_arrays(int a[], int b[], int n) {
int i;
printf("\n");
for (i = 0; i < n; i++)
printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }
int main() {
int n = 10, a[n], b[n];
initialize(a, b, n);
print_arrays(a, b, n);
simple_exchange(a, b, n);
print_arrays(a, b, n); }
This procedure takes time proportional to n. One might fear that it has
the same problem as local_swap above, but that is not the case. The
reason is that the array variables a and b are actually pointers and
that these pointers are handed over as parameters. The output of the
program is:
a[ 0] = 0, b[ 0] = 0 a[ 1] = 1, b[ 1] = -1 a[ 2] = 2, b[ 2] = -2 a[ 3] = 3, b[ 3] = -3 a[ 4] = 4, b[ 4] = -4 a[ 5] = 5, b[ 5] = -5 a[ 6] = 6, b[ 6] = -6 a[ 7] = 7, b[ 7] = -7 a[ 8] = 8, b[ 8] = -8 a[ 9] = 9, b[ 9] = -9 a[ 0] = 0, b[ 0] = 0 a[ 1] = -1, b[ 1] = 1 a[ 2] = -2, b[ 2] = 2 a[ 3] = -3, b[ 3] = 3 a[ 4] = -4, b[ 4] = 4 a[ 5] = -5, b[ 5] = 5 a[ 6] = -6, b[ 6] = 6 a[ 7] = -7, b[ 7] = 7 a[ 8] = -8, b[ 8] = 8 a[ 9] = -9, b[ 9] = 9
This is nice, but not very efficient. This operation can also be performed in constant time: we do not have to exchange all elements, it is sufficient to exchange the values of a and b. So, afterwards a will point to the first position of b and vice versa. That is, we want to perform a procedure like global_exchange with parameters of type "array of integer". For more complex operations like this, it becomes much more convenient not to mix the array notation with the pointer notation:
void initialize(int a[], int b[], int n) {
int i;
for (i = 0; i < n; i++) {
a[i] = i;
b[i] = -i; } }
void fast_exchange(int** aa, int** ab) {
int* c ;
c = *aa;
*aa = *ab;
*ab = c; }
void print_arrays(int a[], int b[], int n) {
int i;
printf("\n");
for (i = 0; i < n; i++)
printf("a[%2d] = %4d, b[%2d] = %4d\n", i, a[i], i, b[i]); }
int main() {
int n = 10;
int* a = (int*) malloc(n * sizeof(int));
int* b = (int*) malloc(n * sizeof(int));
initialize(a, b, n);
print_arrays(a, b, n);
fast_exchange(&a, &b);
print_arrays(a, b, n);
free(b);
free(a); }
This produces the same output as the program above.
Different from an array declaration, declaring a variable of type int* does not immediately allocate a whole lot of memory. A variable of int* has size four (possibly eight) bytes. The standard procedure malloc is used to allocate memory. The number of bytes is passed as an argument. We might write 4 * n, but then we would explicitly use that integers are four bytes long. Doing this, the program would not work on a more modern system were integers consist of eight bytes. The procedure malloc returns a typeless pointer, void*, which cannot be assigned in a correct way to an int* without forcing the system to do so. Therefore we precede the procedure by "(int*)", enforcing a type conversion of the result. Said otherwise, the result type is cast to int*.
At the end of the program we find the calls to the standard procedure free. This procedure deallocates the memory a pointer is pointing to. Of course, at the end of a program all memory is deallocated, so in this case these statements are superfluous. However, in general it is important to carefully manage the memory making sure that the program does not create garbage: allocated memory which cannot be reached anymore by following any of the pointers. Garbage would be created if at some stage in the program we would write "a = b;" or if we would have a second malloc statement for a. Forgetting to free memory is an important source of problems. Suppose that instead of simple_exchange we were doing the following:
void stupid_exchange(int a[], int b[], int n) {
int i;
int* c = (int*) malloc(n * sizeof(int));
for (i = 0; i < n; i++);
c[i] = a[i];
for (i = 0; i < n; i++);
a[i] = b[i];
for (i = 0; i < n; i++);
b[i] = c[i]; }
Constructions of this kind, reducing the number of different variables
in each loop, might have advantages when the cache associativity is
low (stupid_exchange is guaranteed to work fine already for a two-way
associative cache). However, this procedure leaves behind n *
sizeof(int) bytes of garbage. If we are calling this procedure many
times, we will run out of memory, even when the program actually needs
only a small fraction of it. Every good programmer is so disciplined
to match each malloc (or similar operation) with a corresponding free,
in the same way any "{" is matched by a corresponding "}".
Now one might become afraid that we have the same problem for the following procedure:
void not_so_stupid_exchange(int a[], int b[], int n) {
int i;
int c[n];
for (i = 0; i < n; i++);
c[i] = a[i];
for (i = 0; i < n; i++);
a[i] = b[i];
for (i = 0; i < n; i++);
b[i] = c[i]; }
But here we touch on the counterpart of the automatic memory
allocation of an array: just as the memory is allocated implicitly,
it is also automatically deallocated at the end of the procedure.
In particular this means that one should not assign a local array to a pointer variable which is going to be used outside the procedure. Consider the following program:
void incorrect_initialize(int a[], int n) {
int i;
int b[n];
for (i = 0; i < n; i++);
b[i] = i;
a = b; }
void print_array(int a[], int n) {
int i;
printf("\n");
for (i = 0; i < n; i++)
printf("a[%2d] = %4d\n", i, a[i]); }
int main() {
int n = 10;
int* a;
initialize(a, n);
print_array(a, n); }
This program is syntactically correct. However, when
running it, it will probably crash with "segmentation fault", one of
the most common errors. It means that the program is trying to access
memory which it does not own. Another common error is "bus error". In
most cases this also means segmentation fault. So, not withstanding
the syntactical correctness, this program has runtime errors.
int main() {
int i, size, sum;
printf("Give size >>> ");
scanf("%d", &size);
int a[size];
for (i = 0; i < size; i++)
a[i] = i;
for (i = 0, sum = 0; i < size; i++)
sum += a[i];
printf("The average value is %5.2f\n", (float) sum / size);
return 1; }
Nothing speaks against this except the practice: for some reason (which goes back on the place where the memory is allocated), one cannot allocate very large arrays at runtime. On my computer the maximum size is only about 8 MB. This problem can be remedied in two possible ways. The easiest is to declare the array before the start of the program and to use only the fraction which is needed:
#include "stdlib.h"
#define MAXSIZE 10000000
int a[MAXSIZE];
int main() {
int i, size, sum;
printf("Give size >>> ");
scanf("%d", &size);
if (size > MAXSIZE) {
printf("Size is too large, exiting the program!\n");
exit(1); }
for (i = 0; i < size; i++)
a[i] = i;
for (i = 0, sum = 0; i < size; i++)
sum += a[i];
printf("The average value is %5.2f\n", (float) sum / size);
return 1; }
Here we touch on one more important aspect of variables: their visibility. A variable is visible, that is can be used only within a certain scope. A variable which is declared within a procedure can be used anywhere within this procedure, but not in another procedure: it is a local variable. This also means that it is no problem to use the same variable names in several procedures. This is essential! Otherwise, it would be very hard to add a new procedure at a later time to an existing program: one should have a complete overview of all names used anywhere in the program. Variables can be even more local then the procedure: new variables may be defined inside an if, for, while or compound statement. These are invisible outside it. Even these variables may have the same names as other variables within the same procedure. In that case we say that these other variables are shielded. In general shielding a variable is not a good idea, because it is confusing. The other extreme is variables which are not local to any procedure: global variables. In C these are declared at the beginning of the program like the array a[] above. These are visible everywhere. Not only there visibility is different: because the compiler already knows from the start that these variables are going to be there, the memory for them is allocated in a different way than for local variables. This makes that in the above program it is no problem to declare a global array with 10.000.000 ints, whereas this is not possible inside main.
So, the above program works fine, but it is kind of stupid to declare an array of maximum size for just in case. Using malloc gives us the best of both: we can declare very large arrays, without wasting memory when size is chosen small.
int main() {
int i, size, sum;
printf("Give size >>> ");
scanf("%d", &size);
int* a = (int*) malloc(size * sizeof(int));
for (i = 0; i < size; i++)
a[i] = i;
for (i = 0, sum = 0; i < size; i++)
sum += a[i];
printf("The average value is %5.2f\n", (float) sum / size);
free(a);
return 1; }
The above program works fine except for possible number overflow. The reason is that the final value of sum = size * (size - 1) / 2, which is too large for size > 100.000. A possibility is to use a long for size. However, on most current computers a long is not longer than an int. An ugly, but practical, solution is to use a double. A double is guaranteed to have 52 bits for its mantissa.
There is a close but imperfect relation between arrays and pointers. Arrays are convenient, but often it is better to use pointers and allocate the memory with malloc and deallocate it with free.
#define false 0 #define true 1 typedef char boolean;Here a boolean is defined to be a char. In this way every boolean is stored in one byte (8 bits). Because booleans are bi-valued, in principle a single bit would be sufficient. For single boolean variables this does not matter. But, arrays of booleans offer a natural way of realizing a set data structure: if set[i] is true, then the element with index i is an element of the set, otherwise it is not. If the set is large, then it is undesirable to use a byte for each possible element. The underlying problem is that the byte is the smallest addressable memory unit: the memory is organized as a set of bytes.
The solution is to view a byte not as a number from 0 to 255, but as 8 bits packed together. So, for storing an array of n booleans, we use an array of n / 8 (rounded up) bytes, and store 8 booleans in each of them. The values of the individual bits can be set and read in a constant number (one or two) of clock cycles using the bitwise operations: in most programming languages there are not only instructions to perform operations on booleans, characters and numbers, but one can also perform 8-, 32- or even 64-bit operations in one stroke. Because there is a one-to-one correspondence between bit operations and boolean operations this feature allows to perform 32 or even 64 boolean operations in one clock cycle, provided that all these operations are of the same kind. The language C provides such bitwise operations: bitwise-and, bitwise-or and bitwise-exor. Bitwise-not can be obtained by computing bitwise-exor with FFFFFFFF.
Memory-efficient computation requires that arrays of booleans are packed with 8 booleans per byte. This feature is normally not provided defaultly.
In C all this is easy to realize:
#define ALL_ONE 0xFFFFFFFFu
typedef unsigned int boolean;
void set_value(boolean* a, int i, boolean x) {
/* Assign value x to boolean i of the array. */
int j = i >> 5; /* This is equivalent to j = i / 32. */
int k = i & 31; /* This is equivalent to k = i % 32. */
int l = 1 << k; /* This is equivalent to l = 2^k. */
if (x)
{
a[j] = a[j] | l;
}
else
{
l = l ^ ALL_ONE; /* This inverts all bits in l. */
a[j] = a[j] & l;
} }
boolean get_value(boolean* a, int i) {
/* Return the value of boolean i of the array. */
int j = i >> 5; /* This is equivalent to j = i / 32. */
int k = i & 31; /* This is equivalent to k = i % 32. */
int l = 1 << k; /* This is equivalent to l = 2^k. */
return a[j] & l;
For a boolean variable x it is ugly to write something like "if (x ==
true)", x has a truth-value itself, so one should rather write "if
(x)". In the above we heavily use the bitwise operations. If
applicable, these are much faster than the equivalent formulations
with divisions: all bitwise operations may be assumed to take a single
clock cycle, while division is not an elementary operation.
The subroutines can be written more compactly as follows:
#define ALL_ONE 0xFFFFFFFFu
typedef unsigned int boolean;
void set_value(boolean* a, int i, boolean x) {
/* Assign value x to boolean i of the array. */
if (x)
a[i >> 5] |= 1 << (i & 31);
else
a[i >> 5] &= (1 << (i & 31)) ^ ALL_ONE; }
boolean get_value(boolean* a, int i) {
/* Return the value of boolean i of the array. */
return a[i >> 5] & (1 << (i & 31)); }
This shows clearly that the operations are very simple and can be
expected to take very little time. Because the value of x is mostly
known by the caller, it is more efficient to replace set_value() by
set_true() and set_false(). If efficiency is really a concern, then
the procedure call should be avoided, inlining the above instructions.
Because of cache effects, saving memory in many cases implies that all operations go faster. Therefore, the four or five extra elementary operations that must be performed to access the fields of the array are typically outweighed by faster memory access, and therefore packing booleans will mostly even lead to faster execution: there is no memory-time trade-off. These subroutines integrated in a running C program can be downloaded here.
In the above implementation, instead of chars we have chosen to use unsigned ints for the array of booleans. An unsigned does not have a reserved sign bit, and is therefore ideally suited for our purposes. Using ints may imply the waste of at most three bytes at the end of the array, but this has the advantage that many operations which involve all booleans of the array can be performed with 32-fold parallelism. This is particularly important when using such an array of booleans to implement a set: the union and intersection of two sets with n elements each can be computed with round_up(n / 32) operations. Exploiting the w-fold parallelism provided by the bitwise operations of a computer with w-bit word length is called bit parallelism. For example, allocating an array of booleans and initializing all of them at false can be performed as follows:
boolean* make_array(int n) {
int i;
n = (n + 31) / 32;
boolean* a = (boolean*) malloc(n * sizeof(boolean));
for (i = 0; i < n; i++)
a[i] = 0;
return a; }
The task is to verify this property for various n and x and to measure the time consumption. This is done by using a second unsigned integer array b[], which is used for counting the frequencies of the numbers in a[]. It is initialized at zero and in a final pass the maximum of all values in b[] is determined and printed.
Times can be measured with the following procedure:
long dclock() {
/* Returns the time in milliseconds */
struct timeval tp;
struct timezone tzp;
gettimeofday(&tp, &tzp);
return 1000 * (tp.tv_sec % 1000000) + tp.tv_usec / 1000; }
This is not the most scientific way of measuring times, but it is
simple and works quite well. In order to be able to use this
routine it is necessary to include the system library "sys/time.h",
which is done in the same way as the inclusion of "stdio.h".
For n you must test n = 2^k, for all k >= 12 as far as the computer allows you to solve the problem in less than 1 minute. For x you must test x = 1, 2, 4, 11, 19, 1007, 99991. The time measurement should only reflect the time for counting the frequencies of the numbers in a[], not the initialization or finding the maximum. To get stable measurements, the experiments should be repeated until the sum of the measured times exceeds 1000 ms (and then of course you must divide by the number of experiments to get the average time per experiment). Plot the resulting average time consumptions as a function of n using a doubly logarithmic scale (that is, both along the x-axis and along the y-axis the scaling is so that each factor two is one unit distance) connecting the points belonging to the same x value. Consider the developments and the differences and explain them.
Generate three random sets of size 100.000.000 each: S1 are the lotto prices for the first draw, the probability that a number gives a price in the is 0.05. S_2 are the lotto prices for the second draw, again a fraction 0.05 of them is 1. S_3 gives the lotto bets, the probability that a number is selected is 0.2. Now compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably there is something wrong with your program).
Random numbers can be generated with help of the function random. See the online manual for the details (type "man random" inside a Unix or Linux environment).
This algorithm can be made more efficient by explicitly dealing with some special cases. For example, it is not necessary to ever test even numbers: these can also be thrown out by a modified initialization. Larger improvements can be achieved by not testing multiples of 2, 3, 5, 7, ... , either. And, when one is not testing them, why should one have storage for these numbers? None of these improvements you must implement, it is just pointed out here that this fast algorithm for finding prime numbers can be improved further.
Program two variants with the following features:
For each of the two versions determine the largest power of 2 for which the program runs in less than 1 minute. Which variant is best?
The program should also produce some output. After performing the sieving, you should determine for each number 2 <= k == 2^i <= n / 2 the number of primes between k and 2 * k and the resulting average distance between two primes in these intervals. For each of these intervals the program should also print the maximum distance between any two consecutive primes.
We want to know how efficient the Eratosthenes algorithm for computing primes is. Not in a concrete sense by measuring seconds, but in an abstract sense by counting some specific operations which give a good measure for the amount of work performed. In our case, such a measure is given by the number of visited multiples of the prime numbers. This does not account for the initialization and the testing, but this amount of work is easy to estimate: it is proportional to n.
Determine this number for several values of n and speculate how it develops as a function of n. You can choose from simple functions of the following types: c * n^2, c * n^{3/2}, c * n * log n, c * n * loglog n, and variants. Of course you do not need to speculate: using your measurement of the development of the average distance between primes, it is not hard to derive this development.
The matrices should be initialized as follows:
A_{ij} = 1, for all i, j with i + j evenFor C = A * B this gives a simple regular pattern, which can be used to check that the three procedures all compute the same product.
A_{ij} = -1, for all i, j with i + j odd
B_{ij} = i, for all i, j
Measure the time for each of these methods for n = 2^k, for k = 4, 5, ..., 10 or 11. The time for possibly transposing the matrix must also be taken into account, but not the time for allocating and initializing the matrices. The first time you are using C after allocating it, all its fields must be accessed once to make sure that C is actually loaded in to the cache/memory. For the small matrices the experiments must be repeated many times to get stable time measurements.
Plot the results in a suitable way: along the x-axis you should give the k values, along the y-axis you should give log_2 T(2^k), where T(2^k) gives the time for an experiment with n = 2^k. The graphs should be about lines. Explain the irregularities in the development and the differences between the methods. Which method is best?
Java was originally designed as an interpreted language. However, not the source code is interpreted, but something called byte-code. This byte-code is generated by a program (one could say a compiler) called "java.c". So, once the program is written, one types "java_c MainClassName.java". If there are no syntactical errors, then a new file with name "MainClassName.class" is generated. The program can now be executed by typing "java MainClassName".
The following gives a very simple program computing the average value of the fields of an array:
class ArrayAverage
{
static final int SIZE = 100;
public static void main(String ps[])
{
int i, sum;
int[] a = new int[SIZE];
for (i = 0; i < SIZE; i++)
a[i] = i;
for (i = 0, sum = 0; i < SIZE; i++)
sum += a[i];
System.out.println("\nThe average value is " +
(float) sum / SIZE + "\n");
}
}
Comparing with the C program doing the same, we see that details have changed, but that the program as a whole is more or less the same. The differences are
In Java there are quite generally applied name-giving conventions:
At the most basic level, the difference between Java and C is small.
Classes, structured data types defined along with the operations that can be performed on them, their functionality, are the central notion of object-oriented programming languages.
class Employee
{
String name;
int number;
double salary;
public Employee(String theName, int theNumber, int theSalary)
{
name = theName;
number = theNumber;
salary = theSalary;
}
public void increaseSalary(double salaryIncrease)
{
salary += salaryIncrease;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary + ")";
}
public void setName(String newName)
{
name = newName;
}
}
Here we see many important aspects of classes. First the header. A
class header always consists of the word "class" followed by the name
of the class, in our case "Employee". In the basic case we are
considering here, we then get a "{", which is matched by a "}" at the
end of the definition of a class. It should be noticed that here we
only describe a class, we do not create an object of this class.
Then we see a list of variables: a String, an int and a double. These variables will be called instance variables. Other names are in use as well. Any Employee object (= instance) has the instance variables name, number and salary. So far a class is just like a struct in C.
The difference with a struct is that the definition of a class contains also the definition of the methods that are working on objects (= variables) of this class. Further down we will see how this goes, but here we can notice already that there are four such methods: Employee, increaseSalary, toString and setName.
The latter three resemble procedures in C: they have a name, parameters and a return type. In addition we find the word "public", which is an example of an access modifier. "public" means that these methods are accessible from outside. If we would have written "private" instead, these methods could only have been called from inside the class itself. In total there are four of these access modifiers, they will be discussed further down.
The method Employee is more special. It is a so-called constructor. When calling this method in combination with the keyword new, then memory is allocated and the instructions in the constructor are executed. Typically a constructor contains instructions to initialize the instance variables, but it may also do more or less. In any case: each class must have at least one constructor, otherwise no objects can be generated. The name of a constructor is always the same as the name of the class and therefore one does not indicate the return type: by default it returns an object of the class.
Inside a class a list of instance variables together with their type is followed by all methods. Each class definition must contain at least one constructor, which is called when creating new objects.
class IntegerMatrix
{
int n;
int[][] a;
public IntegerMatrix(int size)
// Initializes all positions with 0
{
n = size;
a = new int[n][n];
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
a[i][j] = 0;
}
public IntegerMatrix(IntegerMatrix matrix)
// Creates a copy of matrix
{
n = matrix.n;
a = new int[n][n];
for (int i = 0; i < n; i++)
for (int j = 0; j < n; j++)
a[i][j] = matrix.a[i][j];
}
public int trace()
// Computes the trace (= sum of diagonal elements) of a matrix
{
int s = 0;
for (int i = 0; i < n; i++)
s += a[i][i];
return s;
}
public boolean findValue(int x)
// Checks whether the value x occurs in the matrix
{
int i;
for (int j = i = 0; i < n && a[i][j] != x; i++)
for (j = 0; j < n && a[i][j] != x; j++);
return i < n;
}
}
Here we see all the things we saw above plus some new features.
Most noticeable is that there are two constructors. They have
the same name, inside the same scope (both names are visible inside
and outside the class). Nevertheless this is correct: the names are
the same, but their signature is not. The signature of a method
is the whole set of name, parameter list and return type. For two
parameter lists to be the same, parameters of the same types must
appear in the same order. Here the first constructor has an int
parameter, the second has an IntegerMatrix parameter. When calling
these methods from outside, the compiler/interpreter has no problem
in figuring out which of the two is meant: it just has to check the
type of the parameters and to match it with one of the specified
methods. This is a first example of polymorphism about which we
will hear more further down.
Now it is also time to notice that inside the class the methods can work with the instance variables. In general, in a method there are three kinds of variables:
Local variables must not be declared at the beginning of a method. On the contrary: in Java it is considered to be good style to declare a variable locally. Also it is considered good to initialize a variable upon declaration. One must be careful with the scope of a local variable. By scope we mean the "visibility range" of a variable. The scope of a variable stretches from its declaration to the end of the level at which it was declared. So, a variable declared at the beginning of a method is visible anywhere in the method, but not outside the method. That is why we call it a local variable. The variable i in findValue is of this type. A variable which is declared in the header of a for loop is visible inside this loop, but not outside of it. The variable j in findValue is of this type. The reason that i was not declared in the header of the first loop is that we wanted to use it in the final comparison. A variable declared inside a compound statement is visible only within this compound statement.
It is correct and perfectly accepted to use the same variable name in many different methods. This possibility assures that program fragments can be combined without extensive effort to trace all common variable names. If from a method in which a variable x is used another method is called in which also a variable x is used, then of course this latter x is the valid one, because the scope of the x in the calling method is limited to its own method.
Slightly less clear that the following is also correct:
int i = 10;
for (int i = 0; i < 1000; i++)
a[i] = 2 * i;
System.out.println("i = " + i);
What is going to be printed? 10 of course! Outside the for loop the
locally defined variable i is not existing, the scope of this variable
is limited to the loop. On the other hand, inside the loop the original
variable i is not visible: it is shielded by the more local
variable. The reason why this works, not only in Java, is that the
compiler creates its own list of variables and has no problem to keep
the variables apart. Even though this works, there is rarely a good
reason to program this way, and therefore this confusing style of
programming should be avoided.
In the second constructor, we see that there may be a parameter of the same type as the class. One might fear that this leads to confusion. However, the instance variables of such a parameter are accessed like the instance variables of any other object with help of the dot-operator, ".", just like in C. So, n is the instance variable of the instance under consideration, while mat.n denotes the corresponding instance variable of the parameter mat.
In both constructors we see how an array is allocated. In C there is no distinction between declaring an array variable and the allocation of its memory. In Java, writing int a[][] creates an array variable without allocating memory (which would be hard, because the compiler still does not know how big it should be). An array variable is actually a reserved memory location in which pointers to arrays of the appropriate type can be stored. The call "new int[n][n]" allocates space for n * n integers and returns a pointer to this space. This pointer is assigned to the array variable a. All this is very clean. A similar construction we have in C when we use int** a and malloc to allocate memory, but this is quite ugly.
Now that we speak about memory allocation: in Java one does not have to bother about cleaning up (though it is possible to do so): the system runs a garbage collector in the background. A garbage collector is a program which checks for objects to which no pointers are pointing anymore and then deallocates their memory.
A variable of a class type is actually only a pointer to an object of this type. This object can be generated calling a constructor and then it can be assigned to the variable.Not all data types are classes: the primitive data types, several numerical types, characters and booleans, are not classes. All derived types are classes. Only the variables of class types are called objects. This distinction is important: when calling a method, objects are passed "by reference", while non-objects (= normal variables) are passed "by value". Actually this is not an entirely correct view: a variable of a class type is actually a pointer. So, if we pass a class variable as parameter in a method call, then the value of this pointer variable, an address, is copied into the corresponding parameter. This is the same as in C, the only difference is that for variables which are not objects, variables of the primitive types, there is no way to specify that we want to pass their address.
At this point Java is rigid, and sometimes this makes it hard to do easy things. In C it is trivial to write a procedure "swap" for exchanging the value of two variables which are passed as parameters: one passes their address instead of their value, which is done with help of the address operator "&". In the procedure one can access the values of these parameters with help of the value-of operator "*". In Java this simple and common task can only be realized in a quite elaborate way, using a so-called wrapper class: a class with a single instance variable of a primitive type, which thus obtains object status. The following example, which can be downloaded here, gives a possible work-out of this idea. Java provides also predefined wrapper classes: Integer, Float, Boolean, ... .
class Int // A self-defined wrapper class
{
public int v; // The wrapped value
public Int(int x)
{
v = x;
}
static public void swap(Int a, Int b)
{
int c;
c = a.v;
a.v = b.v;
b.v = c;
}
}
public class Swap
{
static public void swap(int a, int b)
{
int c;
c = a;
a = b;
b = c;
}
public static void main(String[] args)
{
int a = 4;
int b = 7;
System.out.print("a = " + a + ", b = " + b + "\n");
swap(a, b); // Swapping without effect
System.out.print("a = " + a + ", b = " + b + "\n");
Int aWrap = new Int(a); Int bWrap = new Int(b); // Wrapping
Int.swap(aWrap, bWrap); // Swapping
a = aWrap.v; b = bWrap.v; // Unwrapping
System.out.print("a = " + a + ", b = " + b + "\n");
}
}
When calling methods class variables, objects, are passed by reference, while variables of primitive types are passed by value. Wrapper classes grant object status to primitive types, allowing to pass variables of primitive types by reference.
class Node
{
int key;
Node next;
Node(int key, Node next)
{
this.key = key;
this.next = next;
}
Node(int key)
{
this(key, null);
}
}
class Chain
{
Node first;
public Chain()
{
first = null;
}
private Node getLast()
// Return the last node of a chain
{
if (first == null)
return null;
Node node = first;
while (node.next != null)
node = node.next;
return node;
}
public void addFirst(int key)
// Add a new node at the beginning of the chain
{
first = new Node(key, first);
}
public void addLast(int key)
// Add a new node at the end of the chain
{
if (first == null)
first = new Node(key);
else
getLast().next = new Node(key);
}
public void addChain(Chain chain)
// Attach the Chain chain at the end of the considered chain
{
if (first == null)
first = chain.first;
else
getLast().next = chain.first;
}
public boolean findValue(int x)
// Test whether there is a node with key value x
{
Node node = first;
while (node != null && node.key != x)
node = node.next;
return node != null;
}
public void print()
// Print all the keys together with their position in the list
{
int counter = 0;
Node node = first;
while (node != null)
{
System.out.println("Node " + counter + " has key " + node.key);
counter++;
node = node.next;
}
}
}
In Node there are two instance variables: key and node. key is a simple
integer instance variable as we have seen before. The exiting thing
is that node is of type Node. Is this possible? What does it mean?
Here it is crucial that an object, and any variable of type Node is
an object because Node is a class, is only a pointer and not the thing
itself (otherwise we would get an explosion). So, upon calling one of
the constructors with "new Node( ... )", space is allocated for one
integer and for one pointer to a Node object (each takes either 4 or 8
bytes) and a pointer to this space is returned.
Linked structures can be defined by defining a class with an instance variable with the data type of the class itself. Because memory for an object is only allocated when explicitly calling a constructor, this does not lead to a recursive explosion.
The constructors of Node contain the special word this. this has several related meanings. It means either: "this class", or "the current object". In our example we find examples of both applications:
The first constructor of Node is of a conventional type: two parameters are passed and assigned to the instance variables. Slightly problematic might be the assignment "this.next = next". Here a Node object is assigned to another Node object. What does it mean? If one realizes that an object is a pointer, the answer is clear: afterwards next points to the same object as next. This is general: an assignment "x = y" can always be performed when x and y are variables (y may also be a constant) of the same type (or more generally when the type of y may be converted to the type of x). In case x and y are of a primitive type, then afterwards x has the same value as y. In case x and y are objects, then afterwards x points to the same object as y (even in this case one can say that x has the same value as y, namely the same address).
The class Chain has only one instance variable: the Node first. The single constructor is trivial: no parameters, first is set to null. null is a constant value which can be assigned to any pointer variable (that is, object). It means something like "to_nothing". The important thing is that it can be used in tests. If first (or any other object) has value null, that it would be fatal (that is, leading to a runtime error) to use first.key: first == null means that the pointer of first has no specific value, in particular it is not pointing to a storage space of a Node. Thus, first.key, which means so much as the value of the int stored in the storage space first is pointing to, is not defined. Errors of this kind are very common. In the above example we were careful not to run into it.
The other methods of Chain are for adding nodes either at the beginning or at the end, for checking whether an element exists or for printing all keys in the order they appear in the list. Further methods can be added to make it more useful, here we only give an example. The method getLast is declared private. This is because we decided that it should be only for internal usage. The reason for this is that we maybe do not want to guarantee that it is always there or not in exactly this form. This prevents users from using features which they are not supposed to use. This is a first example of encapsulation about which we will hear more further down.
Now that we are presenting the Chain, we should also try to understand how exactly it works under addition of nodes. Initially we have an empty structure: first == null. Then, the first addition (it does not matter which of the additions is used) creates a new initialized Node by calling "new Node(key)" and assigns the returned value, a pointer to a Node to first. The later additions are of two kinds.
addFirst performs
first = new Node(key, first);Here many things are happening! First the value of first, a pointer to a Node or null, is looked up and together with the new key it is passed to the Node constructor. This creates a new Node object with the same next value as first had so far. Then the resulting pointer is assigned to first.
addLast performs
getLast().next = new Node(key);Here a new Node object with the new key value is created. Its next value is set to null. Then the resulting pointer is assigned to the next field of the Node which is found by calling getLast. Here getLast walks along the chain until coming to the last node and returns this object (of course it would be handy to have a second instance variable "last" in order to access this position faster, but this would be less instructive).
It was pointed out that one should be very careful not to access the instance variables of a null-object. Is this not exactly what we are doing in the following loop in findValue?
while (node != null && node.key != x)
node = node.next;
No! The reason is that in an expression involving && the left-hand
side is evaluated first. If node == null, it is certain that the whole
evaluation will result with false and therefore it is interrupted.
On the other hand, it would have been fatal to write
while (node.key != x && node != null)
node = node.next;
Even though this works, depending on the programming language there
may be no guarantee that it does. Therefore this is an example of a
possibly risky programming style which might better be avoided. In
this case this goes at little extra cost by rewriting findValue() as
follows:
public boolean findValue(int x)
{
if (first == null)
return false;
Node node = first;
while (node.next != null && node.key != x)
node = node.next;
return node.key == x;
}
Using linked structures in an object-oriented way requires that one or more objects of some node-type occur as instance variables in the definition of another class. These instance variables give the access points to the structure.
class Employee
{
protected String name;
protected int number;
protected double salary;
public Employee(String theName, int theNumber, double theSalary)
{
name = theName;
number = theNumber;
salary = theSalary;
}
public double getSalary()
{
return salary;
}
public double getNumber()
{
return number;
}
public void increaseSalary(double salaryIncrease)
{
salary += salaryIncrease;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary + ")";
}
public void setName(String newName)
{
name = newName;
}
}
class Company
{
protected int size;
protected int maxSize;
protected Employee staff[];
public Company(int theMaxSize)
{
size = 0;
maxSize = theMaxSize;
staff = new Employee[maxSize];
}
public int getSize()
{
return size;
}
public int getMaxSize()
{
return maxSize;
}
public void setName(int number, String name)
{
int i = 0;
while (i < size && staff[i].getNumber() != number)
i++;
if (i == size)
System.out.print("Number not found, ignoring instruction!\n");
else
staff[i].setName(name);
}
public void addEmployee(String name, int number, double salary)
{
if (size == maxSize)
System.out.print("No space left, ignoring instruction!\n");
else
{
staff[size] = new Employee(name, number, salary);
size++;
}
}
public void increaseSalary(double factor, double leastIncrease)
{
for (int i = 0; i < size; i++)
{
double increase = factor * staff[i].getSalary();
if (increase < leastIncrease)
staff[i].increaseSalary(leastIncrease);
else
staff[i].increaseSalary(increase);
}
}
public void print()
{
System.out.print("\nOverview of employees:\n");
for (int i = 0; i < size; i++)
System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
}
}
class CompanyTest
{
public static void main(String ps[])
{
Company myCompany = new Company(100);
myCompany.addEmployee("Becker, Boris", 235521, 4500.00);
myCompany.addEmployee("Hecht, Edgar", 878722, 6500.00);
myCompany.addEmployee("Albers, Marianne", 456212, 1554.00);
myCompany.addEmployee("Krauser, Angela", 426578, 1954.00);
myCompany.addEmployee("Noack, Christina", 663738, 5646.00);
myCompany.print();
myCompany.increaseSalary(0.04, 50.0);
myCompany.addEmployee("Brauer, Harald", 568900, 2200.00);
myCompany.setName(456212, "Becker Marianne");
myCompany.print();
System.out.print("\n");
}
}
Here we have slightly changed even Employee. We have made the instance
variables "protected" in order to restrict the access from outside the
class. Instead special access methods are supplied.
These methods are typically given names like "getNumber" and "setName".
Of course this means that many extra calls to methods are made, but
errors in future extensions is worse! Never forget that in Java the
prime consideration is correctness, not speed. If speed is really
critical (as it is in programs solving very large problems and in
games), then in some small well-documented sections in which most of
the computation is performed you may do ugly things. However, if speed
really matters, then one can better write a hack in C.
One should also notice that once we have defined Employee how amazingly simple it is to build Company on top of it: we just declare an array of Employee and add a few methods for performing operations on the Company as a whole. Then the main program is more or less an empty shell. The good thing is that even without knowing about the underlying organization, any reader who understands the format immediately grasps what is going on. This is partially because of the names that were chosen, but even more because of the usage of powerful subroutines and the object-oriented programming style.
Here we touch on the most important new point. What does writing "myCompany.addEmployee( ... )" or "staff[i].increaseSalary( ... )" mean? Here we see the second usage of the dot-operator. Before we have seen that it can be used for accessing the instance variables of an object. Here we use it to connect an object with a method from its class. The semantic of this is, that the system first determines the class of the object, then searches for a method with matching signature in the class and then executes the method working on the instance variables of the object. In object-oriented languages, this is the major way of calling methods.
An exception are the static methods. A static method is any method which in its definition is preceded by the keyword "static". Static methods can be called without passing an object of the class on which it works. This implies that inside a static method there are no instance variables to use. A static method corresponds to a procedure in C and other non-object-oriented languages. The non-static methods are something new, the static ones we already know! Even in Java we already know one important example: main. Of course main should be callable without object, because by the time it is called there is not yet any object!
Static methods are encapsulated inside their classes. That is, if they are called from outside the class, it is not obvious where to find such a method (there might be static methods with the same name in several classes). Therefore, when calling a static method it is necessary to indicate where they can be found. This is done by prefixing the name of a static method with the name of the class connected by ".": another usage of the dot operator.
In principle it is possible to program in Java as in C: make one big class without instance variables and declare all methods to be static. This is against the whole concept of object-oriented programming, and therefore considered to be extremely bad style. Sometimes it is very handy though to have static methods, sometimes it more clearly expresses what is going on (a call with an object puts one an object in the foreground, but maybe the operation uses several objects as arguments in a symmetric way), and sometimes there is no alternative: as we mentioned before variables of the primitive data types are no objects. So, how should one compute e^x for a double x? The exponent function, and many other mathematical functions alike, are therefore static. This allows to call them the conventional way, without first converting a double into a Double (Double is the class with a double as an instance variable). Therefore, inside the class Math the method exp is defined as
public static double exp(double a)It can be called by writing Math.exp(x).
A somewhat strange case are the constructors. These are called by only giving the name of the method, but because this name is identical with the name of the class, it is clear where to find them. No object is passed, in this sense it resembles a static method, but the constructor allocates the object, and therefore the instance variables are available like in non-static methods.
In object-oriented programming, the default way of calling methods is by connecting an object of the appropriate class to the method with help of the dot operator. The method is working on this object. Static methods are called by specifying the class without passing an object.
Computing fib(n), the n-th Fibonacci number using directly fib(n) = fib(n - 1) + fib(n - 2) gives an algorithm whose time consumption increases exponentially with n. Of course Fibonacci numbers can easily be computed in an itterative way, but that is not the point here: this problem stands for a whole class of problems. An efficient recursive algorithm can be obtained by not only computing fib(n), but also fib(n - 1). From these two values fib(n + 1) and fib(n) can be computed in constant time and thus the time for computing fib(n) increases linearly with n, as it should do.
In C the two computed values may be handed over using variables of type int*. In Java each variable can be individually wrapped, but doing that means ignoring the structure of the problem: if the method should return a pair of values, then we should use objects of some class which can hold a pair of integers. This can now easily be turned into a correct and efficient program, but this leads to a functional rather than to an object-oriented approach. In an object-oriented context, it is cleaner to let the method work on objects of some class, than to let a static method return objects of this class. Taking all these considerations into account, we get the following program which can be downloaded here:
import java.io.*;
class IO
{
public static int readInt()
// Reads an int from standard input.
{
String input = "";
try
{
BufferedReader bufRead = new BufferedReader
(new InputStreamReader (System.in));
input = bufRead.readLine();
}
catch (java.io.IOException e)
{
System.out.print("Error while reading input line!\n");
}
return Integer.valueOf(input).intValue();
}
}
class Fibonacci
{
int x, y;
Fibonacci()
{
x = y = 0;
}
void recFib(int n)
{
if (n == 1)
{
x = 0;
y = 1;
}
else
{
recFib(n - 1);
y += x;
x = y - x;
}
}
static int fib(int n)
{
if (n == 0)
return 0;
Fibonacci p = new Fibonacci();
p.recFib(n);
return p.y;
}
}
class FibonacciTest
{
public static void main(String[] args)
{
System.out.print("\nGive n >>> ");
int n = IO.readInt();
System.out.println("Computed value = " + Fibonacci.fib(n) + "\n");
}
}
Here we will not try to understand the method readInt(). In Java even
IO is handled in a clean object-oriented way, but one would prefer C's
basic but convenient routines. Unformatted writing is easy, but reading
and formatted writing require quite elaborate methods. More interesting
is the class Fibonacci. It contains a static method fib(), which is
called from main(). We see how when calling readInt() and fib(), the
name of the class is indicated by prefixing it with the respective
class names.
In fib() an object p of the class Fibonacci is created. recFib() is called with this object. In recFib() recursive calls are made. Remind that the statement "recFib(n - 1)" is equivalent to "this.recFib(n - 1)", and in this way p, or more correctly a pointer to p, is handed all the way down until reaching the bottom of the recursion. There the values x and y are given values. Remind that whenever working inside a class with the instance variables, these are the instance variables of the current object. In our case this is the object p. Then the recursion returns step-by-step, eventually computing fib(n - 1) and fib(n). The second of these values is returned by fib().
class Node
{
static int totalSize = 0;
int key;
Node next;
Node(int key, Node next)
{
this.key = key;
this.next = next;
totalSize++;
}
Node(int key)
{
this(key, null);
}
protected void finalize()
{
totalSize--;
}
}
class Chain
{
Node first;
public Chain()
{
first = null;
}
private Node getLast()
// Return the last node of a chain
{
if (first == null)
return null;
Node node = first;
while (node.next != null)
node = node.next;
return node;
}
public void addFirst(int key)
// Add a new node at the beginning of the chain
{
first = new Node(key, first);
}
public void addLast(int key)
// Add a new node at the end of the chain
{
if (first == null)
first = new Node(key);
else
getLast().next = new Node(key);
}
public void addChain(Chain chain)
// Attach the Chain chain at the end of the considered chain
{
if (first == null)
first = chain.first;
else
getLast().next = chain.first;
chain.first = null;
}
public boolean findValue(int x)
// Test whether there is a node with key value x
{
Node node = first;
while (node != null && node.key != x)
node = node.next;
return node != null;
}
public void print()
// Print all the keys together with their position in the list
{
int counter = 0;
Node node = first;
while (node != null)
{
System.out.println("Node " + counter + " has key " + node.key);
counter++;
node = node.next;
}
}
}
class ChainTest
{
public static void main(String ps[])
{
Chain c1 = new Chain();
Chain c2 = new Chain();
System.out.println("\nCreating chain 1\n");
c1.addFirst(12);
c1.addFirst(22);
c1.addFirst(16);
c1.addFirst(14);
c1.addFirst(20);
c1.addFirst(18);
for (int i = 0; i < 100; i++)
if (c1.findValue(i))
System.out.print(i + " is among the stored values\n");
c1.print();
System.out.println("Total number of nodes = " + Node.totalSize);
System.out.println("\nCreating chain 2\n");
c2.addLast(11);
c2.addLast(23);
c2.addLast(19);
c2.addLast(37);
c2.addLast(21);
for (int i = 0; i < 100; i++)
if (c2.findValue(i))
System.out.print(i + " is among the stored values\n");
c2.print();
System.out.println("Total number of nodes = " + Node.totalSize);
System.out.println("\nConcatenating chains\n");
c1.addChain(c2);
for (int i = 0; i < 100; i++)
if (c1.findValue(i))
System.out.print(i + " is among the stored values\n");
System.out.println("\nChain 1:\n");
c1.print();
System.out.println("\nChain 2:\n");
c2.print();
System.out.println("Total number of nodes = " + Node.totalSize);
System.out.println("\nRemoving chain 1\n");
c1 = null;
System.gc();
System.out.println("Total number of nodes = " + Node.totalSize);
}
}
Class Chain is the same as before. Node is augmented by a static variable. A static variable is the fourth kind of variables next to instance variables, parameters and local variables. These might best be called class variables, so belonging to the class and not to the instance: for all the objects of a class there is only one copy of a static variable. This is the ideal way to maintain information pertaining to the class as a whole. The prime example of this is a counter which keeps track of the number of objects extent. For example, it may be counted how many external ports are in use, and once a new port is requested when the maximum number is already used, some special action must be taken. Inside the class these variables can be accessed just like the instance variables. Outside the class they are accessed analogously to the way a static method is accessed: the name of the static variable is prefixed with the name of the class connected by ".". An example is found in the instruction
System.out.println("Total number of nodes = " + Node.totalSize);
Of course this access is possible only if the variable is not private.
In Node we now also find a new method called finalize. The method finalize is called automatically by the system when an object is removed by the garbage collector, once for each removed object. It is by default part of any class definition (in an unvisible way) doing nothing, but one can choose to give it a certain functionality. Especially when one uses static variables to count occupied resources, it is important to also decrease there value when these resources are freed again.
Now one might think that in our example the value printed in the last line is 0: because there are no pointers anymore to the chain, all nodes in it have become garbage, unaccessible allocated parts of the memory. So, they could be removed. However, the garbage collection is done in a lazy way: typically it is only performed when need arises or when the processor is waiting anyway. Therefore, the printed value will most likely be 11. If one wants to force the garbage collector to run, then one should add a call to the static method gc from System:
System.gc();Notice that in addChain the final instruction is deleting the link of the attached chain. Without this instruction, the second half of the chain would still have been reachable, and the garbage collector would only throw away 6 of the nodes. It is strongly suggested that the readers actually try these variants of the program and understand what is happening.
Static variables are class variable: one copy exists for all objects of a given class. This is particularly useful for counters. In order to keep the counting up-to-date in the context of automatic garbage collection one should overwrite the method finalize().
Harder is it if we do not want to add an instance variable or a method but to change it. For example we may want to change the type of the node first in Chain from Node to BetterNode or we may want to replace a method which is good in a general case by a method which is better in a special case. Of course we can give it a new name and add it nevertheless. This is however quite ugly and confusing. It would at least require a very good documentation to make sure that later updates indeed choose the right methods. In any case it increases the number of variables and methods unnecessarily.
Now assume that we want to maintain objects with slightly different features in a common structure, for example an array. One can think of a shop having all kind of things to sell. For food articles there is an ultimate selling day, for non-food articles there may be seasons to respect. But all of them have a price. So, it makes sense to maintain all objects in an array and to call a method price increase. In C this is really hard to realize.
All mentioned aspects are dealt with in a trivial way by the idea of inheritance. Inheritance means that one defines a new class as an extension of an existing class. Such a new class is called a derived class, the class which it extends will be called mother class or base class.
A derived class inherits all the instance variables and methods of its mother class. In addition new instance variables and methods may be added. Instance variables from the mother class may even be defined again, shielding the variable from the mother class. Methods can be overwritten. Frequently a method in a derived class is merely a small modification of a method in the mother class. In that case it is natural and possible to reuse the code from the mother class by a special calling mechanism.
Inheritance is the key concept of object-oriented programming. It allows to add, extend and adapt the functionality of methods and to add instance variables to a class in a hierarchical way.
All what has been mentioned so far, holds true for any object-oriented language, possibly with some differences in terminology. The concrete example brings us back to Java. The class definitions of Employee and Company are not repeated, these classes are considered as being fixed. All of the following classes are all build on top of these two. It turns out that while designing these original classes, we might have been slightly more extension oriented: one method is formulated in an unsuitable way, another is not defined at all, even though it will arise in all derived classes. Therefore the following construction is slightly more complex than necessary. This might be considered as a realistic example therefore. Click here if you want to download the complete program.
class FixedEmployee extends Employee
{
public FixedEmployee(String name, int number, double salary)
{
super(name, number, salary);
}
public void endOfYear()
{
}
}
class Director extends FixedEmployee
{
private double yearlyBudget;
private double budget;
public Director(String name, int number, double salary,
double theBudget)
{
super(name, number, salary);
yearlyBudget = budget = theBudget;
}
public void endOfYear()
{
budget = budget / 2 + yearlyBudget;
}
public void expense(double amount)
{
budget -= amount;
}
public void increaseSalary(double salaryIncrease)
{
if (budget >= 0)
super.increaseSalary(2.0 * salaryIncrease);
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary +
", director, " + yearlyBudget + ", " + budget + ")";
}
}
FidexEmployee is only used to add the method endOfYear, which is
defined in all derived classes.
Director is defined as an extension of FixedEmployee. Director has two additional instance variables: "yearlyBudget" and "budget". The new constructor has one more parameter. It performs first a call super( ... ). In this case this means a call to the constructor of the mother class. However, the usage of super is not limited to this case: it generally denotes methods or instance variables in the mother class. The opposite is this, which we encountered already in class Node. It generally denotes the current object or a method, particularly a constructor, from the current class.
"endOfYear" and "expense" are new methods. More interesting are the methods which existed already before: "increaseSalary" and "toString". These are overwriting the methods with the same name in the mother class. increaseSalary calls the method in the mother class by specifying this with super.
class LowerEmployee extends FixedEmployee
{
protected int vacationDays;
protected int yearlyVacationDays;
public LowerEmployee(String name, int number, double salary,
int theYearlyVacationDays)
{
super(name, number, salary);
vacationDays = 0;
yearlyVacationDays = theYearlyVacationDays;
}
public int applyVacation(int numberOfDays)
{
if (numberOfDays > vacationDays)
numberOfDays = vacationDays;
vacationDays -= numberOfDays;
return numberOfDays;
}
public void endOfYear()
{
vacationDays = vacationDays / 2
+ yearlyVacationDays;
}
}
The class LowerEmployee has the same features as Director: a few new
instance variables and methods. In the constructor the constructor of
the mother class is again called. It is a requirement that this call is
the first statement of any constructor in a derived class.
class Staff extends LowerEmployee
{
private int overTime;
public Staff(String name, int number, double salary,
int yearlyVacationDays)
{
super(name, number, salary, yearlyVacationDays);
overTime = 0;
}
public void addOvertime(int hours)
{
overTime += hours;
}
public void endOfYear()
{
super.endOfYear();
vacationDays += overTime / 10;
overTime = 0;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary +
", staff, " + yearlyVacationDays + ", " + vacationDays + ")";
}
}
class Worker extends LowerEmployee
{
private static int shiftVacationDays = 5;
private boolean shiftDuty;
public Worker(String name, int number, double salary,
int yearlyVacationDays, boolean theShiftDuty)
{
super(name, number, salary, yearlyVacationDays);
shiftDuty = theShiftDuty;
}
public void increaseSalary(double salaryIncrease)
{
if (shiftDuty)
super.increaseSalary(1.1 * salaryIncrease);
}
public void endOfYear()
{
super.endOfYear();
if (shiftDuty)
vacationDays += shiftVacationDays;
}
public String toString()
{
return "(" + name + ", " + number + ", " + salary +
", worker, " + yearlyVacationDays + ", " + vacationDays +
", " + shiftDuty + ")";
}
}
The variable shiftVacationDays is static. This means that this is not
an individual quantity, but common to all members of the class.
class BetterCompany extends Company
{
public BetterCompany(int maxSize)
{
super(maxSize);
}
public void addEmployee(FixedEmployee newEmployee)
{
if (size == maxSize)
System.out.print("No space left, ignoring instruction!\n");
else
{
staff[size] = newEmployee;
size++;
}
}
public void endOfYear()
{
for (int i = 0; i < size; i++)
if (staff[i] instanceof FixedEmployee)
((FixedEmployee) staff[i]).endOfYear();
}
public void expense(int number, double amount)
{
int i = 0;
while (i < size && staff[i].getNumber() != number)
i++;
if (i == size)
System.out.print("Number not found, ignoring instruction!\n");
else
if (staff[i] instanceof Director)
((Director) staff[i]).expense(amount);
else
System.out.print("Employee with number " + number +
" is not a director, ignoring instruction!\n");
}
}
The class BetterCompany corrects an omission in Company: the method
addEmployee with an Employee parameter. Notice that in this case we
do not say that addEmployee is overwriting the method with the same
name in the mother class: the signature of these methods is not the
same. Here we rather encounter polymorphic variants.
The new method endOfYear makes it possible to perform endOfYear in the same way as increaseSalary in the original version. The new method expense makes it possible to call the method expense in Director in the same way as before we could call changeName.
In endOfYear we see the operator "instanceof". The reason for this is that staff[] is an array of Employee objects. Even though we might believe that these are actually of type fixedWorker, for which the method endOfYear is defined, there might also be a derived class TemporaryAid for which endOfYear is not defined. At this point it is important to introduce the difference between the declared type and the actual type of a variable. The declared type of staff[i] is Employee, the actual type may be any of the derived classes. instanceof determines at runtime the actual type of a variable and returns true if this matches the specified type.
Even though we now are sure that the application of endOfYear is correct, it still does not work to simply write
staff[i].endOfYear();
The problem is that endOfYear is not mentioned in class Employee. Thus,
at compile time, this looks wrong. Therefore it is required to add a
so-called cast. A cast is a forced type conversion. So, we
convert staff[i] in a FixedEmployee, or our own responsibility. Not
withstanding the cast, at runtime the actual type determines which
method to select.
Now we have obtained all we need to get a main program with considerably larger functionality. The changes to make are small. If Company would have been designed better, with a method addEmployee with Employee parameter, the changes would have been even less.
class CompanyTest
{
public static void main(String ps[])
{
BetterCompany myCompany = new BetterCompany(100);
myCompany.addEmployee(
new Staff("Becker, Boris", 235521, 4500.00, 28));
myCompany.addEmployee(
new Director("Hecht, Edgar", 878722, 6500.00, 10000000));
myCompany.addEmployee(
new Worker("Albers, Marianne", 456212, 1554.00, 23, false));
myCompany.print();
myCompany.endOfYear();
myCompany.print();
myCompany.addEmployee(
new Worker("Krauser, Angela", 426578, 1954.00, 25, true));
myCompany.addEmployee(
new Staff("Noack, Christina", 663738, 5646.00, 32));
myCompany.print();
myCompany.increaseSalary(0.04, 50.0);
myCompany.addEmployee(
new Worker("Brauer, Harald", 568900, 2200.00, 25, true));
myCompany.setName(456212, "Becker, Marianne");
myCompany.expense(878722, 73000);
myCompany.print();
System.out.print("\n");
}
}
Running the program gives the following output, clearly showing the
result of the more individual treatment.
Overview of employees: Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 0) Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.0E7) Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 0, false) Overview of employees: Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28) Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7) Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false) Overview of employees: Employee[0] = (Becker, Boris, 235521, 4500.0, staff, 28, 28) Employee[1] = (Hecht, Edgar, 878722, 6500.0, director, 1.0E7, 1.5E7) Employee[2] = (Albers, Marianne, 456212, 1554.0, worker, 23, 23, false) Employee[3] = (Krauser, Angela, 426578, 1954.0, worker, 25, 0, true) Employee[4] = (Noack, Christina, 663738, 5646.0, staff, 32, 0) Overview of employees: Employee[0] = (Becker, Boris, 235521, 4680.0, staff, 28, 28) Employee[1] = (Hecht, Edgar, 878722, 7020.0, director, 1.0E7, 1.4927E7) Employee[2] = (Becker, Marianne, 456212, 1554.0, worker, 23, 23, false) Employee[3] = (Krauser, Angela, 426578, 2039.976, worker, 25, 0, true) Employee[4] = (Noack, Christina, 663738, 5871.84, staff, 32, 0) Employee[5] = (Brauer, Harald, 568900, 2200.0, worker, 25, 0, true)
The above gives an example of polymorphism in the more strict sense: polymorphism means that variables can actually stand for different kinds of objects. This implies that parts of the program which are formulated in general terms can be applied to different kinds of objects. This notion is closely linked to the notion of dynamic binding: the above described phenomenon, that at runtime the actual type is used to determine which of the methods with identical signature is going to be used.
At compile time, it is checked that any method is connected by the dot operator to an object of a class in which this method is defined. This is done by checking the declared type of the object. At run time, the method to execute is chosen by looking at the actual type of the object connected to the method.
Now it is time to mention that any class is implicitly defined as an extension of class Object. Object is at the top of the class hierarchy. Without knowing this, we have already been using this fact implicitly. Consider a print statement of the following kind:
System.out.print("Employee[" + i + "] = " + staff[i] + "\n");
How does this work? First the expression between the round brackets is
evaluated. Here we use that the operator "+" is polymorphic, although
for operators we rather say that they are overloaded. So,
depending on the types of the arguments, "+" has a different effect.
This is nothing new, we already know that 3 / 4 < 0.5, while 3.0 / 4 > 0.5. The reason is here that in the first case "/" is evaluated as an integer operation, while in the second case it is evaluated as an operation between doubles. The rule for "/" is that it is evaluated as an integer operator if both its arguments are integers. If one of the arguments is a float or a double, then the other argument is converted to this type as well before the division is performed between floats or doubles. Notice that the resulting type has no impact: if x is a double, then "x = 3 / 4" is equivalent to writing "x = 0". Slightly more tricky is that "x = 3 / 4 * 10.0" has the same effect. The reason is that among operators with the same priority, the evaluation order goes from left to right (in this case).
The rules for "+" are different but similar. "+" between two String objects performs a concatenation of these. If the arguments are objects of other classes, then first the method toString is called. Because toString is defined in Object, this always works. Not overwriting toString results in a standard layout. Overwriting toString, as is done in Employee, allows to tune the output. Only when both arguments of "+" are of a numerical type, it is assumed that an addition is to be performed. Therefore we have
"Value = " + i + i != "Value = " + (i + i) i + i + "= Value" != i + (i + "= Value")
Casts are sometimes needed to obtain a forced type conversion.
Unfortunately there is no modifier for "the own class + all derived classes". The only way to obtain this is to define a method / instance variable as "protected" and not integrating any non-derived classes in the package.
A careful choice of the applied modifiers is of great importance: making everything public is convenient, but implies that external applications may essentially use features of the internal realization of a class. If later one wants to change this internal realization, then it may happen that these applications do not run correctly anymore.
It is good practice to fix a well-defined interface between the class and the outside world: that is to fix which instance variables of an object should be visible and which methods should be callable. Less visibility gives more flexibility! Classes should be defined according to their functionality, not according to how it is realized. For example: a Chain has the functionality of a special kind of (multi) set, with two insert operations and the possibility to unify to Chain objects. The general idea of limiting the access is called encapsulation, it is one of the corner stones of object-oriented programming.
The above argument should have made clear that it is wrong to only use public. But only using "private" or "public" is not good either. Sometimes classes are designed with the explicit purpose that they are going to be derived. One can consider Employee to be of this type. One may consider that the structure of Employee is so reasonable that there will never arise need to modify it. At the same time derivations are considerably facilitated if the instance variables and methods are accessible from the derived classes. Therefore, we have chosen to use "protected" for the instance variables in Employee.
The access modifiers allow the programmer to fix the degree of encapsulation of classes, objects and methods. Mostly instance variables are private or protected and can be accessed only by special access methods
Abstract is more or less the opposite of final: an "abstract" class must be derived. One cannot create any objects of an abstract class. Likewise, an abstract method must be overwritten. To make this consistent, the designers of Java have decided that abstract methods can only appear in abstract classes (but an abstract class can have methods that are not abstract).
There are good reasons to allow polyinheritance: many objects incorporate aspects of several more general classes. A person can both be an Employee and a ClubMember, an article in a shop may both be a FoodArticle and a LuxuryArticle. However, polyinheritance may also lead to consistency problems: if BClass and CClass each extend AClass and DClass extends both BClass and CClass, then methods from AClass are inherited in two possible ways. If a method from AClass has been overwritten in BClass and/or CClass, then at runtime it would not be clear which one to take.
To exclude this kind of problems in Java polyinheritance is generally forbidden. In other languages this problem is addressed differently: One might generally allow polyinheritance, but forbid inheritances which result in having equally valid variants of methods. One might allow any kind of inheritance, and in case a method is inherited several times, one might for example always select the variant from the first listed class in which it is defined. Java has chosen the most restrictive approach, assuring correctness and facilitating the task of the compiler, at the expense of programming possibilities.
An interface is something like a class, but different. It is close to being a fully-abstract class: it has only abstract methods (because this is default, one does not have to define them as such); it has no instance variables. The only thing it may have is interface constants: static final variables. For the rest an interface is like a normal class: one can define variables, parameter and return values of an interface type.
Interfaces are typically used to express properties. Therefore, it is customary to give an interface a name ending on "able". In the earlier example "BetterCompany", the class FixedEmployee was only defined to obtain a common platform for all the derived classes implementing the method endOfYear. This could better have been done in the following way using an interface (click here if you want to download the complete modified program):
interface EndOfYearAble
{
public void endOfYear();
}
class Director extends Employee implements EndOfYearAble
{
...
public void endOfYear()
{
budget = budget / 2 + yearlyBudget;
}
...
}
class LowerEmployee extends Employee implements EndOfYearAble
{
...
public void endOfYear()
{
vacationDays = vacationDays / 2
+ yearlyVacationDays;
}
...
}
class Staff extends LowerEmployee
{
...
public void endOfYear()
{
super.endOfYear();
vacationDays += overTime / 10;
overTime = 0;
}
...
}
class Worker extends LowerEmployee
{
...
public void endOfYear()
{
super.endOfYear();
if (shiftDuty)
vacationDays += shiftVacationDays;
}
...
}
class BetterCompany extends Company
{
...
public void endOfYear()
{
for (int i = 0; i < size; i++)
if (staff[i] instanceof EndOfYearAble)
((EndOfYearAble) staff[i]).endOfYear();
}
...
}
Because interfaces have neither instance variables nor worked-out methods, there are no problems related to having implementations of several interfaces and therefore a class may implement any number of interfaces.
Each class extends at most one other class, assuring that the inheritance hierarchy has a tree structure. But, classes may implement many interfaces, telling which methods certainly exist.
How does this work? If an error occurs, then one of the following things happens:
There are two types of exceptions: runtime and general exceptions. General exceptions must be dealt with, runtime exceptions might be dealt with. An example of a general exception is when reading: Java obliges you to take into account that you are trying to read beyond EOF. Thus, every read must be surrounded by a try-catch. The following piece of code gives a class which contains a static method for reading an integer. In case something goes wrong while reading, the user is informed, and 0 is returned from the method.
import java.io.BufferedReader;
import java.io.InputStreamReader;
class IntReader
{
static int readInt()
// Reads an integer from input
{
try
{
return Integer.valueOf(
(new BufferedReader(
new InputStreamReader(System.in)).readLine())).intValue();
}
catch (java.io.IOException e)
{
System.out.print("IO Exception occurred, returning 0");
return 0;
}
}
}
class ExceptionTest
{
public static void main(String ps[])
{
int i;
System.out.print("Give i >>> ");
i = IntReader.readInt();
System.out.print("i = " + i + "\n");
}
}
An example of a runtime exception is division-by-zero: you are free to test for this and to choose an appropriate reaction, but you are free to not do it: things become very slow if you test everything. Important in this context is the keyword throws: this allows to handle general exceptions at a higher level. Using throws indicates that one is aware of the possibility that something might go wrong, but that one does not want to deal with it at this level. Using throws may help to save many try-catch pairs.
import java.io.BufferedReader;
import java.io.InputStreamReader;
class IntReader
{
static int readInt() throws java.io.IOException
// Reads an integer from input
{
return Integer.valueOf(
(new BufferedReader(
new InputStreamReader(System.in)).readLine())).intValue();
}
}
class ExceptionTest
{
public static void main(String ps[])
{
int i;
System.out.print("Give i >>> ");
try
{
i = IntReader.readInt();
}
catch (java.io.IOException e)
{
System.out.print("IO Exception occurred, continuing with i == 0");
i = 0;
}
System.out.print("i = " + i + "\n");
}
}
Everything is classes, and so are exceptions. By deriving from the class Exception, you can define your own exception classes, which might be useful to test for non-standard exceptions. Any Java book gives you examples in case you might need something of this kind.
Exceptions are there to assure that in case something goes wrong a decent output is produced and resources are freed before crashing or going on in an alternative way.
At a superficial level, the object-orientedness of Java is expressed by the way methods are called: an object is connected to a method of the class of this method with the dot-operator, putting the object in the foreground. Much more important are the following general concepts of object-oriented programming:
IntegerMatrix should also have a static variable totalSize keeping track of the sum of all sizes of all matrices, and the constructors should refuse to allocate new memory when totalSize would exceed MAX_TOTAL_SIZE, for some constant. In that case some output is produced, ideally this is handled by a self-defined exception, but this is not required. The method finalize() should be overwritten to assure that totalSize remains accurate even when IntegerMatrix objects are removed by the garbage collector.
Integrate class IntegerMatrix into a program which creates several matrices, makes some assignments and performing some operations. More concretely, we want you to create matrices A, B and C, as specified below, and to compute A = A . (B + C).
( 1 7 2) ( 3 -7 -3) ( 0 4 -2)
A = (-1 2 7) B = (-4 2 3) C = ( 0 -1 -5)
( 1 4 -5) (-6 -1 3) (10 5 -2)
The initial, intermediate and final matrices should be printed.
Check that the computed results make sense:
( 3 -3 -5) (-17 12 -17)
B + C = (-4 1 -2) A . (B + C) = ( 17 33 8)
( 4 4 1) (-33 -19 -18)
This array should be sorted. To this end you should define a class Sort which has a static method sort() which has an array of Pairs as parameter. Sort has another parameter which is used to pass the value of m. Here we are not so much interested in efficiency but in handling classes. Define a further class, called Node. A Node has two instance variables: a Node and a Pair. The class NodeArray mainly consists of an array of Nodes. In our application this array has length m. Because the Nodes will be linked to each other so that they form lists, an object of NodeArray can be viewed as a set of m linked lists. In NodeArray there is a method which allows to insert a Node at the beginning of a list at a specified position of the array. NodeArray also has methods which allow to enumerate all Nodes in all lists in a systematic way, starting with the list at array position 0. The sorting can now be performed by sort() as follows:
Fill in the details yourself and work this out to a running program. Test it for m = 10 and n = 20.
In the current version, if there are several Pairs with the same key, then the order of these Pairs will get reversed. This is undesirable: in many applications it is required that a sorting subroutine is stable. With a minimal change the above sorting method can be made stable. How?
What is the running time of your algorithm expressed in terms of n and m? What do you get for m = O(n)?
Of course you should define a class Set for this. All operations should be perfectly intransparent and the instance variables should not be visible outside the class. All calls to the methods of set must be performed in an object-oriented way, none of the mentioned methods may be static.
Random numbers can be generated with help of the methods in the class Random in java.util. Use this to generate three random sets of size 100.000.000 each:
The task is to compute the number of bets resulting in a price (each bet gets at most one price). That is, you should first compute the union of S_1 and S_2, then intersect with S_3 and finally compute the size of the resulting set. Print this resulting number (if it does not lie between 1.940.000 and 1.960.000, then probably something is wrong with your program).
Integrate LChain into a program: take the program ChainTest from the text above and change the type of c1 and c2 from Chain to LChain. The text of the program can be downloaded here.
In a search tree, the keys are not are not arranged in just any possible way, but so that for any node the key of its left child (if existing) is smaller than its own key, and that the key in its right child is larger. This arrangement allows to easily perform the operation find: determining whether an element with a specified key exists or not. This is done in the following way: If the value x is smaller than the key y of the current node, then, if x occurs at all in the tree, it must occur in the left child or the nodes which can be reached from there. If the current node has no left child, then x does not occur. In case x > y, we must go right. If x is equal to the key, then we have found the value.
Create a class SearchTree implementing the above ideas. The class has an instance variable Node root. "root" corresponds to "first" in Chain: this is the node from which the structure is entered. There must be a trivial constructor, a method find along the above guidelines and a method print. The return type of find should be Node: it returns null when the value x we were looking for does not occur, otherwise it returns the Node with key equal to x. "print" should print all nodes in some systematic way. A very good idea is to do it recursively, A method is called recursive when it works by calling itself again (with a certain stopping condition). This recursive printing should however be handed over to a method within the class Node or an extension thereof (after testing that root != null). It has a structure of the following kind:
void print()
{
if (left != null)
{
System.out.print("Going left\n");
left.print();
}
System.out.print("Key value = " + key + "\n");
if (right != null)
{
System.out.print("Going right\n");
right.print();
}
}
It is a good idea, but not required, to also hand over find to
a method in the class Node.
Create the search tree from the picture "by hand", that is, by creating nodes with appropriate keys one by one and hooking them in the correct way. Then call print for the tree.
Inserting a node with key x in a search tree is also easy: Search for x. If x already occurs, we do not insert it again. Otherwise if the search ends in a node with key y != x, then if y > x, a new Node with key x is added as left child, otherwise as right child. Delete can be performed by marking the deleted nodes in a special way, if this value is inserted again later on, the marking must be undone.
Create a derived class MarkNode of Node which has one additional instance variable: boolean deleted. Of course this class also needs a constructor. The class Dictionary is a derived class of SearchTree. It has additional instance variables int size and int realSize. "size" indicates the number of non-deleted nodes, while "realSize" indicates how many nodes are physically there. Methods insert and delete are added. The actual work should best be done at the level of MarkNode.
Now create the same tree again by inserting the elements in appropriate order. For two trees to be the same the structure and the keys in corresponding nodes must be the same.
Create an empty Dictionary. Generate 100,000 random values in the range 0, ... , 199,999 and insert these in the order they are generated. Print the size of the tree. It should lie between 77000 and 80000. Generate 100,000 random numbers in the same range and count how many of them occur in the tree. It should be about 39300. Generate 200,000 random numbers in the same range and perform a delete for all of them. Print again the number of remaining nodes, now print both size and realSize. size should now lie between 46000 and 49000. Again generate 100,000 random values in the range 0, ... , 199,999 and insert these in the order they are generated. Print size and realSize. size is about 107600, realSize should be about 126400.
Create an empty Dictionary. Insert the numbers 0, 1, ..., 99,999 in this give order. What do you notice. What is the reason? Why did this not happen before? What is your conclusion about the suggested data structure Dictionary?
(a, b) + (c, d) = (a + b, c + d),Here the symbols +, - and * inside the brackets denote the operations on doubles. Define a class ComplexRing implementing these operations as methods. The methods should be called add, subtract and multiply. They should be non-static: the return value is the object the method is called with: computing x = y + z is performed by calling x.add(y, z). The method isZero is non-static. It returns a boolean when the complex number passed as an object is equal to zero. A complex number (a, b) is zero when a == 0 and b == 0.
(a, b) - (c, d) = (a - b, c - d),
(a, b) * (c, d) = (a * b - b * d, b * c + a * d).
Add a constructor which can be called with two double arguments. Also add a method (called with an object connected to the method name with the dot operator) readComplex. readComplex asks for two doubles which are read with help of the class DoubleReader. It can be downloaded here. Overwrite the method toString from Object to enable printing complexNumbers in a decent way. The instance variables should be "protected" and the methods "public".
We build on on the class ComplexRing. Define a derived class ComplexField. This class has one extra instance variable and some extra methods. The private double instance variable "norm" at all times gives the norm of the complex number, which for a number (a, b) is defined as a^2 + b^2.
The method isZero is overwritten: the test on zero is simplified to norm == 0. reciprocal is a private static method returning the reciprocal of the complex number passed as an argument. This is defined as follows: for a number (a, b) the reciprocal is given by the number
(a, b)^{-1} = (b / norm, -a / norm).Use this method to define a method divide which returns x / y = x * y^{-1} in the object z with which it is called for two complex numbers x and y passed as arguments. So, the method can be called as z.divide(x, y).
In all methods both these of ComplexRing and CompledField one should be careful to assure that if the arguments overlap with the calling object, that then the correct result is computed. Either one might test for this and write a special strategy, or one should work with some dummy variable.
Define an exception divisionByZero. This exception is thrown by the method divide when the second argument equals zero. Read the above text on exceptions to see how this is done and consider the example in class Seven.
Create a program with main embedded in a class called ComplexTest. In main five complex numbers are created: u, v, x, y, z. u, x and y are read in. v is initialized at zero. Then compute z = x + y - z, and subsequently z = z * v. Print the results. Then compute x / x x / y and z / z and print the results.
Depending on the precise operations that can be performed each ADT has variants that may be best realized with different implementations. Sometimes also the frequence with which the operations are performed may have an impact on the choice of the implementation. For example, priority queues may or may not support decreasKey. If decreaseKey is supported, this operation may be far more frequent than insert and deleteMin. Such aspects have resulted in a multitude of priority-queue implementations.
In the following we look at implementations, but it is essential to keep in mind the separation between an ADT and its implementation. If we speak about the list ADT, that this does not refer to a special actual data structure. And although priority queues are mostly implemented with heaps (which themselves can be implemented in various ways), we do not even want to know this when we talk about priority queues. In this context programming languages like Java that allow encapsulation are at their best.
We want to be able to perform a sub- or superset of the following operations:
So, we have a "pointer" to the first element, and then there follows a number of elements that are linked together, until a final element that points to "null", the special element that means "no object". In Java there is no explicit concept of pointers. Instead in Java one can define a class with a field of the class itself, which is typically called next. Writing x = y.next brings us to the next element in the list, leaving all address and pointer management implicit.
We consider how we can perform the above operations with a linked list data structure. The main difference with before, is that once we are at a position (getting there is expensive because we cannot jump to an arbitrary position but must plow forward through the list) insertions and deletions are trivial: by simply relinking an element can be in/excluded in constant time. So, now we have
class Node
{
Object key;
Node next;
Node(Object key, Node next)
{
this.key = key;
this.next = next;
}
Node(Object key)
{
this(key, null);
}
}
public class Iterator
{
Node current;
public void Iterator(Node current)
{
this.current = current;
}
public void Iterator(LinkedListIterator iterator)
{
current = iterator.current;
}
public boolean isLast()
{
return current == null;
}
public void next()
{
if(!isLast())
current = current.next;
}
public Object getKey()
{
if(isLast())
return null;
else
return current.key;
}
}
It is intentional that LinkedListIterator and all its methods are
public, thus being accessible everywhere, whereas ListNode and current
are unmodified thus being accessible only from within the package.
Because current is not public, the second constructor is added which
allows to create a new iterator "pointing" to the same node as before.
The class LinkedList can now be build up on top of the iterator. Using
the class LinkedListIterator as an intermediate gives a clear
separation of the concepts of the list and a position in it.
The requirements are more limited than for lists, and therefore we may expect all operations to be performable efficiently. In this context, that means in O(1) time. Stacks are of great importance in many contexts, particularly also in the computer itself: a procedure call results in pushing the current state of the calling routine on the stack, allowing to resume execution here at a later stage after restoring the state.
public class Stack
{
// Invariant: at all times after completion of the routines,
// numberOfElements always gives the current number of elements.
private int[] a;
private int size;
private int numberOfElements;
public Stack(int size)
{
this.size = size;
a = new int[size];
numberOfElements = 0;
}
public boolean isEmpty()
{
return numberOfElements == 0;
}
public boolean isFull()
{
return numberOfElements == size;
}
public void push(int x)
{
if (! isFull())
{
a[numberOfElements] = x;
numberOfElements++;
}
else
System.out.println("Stack full, ignoring addition!");
}
public int pop()
{
if (! isEmpty())
{
numberOfElements--;
return a[numberOfElements];
}
else
{
System.out.println("Stack empty, returning 0!");
return 0;
}
}
}
We work out the details. If we have just started a fresh structure, then we may assume the size of the stack, n, to be half of the size of the array, 2 * n. That is, if we are growing beyond the size of the array, then we have been performing at least n pushes to the stack. That is, creating a new array of double size and copying all the data from the stack into it would cost at most 2 copy operations per performed push operations. If the size of the stack is shrinking, then we are going to create a new array of size n and have to copy the remaining n / 2 elements of the stack into it. It takes at least n / 2 pops before we have to do this. Thus, amortizing these costs, this adds the cost for one copy for every performed pop.
These ideas require only minor modifications to push and pop. For x = 2, this has been implemented in a program, which can be downloaded here. In this program the effectiveness of this strategy is tested by performing many pushes and pops on an initially empty stack of size 2. More precisely, n steps are performed. At most 2 pushes are performed, each of these pushes is performed with 50% probability. Then a single pop is performed. So, the expected number of pushes equals the number of pops, but because pops are occasionally performed on an empty stack, the expected average stack size nevertheless grows with n. Running a test for n = 10^9 gives:
Number of pushes = 1000029574 Number of pops = 1000000000 Number of pops on empty = 2647 Number of copied elements = 201196 Maximum number on stack = 37315 Average number on stack = 15456 Final number on stack = 32221In this case the maximum size is not so much larger than the average size. A more important observation is that the number of copied elements is negligible in comparison with the number of operations. In other words, the close-to-optimal size of the array is obtained at negligible additional cost. This program may be understood as an abstraction of some clerk doing work at a constant pace. In principle he is fast enough for the work supplied, but nevertheless, in the course of the years, the pile of work on his desk tends to grow.
An array-based implementation of a stack combines simplicity and efficiency. Dynamically adapting the size of the array to the number of elements on the stack, the memory usage can be kept to within a constant factor from the optimum, while assuring O(1) amortized time per operation both for pushes and for pops.
We work out the idea outlined above. In practice, depending on the distribution of pops and pushes, it may be best to work with green and yellow zones, but the easiest is to start building new structures immediately. As soon as we start working with the new array of size 2 * n, we create a new array: A_s of size n and A_l of size 4 * n. Then, whenever the size of the stack reaches a new maximum value n + x, we copy two more elements from the stack to A_l (the elements with indices 2 * x - 2 and 2 * x - 1). Whenever we reach a new minimum value n - x, we copy one element to A_s (the element with index x - 1). In this way, the copying has just been completed when the stack has 2 * n or n / 2 elements.
Because of the very special (simple) properties of the insertions and deletions to a stack, all this is rather easy. For other data structures, one can often apply the same idea, but not so easily. For a list, to which insertions can be performed anywhere one must be more careful, and updates may also have to be made to the half-ready structure. Actually, for stacks the cost for creating new smaller structures can be saved all together, when we do not throw away the previously used smaller arrays: the contents of these are still valid.
Starting the construction of the new larger array before it is actually needed, the memory usage can be kept to within a constant factor from the optimum, while assuring O(1) worst-case time per operation both for pushes and for pops.
It is not hard to achieve O(1) for both operations. There are variables head and tail. Head indicates the position of the array where the next element can be dequeued. Tail indicates where the next element can be enqueued. Initially we set head = 0 and tail = 0. The following operations then correctly perform the queue operations as long as the array is long enough:
class Queue
{
// Invariant: at all times after completion of the routines,
// head indicates position to dequeue, tail position to enqueue
protected int a[];
protected int size;
protected int head;
protected int tail;
public Queue(int s)
{
size = s;
a = new int[size];
head = 0;
tail = 0;
}
public boolean isEmpty()
{
return tail == head;
}
public boolean isFull()
{
return tail == size;
}
public void enqueue(int x)
{
if (!isFull())
{
a[tail] = x;
tail++;
}
else
System.out.println("No space left, ignoring!");
}
public int dequeue()
{
if (!isEmpty())
{
int x = a[head];
head++;
return x;
}
else
{
System.out.println("Attempt to dequeue from empty queue!");
return 0;
}
}
}
The problem with this approach is that if we are performing many operations, then we will run out off the array, even when the maximum number of elements enqueued at the same time is small. There is a much better construction with arrays, which is hardly more complicated and still guarantees that all operations (as long as the size of the array is sufficiently large to store all elements) can be performed in O(1): head and tail are computed modulo the length n of the array a[]. There are three situations to distinguish:
class BetterQueue extends Queue
{
protected boolean isFull;
public BetterQueue(int s)
{
super(s);
isFull = false;
}
public boolean isEmpty()
{
return (head == tail) && !isFull;
}
public boolean isFull()
{
return isFull;
}
public void enqueue(int x)
{
if (!isFull())
{
a[tail] = x;
tail++;
if (tail == size)
tail = 0;
isFull = tail == head;
}
else
System.out.println("No space left, ignoring!");
}
public int dequeue()
{
if (!isEmpty())
{
int x = a[head];
head++;
if (head == size)
head = 0;
isFull = false;
return x;
}
else
{
System.out.println("Attempt to dequeue from empty queue!");
return 0;
}
}
}
Modify the program so that in every round exactly one pop is performed, while two pushes are performed with probability p each, 0 <= p <= 1/2. For p = 0.45, 0.40, 0.30, determine the average stack size when performing 10^7, 10^8 and 10^9 rounds. Are the results in line with the claimed constant average size? Give the average size as a function of p. Depending on the context, the maximum stack size may be more important than the average stack size. Also measure these. If necessary you should repeat the experiments to get more stable results. Is even the maximum size independent of the number of rounds?
These experiments are important because of the following. For the analogous dynamized implementation of a queue the average and maximum sizes are the same. The queue process can be interpreted as customers arriving at the cash of a supermarket being serviced one at a time in order of arrival. The attempts to dequeue from an empty queue can be viewed as "idle time". From the point of view of the shopkeeper, this is wasteful. On the other hand, if waiting times become too long, customers may prefer another shop next time. Customers are probably most sensitive to long waiting times. How much over-capacity must a shop offer in order to assure that less than one percent of the customers enters the queue when five or more other customers are waiting already?
Binary trees can be generated starting from trees with zero or one nodes in a recursive way. Generally, a tree is given by one of the following structures:
In addition to the edges directed away from the root, other edges may be added in order to facilitate certain operations. It may be handy to add links to the parent nodes. Because every node has at most one parent, this does not cost much. If the leafs of T are linked together, it becomes easy to walk along the bottom of the tree. If, as in most binary-tree implementations the nodes have storage for links to two children, this can be implemented without using additional memory.
The following gives a basic Java class for the nodes of a binary tree:
class TreeNode
{
int key;
TreeNode leftChild;
TreeNode rightChild;
TreeNode(int key)
{
this.key = key;
leftChild = null;
rightChild = null;
}
}
A ternary tree may be realized by adding an instance variable
TreeNode middleChild. This may also require a second key value.
If the degree of the tree is bounded by some fixed value k, but not
necessarily a small constant, then it is convenient to work with an
array of length k containing the possible children. Because nodes like
this require O(k) storage this should only be applied if all internal
nodes have degree k or some value close to it. For the leafs another
type of nodes should be used. If the degree is unbounded or if the
tree is very irregular, then the children should be maintained in some
list data structure.
Notice that any tree in which all nodes that are not leafs have degree 2 has at least n / 2 leafs. More precisely, such a tree has (n + 1) / 2 leafs. The proof goes by induction. The claim is true for a tree with one leaf. Now consider a tree T with at least one internal node. Let u be one of the deepest lying internal nodes. Because of this choice, both children v and w of u are leafs. Let T' be the tree obtained from T by replacing u and its two children by a single leaf node u'. T' is again a tree satisfying the condition that all non-leafs have degree 2, having n - 2 nodes. So, it has ((n - 2) + 1) / 2 = (n + 1) / 2 - 1 leafs because of the induction hypothesis. Thus, T has (n + 1) / 2 leafs, because it has v and w as leafs instead of u'.
Trees will be used to store information in the nodes, that can be found with help of keys that are used for guiding us on a path down from the root to the information we are looking for. The fact that more than half of the nodes are leafs implies, that if for some reason we prefer to only store information in the leafs, and not in the internal nodes (this may make updates easier), then we can do so at the expense of at most doubling the size of the tree.
In applications, a node may stand for a personal record which contains much more information than a single integer. A person has a first and a second name, a gender, an age an address, a personal number, etc. This can be handled in two rather different ways:
So, on a binary search tree (hereafter the word "binary" will be omited, as we will mainly consider binary trees), testing whether a certain key x occurs in the set of stored keys can be performed as follows:
boolean find(int x)
// Test whether there is a node with key x.
{
if (x < key)
if (leftChild != null) // go left
return leftChild.find(x);
else
return false;
else if (x > key)
if (rightChild != null) // go right
return rightChild.find(x);
else
return false;
else // key == x
return true;
}
This method is called from outside with root.find(x) (assuming root
!= null). It takes time proportional to the number of traversed links.
If there is a node with key x, this equals the depth of x. In
general, the running time of find is bounded by O(depth), where
depth gives the depth of the tree.
The above method find() is correct and easy to understand, but it can be formulated much more concisely and slightly faster as follows:
boolean find(int x)
// Test whether there is a node with key x.
{
if (x < key)
return leftChild != null && leftChild.find(x);
if (x > key)
return rightChild != null && rightChild.find(x);
return true;
}
Here we are explicitly using that boolean expressions are evaluated
lazily: once it is known that leftChild == null, the second part of
the expression is not evaluated.
In a binary tree each node has at most two children. This immediately gives a simple recurrency relation for the maximum number n_max(k) of nodes in a binary tree of depth k:
n_max(0) = 1,Trying a few values gives n_max(0) = 1, n_max(1) = 3, n_max(2) = 7, n_max(3) = 15. This should suffice to formulate the hypothesis that generally n_max(k) = 2^{k + 1} - 1. This can easily be checked using induction. More generally, for g-ary trees, we have
n_max(k) = 1 + 2 * n_max(k - 1), for all k > 0.
n_max(g, 0) = 1,n_max(g, 0) = 1, n_max(g, 1) = 1 + g, n_max(g, 2) = 1 + g + g^2 and n_max(g, k) = sum_{i = 0}^k g^k = (g^{k + 1} - 1) / (g - 1). Notice that n_max(g, k) >= g^k for all g and k, which is easy to remember. So, the number of nodes may be exponential in the depth of the tree.
n_max(g, k) = 1 + g * n_max(g, k - 1), for all k > 0.
A level in a tree is constituted by all nodes at the same distance from the root. Level k of a binary tree T is said to be full if there are 2^k nodes at level k of T. A binary tree is called perfect if all levels are full except possibly the deepest one. A kind of a reversal of the above analysis of N(k) is given by:
Lemma: A perfect binary tree with n nodes has depth equal to round_down(log_2 n).
Proof: A perfect binary tree with 2^k - 1 nodes has depth k - 1: it consists of k full levels. Thus, the perfect binary tree with 2^k nodes must have depth k. At this new level, 2^k nodes can be accommodated. So, any perfect binary tree with n nodes, for 2^k <= n < 2^{k + 1} has depth k. For all these n, the depth is given by k = round_down(log_2 n). End.
The depth of trees ranges from logarithmic to linear. Therefore, bounding the time of operations in the depth only makes sense if at the same time it is assured that the depth remains bounded.
The depth of all nodes can be computed by a simple recursive procedure, exploiting the structural characterization of a tree (empty, a node or a node which is the root of two trees):
void computeDepth(int depth)
{
this.depth = depth;
if (leftChild != null)
leftChild.computeDepth(depth + 1);
if (rightChild != null)
rightChild.computeDepth(depth + 1);
}
This method is called from outside with root.computeDepth(0) (assuming
root != null). Because a single execution of the body of this method
takes O(1) and because it is called once for every node, computeDepth()
takes time linear in the number of nodes. This can also be proven in a
more formal way:
Lemma: computeDepth() takes O(n) for a tree with n nodes.
Proof: The time consumption T(n) for applying computeDepth() to a tree with n nodes is given by
T(0) = 0, T(1) <= c, T(n) <= c + T(n_left) + T(n_right).Here n_left and n_right denote the size of the left and right subtree of the root respectively. n_left + n_right = n - 1. The solution is T(n) = c * n. This can be proven by induction. For n = 0 and n = 1, this is correct. Now assume the claim has been proven for all n' < n, then we get T(n) <= c + T(n_left) + T(n_right) = c + c * n_left + c * n_right = c * (n_left + n_right + 1) = c * n. End.
void preorder()
// Print all nodes in preorder.
{
System.out.println("Key = " + key);
if (leftChild != null)
leftChild.preorder();
if (rightChild != null)
rightChild.preorder();
}
This method is called from outside with root.preorderEnumerate()
(assuming root != null). Just as computeDepth() it takes O(n) time for
a tree with n nodes. Instead of the print statement any other
instruction might be executed with the data of this node. The method
performs a so-called preorder traversalinorder traversal, if the operation is performed after the
second recursive call, it is called a postorder traversal.
Of these three traversals the inorder traversal is the most interesting if the key values are printed. Because of the definition of a search tree, this results in all key values being printed in sorted order. The correctness of this claim can be proven by structural induction. For any tree, we may assume the keys of the nodes in the left subtree of the root are printed in order. All these values are smaller than the key of the root, which is printed after them. This root key is printed before any of the keys of the nodes in the right subtree, which are printed after it, and which may be assumed to be printed in sorted order.
The traversal can also be performed in an iterative way using a stack: before continuing with the left child, the current node is pushed; when getting stuck the top element is popped from the stack and we continue at its right child. This idea can be worked out as follows:
static void preorder(TreeNode current)
// Print all nodes in preorder.
{
Stack stack = new Stack(); // A stack of TreeNode
while (current != null)
{
System.out.println("Key = " + key);
stack.push(current);
current = current.leftChild;
while (current == null && stack.isNotEmpty())
current = stack.pop().rightChild;
}
}
This is less elegant but may save time and storage. In order to save
time the stack management should be integrated into the method. Memory
is saved because for each node on the path only a single pointer is
pushed on the stack. Very deep recursions, such as they may arise when
applying the recursive algorithm to a very deep tree, are problematic
because they may lead to an overflow of the internally managed
recursion stack. The iterative algorithm does not share this problem
because a user-defined stack can be as large as the available memory.
A binary search tree can be built in a direct way by first sorting the key values then selecting the middle key for the root r and building recursively trees for the sets of smaller and larger values, which are attached to r as left and right child, respectively. For a set of keys stored in an array a[] of length n, this can be worked out as follows:
static TreeNode build(TreeNode[] a, int l, int h)
// Build a tree from the nodes in the subarray a[l] ... a[h].
{
if (h < l)
return null;
int m = (l + h + 1) / 2;
a[m].leftChild = build(a, l, m - 1);
a[m].rightChild = build(a, m + 1, h);
return a[m];
}
static TreeNode build(TreeNode[] a, int n)
{
Sort.sort(a, n);
return build(a, 0, n - 1);
}
How long does this take? First we are sorting. For general instances,
this takes O(n * log n), but we should not forget that it sometimes can
be performed faster. Then we have a recursive procedure for which the
time consumption can be written as follows:
T(0) <= cThe most easy way to proof that the time consumption is linear is by considering the development of the times bottom-up: T(0) <= c, T(1) <= 3, T(3) <= 7 * c, T(7) <= 15 * c. Generally: T(n) <= (2 * n + 1) * c, for all n = 2^k - 1, for some k >= 0. This can be proven by induction: T(0) = c = (2 * 0 + 1) * c. Assume the relation holds for all k' < k, then T(n) = T(2^k - 1) <= c + 2 * T(2^{k - 1} - 1) = c + 2 * (2^k - 1) * c = (2 * n + 1) * c, as it should be. So, we need T_sort + O(n) in total. The constructed tree is a perfect tree, but at the lowest level the nodes are not "right-aligned".
T(n) <= c + T(m) + T(n - m - 1), for all n > 0.
We claim that buildTree has the optimal time consumption. For instances for which T_sort(n) = O(n), this is clear: in that case T_build(n) = O(n) and building a tree with n nodes takes at least linear time. So, assume T_sort(n) = omega(n). Above we have seen that, given a binary search tree with n keys, an inorder traversal can be used to output all keys in sorted order in O(n) time. Thus, denoting by T_build(a, n) the minimal time for constructing a binary search tree for the n keys in a[], we have
T_build(a, n) + c * n >= T_sort(a, n).Thus, T_build(a, n) >= T_sort(a, n) - c * n. If T_sort(a, n) = omega(n), then T_sort(a, n) - c * n >= T_sort(a, n) / 2 for sufficiently large n, showing that T_build(a, n) = Omega(T_sort(a, n)). So, the optimality is of the strongest kind: for every input the time lies within a constant factor from the optimum for that input.
Using arrays it is no problem to either perform insert in O(1) and find in O(n), or, when keeping the array sorted at all times, to do the find in O(log n) and the insert in O(n). Trees are the key to greater efficiency. Let us first consider how these operations can be implemented, without paying too much attention to the efficiency.
If there are rare updates, then it is a good idea to keep in addition to the tree (or simply the sorted array of elements), a list with "new issues". If an element is added, then it is put in this list. If an element is deleted, then it is marked as deleted in the existing tree. Doing this, a deletion is trivial: the same time as a find. An insertion takes O(1) after testing whether the element is already there. A find takes O(log n + length of additional list). So, this is fine as long as the additional list contains at most O(log n) elements. Because of the extreme simplicity, this may indeed be a good idea in some practical cases.
A disadvantage is that every now and then the tree has to be rebuild entirely. However, there is no problem to amortize these costs. The idea is very similar to what we have been doing for queues and stacks implemented array: start rebuilding the tree already before we need the new one.
Even amortizing the cost will not bring us so much: already after O(log n) additions, we may need a new tree. This new construction costs us O(n) (most keys are already sorted), so n / log n per performed insertion. Thus, only if we perform many more finds and deletes than insertions (the factor is n / log^2 n), this works well.
Actually, this whole idea has been applied for centuries in the context of printed media. Think of an encyclopedia: the original work appears, soon followed by a list of errata. Then, every year, updates are added. The speed at which information can be found goes down with every update. Finally a completely new issue appears and the process begins again. One of the biggest advantages of the use of electronic storage devices is that information can be changed or deleted without leaving traces.
void insert(int x)
// Insert a node into the tree.
{
if (x < key)
if (leftChild != null) // go left
leftChild.insert(x);
else
leftChild = new TreeNode(x);
else if (x > key)
if (rightChild != null) // go right
rightChild.insert(x);
else
rightChild = new TreeNode(x);
else // key == x
System.out.println(
"\nKey " + x + " exists, insertion ignored!");
}
The simplest general solution is to perform lazy deletions: a node to delete is marked by setting an additional boolean instance variable of TreeNode to false. Otherwise the node is left unchanged. This idea guarantees to not impair the search-tree property because the key of a deleted node is still available for guiding the search through the tree. So, the only change to the procedure find is to test whether a node with key == x is deleted or not before returning true. When inserting a node with a key x, while a deleted node with key x is still present in the tree, then the deleted node should be replaced by the new one. Any newly inserted node gets its mark set to true.
This is a very good solution if we have a finite set of elements which are repeatedly inserted and deleted. However, in that case we can better use a sorted array, which is even simpler. In general, this idea has the problem that over time the structure may become more and more polluted. The obvious solution is to rebuild the tree without the deleted nodes as soon as their number exceeds 50% (or some other constant fraction) of the total number of nodes. This implies a certain waste of memory space, but typically it has very little impact on the time for searching: if we can guarantee that the depth of a tree with n nodes is bounded by c * log n, for some constant c, then the time to search in a tree with n real nodes polluted with at most n deleted nodes is c * log (2 * n) = c * log n + c instead of c * log n for a clean tree.
The rebuilding is not too expensive: performing an inorder tree traversal, all non-deleted nodes can be extracted from the tree in sorted order in linear time. Building a new perfect tree out of these takes linear time as well. If the resulting structure has n nodes, then at least n / 2 delete operations can be performed before the next rebuilding. Thus, the rebuilding has an amortized cost of O(1) per deletion, which is negligible in comparison to the time needed for finding the nodes.
The general solution is to find a replacement v for u. The key of v should be such that all nodes in the left subtree have smaller values and all nodes in the right subtree larger values. The largest key value m in the left subtree has the desired property: by definition m is larger than all other keys in the left subtree, and because we may assume (induction) that so far the tree had the search-tree property, m is smaller than all keys in the right subtree. Also the smallest key value from the right subtree has the desired property. The problem is that the node v with key m is not necessarily a leaf itself. However, the node with the largest key in the left subtree of u has no right child and does has degree at most 1. The same is true for the node with the smallest key in the right subtree of u. So, the deletion ends here. This can be worked out as follows:
void delete(TreeNode parent, int x)
// Delete the node with key x from the tree.
{
if (x < key)
if (leftChild != null) // go left
leftChild.delete(this, x);
else
System.out.println(
"\nKey " + x + " does not exist, deletion ignored!");
else if (x > key)
if (rightChild != null) // go right
rightChild.delete(this, x);
else
System.out.println(
"\nKey " + x + " does not exist, deletion ignored!");
else // x == key
{
if (leftChild == null)
if (this == parent.leftChild)
parent.leftChild = rightChild;
else
parent.rightChild = rightChild;
else if (rightChild == null)
if (this == parent.leftChild)
parent.leftChild = leftChild;
else
parent.rightChild = leftChild;
else // this has degree 2
key = leftChild.deleteMax(this);
}
}
int deleteMax(TreeNode parent)
{
if (rightChild != null)
return rightChild.deleteMax(this);
else
{
if (this == parent.leftChild)
parent.leftChild = leftChild;
else
parent.rightChild = leftChild;
return key;
}
}
Here parent is always taken along in the searches in order to easily
get access to the parent of the current node which is called child.
The above shows that the time consumption for all three operations is bounded by O(depth of tree). As long as the tree is as it should be: balanced in the sense that for n nodes the maximum depth is bounded by O(log n), we are happy. However, very natural operations may lead to trees that are degenerated: having depth omega(log n). This already happens when the elements are inserted in sorted order: all elements are inserted as right childs, leading to a tree with the structure of a linear list.
If it can somehow be guaranteed that the tree remains balanced in the dynamic context of insertions and deletions, then all operations can be performed in O(log n) time.
class Tree
{
TreeNode root;
public Tree()
{
root = new TreeNode(-1);
}
static TreeNode build(TreeNode[] a, int l, int h)
// Build a tree from the nodes in the subarray a[l] ... a[h].
{
if (h < l)
return null;
int m = (l + h + 1) / 2;
a[m].leftChild = build(a, l, m - 1);
a[m].rightChild = build(a, m + 1, h);
return a[m];
}
public void rebuild()
// Rebuild the tree in a perfect way.
{
if (root.leftChild != null)
{
TreeNode[] a = root.leftChild.treeToArray();
root.leftChild = build(a, 0, a.length - 1);
}
}
public boolean find(int x)
// Test whether there is a node with key x.
{
return root.find(x);
}
public void insert(int x)
// Insert a node into the tree.
{
root.insert(x);
}
public void delete(int x)
// Delete a node with key x from the tree.
{
root.delete(null, x);
}
public int size()
// Compute the size of the tree.
{
return root.size();
}
public void preorder()
// Enumerate the nodes of the tree in preorder.
{
System.out.println("\nPrinting nodes preordered:");
root.preorder(0);
System.out.println(
"Tree has " + IO.intToString(size(), 4) + " nodes in total");
}
public void inorder()
// Enumerate the nodes of the tree in inorder.
{
System.out.println("\nPrinting nodes inordered:");
root.inorder(0);
System.out.println(
"Tree has " + IO.intToString(size(), 4) + " nodes in total");
}
}
Click here to see most of the presented code fragments integrated into a running program. The final tree is the perfect tree with keys 1, 13, 27, 44, 54, 64, 71, 89, 92, which is shown in a picture given above.
The balance of a node is the difference between the depth of the left and right subtree. The balance can be computed easily by a modification of the method computeDepth(). A node is said to have the AVL property, if its balance is -1, 0 or 1. A tree is said to be an AVL tree if all its nodes have the AVL property. This definition leaves considerable flexibility. One of the positive aspects is that the AVL property is local, this implies that an insertion or deletion only has impact for the nodes along the search path. Any perfect tree is an AVL tree, but an AVL tree does not need to look like a perfect tree at all: the leafs may lie at many different levels. A priori it is neither clear that the depth of such trees is bounded to O(log n) nor that we can perform inserts and deletes on AVL trees without needing reconstructions that cost more than O(log n) time.
The above given java program has been extended to incorporate the notion of balance. This has been realized by defining subclasses of the classes TreeNode and Tree. A HeightedTreeNode has one additional instance variable, height. This is used for storing the height of a node in the tree. In this implementation the height is not updated during insertion and deletion. Instead it is recomputed for all nodes before producing output. The balance of a node can be computed in constant time from the heights of its children. So, in an AVL-tree implementation, there is no need to save and update the heights of both its children. In the extended Java program, first the above shown AVL tree is constructed, then it is turned into a perfect tree. The perfect tree has smaller depth and less unbalance illustrating that in general an AVL tree is far from perfect.
N(0) = 1,Thus, the N(d) are very similar to the Fibonacci numbers, and even slightly larger. How do these numbers develop exactly? We have N(0) = 1, N(1) = 2, N(2) = 4, N(3) = 7, N(4) = 12, N(5) = 20, N(6) = 33, N(7) = 54, N(8) = 88, N(9) = 143, N(10) = 232, ... . We see that they grow fast! Actually, looking at them we get the strong feeling that they grow exponentially, even though they do not double every step.
N(1) = 2,
N(d) = N(d - 1) + N(d - 2) + 1, for all d > 1.
How can one prove such a fact? In (almost) all such cases, one should use induction. For induction we need an induction hypothesis. The problem here is that we do not know exactly what we want to prove. If we just want to prove that the development is exponential, we can use an underestimate. It is easy to check that all N(d) are positive. From this it follows that N(d) > N(d - 1) for all d > 1. Thus, N(d) > 2 * N(d - 2). So, as an hypothesis we could use N(d) >= sqrt(2)^d. We must check the basis of the induction. This is ok, because sqrt(2)^0 = 1 <= N(0), and sqrt(2)^1 = sqrt(2) <= N(1). After that we must only check that indeed N(d + 2) > 2 * N(d) >= 2 * sqrt(2)^d = sqrt(2)^{d + 2}.
In principle this is good enough. If, however, we want a more accurate estimate, then we can do the following which is a special case of an approach that works in general for estimating the exponential development of a recurrency relation. We assume that the development is as a^d, for the time being we forget about constant factors and additional terms. How big is a? Because the exponential behavior dominates all other things, we should basically have a^d = a^{d - 1} + a^{d - 2}. Dividing left and right side by a^{d - 2} gives a quadratic equation: a^2 - a - 1 = 0. Solving with the ABC-rule gives a = (1 + sqrt(5)) / 2 ~= 1.618, the famous golden ratio. For this a the above induction proof can be repeated. We are lucky that we have "+ 1" and not "- 1". Otherwise the development would still have been of the same order, but the proof would have been harder. So, we know now that the depth of an AVL tree with a given number of nodes N is at most log_a N = log_2 N / log_2 a = 1.440 * log_2 N. In most cases it will be less, but the main point is that this is logarithmic.
Another observation is that a single insertion can change the depth of any subtree by at most one. So, if after the insertion there are nodes in unbalance, then this is because one of their subtrees has depth exactly 2 larger than the other subtree.
Let w be the deepest node with unbalance. Without loss of generality we may assume that the left subtree of w, rooted at w_l is too deep in comparison to the right subtree rooted at w_r. More precisely, we assume that the balance of w is +2. The subtrees of w_l are denoted by w_ll and w_lr. Possibly these subtrees are empty, but at the level of this theoretical discussion it does not matter. It helps to consider the performed operations in an abstract way.
A crucial observation is that the tree was balanced before the insertion. Because now the depth of the subtree of w_l is two larger than that of w_r, it must have been exactly one larger before. The insertion must have changed the depth of w_l, and consequently of either w_ll or w_lr, otherwise the balance would still be ok. In order to increase the depth of a tree, one must perform the insertion in the subtree whose depth is larger or equal than the other. If this depth would have been larger already before the insertion, then it would now be larger by two. But, if the depth of either w_ll or w_lr exceeds the other by two, then w_l is unbalanced after the insertion, in contradiction with our assumption that w is the deepest unbalanced node. So, now we know incredibly precisely how the tree looked before and after the insertion:
The subtrees rooted at w_ll, w_lr and w_r all had the same depth. As a result of the insertion, the depth of w_ll or w_lr has increased by one.
The first case is the simplest. In that case, we make w_l to the new root of this subtree, hook w to it as right child and w_ll as left child. w gets as right child w_r and as left child w_lr. The ordering is preserved by these operations, and after this, the balance has become perfect, because the tree of w_ll has moved one level up, and the tree of w_r has moved one level down. This operations is called single rotation.
This operation works because the subtree rooted at w_ll, whose increased depth was the cause of all trouble, is on the outside. So, it remains attached to w_l, which is lifted, so it is lifted along. Thus, this effectively reduces the depth of the left subtree by one. The depth of the subtree rooted at w_r, which was two smaller, is increased by one, so it now becomes equal. The subtree rooted at w_lr is thrown over to w_r. Because w_r now has the same level as w_l before, its depth remains unchanged. We conclude that afterwards all depths are equal.
So, we must apply a more elaborate technique called double rotation. The idea is to make w_lr the root of a newly constructed subtree with w_l as its leftchild and w as its rightchild (among other things you should notice that indeed the key of w_lr is larger than that of w_l and smaller than that of w). w_l gets new children w_ll and w_lrl. w gets new children w_lrr and w_r.
The subtree that was deepest, the one that was rooted at w_lrl or w_lrr is indeed raised one level by this operation (because these nodes now come at distance two from the root, whereas before this was three). The subtree of w_ll remains at the same level, and the subtree of w_rr is going down one level, but this is fine. So, afterwards, we have three of the four subtrees at the same level, and one of them one less.
For stacks and queues implemented with arrays, it was quite easy to turn an amortized time bound into a worst-case time bound, by starting the rebuilding before it was actually needed. For search trees this is much harder because the structure does not only change on one side, but everywhere. Even the structure may change. So, it is not easy to make a "snapshot" of a tree while operations are performed on it. Therefore, for applications in which a time-out for rebuilding is unacceptable, it is desirable to have a real deletion running in O(log n) time.
If the balance remains within the acceptable bounds, then we are ready. Otherwise, we look for the deepest node w whose balance is no longer ok. From the fact that w was ok before the deletion, we conclude that one of the subtrees of w, say the right one, has depth exactly two smaller than the other. Denote by w_l and w_r the left and right child of w, and by w_ll, w_lr, w_rl and w_rr the grandchildren. The depth of w_r must have been reduced. Because it was balanced before, the deeper of its subtrees must have had depth exactly one larger than the other, and thus now the trees rooted at w_rl and w_rr have the same depth. For the depths of the trees rooted at w_ll and w_lr there are three possibilities: (+2, +1), (+1, +2) and (+2, +2). Here a number +x denotes how much deeper this subtree is than the trees rooted at w_rl and w_rr. There must be a +2 because w is unbalanced now and there cannot be anything else because w was balanced before.
The treatment of these cases is completely analogous to what we have done before. The simplest case is again (+2, +1). Then it is sufficient to perform a single rotation: w_l becomes the new root. w becomes its right child. w gets w_lr as new left child.
The case (+1, +2) can be treated with a simple double rotation: w_lr becomes the new root, with w as right child and w_l as left child. w has w_r as right child and w_lrr as left child. w_l has w_lrl as right child and w_ll as left child.
The only really new case is (+2, +2). Fortunately, performing a single rotation, helps in this case: single rotation makes the subtree rooted at w_r one deeper, the subtree rooted at w_ll becomes one undeeper, and the subtree rooted at w_lr remains at the same level.
In this case, the internal nodes typically do not hold information, but one or two keys that guide the searching process. If there is one key k, then k could be the largest value of the left subtree. If there are two values k_1 and k_2, then k_1 is the largest value of the left subtree and k_2 the largest value of the middle subtree.
if (x <= k_1)
goleft();
else if (x <= k_2)
gomiddle();
else // x > k_2
goright();
In this way we continue until we reach a leaf. There we test whether
the key is equal or not and perform a suitable operation in reaction.
The construction and these more elaborate comparison imply a certain overhead, but all the rest becomes much simpler because of this. If you would implement a search tree ADT based on 2-3 trees, then it is a good idea to have different classes for internal nodes and leafs, as they allow very different operations and have different fields as well.
It is essential that 3 + 1 = 2 + 2. This implies that, once the maximum degree of a node is exceeded, it can be split in two nodes with degree within the legal range. This implies that we could make a 3-5 tree in the same way, because 5 + 1 = 3 + 3, but that a 3-4 tree would be clumsy, because 4 + 1 = 5 < 3 + 3. These generalizations are called a-b trees. The degrees of the nodes lie between a and b and we must have b + 1 >= 2 * a. 2-3 trees have the problem that after splitting a node (as the result of an insertion) the new nodes have degree 2, which is the minimum. Therefore, it may happen that a subsequent deletion immediately leads to a fusion again. These structural changes may go all the way up to the root every time. Even though all these takes only logarithmic time, it still will mean a considerable increase of the cost. If we would use 2-5 trees instead, then after a splitting operation we obtain nodes of degree 3, which are not critical, and which cannot be split immediately afterwards. For 2-5 trees one can even show that the amortized number of restructuring operations is constant.
That all leafs lie at the same depth is an advantage: this means that it is trivial to determine that a search has reached the level above the leafs. The find algorithm can now also be written using a for loop, because the number of links to follow is known from the start. The distinction between leafs and internal nodes is very natural. Because leafs will never become internal nodes, there is no need to reserve storage for pointers to other nodes in the leafs. This gives a considerable saving.
The most serious disadvantage of 2-3 trees, is the possibility that alternating inserts and deletes result in restructuring again and again the whole path up to the root. Though this will rarely happen in practice, it is rather unpleasant, because restructurings are more involved than the one or two comparisons which are performed when passing through a node on the path from the root towards the leafs.
The notion of 2-3 trees can be generalized in a natural way to a-b trees. An a-b tree is just like a 2-3 tree, but now the degree of an internal node must be at least a and at most b. If after an insertion a node has b + 1 children it is split into two nodes with round_down((a + 1) / 2) and round_up((a + 1) / 2) children. If after deletion a node has degree a - 1, it either takes over a child from a sibling or fuses with a sibling. In order to work, this imposes the conditions b + 1 >= 2 * a and 2 * a - 1 >= b. Both are the same. 2-3 trees are a-b trees with a = 2 and b = 3, these are the smallest possible choices of a and b. There are many good choices, for example, 2-4, 2-5, 3-5 and 3-6. On the other hand, 3-4 trees are not practical, because after splitting a node of degree 5, we obtain a node of degree 2 which is not allowed.
For 2-5 trees it can be shown that any sequence of n insertions and deletions causes at most O(n) splitting and fusing operations in total. This means that on 2-5 trees insertions and deletions can be performed with O(1) amortized time per operation in addition to the time for the finds. The reason why this works for 2-5 trees and not for 2-3 trees, is that after splitting or fusing the new nodes have degree 3. 3 is not a critical value in the sense that reducing or increasing it by one immediately leads to a node with too low or too high degree again. So, in between every restructering of a node, it is accessed at least once without being restructured.
Nodes of higher degree lead to a shallower tree. A shallow tree is fine because this may reduce the number of cache misses: the information concerning a node may be assumed to stand contiguously in the memory, but going from one node to another typically means making an expensive jump through the memory space. How about the costs of the operations on an a-b tree? To answer this question it must be established how exactly the keys in the nodes are organized. The easiest idea is to maintain these keys in a sorted array. Doing this, we can use binary search to find the link that must be followed. This takes O(log b) time (in practice, for small b, a linear search might be better). Because the depth of an a-b tree is at most log_a n = log n / log_a, the time for find becomes O(log b / log a * log n). Thus, if a and b differ by a constant factor, and there is no reason to take b larger than 4 * a, finds can be performed in O(log n) time, just as on a 2-3 tree, while visiting only log_a n nodes. Unfortunately, such a good result is not automatically obtained for inserts and deletes. If a node is split or if two nodes get fused, this means splitting or fusing also the set of keys of these nodes. When using arrays, this costs time proportional to the size of these arrays. There are two solutions:
This can be realized in several ways, but two very simple ways are based on search trees. First we assume that the information is only stored in the leafs of the tree and that the leafs are connected in a doubly linked way to each other (if for the leafs we use the same objects as for the internal nodes, then we have some spare links, which can be used for this!). In this case we search for low and then walk along the leafs until hitting high or a larger value. This whole procedure takes O(depth of tree + number of nodes in the range). This is clearly optimal, because any search takes at least O(depth of tree) time and handling all elements takes at least O(number of nodes in the range). With minor modifications to the insertion and deletion algorithms, the linking of the leafs can be kept up-to-date at all times.
Range queries can also be performed on a conventional binary tree with the information stored in all nodes. The idea is to walk down the tree until low and high do no longer follow the same path. Then, whenever the search for low moves left in a node u, u and all keys in its right subtree should be enumerated. Analogously, whenever the search for high moves right in a node v, v and all keys in its left subtree should be enumerated. There is a very simple recursive work-out of this algorithm, a minor modification of the inorder traversal.
One of the subsequent chapters is entirely devoted to priority queues. Here we consider a basic realization with search trees in which the priorities are used as keys. On this structure insert is trivial: use the insert operation of the search tree.
An operation that can be implemented very easily on search trees is "find the node with smallest key": just walk left all the time until hitting a node without left child. Thus, the following can be used to implement a deletemin:
int getSmallest()
{
if (leftChild != null)
return leftChild.getSmallest();
else
return key;
}
int deleteMin()
{
int x = getSmallest();
delete(x);
return x;
}
Of course, in an implementation these two subroutines can be integrated
so that the path from the root to the node holding x needs to be
traversed only once.
Decreasekey is also simple: go to the specified node, delete it from the search tree and reinsert it with modified priority. The only problem is that typically the node of which the priority should be modified is not specified by its priority but by another tag. For example, if we are managing the waiting list of a hospital, a typical instruction will not be decreaseKey(89, 74), but rather decreaseKey("Baker", 74), indicating that the priority of "Baker" should be set to 74. The solution is to use a second search tree in which the names are used as key.
The simplest realization of this idea is to have two trees in which all nodes have two keys. One tree is ordered according to the priorities, the other according to the names. Inserts are performed in each tree. For a deletemin, one searches the node with minimum priority in the tree which is ordered according to the priorities. There one finds the corresponding name which can subsequently be searched for in the other tree. When performing a decreasekey, one first searches for the name in the tree which is ordered according to the names, there one finds the corresponding priority which can subsequently be searched for in the other tree. In this way the information in both trees remains the same after completion of each operation.
The same idea can be realized more efficiently if there are links from one tree to the other in both directions. In that case decreaseKey("Baker", 74) is performed by searching for "Baker" in the name tree and then following the link to the node with its priority in the other tree. When performing deletemin one searches in the tree with priorities for the smallest key. From there one follows the link to the other tree. The structure is more complex, but the additional links reduce the time for finding the element in the second tree.
Each of these realizations assures that, as long as we use one of the balanced search trees, all three priority-queue operations can be performed in O(log n) time. This is not the best achievable, but quite acceptable. As we will see, either insert or deletemin must take Omega(log n) time.
Implicit trees are most efficient for complete perfect d-ary trees because the array which is used for storing the tree must have the maximum size any way. That is, for a d-ary tree of depth k the array must have length N(d, k) = sum_{i = 0}^k d^i = (d^{k + 1} - 1) / (d - 1) even if the tree has n << N nodes.
Implicit trees might be suited for storing the sets of key values in the internal nodes of search trees of high degree. Maintaining these sets in sorted arrays leads to linear-time updates. At the expense of more complex operations and a waste of memory, on an implicit tree inserts and deletes can be performed in logarithmic time. On implicit trees it takes linear time to perform the split and fuse operations needed in a-b trees because this implies creating and initializing arrays of linear size. This is not that serious, because taking b sufficiently larger than a, the cost of each split or fuse can be amortized over a large number of inserts and deletes.
The canonical way of implementing this ADT is by using a search tree, but this is not the only possible way. If we really only want to implement the above three operations (and do not want to perform range queries or findmins as well), there are alternative ways, that are
void insert(int k)
{
a[k] = true;
}
void delete(int k)
{
a[k] = false;
}
boolean find(int k)
{
return a[k];
}
Of course, in most real-life applications some additional information
will be stored along with the entry in a[], but this holds for any of
the discussed data structures: here we focus on how to perform the
basic operations.
This special case is not called hashing, but direct addressing.
The name is inspired by the fact that the key values are used directly,
without any modification or computation, for determining where data are
stored. Its efficiency essentially depends on the availability of RAM.
Direct addressing is based on the same underlying idea as bucket sort. Like bucket sort, direct addressing is applicable only for moderate M. There is a difference with bucket sort though: bucket sort even requires O(M) time, whereas direct addressing, applying virtual initialization, only requires sufficient memory.
More generally, we have at most N numbers from 0 to M - 1. Then we have one integer array a[] of size M and one integer array b[] of size N plus one counter c, giving the number of inserted elements, initially with value 0. To insert an element k, we perform
void insert(int k)
{
a[k] = c;
b[c] = k;
c++;
}
To check whether x is there we perform
boolean find(int k)
{
return a[k] < c && b[a[k]] == k;
}
Deletions are simple to perform:
void delete(int k)
{
a[k] = c;
}
One might also use any other value which is guaranteed not to be equal
to the original value of a[k]. A disadvantage is that the entry in b[]
is not removed. So, if insertions and deletions are alternated then c
will increase to a value larger than M.
This is unnecessary, the unused space can be made available again by a slightly more complicated delete procedure. In this procedure, the last inserted element is deleted and reinserted at the position of k:
void delete(int k)
{
c--;
a[b[c]] = a[k];
b[a[k]] = b[c];
}
If we compare with the simple approach in which the values are first initialized, then we see that now all operations are more expensive by a small constant factor. A larger disadvantage is the additional memory usage: Before we had one array of N bits. Now we have two arrays with N and M integers, respectively. Even when we may assume that M is much smaller than N (otherwise the whole idea of virtual initialization does not make sense), this is considerably more. Therefore, virtual initialization appears to be a nice idea with limited practical interest.
Suppose for a second that f maps the keys injectively to {0, ..., N - 1}, that is, for any two keys k_1 and k_2, f(k_1) != f(k_2). Then we have a great search structure: the three operations can be performed in constant time almost just as with direct addressing. The main difference is that instead of accessing position k, we must access position f(k):
void insert(int k)
{
a[f(k)] = k;
}
void delete(int k)
{
a[f(k)] = -1;
}
boolean find(int k)
{
return a[f(k)] == k;
}
There is one more difference with direct addressing: because many
values may be mapped to the same position, it does not suffice to set
a boolean. It is essential to mark a position by the key itself, so
that when performing a find it can be tested whether it was really k
that was mapped here or some other value k' with f(k') = f(k). Thus,
a[] must be an integer array. Just as with direct addressing, the
efficiency of hashing depends on the availability of RAM: once we know
the index we can go to the desired position in the array without
noticeable delay.
Because the keys are not known beforehand, we cannot hope to choose the hash function f in an optimal way, avoiding all collisions. Such a function might also be expensive to compute. The best we can reasonably hope for is that f is as good as randomly scattering the elements over the array. We cite some basic facts from probability theory:
In Java, with objects of a type ListNode with fields key and next, the chaining idea can be implemented as follows:
void insert(int k)
{
int i = f(k);
a[i] = new ListNode(k, a[i]);
}
void delete(int k)
{
int i = f(k);
if (a[i] != null) // not an empty list
{
if (a[i].key == k)
{
a[i] = a[i].next;
System.out.println("Key " + k + " deleted");
}
else
{
ListNode prev = a[i];
ListNode node = a[i].next;
while (node != null && node.key != k)
{
prev = node;
node = node.next;
}
if (node != null)
{
prev.next = node.next;
System.out.println("Key " + k + " deleted");
}
}
}
}
boolean find(int k)
{
ListNode node = a[f(k)];
while (node != null && node.key != k)
node = node.next;
return node != null;
}
This implementation is complicated by the fact that when deleting we
must take care of the special case of empty lists. At the expense of
O(N) extra storage, the operations can be made easier by adding
sentinels to the lists. Click here
to see the above piece of code integrated in a working Java program.
For a successful find we only have to traverse half the list on average. The most expensive operation is an unsuccessful find or an attempt to delete a non-existing element: in these cases the whole list must be traversed. If before inserting an element k it should first be verified that k does not yet occur, than even each insertion is preceded by an unsuccessful find. The average cost of unsuccessful finds can be reduced by a factor two, by keeping the lists sorted. Doing this, a search for k can be terminated as soon as we reach a node with key k' > k. If now we add sentinels with key value infinity, we can even save a lot of tests.
We give an example. Suppose we would like to build a data base for the about one million inhabitants of northern Sweden. More precisely, for the inhabitants of the five provinces (län) which constitute Norrland: Gävleborgslän, Västernorrland, Jämtland, Västerbotten and Norrbotten. As key we use the personal number and the array has length N = 1,000,000. The first six positions of the Swedish personal number consists of the date of birth (in the format yymmdd). Position 7 and 8 indicated until recently the län in which the person was born and only the two last digits are slightly arbitrary (though the gender is encoded in these). If we now simply use f(k) = k mod N, then we will get a very poor distribution: most people living in Norrland are born in this area, so most people are sharing the same 5 possibilities for digit 7 and 8. Furthermore, there are only 31 days, so digits 5 and 6 have only 31 values. Thus, the big majority of the one million people is going to be mapped to at most 15.000 positions, at least 60 for each slot. In this case it is quite obvious and we know an easy way out. Out of the personal number, we should first construct a condensed number, taking into account that there are only 12 months with at most 31 days, and 6 län (the mentioned five and outsiders), and not all final two digits.
We work out the above idea in a more general setting. Suppose we have keys composed of n >= 1 "digits" (which may consist of more than one actual digit). The keys in the above example may be decomposed in 5 of these digits, each being a group of two actual digits. Assume that digit i may assume m_i different values. In the above example, the digit giving the month assumes 12 values, while the digit giving the day assumes 31 values. Thus, the total number of different values is M = prod_{0 <= i < n} m_i. For a key k, let k_i, 0 <= i < n, denote digit i. The reduced key k' is given by k' = k_0 + m_0 * k_1 + m_0 * m_1 * k_2 + ... + m_0 * m_1 * ... * m_{n - 2} * k_{n - 1}. This reduced key k' is much better suited for hashing then k itself. For example, one could use f(k) = k' mod N. From the above expression it may appear that computing k' from k takes quadratic time. However, a variant of Horner's rule computes k' in linear time: k' = k_0 + m_0 * (k_1 + m_1 * (k_2 + m_2 * ( ... + (k_{n - 2} + m_{n - 2} * k_{n - 1}) ) ) ).
The above describes an idealized situation: we assumed that there are digits for which it is known how many different values are assumed, and that these values are assumed uniformly. If we would like to apply hashing to create a dictionary of words, then it is clear that not all letters are equally frequent ('e' and 'a' are much more frequent than 'q' and 'y'), but it is not obvious what to do with such observations.
Not knowing the structure of the numbers we can never guarantee that the chosen hash function is good, however, this can be made independent of the input. The idea is to choose the hash function uniformly from a large class of good hash functions. Even though for any particular hash function there are sets of keys which are mapped to few different positions, it can be achieved in this way that the a priori probability that any two keys k_1 and k_2 are mapped to the same position is exactly 1 / N, that is, just as if they are randomly scattered over the array. This idea is called universal hashing.
Problems may occur but mostly we will be lucky. In that case the elements are distributed more or less evenly. Consider the case that the number of distributed elements n equals the size of the array N. Some longer lists will occur, but most elements are close to the root of their list. If the element to search for is selected uniformly from among the n elements, then it can easily be shown that the expected time for a find operation is constant. Then this also holds for deletes and the time for inserts is constant independently of the length of the lists. The statement may be strengthened further: every individual find may take O(log n) time, but with high probability the time for a sequence of log n finds for independently selected keys is bounded by O(log n) as well. Thus, amortizing over O(log n) operations, the time per operation is O(1) with high probability.
An alternative is to store elements in the array itself. For a while this works fine, but what do we do when an element gets mapped to a position that is already taken?
How could we do a find in this case? We go to position f(k). If it is empty we are done. If we find k as well. But, if position f(k) is occupied by another element, then we do not know anything. Subsequent positions must be tested until finding k or an empty position. When performing deletions we have to be careful: simply removing an element k does not work. It may namely be the case that other elements which lie after k become unfindable when we would remove k. Therefore, we should perform lazy deletion by somehow marking k as deleted. When performing a find these deleted positions are handled as filled positions, when performing subsequent inserts, they are handled as empty positions, reusing the memory. This technique works fine because the memory space is available anyway.
If all key values are positive, the above ideas may be implemented as follows:
// -1 means empty
// -2 means deleted
void initialize()
{
for (i = 0; i < N; i++)
a[i] = -1;
}
void insert(int k)
{
for (i = f(k); a[i] >= 0; i = (i + 1) % N);
a[i] = k;
}
void delete(int k)
{
for (i = f(k); a[i] != k && a[i] != -1; i = (i + 1) % N);
if (a[i] == k)
a[i] = -2;
}
boolean find(int k)
{
for (i = f(k); a[i] != k && a[i] != -1; i = (i + 1) % N);
return a[i] == k;
}
Click here to see slightly
modified code integrated in a working Java program. This program
is conceived so that it can be modified very easily to work for the
better conflict-resolution strategies presented in the following.
Not all simple ideas are good ideas. Linear probing has lousy performance once the array becomes slightly fuller. In this relation we need the term load factor, which is the ratio n / N. Mostly expressed in %. So, if we say that the hash table has a load factor of 80%, we mean that we are storing n = 0.8 * N keys in an array of length N. The problem with linear probing is that once we have a small sequence of consecutive positions that are full, that then it is quite likely that the next insertion hits this chain. So, chains tend to grow very fast. This problem is called primary clustering. More precisely, it means that all elements that get mapped to any of the positions of a cluster will make the cluster larger.
Without going into detail we look at some numbers. Suppose we have an array of length 100. Suppose we have inserted 20 elements. Suppose all elements are still isolated (this is quite unlikely). For isolated elements, the probability that they are hit and that the length of their chain becomes 2 is 0.01. If this happens (20% probability, so within a few insertions it will indeed happen), then hereafter, the probability that the chain gets hit is no longer 0.01, but 0.02. When it is hit, the probability becomes larger again. In addition, if the array gets fuller, one large chain may swallow the following chain, further increasing the rate at which chains are growing. A simulation, generating uniformly distributed random numbers in the range 0, ..., N - 1, shows that the load factor should be at most 60 or 70%. Otherwise the performance breaks down. This means that we must always waste at least 30 to 40% of the memory.
void insert(int k) {
for (t = 0; a[p(k, t)] is occupied; t++);
a[p(k, t)] = k; }
Here p(k, t) gives the probe sequence of k. For linear probing, p(k, t)
= (f(k) + t) mod N. The next simplest idea is to try p(k, t) = (f(k) +
t^2) mod N. This, or similar variants, is called quadratic
probing.
Linear probing had the problem of primary clustering, the problem that any key mapped to any position in a chain follows the same trajectory through the array. This leads to a self reinforcing clustering in which the clusters tend to grow very fast. With quadratic probing, we have a less serious form of clustering: all keys that get mapped to the same position f(k) follow the same trajectory. This is called secondary clustering. Primary clustering is worse because it arises even if the hash function distributes the keys over the array completely uniformly. Secondary clustering is a problem only when there are many key values k with the same f(k).
The first point implies that we should not take a simple modulo function. The second point implies that for all k, f_2(k) should be relatively prime to the size N of the hash table. This is most easily assured if N itself is a prime number, then f_2 can assume any value different from multiples of N. If f_1(k) = k mod N, then f_2 can be based on k / N. For example, f_2(k) = 1 + (k / N) mod (N - 1).
Of course we can continue inserting until the array is entirely full, but this is not clever: the last insertions will cost linear time. The simplest criterion for rebuilding is the load factor. For example, one can rebuild as soon as the load factor exceeds 80%. This is quite static though and does not tak into account that sometimes we are running into bad hashing behavior because of an unlucky choice of the hash function. Therefore, it is better to rebuild as soon as performance becomes too poor. For example, one can rebuild as soon as the latest 1000 insertions took more than c * 1000 steps, for some small constant c.
If it is important to keep the maximum time per operation bounded, then the rebuilding can be started somewhat earlier and run in the background until the new structure is ready. In this way the amortized cost of the rebuilding can be reduced to O(1).
The developed data structure can be used for sorting. Suppose it is given that the numbers in an array b[] of length n originate from a bounded, but arbitrarily large, range. That is, 0 <= b[i] < M, for all i, 0 <= i < n. Create a hash table with N = n. Use as hash function f(x) = x / c, where c = M / N. In general this is not a good choice, because if the numbers in the input are not uniformly distributed, this will lead to some long lists. But for sorting it is good: using this function, all numbers in list a[i] are smaller than all numbers in list a[i + 1]. So, first inserting all elements of b[] into the hash table and then enumerating the elements of the lists gives the numbers in sorted order.
The above leaves much freedom in how to perform the unions and how to choose the names. A possibility is to apply a rule like "add first to second", meaning that an operation union(x, y) is performed by adding all elements of the subset in which x lies to the subset in which y lies. The new name for this set then becomes the name of the subset in which y lies.
The subset ADT may sound unimportant, but it shows up everywhere. Subsets are the canonical example of the mathematical concept of equivalence relations. An equivalence relation is a binary operator "~" on elements of a set with the property that
The subset ADT allows to maintain equivalence relations dynamically: relations may be added by applying the union operation. The only limitation is that there is no de-union: once unified, the sets cannot be split anymore. This latter feature would be much more expensive to implement, in particular it would require that all previous operations are recorded.
There are several implementations, ranging from extremely simple giving modest performance (one operation O(1), the other O(n)), to slightly less simple giving excellent performance (almost constant time for both operations).
A union is more complicated: if all the elements of a subset S have to be renamed, then, in the simplest implementation, we have to scan the whole array to find those which belong to S. This takes O(n) time. Thus, for all n - 1 unions (after n - 1 non-trivial all elements have been unified into one set ), we need O(n^2) time.
void initialize() {
for (int i = 0; i < n; i++)
a[i] = i; }
int find(int k) {
return a[k]; }
void union(int k1, int k2) {
k1 = find(k1);
k2 = find(k2);
if (k1 != k2) // rename all elements in subset of k1
for (int i = 0; i < n; i++)
if (a[i] == k1)
a[i] = k2; }
If we are slightly more clever, we might maintain the elements that belong to a set in a linked list. In that case, we do not have to scan through all the elements when performing a union operation. A union is now performed by traversing one of the lists, accessing a[i] for each listed element i and updating it with the index of the other set. Then this list is hooked to the other list.
We consider an implementation of this. The lists are also implemented in an integer array b[]. In general the value b[i] gives the successor of i in its list. It is convenient to maintain a set of circular lists. The initial situation for this can be established by setting b[i] = i, for all 0 <= i < n. These ideas are worked out in the following piece of code:
void initialize() {
for (int i = 0; i < n; i++)
a[i] = b[i] = i; }
int find(int k) {
return a[k]; }
void union(int k1, int k2) {
int i, j;
k1 = find(k1);
k2 = find(k2);
if (k1 != k2) { // rename all elements in subset of k1
i = k1;
do {
a[i] = k2;
j = i;
i = b[i]; }
while (i != k1);
// glue list of k1 into the list of k2
b[j] = b[k2];
b[k2] = k1; } }
At a first glance this appears to be a very good idea: the implementation is simple and does not cause too much overhead. The work of a union is now proportional to the number of renamings that is, it is proportional to the size of the subset of k1 and no longer Theta(n).
However, consider the following sequence of unions
for (int i = 1; i < n; i++)
union(i - 1; i)
If always the first set is joined to the second, then in total we have
to rename sum_{i = 0}^{n - 1} i = (n - 1) * n / 2 = Omega(n^2)
elements. This is half as much as with the trivial implementation, and
in view of the extra work for each renaming, it is actually no
improvement at all.
How do we prove such a thing? The usual way of bounding the time of a sequence of operation is to put a bound on the time per operation. Here this approach does not work, because union operations may take liner time. In this case one should not perform an operation-based analysis, but an element-based analysis: bounding the cost per operation instead of bounding the cost per element. The idea is to consider the maximum number of times that any node may be given a new name. We show that this happens at most log n times. Then it follows that in total there are at most n * log n renamings.
Lemma: A node that has been renamed k times belongs to a set of size at least 2^k.
Proof: To be formal, we use induction. We must check a base case and a step. The base case is easy: a node that has been renamed 0 times belongs to a set of size at least 1 right from the start. Now suppose the lemma holds for all k' < k, for some k > 0. Consider a node x that has been renamed k - 1 times. The induction assumption implies that the size n_x of the subset to which x belongs satisfies n^x >= 2^{k - 1}. Assume x gets renamed again when performing union(x, y). Because unions are performed by-size, this implies that n_y >= n_x, where n_y gives the size of the subset to which y belongs. For the size n_{xy} of the subset created by the union, this gives n_{xy} = n_x + n_y >= 2 * n_x >= 2 * 2^{k - 1} = 2^k. End.
Corollary: A node can be renamed at most log n times.
Proof: Assume some node x is renamed k > log n times. Then according to the lemma it belongs to a subset of size n_x >= 2^k > 2^{log n} = n, which is impossible. End.
Theorem: When performing union-by-size, the time consumption of any sequence of n - 1 unions is bounded by O(n * log n) time.
This result is sharp: there is a sequence of unions that actually requires Omega(n * log n) renamings:
for (int i = 1; i <= log n; i++)
for (int j = 0; j < n / 2^i; j++)
union(j * 2^i; j * 2^i + 2^{i - 1})
The number of renamings is log n * n / 2: in every round n/2 of the
nodes get a new name. The factor two difference with the upper bound
comes from the way the upper bound was proven: though every individual
element may have to be renamed log n times, it is not possible that all
elements are renamed that often. Mostly we do not care about such small
factors.
Maintaining the size of the sets is trivial: if two sets are joined, the new size is the sum of the two old sizes. It is trivial to implement this using an additional array s[] for storing the sizes, initialized at 1. If we are very tricky (this is quite ugly hacking) then we can also do without this extra array: Normally, we find at position i of a[] the index of the subset to which node i belongs. If a[i] = i, then we can also flag this by just putting a[i] = -1 (or any other value that does not lie in the range [0, n - 1]). But then we can also store there the size of the subset of i. Here we use that we have one spare bit. If n is extremely large, needing all bits, then this idea does not work. In code this may be implemented as follows:
void initialize() {
for (int i = 0; i < n; i++) {
a[i] = -1;
b[i] = i; } }
int find(int k) {
if (a[k] < 0)
return k;
return a[k]; }
void union(int k1, int k2) {
int i, j;
k1 = find(k1);
k2 = find(k2);
if (k1 != k2) {
if (a[k1] < a[k2]) { // the set of k1 is larger
i = k1;
k1 = k2;
k2 = i; }
// rename all elements in subset of k1
a[k2] += a[k1];
i = k1;
do {
a[i] = k2;
j = i;
i = b[i]; }
while (i != k1);
// glue list of k1 into the list of k2
b[j] = b[k2];
b[k2] = k1; } }
Click here to see the above piece
of code integrated in a working Java program. This program is executing
the same example with n = 10 as shown in the pictures.
What is special about the application of union-find, that we are using arrays here to realize linked structures, whereas before we were using a structure build of list nodes linked together? The answer is that in the current case, there is a fixed number of nodes which have keys from 0 to n - 1. This makes that we can apply direct addressing: the information for node k, including the information of its next field, is stored at position k of one or more arrays.
In the example above we are working with two arrays. Alternatively, we might also work with one array of objects of the following type:
class ArrayNode
{
int a;
int b;
ArrayNode(int i)
{
a = -1;
b = i;
}
}
It becomes even more elegant if a boolean instance variable is added
indicating whether a gives the size or the name of the list. An
organization with ArrayNode objects is more object oriented then an
organization with several arrays.
Which of these two organizations is more efficient depends on the memory access pattern. If there are several arrays, each array may be assumed to stand consecutively in the memory. This allows for speedy access to all information of one kind. This is more true if this information is accessed in a consecutive way, then if it is accessed by single accesses such as in the find operation. In an organization with one array of ArrayNodes, the information belonging to each node stands together. This makes it cheaper to access several fields of a node as is done in the union operation. Of course using ArrayNodes causes some overhead because there is an extra indirection: accessing nodes[i].b is more involved then b[i].
This idea can be realized very simply using an array to represent the set of links:
void initialize() {
for (int i = 0; i < n; i++)
a[i] = i; }
int find(int k) {
while (a[k] != k)
k = a[k];
return k; }
void union(int k1, int k2) {
k1 = find(k1);
k2 = find(k2);
if (k1 != k2) // hook k1 to k2
a[k1] = k2;
Here a root k of a tree is characterized by the fact that it has a[k]
== k. The good thing is that with the tree-based implementation, there
is no need to access all elements of a set. Thus, there is no need to
have the additional list structure requiring a second array.
Conceptually using trees is a step, but practically it is even easier
than the array-based implementation.
How about the efficiency? A find requires that one runs up the tree to find the index of the root. A union, once the two finds have been performed, is trivial, it just requires that one new link is created. So, here we have reduced the cost of the union at the expense of more expensive finds. The finds can actually be arbitrarily expensive: if the tree degenerates, it can have depth close to n. In that case finds may take linear time.
void initialize() {
for (int i = 0; i < n; i++)
a[i] = -1; }
int find(int k) {
while (a[k] >= 0)
k = a[k];
return k; }
void union(int k1, int k2) {
k1 = find(k1);
k2 = find(k2);
if (k1 != k2) {
if (a[k1] < a[k2]) { // the set of k1 is larger
int i = k1;
k1 = k2;
k2 = i; }
a[k2] += a[k1];
a[k1] = k2; } }
Here a root k of a tree is characterized by the fact that it has a[k]
< 0. In that case -a[k] gives the size of the tree.
Lemma: A tree of depth k has at least 2^k nodes.
Proof: The proof goes by induction. To settle the base case, we fix that a tree with one node has depth 0. Now assume the claim is correct for given k. How do the depths develop? If T_1 is joined to T_2, and T_1 has depth smaller than T_2, then nothing changes. If T_1 has depth >= the depth of T_2, then the new depth equals the depth of T_1 + 1. While performing unions, new trees of depth k + 1 can thus only arise when a tree T_1 of depth k is joined to another tree T_2, which, because of our clever joining technique, must have at least as many nodes as T_1. Because of our induction assumption, T_1 has at least 2^k nodes, and thus T_1 + T_2 has at least 2^{k + 1} nodes. End.
Corollary: Using trees and performing union-by-size, union takes O(1), while the time for a find is bounded by O(log n).
Proof: The time to perform find(x) is proportional to the depth of node x in its tree. So, assume some node x has depth k > log n. Because the depth of a tree equals the maximum of the depths of all its nodes, this implies that the depth of the tree of x is at least k. According to the lemma this implies that the size n_x of the tree satisfies n_x >= 2^k > 2^{log n} = n, which is impossible. End.
The given bound is sharp: trees of logarithmic depth may really arise when repeatedly performing union for trees of equal size. If we are mainly interested in limiting the depth of our tree, then we can just as well perform union by depth: the shallower of the two trees (if any) is hooked to the other. Doing this, it is even easier to prove that the number of nodes in a tree with depth k is at least 2^k and that consequently the depth is bounded by log n.The only further algorithmic idea in this domain is path contraction. That means, that when we are performing find(k), after we have found that find(k) = r, we start once more at k and link all nodes on the path directly to r. This makes the individual finds twice as expensive, but has a very positive impact on the structure of the tree. The idea that expensive operations lead to an improvement of a search structure is not limited to union-find. Similar ideas are also applied for search trees and priority queues. Notice that even a union operation involves two finds, so even a union may lead to changes in the trees more than just hooking one to the other.
Using the same initialize as before, the code for find now looks as follows:
int find(int k) {
int l = k;
while (a[l] >= 0)
l = a[l];
// Now l == find(k)
while (a[k] > 0) {
int m = a[k];
a[k] = l;
k = m; }
// Now all nodes on the path point to l
return l; }
Click here to see the above piece
of code integrated in a working Java program. This program is executing
the same example with n = 10 as shown in the pictures.
The idea of path contraction is that we invest something extra right now, in order to exclude that in the future we have to walk this long way again.
There are alternative implementations of this idea. We can also save the second run, by keeping a trailer and just reducing the depth of the search path by a factor two. This reduces the number of elements to address and may therefore be slightly faster in practice:
int find(int k) {
int l = k;
int m = k;
while (a[l] >= 0) {
a[m] = a[l];
m = l;
l = a[l]; }
return l; }
The combination of union-by-size (or by height, although the height information may become inaccurate due to the finds) and path contraction leaves very little to desire. A partial analysis is given separately in the next section. Even understanding how good exactly the algorithm is is not trivial.
Theorem: The time for an arbitrary sequence of m unions and finds is bounded by O(m * log^* n) provided m >= n.
Here log^*, pronounced log-star, is the function informally defined as "the number of times the log function must be applied to reach 1 or less". More formally, log* n = min{i >= 0| log^(i) n <= 1}. Here log^(i) n denotes i the function obtained by i times applying the log-function. More generally, for any function f, f^(i) is defined byf^(1)(n) = f(n)log* 1 == 0, log* 2 == 1, log* 4 == 2, log* 16 == 3, log* 65536 == 4, log* 2^65536 == 5. In practice log* cannot be distinguished from a constant. The actual results is even much stronger: the time for any sequence of m >= n unions and finds is bounded by O(m * alpha(m, n)), where alpha(,) is called the inverse Ackermann function. For any m slightly larger than linear, alpha(m, n) is constant.
f^(i)(n) = f(f^(i - 1)(n)
It costs very little extra to perform the union by size, but still one may wonder whether it is necessary. Possibly just performing the finds with path contraction might be enough. This makes the union procedure even simpler and faster. Trying examples suggests that this has almost no negative impact on the time of the finds. However, doing this, there is a sequence of n unions and n finds requiring Omega(n * log n) time. This shows that, at least in theory, one really needs the combination of union-by-size and path contraction to obtain the best achievable performance.
A(1, j) == 2^j, for j > 0Ackermann's function grows terribly fast. It is instructive to fill in a small square of values. Now one can define alpha(m, n) == min{i >= 1| A(i, m / n) > log n}. Because of the growth rate of A(i, j), is alpha(m, n) practically bounded by 4, even for m / n == 1.
A(i, 1) == A(i - 1, 2), for i > 1
A(i, j) == A(i - 1, A(i, j - 1)), for i, j > 1.
Theorem: The time for an arbitrary sequence of m unions and finds is bounded by O(m * alpha(m, n)) provided m >= n. This bound is sharp.
The proof of this is technical and omited. Instead we prove the weaker theorem above, which is still incredibly strong in itself. Comparing the the two theorems, we see that the "weakness" of the first is most notable for slightly larger values of m: Because A(2, j) == 2^2^ ... ^2, with in total j exponentiations, we have that A(2, log* n) == n. Thus, if m / n > log^* n, then alpha(m, n) == 2, and thus we find that already for m which are only a little bit larger than n, m operations can be performed in O(m) time and not something that is super-linear in m.Let us summarize the algorithm:
Ranks may only change with unions, so for proving results on the numbers of nodes with given ranks, we can forget the path compressions.
When only performing a set of unions, a node of rank r has at least 2^r descendants (counting the node itself as well).The proof goes by induction over time t. In other words, we reformulate the claim as an invariant property: at all times any tree with rank r has at least 2^r nodes. First we check that the claim is true for t = 0: initially all trees have size 1 and rank 0, so this is ok. At any given time two trees are hooked together. If the rank of the new tree is unchanged, then certainly no tree violating the condition is resulting. The rank only increases when two trees with the same rank r are hooked. Each of them has at least 2^r nodes because of the invariant property. The resulting tree thus has at least 2^{r + 1} nodes, preserving the invariant.
The ranks decrease strongly monotonically on a path away from the root.Because of the union-by-rank rule, this is obvious when we would not perform path contraction. However, if a node v is a descendant of w after path contraction, then it was already a descendant of w before path contraction, and thus must v have smaller rank than w.
There are at most n / 2^r nodes of rank r.Consider a node of rank r. Without path compression it would be the root of a subtree of size at least 2^r. All these subtrees are disjoint. The path compression has no consequences for the rank, and the unions are also performed independently of them, as they only consider ranks and not the actual depths. So, there can be at most n / 2^r nodes of rank r.
The actual proof is slightly technical but the idea is really simple and beautiful. The ranks are somehow divided in rank groups consisting of consecutive ranks. F(g) gives the largest rank in group g. So, group g comprises all of the F(g) - F(g - 1) ranks F(g - 1) + 1, ..., F(g). Let G(n) be the total number of rank groups.
Consider a find starting in a node v and leading to the root of its tree r. Whenever we follow a link (w, w') leading to another rank group, or when w' = r, or when w = r (the final step), we account one cost unit to the find operation. Because there are only G(n) rank groups and 2 final steps, this allocates at most G(n) + 2 cost units to any find. Notice that this result holds independently of the path-contraction we are performing.
For following all other links (that is all non-final links within a rank group), the cost unit is accounted to the node w and not to the find operation. In order to bound these costs over the course of the operations, we need that we apply path-contraction. Because of this, we know that w will get a new link, leading to a node w'' higher in the tree. By our earlier proof, we know that the rank of w'' is strictly larger than the rank of w'. Thus, if w belongs to rank group g, we are allocating at most F(g) - F(g - 1) cost units to w under this rule. Notice that the above argument applies just as well to the alternative path-contraction technique in which the path length is only halved.
The total cost is now bounded by
m * (G(n) + 2) + sum_{g = 0}^{G(n) - 1} #{nodes in rank group_g} * (F(g) - F(g - 1))The number of nodes in rank group g, starting with rank F(g - 1) + 1, can easily be estimated because we know that there are at most n / 2^x nodes with rank x. This gives that there are at most sum_{r = F(g - 1) + 1}^F(g) n / 2^r <= n / 2^F(g - 1) nodes in rank group g. So, our formula becomes
m * (G(n) + 2) + sum_{g = 0}^{G(n) - 1} n * (F(g) - F(g - 1)) / 2^F(g - 1)Which we simplify to
m * G(n) + n * sum_{g = 0}^G(n) F(g) / 2^F(g - 1)A clever choice is F(g) = 2^F(g - 1). Then the left term gives m * log* n, and the right term only n * G(n) = n * log* n.
First build tree structures, performing n - 1 union operations by picking at any time two of the remaining roots at random. Then perform k * n find instructions. Count the number of links traversed until reaching the roots. The cost measure is the average number per operation over the last n find instructions.
Perform the above tests for one given large n, for example n = 2^25 and consider how the numbers develop for k = 1, 2, 3, ... . Perform the above tests for k = 1 and n = 2^x, for x = 10, 11, 12, ... . The experiments for small x must be repeated sufficiently often. Plot the results as a function of x. Which strategy appears to be the best choice in practice considering both performance and overhead?".
Sorting is one of the few problems for which a non-trivial lower bound can be proven relatively easily. We will do this, showing that under reasonable assumption sorting requires at least Omega(n * log n) time. The remarkable thing is that at the same time we know algorithms running in O(n * log n). So, the sorting problem has been solved in the sense that there are optimal algorithms, algorithms whose running time matches the lower bound.
boolean isSmaller(String s1, String s2) {
if (s1 == "" && s2 == "") // equal strings
return false;
if (s1 == "") // shorter string with common prefix
return true;
if (s2 == "") // longer string with common prefix
return false;
Char c1 = head(s1); // first character of s1
Char c2 = head(s2); // first character of s2
if (c1 == c2) // common first character
return isSmaller(tail(s1), tail(s2)); // compare remainders
return c1 < c2; } // compare first character
The lexicographical ordering is not limited to strings. The only typical thing is that strings may have arbitrary length. More generally an ordering can be defined on a product set S_1 x S_2, whenever there is an ordering both on S_1 and on S_2. These do not need to be the same. The ordering relations are denoted "<_1" and "<_2", respectively. The ordering "<" is then defined by (x_1, x_2) < (y_1, y_2) = (x_1 <_1 y_1) or (x_1 == y_1 and x_2 <_2 y_2). If a product of more than two sets S_1 x S_2 x ... x S_n is interpreted as S_1 x (S_2 x ... x S_n), the definition of the lexicographical ordering extends in a natural way.
The objects in the array may be large (for example, records of personal data). Fortunately, this does not need to bother us: an array of objects actually consists of an array of pointers to the objects. Thus, independently of their size two objects can be swapped in constant time:
if (a[i].key > a[j].key)
{
myClass x = a[i];
a[i] = a[j];
a[j] = x;
}
Alternatively one might copy all instance variables one-by-one, but
for objects with many instance variables this would be much more
expensive.
The most common cost measure is the number of comparisons made. One might also count the number of assignments, but because essentially these cost the same as comparisons, it does not make sense to reduce the number of assignments at the expense of substantially more comparisons.
So, the main goal in this chapter is to find sorting algorithms that minimize the number of comparisons. Only at the end we will encounter a special case which can be solved by counting rather than comparing.
void sort(int[] a, int n)
{
for (int r = n - 1; r > 0; r--)
for (int i = 0; i < r; i++)
if (a[i] > a[i + 1])
{
int x = a[i];
a[i] = a[i + 1];
a[i + 1] = x;
}
}
Click here to see the above piece
of code integrated in a working Java program.
The underlying algorithm is called bubble sort. The name comes from the bubble-like way the elements move through the array. The algorithm is correct because the following invariant property holds:
Lemma: At the end of round r, n >= r >= 0, the largest n - r elements have reached their destination positions.
Proof: The proof goes by induction. For r == n, it is clearly true. So, assume the statement holds for a given r. Because the largest n - r elements are already positioned correctly and are not addressed any more, they are of no importance: we are just working on the subarray of r elements. Consider the maximum m of this subarray. Assume at the beginning of round r - 1 it is located in a[j]. When i == j, m is exchanged and put at position i + 1. Then i is increased and m is exchanged again. In this way it bubbles until it reaches position r - 1 at the end of round r - 1, proving the induction step. End.
Thus, at the end of round 1, the largest n - 1 elements have reached their destination positions. For the smallest element, this leaves no place to stay but position 0. Thus, by then all elements have reached their destination positions.
One can easily invent variants of bubble sort. In the following we present some of these. All of them have quadratic running time, but they may be faster by a constant factor.
void sort(int[] a, int n)
{
for (int r = 0; r < n; r++)
for (int i = r & 1; i + 1 < n; i += 2)
if (a[i] > a[i + 1])
{
int x = a[i];
a[i] = a[i + 1];
a[i + 1] = x;
}
}
Click here to see the above
piece of code integrated in a working Java program.
The number of operations is about the same as with bubble sort. The correctness is not obvious. It can be proven most easily by arguing that any sequence consisting of only zeroes and ones gets sorted correctly. Then applying the so-called "zero-one sorting lemma" gives the result. This proof can be found in most text books on parallel algorithms.
Applying odd-even transposition sort the order in which the operations in a round are performed does not matter, which means that on a computer with more than one processor these operations can be executed in parallel. It also means that it is very easy to restructure the inner loop so that it can be executed faster on a modern processor with deep pipelines and some kind of parallelism in the processor itself.
void sort(int[] a, int n)
{
for (int r = n - 1; r > 0; r--)
{
int x = 0;
for (int i = 1; i <= r; i++)
if (a[i] > a[x])
x = i;
int y = a[x];
a[x] = a[r];
a[r] = y;
}
}
Click here to see the above
piece of code integrated in a working Java program.
The number of comparisons is the same as before, but the maximum number of assignments is now reduced to n^2 / 2 + O(n). For randomly permuted elements, the number of assignments is even much smaller (it is bounded by O(n * log n)). This means that in practice the time consumption is determined by the comparisons. This is better than with the basic algorithm, because there most comparisons result in an exchange, even for random inputs.
In an implementation it is handy to first determine the maximum element and placing it in the last position. This element then can be used as a sentinel: having the largest element in place implies that there is no possibility to run out off the array. Thereby, it is not necessary to test whether the indices are still valid. After the maximum element has been swapped to position n - 1 of a[], we can be sure that a[n - 2] <= a[n - 1], and therefore we can skip round n - 2.
void sort(int[] a, int n)
{
int x = 0;
for (int i = 1; i < n; i++)
if (a[i] > a[x])
x = i;
int y = a[x];
a[x] = a[n - 1];
a[n - 1] = y;
for (int r = n - 3; r >= 0; r--)
{
x = a[r];
int i = r;
while (x > a[i + 1])
{
a[i] = a[i + 1];
i++;
}
a[i] = x;
}
}
Click here to see the above
piece of code integrated in a working Java program.
If the array is sorted in the wrong order, the new number must be compared with all existing numbers. In that case the number of comparisons and assignments is about n^2 / 2. For a random input, the new number only goes half-way on average. So, for random inputs the expected number of comparisons and assignments is about n^2 / 4. So, the number of passes through a loop tends to be only half as large as for selection sort. This makes that in practice the algorithm runs almost twice as fast. Of course the number of comparisons of elements can easily be reduced to O(n * log n), determining the position of the new number with binary search. However, this complicates the algorithm and some loop condition must be tested anyway, so this will be profitable only when comparing keys is more expensive than comparing integers.
static int[] merge(int[] a, int[] b)
{
int i, j, k, n = a.length + b.length;
int[] c = new int[n];
for (i = j = k = 0; k < n; k++)
{
if (a[i] <= b[j])
{
c[k] = a[i];
i++;
}
else
{
c[k] = b[j];
j++;
}
}
return c;
}
One has to be careful with the end of the arrays a[] and b[]. In the
current routine we may run beyond it. A handy idea is to add a
sentinel: an element with key infinity at the end of each of
them: this will prevent going beyond the ends because all elements in
the array have smaller key values, and thus these will be written
first. As an alternative, one can check again and again whether i <
a.length and j < b.length. Once this is no longer true, one should copy
the possible remainder of a[] or b[] to c[].
Merging takes linear time: the algorithm writes one element for every comparison performed. The last comparison can actually be saved, because as soon as there is only one non-sentinel left, it can be written away without comparison. The operation would be perfect if we would not need the extra space. Theoretically this is not so beautiful, and practically it means that we need memory access to twice as many data. These data must be brought in the cache which takes some extra time. It is quite likely that this time for bringing data in cache dominates the total costs and thus this may take almost twice as long as a comparable routine not requiring additional space. If the merge procedure is called many times, the allocation and deallocation may be expensive too.
static void merge(int[] a, int b[], int l, int m, int h)
{
int i = l, j = m, k = l;
while (i < m && j < h)
if (a[i] <= a[j])
b[k++] = a[i++];
else
b[k++] = a[j++];
while (i < m)
b[k++] = a[i++];
while (j < h)
b[k++] = a[j++];
for (i = l; i < h; i++)
a[i] = b[i];
}
static void sort(int[] a, int n)
{
int[] b = new int[n]; // scratch space
for (int d = 1; d < n; d *= 2)
{
for (int l = 0; l < n; l += 2 * d)
if (l + 2 * d <= n)
merge(a, b, l, l + d, l + 2 * d);
else if (l + d < n)
merge(a, b, l, l + d, n);
}
}
static void merge(int[] a, int b[], int l, int m, int h)
{
int i = l, j = m, k = l;
while (i < m && j < h)
if (a[i] <= a[j])
b[k++] = a[i++];
else
b[k++] = a[j++];
while (i < m)
b[k++] = a[i++];
while (j < h)
b[k++] = a[j++];
}
static void sort(int[] a, int n)
{
int[] b = new int[n], c; // scratch space and dummy
int r = 0;
for (int d = 1; d < n; d *= 2)
{
r++;
for (int l = 0; l < n; l += 2 * d)
if (l + 2 * d <= n)
merge(a, b, l, l + d, l + 2 * d);
else if (l + d < n)
merge(a, b, l, l + d, n);
else
for (int i = l; i < n; i++)
b[i] = a[i];
c = a; a = b; b = c; // swap a and b
}
if ((r & 1) == 1) // odd number of rounds
for (int i = 0; i < n; i++)
b[i] = a[i];
}
Click here to see the above piece of
code integrated in a working Java program.
The running time in practice depends on two factors: the number of operations performed and the amount of data that must be brought into the cache. The first is closely related to the number of comparisons made, the second is linear in the number of rounds performed. Most likely, it will be profitable to reduce the number of rounds even if this requires extra comparisons (within reasonable limits).
One idea is to start by sorting runs of some well chosen length by an asymptotically slower procedure that is faster for small numbers. Maybe one can find an optimal method for sorting all sequences of 8 elements. Then one saves three passes through the loop. If n = 1024, this means 7 passes instead of 10. In general, this idea reduces the number of rounds by a constant. If the initial sorting is implemented in an optimal way, this does not increase the number of comparisons.
With one comparison we can find the maximum of two elements. With two comparisons we can find the minimum of three elements:
int minimum(int a, int b, int c)
{
if (a <= b)
// b is not the minimum
{
if (a <= c)
return a;
return c;
}
// a is not the minimum
if (b <= c)
return b;
return c;
}
This can be used to merge three arrays into one: compare the current
elements of the arrays and write away the smallest. Using this, we
can repeatedly multiply the length of the sorted sequences by three
and thus perform the whole sorting in log_3 n rounds. As log_3 n =
log_2 n * ln 2 / ln 3 = 0.63 * log_2 n, this reduces the number of
rounds by a constant factor. The number of comparisons has increased
from about n * log_2 n to 2 * n * log_3 n, which is larger by a factor
2 * ln 2 / ln 3 = 1.26. This is so little, that it will certainly be
profitable to perform this 3-way merge. One can push further
and apply 4-way or even multi-way merge. In the latter case, the
header elements of the arrays to be merged are inserted in a priority
queue and one repeatedly calls deleteMin().
void sort(int[] a, int n)
// sort n elements standing in positions 0, ..., n - 1 of a[]
{
if (n > 1)
{
int i;
int a_smaller[n], a_equal[n], a_larger[n];
int n_smaller = 0, n_equal = 0, n_larger = 0;
int s = a[randomly generated number x, 0 <= x < n];
// Split the set of elements in three subsets using s
for (i = 0; i < n; i++)
if (a[i] < s)
a_smaller[n_smaller++] = a[i];
else if (a[i] == s)
a_equal[n_equal++] = a[i];
else
a_larger[n_larger++] = a[i];
// Solve two recursive subproblems
sort(a_smaller, n_smaller);
sort(a_larger, n_larger);
// Combine the results
for (i = 0; i; < n_smaller; i++)
a[i] = a_smaller[i];
for (i = 0; i < n_equal; i++)
a[i + n_smaller] = a_equal[i];
for (i = 0; i < n_larger; i++)
a[i + n_smaller + n_equal] = a_larger[i];
}
}
The worst case is that the rank of s lies close to 0 or n. In that case, only a few elements are split off. If this happens again and again, we get a recursion tree of depth Omega(n), and the total running time becomes Theta(n^2).
What happens really? Clearly we can not hope to choose medians every time. But, 50% of the selected s are good, in the sense that n/4 < rank(x) < 3/4 n. This probability is independent of n. Every time we are hitting a good splitter, the problem size is reduced by at least a factor 3 / 4. So, the sequence of recursions in which an element x is involved certainly terminates after log_{4/3} n good splitters have been selected from the set to which it belongs. Even the splitters which are not good give some reduction of the problem size, so the expected number of recursive levels in which x is involved is bounded by 2 * log_{4/3} n. Thus, the expected time spent on comparing and rearranging x is O(log n). Because of the so-called "linearity of expectation", the expected time for the whole algorithm equals the sum of the expected times spent for each element. Thus, the expected running time of the quick-sort algorithm is bounded by O(n * log n). A more careful analysis shows that the probability to need substantially more than the expected time is very small.
In practice the most common strategy of this kind is to take "middle of 3" or "middle of 5". It is a good idea to adapt the value of k to n: the larger n, the more one can invest on selecting a good splitter.
Even more so than for merge sort, it is better to no perform the recursion until we have reached a subproblem of size 1: for very small subsets it is rather likely that the division is very uneven. It is much better to apply bubble-sort for all sets with size smaller than some constant m. The optimal choice for m depends on the details of the hardware, but it will not be large. 10 might be reasonable. One can also switch to merge sort: because merge sort is efficient itself, this may be done already for quite large subproblems.
In the case of quick sort a simple modification makes the algorithm in-situ and even saves on the number of elements that are copied. The idea is not to use the additional arrays a_smaller[], a_equal[] and a_larger[], but to swap the elements in the array a[] itself. This can be realized by starting from both sides of the array, searching for elements which stand at the wrong side. If all elements have different values, the splitting may be implemented as shown in the following sorting routine:
void sort(int[] a, int l, int h, Random random)
// sort h - l + 1 elements standing in positions l, ..., h of a[]
{
if (h > l) // At least two elements
{
int low = l;
int hgh = h;
int s = a[random(l, h)]; // choose random element
// Splitting
while (low < hgh)
{
while (low <= h && a[low] <= s)
low++;
while (hgh >= l && a[hgh] > s)
hgh--;
if (low < hgh) // swap a[low] and a[hgh]
{
int x = a[low]; a[low] = a[hgh]; a[hgh] = x;
low++;
hgh--;
}
}
// Recursion
sort(a, l, hgh, random);
sort(a, low, h, random);
}
The additional tests, low <= h and hgh >= l, are necessary, otherwise
we might run out of the interval if s happens to be the smallest or
largest value in this subarray.
After the splitting, low > hgh and the following holds:
As we have seen, the fourth solution is easy to implement if we use one array for each subset to create, but this is a waste. Even now we would like to perform the sorting in-situ. This is possible, though not as easily as before. It is doubtful whether in practice this is better than the second solution. The idea is to maintain 5 regions instead of 3. There are 4 variables, eql, low, hgh, eqh. At all times eql <= low <= hgh <= eqh. The following properties are maintained invariant:
As long as low < hgh, out of a situation in which the four invariants hold, a new such situation with larger low and/or smaller hgh can be established as follows:
while (low <= h && a[low] <= s)
{
if (a[low] == s)
{
int x = a[low]; a[low] = a[eql]; a[eql] = x;
eql++;
}
low++;
}
while (hgh >= l && a[hgh] >= s)
{
if (a[hgh] == s)
{
int x = a[hgh]; a[hgh] = a[eqr]; a[eqlr = x;
eqr--;
}
hgh--;
}
if (low < hgh)
{
int x = a[low]; a[low] = a[hgh]; a[hgh] = x;
low++;
hgh--;
}
Nevertheless, one might wonder whether O(n * log n) is optimal. For n = 2, one can sort with one comparison. For n = 3, one can find the smallest element with two comparisons, and one more comparison sorts the other two. For n = 4 it already becomes a nice puzzle to just use 5 comparisons. A convenient notation helps: sorting algorithms can concisely be presented by a set of arrows as in the following:
a b a b c a b c d
1 <-> <-> <->
2 <-----> <->
3 <-> <----->
4 <----->
5 <->
The above shows how to sort 2, 3, 4 numbers with 1, 3, 5 comparisons.
Every x <--> y denotes the two elements to compare. If x > x, then the
elements are swapped. So, for any input sequence, at the bottom we
should find the elements in sorted order, the smallest on the left.
The notation with arrows can easily be converted into an algorithm:
void swap(int[] a, int i, int j) {
int z = a[i]; a[i] = a[j]; a[j] = z; }
void sortTwo(int[] a) {
if (a[0] > a[1]) swap(a, 0, 1); }
void sortThree(int[] a) {
if (a[0] > a[1]) swap(a, 0, 1);
if (a[0] > a[2]) swap(a, 0, 2);
if (a[1] > a[2]) swap(a, 1, 2); }
void sortFour(int[] a) {
if (a[0] > a[1]) swap(a, 0, 1);
if (a[2] > a[3]) swap(a, 2, 3);
if (a[0] > a[2]) swap(a, 0, 2);
if (a[1] > a[3]) swap(a, 1, 3);
if (a[1] > a[2]) swap(a, 1, 2); }
sortFour() is correct because a[0] holds the minimum after Step 3,
because a[3] holds the maximum after Step 4, and the two middle
elements are sorted in Step 5.
In this case the second approach is more successful: we prove that sorting requires Omega(n * log n) comparisons. For n numbers, there are n! arrangements. Our sorting algorithm must somehow decide which of them corresponds to the current input. We assume that the algorithm is only making comparisons, and is not performing operations on the values. By making a comparison, we can exclude some of the possible arrangements: comparing x = a[i] with y = a[j] either excludes all arrangements with x < y or all arrangements with x > y. So, every comparison gives a certain reduction of the number of possible arrangements. Only once we have reduced the number of possible arrangements to a single one, we may stop and output this single remaining arrangement as the sorted order.
Drawing the set of all possible arrangements at the top and then indicating how, by making comparisons, each of the n! possibilities is singled out gives a tree: In the root represents the set of cardinality n!, the n! leaves the sets of cardinality 1, and the branching in each node is given by the comparisons. This tree is called a decision tree.
Just like the decision tree, a sorting algorithm is a recipe on how to perform comparisons until only one possible arrangement remains. Actually there is a one-one correspondence between them. Algorithms can readily be turned into trees (though it would be no fun to actually do this for n > 5) and trees can be turned into algorithms with one if-statement for every internal node of the tree. For example, corresponding to the above decision tree we have the following algorithm:
void sortFour(int A, int B, int C, int D)
{
if (A < B)
if (C < D)
if (A < C)
if (B < C)
printf("The order is ABCD\n");
else
if (B < D)
printf("The order is ACBD\n");
else
printf("The order is ACDB\n");
else
...
else
if (A < D)
...
else
...
else
if (C < D)
if (B < C)
...
else
...
else
if (B < D)
...
else
...
In the case of sorting we are quite happy: the one-one correspondence between comparison-based sorting algorithms and decision trees, makes it much easier to abstract from the details of the algorithms. In the case of sorting it suffices to prove that there is no decision tree of depth smaller than x. Because we are working in a world of binary comparisons, the tree is binary (in the worst case that all keys are different). A decision tree for sorting n elements must have at least n! leaves. We know that a binary tree of depth k has at most 2^k trees. So, for t_n, the minimum number of comparisons for sorting n numbers, we have the following condition
t_n >= round_up(log_2(n!)).
For small values of n, we can use this formula to find t_2 = 1, t_3 = 3, t_4 = 5, t_5 = 7, t_6 = 10. More in general we get:
Theorem: Sorting n numbers requires Omega(n * log n) comparisons.
Proof: n! = Prod_{i = 1}^n i > Prod_{i = n/2}^n i > (n/2)^{n/2}. Thus. log_2 n! > n/2 * (log_2 n - 1) = Omega(n * log n). Stirlings formula can be used to obtain a more accurate estimate: n! ~= (n/e)^n, thus log_2 n! ~= n * log_2(n / e) = n * (log_2 n - log_2 e) ~= n * (log_2 n - 1.44). End.
So, except for constant factors we now know how many comparisons are needed for sorting n numbers. At first one may believe that in principle one does not need more than round_up(log_2(n!)) comparisons, because a decision tree can always be constructed as follows: at any node choose the comparison that divides the set of remaining possibilities as evenly as possible in two. For all small values of n, this idea leads to trees with the minimal depth of round_up(log_2(n!)). However, for n = 12 this does not work: up(log_2 n!) = 29, and it has been shown that 30 comparisons are necessary (and sufficient) for sorting 12 numbers.
First we consider the special case M == 2. So, all values in a[] are 0 or 1. This is an important special case: for example, the zeroes can stand for personal records of men, the ones for records of women. How should we sort? Quite easy: create two buckets of sufficient size, throw all zeroes in one bucket and all ones in the other. Finally copy all elements back to a[], first the zeroes, then the ones:
void sort(int[] a, int n)
// All values in a[] are 0 or 1.
{
int[] b0 = new int[n];
int[] b1 = new int[n];
int c0 = 0, c1 = 0;
// Throw ones and zeroes in different buckets
for (int i = 0; i < n; i++)
if (a[i] == 0)
{
b0[c0] = a[i];
c0++;
}
else
{
b1[c1] = a[i];
c1++;
}
// Write all elements back to a[]
for (int i = 0; i < c0; i++)
a[i] = b0[i];
for (int i = 0; i < c1; i++)
a[i + c0] = b1[i];
}
Click here to see this piece of code
integrated in a working Java program. This is certainly a very fast
sorting routine. For n = 10^6 it is 10 times faster then quick sort!
In the above example and in the following ones, there is a possibly confusing double usage of a[i]. In a comparison like "a[i] == 0", a[i] denotes the key value. In an assignment like "b0[c0] = a[i]", it stands for the object. Only because we are here sorting integers these are the same. In this case the sorting can be simplified even further: it satisfies to count the number c0 of zeroes in a[], and then we can set the first c0 positions to 0 and the remaining ones to 1. But this is cheating: for the sake of simplicity we are considering here how to sort integers, but always we have applications in mind in which these are keys from larger objects.
Suppose now that M is a small constant. How do we extend the above idea? Still simple: create M buckets and throw the elements with value i in bucket i; finally write them back to a[] bucket by bucket. Now it becomes handy to use the key values as indices instead of using a chain of if-statements. So, this is an application of direct addressing.
void sort(int[] a, int n, int M)
// All values in a[] are in {0, ..., M - 1}
{
int[][] b = new int[M][n];
int[] c = new int[M];
for (int j = 0; j < M; j++)
c[j] = 0;
// Throw elements in buckets according to values
for (int i = 0; i < n; i++)
{
int j = a[i]; // key used as index
b[j][c[j]] = a[i];
c[j]++;
}
// Write all elements back to a[]
int s = 0;
for (int j = 0; j < M; j++)
{
for (int i = 0; i < c[j]; i++)
a[i + s] = b[j][i];
s += c[j];
}
}
Click here to see this piece of
code integrated in a working Java program. For M == 2, the performance
is as good as before, but for larger M it deteriorates.
What is worse, is that the algorithm needs n + M + M * n memory. This is acceptable only for very small M. An algorithm is said to be efficient if its consumption of a resource (time or memory) exceeds that of the best solution known by at most a constant factor. In this formal sense the above algorithm is not memory-efficient for M = omega(1), that is, for non-constant M.
The problem of excessive memory usage can be overcome at the expense of some more operations. If we first count the number n_j of elements with value j, 0 <= j < M, then b[j][] can be defined to be of length n_j. Because sum_j n_j = n, this is memory-efficient even for larger M. It is even more efficient to arrange all these buckets after each other in a single array b[]. This leads to the following efficient implementation:
void sort(int[] a, int n, int M)
{
int b[] = new int[n];
int c[] = new int[M];
// Count the number of occurrencies of each value
for (int i = 0; i < M; i++)
c[i] = 0;
for (int i = 0; i < n; i++)
c[a[i]]++;
// Determine the starting points of the buckets
int sum = 0;
for (int i = 1; i < M; i++)
{
int dum = c[i];
c[i] = sum;
sum += dum;
}
// Throw elements in buckets according to values
for (int i = 0; i < n; i++)
{
b[c[a[i]]] = a[i];
c[a[i]]++;
}
// Write all elements back to a[]
for (int i = 0; i < n; i++)
a[i] = b[i];
}
Click here to see this piece of
code integrated in a working Java program. For M == 2, the performance
is even somewhat better than the above versions: saving memory often
goes hand-in-hand with saving time. In C we can save the final copying
from a[] to b[] by passing a[] by reference and instead of the loop
making the assignment *a = b.
The above sorting algorithm is known as bucket sort. Clearly its time and memory consumption is bounded by O(n + M). For all M = O(n) this is memory-efficient. Not a single comparison between elements of a[] is made. Bucket sort does not contradict the proven lower bound, because we very explicitly use the integer-nature of the keys: using direct addressing gives us something like the power of a n-way comparison in constant time.
A disadvantage of bucket sort is that there are quite a lot of operations. More costly is that the memory access in the loop in which the elements are allocated to their buckets is very unstructured. As long as the cache is large enough to hold one cache line for each bucket, this is no big deal, but for large M this causes n cache faults. Therefore, for large M, bucket sort is hardly faster than an optimized version of quick sort.
The correctness is less obvious. It essentially requires that the sorting in phase 2 preserves the order of elements with the same first digit. More generally, a sorting algorithm which does not change the relative order of elements with the same key is called stable. The given implementation of bucket sort is stable and is therefore suitable as sorting method in phase 2 of radix sort. However, some optimized bucket sort versions are not stable and could not be used. This is not an exceptional situation. Also merge sort and quick sort can be implemented in a stable way, but there are also non-stable implementations. For example, the given in-situ implementation of quick sort is not stable.
Lemma: Radix sort is correct.
Proof A sorting algorithm is correct if and only if for an arbitrary input and an arbitrary pair of numbers z and z' with z < z' the number z are correctly arranged in the array after the sorting. That is, if z stands in position i and z' in position i', we must have i < i'. Let z = x * m + y and z' = x' * m + y'. z < z' implies that (x, y) < (x', y') in the lexicographical ordering. There are two cases to distinguish:
The main disadvantage of bucket sort is that for large M the loop in which the elements are allocated to their buckets causes n cache misses. For the extremely important special case that M == n, this is quite unfortunate. However, if the main memory can accommodate n integers, then on any reasonable system the cache can accommodate sqrt(n) cache lines. So, when applying a two-level radix sort for the special case that M == n, we may assume that m = sqrt(M) cache lines can be accommodated, which implies that the handing-out operations have reasonable cache behavior. In addition this idea reduces the memory consumption from 3 * n to 2 * n + sqrt(n). Comparing the performance of the given bucket sort program with that of the radix sort program, shows that for large n a two-level radix sort is several times faster than bucket sort, clearly showing that the sheer number of performed operations is no longer what determines the time consumption. This also becomes clear when comparing bucket sort and quick sort. For large n, due to its O(n * log n) time consumption, quick sort performs considerably more operations. Nevertheless, it is about equally fast as bucket sort.
By a subkey we mean one of the keys constituting the entire lexicographical key. If the subkeys lie in a finite range, it may appear natural to use bucket sort again and again. For example, when sorting a set of words, one may start by throwing words in 26 buckets according to their first letters, than splitting the buckets by looking at the second letters, etc. However, one should be aware that bucket sort costs O(M) even if there are very few elements to sort. So, as soon as there are fewer than M elements, another sorting algorithm should be used. Otherwise, when sorting words with five levels of bucket sort, we would create 26^5 buckets, most of them not getting a single word.
Radix sort is different: sorting n numbers up to M - 1 for M = m^2, we do not create m^2 buckets at the second level, but only n. At the second (and deeper) level all elements are distributed over one set of buckets again. The advantage is the small number of operations, the disadvantage is that the sorting does not involve ever fewer elements.
while (i < m && j < h)
if (a[i] <= a[j])
b[k++] = a[i++];
else
b[k++] = a[j++];
The comparison of a[i] and a[j] is essential, but we do not really
need two tests in the condition of the while-loop.
if (h > l)
if (h < l + c)
// Run insertion sort on <= c elements
else
// Run quick sort on > c elements
Plot the resulting times for c = 1, 2, ..., 20. What is the
optimal choice for c? How much faster is this best variant than
the original routine?
A priority queue ADT is a set on which at least the following two operations are supported:
It is also possible to maintain the array sorted. In that case the insertions are expensive, and the deletemins are O(1). The advantage of this is that it allows to implement a findmin in O(1).
A much better idea is to use a balanced search tree. In that case, deletemin can be performed in logarithmic time, the same as required for inserts (and all other operations). Maintaining an extra pointer to the smallest element (using 2-3 trees this is always the leftmost leaf), the findmin operation can even be performed in O(1) time.
A heap is a tree for which any node v has a key that is smaller (or equal) than the keys of all its children (if any).The above property will be referred to as the heap property. It clearly implies that the smallest key must stand in the root of the tree, and that the second smallest element is one of its children. Thus, findmin can be done in O(1): just return the key of the root.
A deletemin is slightly harder. It is not hard to remove the root, but we should write another element instead of it, so that afterwards the tree is again a heap. But, this is not too hard either. If the root r of a heap is deleted, it may be replaced by the smallest of its children. In this way the heap property is preserved at the level of the root. Recursively deleting the roots of the heaps at the lower levels gives a correct deletion. When reaching a leaf node, the recursion stops, deleting the whole node. This process can be viewed as a free place, a hole, moving downwards until exiting the tree. Therefore, this is also called deletion by performing a percolate down. In pseudo-code the deletemin looks as follows:
void percolateDown(Node v)
{
if (v has children) // v is not a leaf
{
determine the child w of v with the smallest key;
v.key = w.key; // maybe even other data to copy
percolateDown(w);
}
else
remove v;
}
int deleteMin(Node r)
{
int x = r.key;
percolateDown(r);
return x;
}
An insert is similar. The new node v can in principle be attached to any node w. If the key of v is smaller than the key of w, then the heap property is restored by exchanging v and w, but possibly v may have to bubble up even further. At a certain level, possibly at the root, v will have found a place in which it is no longer violating the heap property and we are done. This operation in which the inserted element is bubbling upwards through the tree is most commonly called a percolate up.
void percolateUp(Node w)
{
if (w has a parent v) // w is not the root
if (v.key > w.key)
{
int x = v.key; v.key = w.key; w.key = x;
percolateUp(v);
}
}
int insert(int x)
{
create a new node w;
w.key = x;
attach w to an appropriate node v;
percolateUp(w);
}
Percolate up and down are symmetric, but there are also important differences: when percolating up, the key value needs to be compared with only one other value at each level. In a percolate down it must to be compared with the minimum of the children which must be determined first. So, the cost of an insert is O(depth_of_tree), while the cost of a deletemin is O(depth_of_tree * degree_of_nodes). Furthermore, while percolating up, we move along the unique path leading towards the root. When percolating down, it is not a priori known which path will be followed. Another important difference is that a deletemin always goes the whole way until reaching a leaf, while an insert stops as soon as the new key has reached a level where it does not conflict with the key of its parent.
From the above, we have seen that one has a lot of freedom for doing things. This will be exploited to come with a very simple and very efficient implementation. The tree will always be kept perfectly balanced: that is, it will always be a binary tree with all levels completely filled, except for possibly the lowest.
This means, that if we are adding a node, performing insert, we must insert it at the next free place in the lowest level. If the last level is just full, we must create a new level, inserting the element as the left child of the leftmost node in the bottom level.
A deletemin cannot be performed as before, because then we cannot control where the hole is going. Therefore, we are modifying the routine. The last position in the bottom level is freed, possibly cancelling the whole level. The key v of this node is temptatively placed in the root, and then it percolates down by exchanges with the smaller child. The whole deletemin now looks like
void percolateDown(Node v)
{
if (v has children) // v is not a leaf
{
determine the child w of v with the smallest key;
if (w.key < v.key)
{
int x = v.key; v.key = w.key; w.key = x;
percolateDown(w);
}
}
}
int deleteMin(Node r)
{
int x = r.key;
let v be the rightmost node at the bottom level of the tree;
r.key = v.key; // maybe even other data to copy
remove v;
percolateDown(r);
return x;
}
Lemma: The deletemin procedure is correct: it removes the entry with minimum key value; preserves the heap property and returns the minimum key value.
Proof: Because we may assume the heap property was given before the u operation, the root is the entry with minimum key. This value is overwritten and returned. It remains to check that the heap property is preserved. A crucial observation is that the it might only be disturbed along the processed path. A formal correctness proof goes by some kind of induction. The assumption is that at any time, the current node v is the only node of the tree in which the heap property may possibly be violated. Once this is proven, the correctness follows, because when the process terminates, the heap property is assured in v because either v.key <= w.key, w being the node with minimum key value among the children of v, or v is a leaf, for which the heap condition is void. Initially the tree is unchanged except for the root, the node at which percolateDown starts. So, assume the hypothesis holds at the beginning of some step. Then if v.key < w.key, the values of v and w are exchanged. After this, key.v < key.w and because w was the node with minimum key among the children of v, it is even true that after swapping key.v <= key.w' for all other children of v. So, the heap property has been established in v. The only node whose key has changed is w, the node which is considered in the next round. End.
Using a perfect binary tree, a heap with n entries has depth round_down(log n), so both operations can be performed in O(log n) time. In a really efficient implementation we do not perform exchanges but keep the element for which the position in the heap still has to be determined in an additional memory position and shift the elements on the path simply one level up or down. Doing this, the number of assignments is reduced from 3 * length_of_path to 1 * length_of_path + 2.Another observation that is essential for very efficient implementations in practice is that a perfect binary tree can very well be maintained in an array, avoiding all pointers. The idea is to number the nodes level by level from left to right, starting with the root which gets number 0. In that case, for any node with index i, the leftchild has index 2 * i + 1 and the rightchild has index 2 * i + 2. This allows to access the children of a node by a simple computation, which requires two clock cycles (maybe even one because often additions and multiplications can be executed in parallel), which is certainly not more than the cost for fetching the address of leftchild. At the same time it gives a considerable reduction of the memory consumption, saving n pointers. If we start with index 1 for the root, then the left child of node i has index 2 * i and the right child 2 * i + 1, saving half of the additions. This indexing idea even works for d-ary heaps which are based on perfect d-ary trees. A perfect tree which is maintained in an implicit way in an array without any pointers is called an implicit tree.
When performing a deletemin, the element that is put tentatively in the root will typically have a rather large key because before it was a leaf. This is not always so: it is possible that many nodes with small keys stand deep in the tree, but in general this will not be the case. Therefore, typically this element will be percolated down rather far before the process stops: even in practice the cost of a deletemin is proportional to log n.
The situation for insertions is different: better. In practice it turns out that insertions go very fast. Much faster than O(log n). The reason is simple, a precise analysis is quite hard. Of course an analysis requires an assumption: practice cannot be analyzed. So, we assume that the keys are sampled uniformly from an interval, say they are reals in [0, 1]. Let us try to estimate the expected number of calls to the routine percolateUp under this assumption.
Consider the case that we are only performing inserts and no deletemin operations. Randomly and uniformly select a key. It is essential that the previous nodes were sampled from the same probability distribution. The node is only moving up k levels, if it has the smallest key in its subtree of depth k - 1. The lowest level of this tree may have been empty except for the node itself, but all the other levels are full. So, only if the node is the smallest among 2^k or more nodes (also counting the node itself) it is moving up k levels. This means that the expected distance it is moving upwards can be estimated as follows: the probability that the node is moving up exactly k levels is at most 1 / 2^k for all k. Denoting the upwards movement by the random variable X, we get
Exp[X] <= sum_{k > 0} k / 2^k = 1 * 1/2 + 2 * 1/4 + 3 * 1/8 + ... = 2.Here Exp[X] denotes the expected value of X, which is defined by
Exp[X] = sum_{all possible values i of X} i * Prob[X == i].
The above analysis is not entirely correct: the keys in the lowest levels of the heap are not entirely uniformly distributed. The fact that they are standing there implies that they are somewhat larger than average. However, this dependency is very weak (because there are so few elements at the higher levels of the tree), and the analysis is correct up to a small correction. The computed constant is not that important. The important point to remember is that the expected time for an insert is O(1).
sum_{i = 1}^n log i > sum_{i = n / 2}^n log i > sum_{i = n / 2}^n log (n / 2) = n/2 * (log n - 1) = Omega(n * log n).
Can we hope to do better? Yes. The fact that the expected cost for an insertion is O(1) hints, but not more than that, that we may hope to do it in O(n) time. Notice the fundamental difference with a search-tree structure: because the elements of a search tree with n elements can be output in sorted order in O(n) time (by running an inorder traversal), the Omega(n * log n) lower bound on sorting implies that the construction of any search tree, balanced or not, takes Omega(n * log n) time. For heaps there is no such fundamental obstacle against an efficient construction: the elements are only very weakly sorted.
A first idea is to randomize the input and then perform n inserts. This overcomes the problem that the elements may stand in the wrong order: with high probability (meaning that the probability of failure is bounded by O(n^-alpha) for some constant alpha > 0) the whole sequence of insertions costs only O(n) time.
However, this bound can also be established deterministically by a rather simple algorithm. The idea is that we do not maintain a heap at all times (this is not necessary as we are not going to do deletemins during the buildheap). We simply create a perfect binary with n nodes, and then heapify it. That is, we are going to plow through it until everything is ok.
How to proceed? Our strategy must in any case guarantee that the smallest element appears in the root of the tree. This seems to imply that the root must (also) be considered at the end, to guarantee that it is correct. From this observation one might come with the idea to work level by level bottom up, guaranteeing that after processing level i, all subtrees with a root at level i are heaps. Let us consider the following situation: we have two heaps of depth d - 1, and one extra node with a key x, which connects the two heaps:
Now, for the whole algorithm we start at level 1 (the leafs being at level 0) and proceed up to level log n. In pseudo-code this gives the following algorithm:
void heapify(Heap h, int n)
{
// The nodes at level 0 constitute heaps of depth 0
for (l = 1; l <= round_down(log_2 n); l++)
for (each node v of h at level l) do
percolateDown(v); // Now v is the root of a heap of depth l
// Now all nodes at level l are the root of a heap of depth l
// Now the root is the root of a heap of depth round_down(log_2 n)
}
The correctness immediately follows from the easily checkable claims
(invariants) written as comments within the program, which hold because
the above observation about the effect of percolating down.
How about the time consumption? At level l we are processing fewer than n / 2^l nodes, and each operation takes O(l) time. Let c be the constant so that the time for processing a node at level l is bounded by c * l, then the total time consumption can be estimated as follows:
sum_{l = 1}^log n c * l * n / 2^l < c * n * sum_{l >= 1} l / 2^l = 2 * c * n. = O(n).
The given algorithm can easily be generalized for any kind of heaps, but for heaps in which all nodes have the same degree implemented in arrays, it can be written even simpler. Assume an array a[] of length n should be turned into a binary heap, then we can do the following:
void percolateDown(int i)
{
int j = i;
int k = (i << 1) + 1;
if (k < n && a[k] < a[j])
j = k;
k++;
if (k < n && a[k] < a[j])
j = k;
if (j != i)
{
k = a[j]; a[j] = a[i]; a[i] = k;
percolateDown(j);
}
}
void buildHeap()
{
for (int i = (n >> 1) - 1; i >= 0; i--)
percolateDown(i);
}
Click here to see the
above piece of code integrated in a working Java program.
Practically there are reasons to choose d to be a power of two: in that case the array implementation requires only bit shifts for the computation of the location of the parent and the children (and no divisions which might be more expensive). For d-heaps mapping the nodes to consecutive numbers in a way so that the indices of the children can be computed easily is the same as before: start with the root, and number on level by level. Giving number 0 to the root, the children of a node with index i are the nodes with indices d * i + 1, ..., d * i + d.
We prove that this is correct. Denote by f_d(k, i) the index in a perfect d-ary tree at position i of level k (the root being at position (0, 0)). We should prove that the children of node with index f_d(k, i) have indices d * f_d(k, i) + 1, ..., d * f_d(k, i) + d. This can be shown by first analyzing the relation between the indices of the leftmost nodes. f_d(k + 1, 0) = sum_{l = 0}^k d^l = d * sum_{l = 0}^{k - 1} d^l + 1 = d * f_d(k, 0) + 1. Now consider the other nodes. Node f_d(k, i) has index f_d(k, 0) + i and its children have indices f_d(k + 1, 0) + d * i, ..., f_d(k + 1, 0) + d * i + d - 1. Substituting f_d(k + 1, 0) = d * f_d(k, 0) + 1 the result follows.
Choosing a tree of degree d reduces the depth of the tree from log_2 n to log_d n. Thus, a deletemin now takes O(d * log_d n). This is more than before. On the other hand, the insert has become cheaper: it only takes O(log_d n). Practically this is not such an interesting improvement as even degree 2 gives us expected time O(1), but theoretically it might be. For example, if we take d = log n, then the cost for the inserts has been reduced to O(log n / loglog n), which is asymptotically faster.
A more important reason, just as for d-ary search trees, is that every access of a node means a cache or page fault. If the tree is shallow, then the number of these accesses is reduced, which in practice will imply a reduction of the time to perform the operations. The right choice of d depends on the type of application. As long as all data fit in the main memory a good choice might be d = 4: this reduces the depth of the tree by a factor two at the expense of few extra operations.
If we consider an application in which the data do not fit into the main memory, then most accesses imply a page fault. In that case the tree should be kept as flat as possible by taking d very large. In that case a good idea is to take d = sqrt(n), assuring that the whole tree has depth 2. More generally, for any d = n^eps, for some constant eps > 0, the depth is constant, assuring that inserts can be performed in constant time.
A problem with such large d is, of course, that when percolating downwards, the minimum has to be selected out of d elements which becomes rather costly: a deletemin takes O(n^eps), which is not good. A solution is to maintain the children of a node not in an array or list, but in a priority queue. Keeping these priority queues up-to-date is not trivial, but clearly any percolate-step, both up and down, can be performed in O(log d) time when using conventional binary heaps for these priority queues of size d. The time for both inserts and deletemins is then bounded by log_d n * O(log_2 d) = O(log_d n * log_2 d) = O(log n). This is the same as before, and due to the more complicated structure this will actually be slower if the data set is small. However, if we have a problem whose size exceeds that of the main memory by a factor 100, then with this approach everything can be organized with at most 2 page faults per operation, whereas otherwise we would need 6 or 7.
Lemma: A binomial tree of depth d has 2^d nodes.
Proof: The proof goes by induction. The lemma is ok for d = 0 because 2^0 = 1. This is the basis. So, assume the lemma is ok, for all depths up to d - 1. Then, the tree of depth d has 1 + sum_{i = 0}^{d - 1} 2^i = 2^d nodes, because sum_{i = 0}^{d - 1} 2^i = 2^d - 1 (this might again be proven by induction). End.
There is an alternative definition of binomial trees, which gives rise to the same structures:
Using the second definition of binomial trees and the binary addition, it is easy to merge a binomial forest with n_1 nodes with a binomial forest with n_2 nodes: starting with the smallest trees in each forest, if there are two or three trees of the same depth d, two of them are linked to one tree of depth d + 1. The number of these operations is bounded by the length of the binary expansions of n_1 and n_2 and is thus bounded by O(log (n_1 + n_2)). In the literature the operation of combining two data structures to a single one is mostly called melding rather than merging.
As an example we consider n_1 = 22 = 10110 and n_2 = 10111. There is a single BNT_0, which remains unchanged. The two BNT_1 are linked and give a BNT_2. Now there are three BNT_2, two of which are linked and give a BNT_3. This is the sole BNT_3 and it survives. Finally, the two BNT_4 are merged to one BNT_5.
Each node of the forest is used for storing one entry. Each tree is organized as a heap (here we encounter an example of heaps with a non-uniform structure), but there is no condition on how the keys are distributed over the trees. As a result the smallest element may stand in any of the trees. We give an example: a priority queue with 29 entries can be realized as a binomial forest with 29 nodes, binomial trees of size 16, 8, 4 and 1, each tree being a heap:
How to perform the operations? Findmin is easy, it can be performed in O(log n) time, by determining the minimum value of the keys stored in the roots of all of the at most log n trees in the forest.
The other operations are build on the merge operation, so let us first consider how a merge can be performed efficiently. We already know how to merge two binomial forests into one new binomial forest. The only open question is how to assure that all resulting trees have the heap property afterwards. However, this is trivial: when joining two BNT_d to one BNT_{d + 1}, the idea is to always hook the tree with the larger root to the tree with the smaller root. In this way the heap property holds for the root of the new tree and because the remaining structure is unchanged it holds for all nodes. Thus, each of these join operations can be performed in O(1) time and thus two forest with n_1 and n_2 nodes, respectively, can be merged in O(log (n_1 + n_2)) time.
For a deletemin, we look to the at most log n roots of the binomial trees. The minimum is the minimum of these roots. This minimum element is removed. If this is the root of a BNT_d, removing the root results in a bunch of new trees, a BNT_0, a BNT_1, ..., a BNT_{d - 1}. Each of these trees is a heap itself, and thus they constitute a binomial Forest with 2^d - 1 nodes in their own right. This forest is merged with the rest of the binomial forest to obtain the resulting binomial forest. Finding the correct root and removing it takes O(log n), merging the two forests also takes O(log n). So, even deletemin can be performed in O(log n) time.
Inserts can be performed also directly without relying on the merge routine. The idea is that we look for the smallest index j so that b_j = 0 (referring to the binary expansion of n). Then we know that the trees T_0, ..., T_{j - 1} + the new element have to be merged into one new tree with 2^j nodes which is replacing the smaller trees in the binomial forest. One can do this in several ways. One possible way is to add the new element as a new root and then to percolate down. This is correct but not very efficient: at the top level, we have j comparisons, at the next level up to j - 1, and so on. The whole operation takes O(j^2) operations. So, it appears better to nevertheless stick more closely to the merging pattern: first we add 1 to T_0 and create a new tree T'_1, which is added to T_1. This gives a new tree T'_2, which is added to T_2. Etc. This is simple and requires only O(j) time.
We analyze the expected time for an insert. The time for inserting an element to a binomial forest with n elements depends on the binary expansion of the number n. Let this be (b_d, ..., b_3, b_2, b_1, b_0), and let z_n be the smallest number so that b_{z_n} = 0. Then the insert involves only the trees with BNT_i with i < z_n. Thus, such an insert can be performed with z_n comparisons. If we have just an arbitrary tree, whose number of elements is uniformly distributed, then with 50% probability b_j = 0 for any j. Thus, the expected number of comparisons for an insert can be given by
T_exp <= sum_{j >= 0} j / 2^j = 2.This shows that the expected time for insertion is constant, O(1).
The above result, assumes nothing about the distribution of the keys it only assumes that we have no a priori knowledge about the size n of the binomial forest. Therefore, this is already much stronger than the earlier result for binary heaps, where we needed that the keys were uniformly distributed, a fact which lies outside the influence of the programmer: for certain inputs it is good, for others it is not.
For this analysis we need some theory. First we consider a problem from daily life with the same spirit. Consider a person who wants to keep track of his expenses. There are numerous smaller and larger expenses, so this requires quite a considerable bookkeeping and it is likely that some expenses are forgotten. Assume this person has a regular income of 1000 units per month and he/she had 1270 units on his account at the beginning of the year and 490 units at the end of the year. Then without knowing how much was spent when and where, we can immediately conclude that the sum of all expenses during the year has been 12 * 1000 + 1270 - 490 = 12780.
When trying to determine cost functions in computer science quite often one has to perform "clever bookkeeping". Costs are allocated to operations that did not really cause them in order to later not have to care when they arise. This idea will prove effective here too. It is quite common to make this bookkeeping explicit using tokens. A token is a cost unit. It costs one unit to deposit a token. This can be viewed as a prepayment for future operations: to consume a token, that is executing an operation which earlier has been deposited a token for, is namely considered to be free. More precisely, the amortized time is given by
t_amortized = t_actual + number_of_deposited_tokens - number_of_consumed_tokens.The total amount of deposited tokens gives the potential of the data structure. The amortized time equals the actual time plus the change of the potential.
If the amortized time as defined above for operations on a data structure of size n can be bounded to t(n) and p(n) gives an upper bound on the potential, then any sequence of m operations takes at most m * t(n) + p(n) time, which means that for m >= p(n) / t(n) the average time per operation for any sequence of operations is bounded by (m * t(n) + p(n)) / m <= 2 * t(n) = O(t(n)). So, the intuitive notion of amortized time as being the average time over a sufficiently long sequence of operations asymptotically coincides with the formalized definition in terms of a potential.
Lemma: The amortized time for performing insertions on a binomial tree is constant. For a structure with n nodes, the used potential has maximum value O(log n).
Proof: Above we noticed that an insert on a forest of size n costs O(1 + z_n) time. Thus, the real cost of an operation is proportional to 1 + z_n. z_n gives the the number of ones in the binary expression of n which have to be turned into zeroes. This number can be as large as log n. However, for any number n, there is exactly one position in which n has a zero where n + 1 has a one. So, it does not cost much to deposit one token for every newly created one. Furthermore, if we start with one token for every one, we can assume that at all times, there is a token available for each one in the binary expression of n. Said otherwise, as potential we use that number of ones in the binary expression of n. For the amortized time this gives t_amortized = 1 + z_n + 1 - z_n = 2. End.
Corollary: Any sequence of m >= log n consecutive insertions to a binomial forest with n elements can be performed in O(m) time.
A decreaseKey is performed by updating the key and percolating the element up. Both on d-ary heaps and on binomial forests this takes logarithmic time. An increaseKey can be performed similarly: update the key and percolate down. On d-ary heaps this takes O(d * log_d n), which is ok for any constant d, but on binomial forests a percolate down may be expensive. A delete can be performed in several ways. Lazy deletions are cheap and have little impact on the time for the other operations. Lazy deletion in combination with insert offers a very simple way to realize increase and decrease key on any priority queue. On heaps one can replace the node to delete by the last leaf which subsequently is percolated down, analogously to doing a deletemin.
This problem is called the selection problem. Selection can be solved trivially by sorting and then picking the right element. If the numbers do not allow a fast linear-time sorting method, then this is not a good method because of the O(n * log n) time consumption.
Another simple method is to perform k rounds of bubble-sort (in a variant that makes the smallest numbers reach their positions first) and then take the number at position k. This costs O(k * n), which is good for small k, but worse than O(n * log n) for large k.
The same time is obtained when generalizing the idea to find the minimum: a pocket with the currently smallest k numbers is maintained. Each new number to consider is compared with these k and if it is smaller then the largest one, this largest one is kicked out.
This last idea can be improved. Suppose we have a priority queue with an insert and deletemax operation requiring at most O(log k) time per operation. Then we can first enter the first k elements into the priority queue. We perform a deletemax and store the value in a variable x. The further elements are compared with x. If such a value y < x, then y is inserted in the priority queue and the result of deletemax is assigned to x. In this way any element of the array is processed at a cost of at most O(log k) time for a total time consumption of O(n * log k). In pseudocode the algorithm looks as follows:
int select(int k, int[] a, int n)
{
create and empty priority queue pq;
for (i = 0; i < k; i++)
pq.insert(a[i]);
x = pq.deletemax();
for (i = k; i < n; i++)
if (a[i] < x)
{
pq.insert(a[i]);
x = pq.deletemax();
}
return x;
}
The above algorithm is correct, because it can easily be shown that at
all times x equals the element with rank k out of the subset of
elements a[j] with j < i.
A simple alternative is to apply buildHeap to all n elements, building a heap in O(n) time, and then perform k deletemins. This has the same worst-case complexity. This idea can be improved further: there is no need to really delete the elements from the heap. The algorithm can also search from the root for the k smallest elements. The crucial observation is that an element with rank i + 1 is a neighbor of one of the elements with ranks i or less. From among these elements, it is the one with smallest value. Thus, the problem of finding the k smallest elements can be solved by a special kind of search through the tree:
int select(int k, int[] a, int n)
{
if (k > n)
return a special error value;
construct a heap hp containing all elements in a[];
create an empty priority queue pq;
insert the key of the root of hp into pq;
for (i = 1; i < k; i++)
{
x = pq.deleteMin(); /* x has rank i */
let u be the node of hp of which x is the key value;
if (u.leftchild != null)
pq.insert(u.leftchild.key);
if (u.lrightchild != null)
pq.insert(u.rightchild.key);
}
return pq.deleteMin();
This routine takes O(n + k * log k) time because the size of pq is
increased by at most one in each iteration, and its size thus remains
bounded by k. So, for all k = O(n / log n), this selection algorithm
runs in linear time. This final idea is interesting because it uses
heaps and priority queues in a non-trivial way. For k = o(n / log n)
it is even quite efficient, because the the time is dominated by the
time to build the heap, which goes fast even in practice. However,
there are deterministic and randomized selection algorithms running in
O(n) time for all k. Particularly the randomized algorithms are simpler
and more efficient.
An advantage of heap sorting is that the answer is produced gradually. If we use a heap and call buildHeap first, then after O(n + k * log n) time the first k elements are available. Using quick sort, no output is available until shortly before the end. Of the discussed sorting methods, only bucket sort shares this property, but its efficiency is far worse. So, if the sorted sequence is the input to a follow-up routine, then applying heap sort allows to pipeline the two operations.
One might have thought that constructing a heap is closely related to sorting, but the above shows that it is actually much less. It is much closer related to finding the minimum.
In connection with graphs there are many notions. Some of them are important already now. A neighbor of a node u is any node v for which there is an edge (u, v). In this case one also says that v is adjacent to u. The nodes u and v are called the endpoints of the edge. A path from u to w is a sequence of edges (u, v_1), (v_1, v_2), ..., (v_{k - 1}, w) connecting u with w. A path is called simple if all nodes on the path are different. A cycle is a simple path which has the same begin- and endpoint. In the example this means that u == w. A graph without cycles is called acyclic. The length of a path is the number of used edges, in the example the path has length k. The distance from u to v is the length of the shortest path between u and v. A graph is called connected if for any pair of nodes u, v there is a path running from u to v. For directed graphs one mostly speaks of strongly connected if we take the direction of the edges into account for these paths, otherwise one speaks of weakly connected. A connected component is a maximum subset of the nodes that is connected. For directed graphs one speaks of strongly (weakly) connected components, often these are also called strong (weak) components. An undirected graph is said to be a tree if it is acyclic and connected. A directed graph is a tree when it is a tree considered as an undirected graph. Mostly it is required in addition that all edges are directed towards or away from a specific node called root. A forest is a set of trees. A spanning tree is a tree that reaches all nodes of a connected graph. A spanning forest is a set of trees, one for each connected component of a graph. The degree of a node u is the number of edges incident on u. For directed graphs it is customary to distinguish indegree, the number of edges leading to u, and outdegree, the number of edges starting in u. The degree of the graph is the maximum of the degrees of all nodes. If an edge (u, v) occurs more than once (that is, the "set" of edges is actually a multiset), then we will say that there are multiple edges. A self-loop is an edge (u, u). A graph without multiple edges and self-loops is said to be simple. It is common to assume that graphs are simple. The number of nodes of a graph is often denoted with n, the number of edges with m. Simple graphs have at most n * (n - 1) edges if they are directed and at most n * (n - 1) / 2 edges if they are undirected. If m = O(n) (as in road graphs), then the graph is said to be sparse. If m is much larger than linear the graph is called dense. This notion is not precisely defined, sometimes it means m = omega(n), sometimes it means m = Omega(n^2).
A common way to represent graphs in the computer is by creating for each node a set of all its neighbors. This is a particularly good idea for sparse graphs. In general one might use linear lists for these sets. This is called the adjacency list representation of the graph. Such an implementation requires O(n + m) memory. The strong feature of using adjacency lists, is that it becomes very easy to determine all neighbors of a node. Using linked lists, it is also trivial to add or delete a particular edge. If all edges have about the same degree with a maximum g, then it is much more convenient to represent the graph as an array of arrays of length g, storing for each node its degree as well. Even for lists of variable length an array may be used, marking for each node where in the array its neighbors are beginning. This implementation requires n + m + O(1) memory. The major disadvantage of adjacency lists is that it takes time proportional to the degree of node u to test whether an edge (u, v) exists or not.
For dense graphs the most appropriate representation is with an adjacency matrix: an n x n boolean matrix, where a 1 at position (u, v) indicates that there is an edge (u, v). If the graph is undirected, the adjacency matrix is symmetric. This representation requires n^2 bits of memory, independently of the number of edges m. Thus, for really large graphs this representation cannot be used. Furthermore, any operation that requires the access of all neighbors of a node takes O(n) time. On the other hand, this is the representation to use if frequently the existence of single edges must be tested. Furthermore, for rather dense graphs of moderate sizes storing n^2 bits may require less memory than storing n + m ints.
In case no specific processing order of the nodes of a graph is required, the following non-recursive procedure might be the simplest and most efficient:
void traversal(int[] number)
{
for (v = 0; v < n; v++)
number[v] = -1; // Flag value
counter = -1;
Bag b = new Bag(n);
for (r = 0; r < n; r++)
if (number[r] == -1) // r is the root of a new component
{
counter++;
number[r] = counter;
b.add(r);
while (b.notEmpty())
{
v = b.remove();
for (each neighbor w of v)
if (number[w] == -1) // w has not been visited yet
{
counter++;
number[w] = counter;
b.add(w);
}
}
}
}
Clearly every node is numbered only once, because only nodes that were not visited before are assigned a value. Because counter is increased just before numbering a node and never decreased, all numbers are different. All nodes are pushed exactly once, and upon popping all their outgoing edges are considered. This means that the algorithm has running time O(n + m).
The produced tree will be directed towards the root: for each node the next node on the path to the root will be computed. The roots are characterized by the fact that they are pointing to themselves. Because for each node only one value has to be stored, independently of the degree of the tree, the whole tree can simply be maintained in an array:
void spanningForest(int[] parent)
{
for (v = 0; v < n; v++)
parent[v] = -1; // Flag value
Bag b = new Bag(n);
for (r = 0; r < n; r++)
if (parent[r] == -1) // r is the root of a new component
{
parent[r] = r; // r points to itself
b.add(r);
while (b.notEmpty())
{
v = b.remove();
for (each neighbor w of v)
if (parent[w] == -1) // w has not been visited yet
{
parent[w] = v; // w is reached from v
b.add(w);
}
}
}
}
The time consumption is clearly the same as before: O(n + m). However, for a formal correctness proof, the above argument should be refined. It should be shown that all nodes in a connected component belong to the same tree. So, assume there is a node w belonging to the same component as r, which is not reached during the traversal starting from r. Consider a path from r to w. Such a path exists because r and w belong to the same component. Let u be the last node on the path for which the parent-value has been set during the traversal starting from r, and let v be the next node on the path. u has been added to the bag upon setting its parent-value. Because the traversal only stops once the bag is empty, u also has been removed from it. Upon that occasion all neighbors of u, including v, are inspected and when they were not yet reached at that point of time, their parent-value is set. This contradicts the assumption that there is a node v on the path from r to w which has not been reached during the traversal starting from r.
void bfsTraversal(int[] number)
{
for (v = 0; v < n; v++)
number[v] = -1; // Flag value
counter = -1;
Queue q = new Queue(n);
for (r = 0; r < n; r++)
if (number[r] == -1) // r is the root of a new component
{
counter++;
number[r] = counter;
q.enqueue(r);
while (q.notEmpty())
{
v = q.dequeue();
for (each neighbor w of v)
if (number[w] == -1) // w has not been visited yet
{
counter++;
number[w] = counter;
q.enqueue(w);
}
}
}
}
The algorithm is identical with the previous one, except that we use a queue instead of a bag. Depending on how the bag is implemented, this may have a strong impact on the produced numbering, but not on the complexity: all nodes are enqueued and dequeued exactly once and upon dequeuing all their outgoing edges are considered. Thus, the complexity is O(n + m).
void dfsTraversal(int[] number)
{
for (v = 0; v < n; v++)
number[v] = -1; // Flag value
counter = -1;
Stack s = new Stack(m);
for (r = 0; r < n; r++)
if (number[r] == -1) // r is the root of a new component
{
s.push(r);
while (s.notEmpty())
{
v = s.pop();
if (number[v] == -1) // v has not been visited yet
{
counter++;
number[v] = counter;
for (each neighbor w of v)
s.push(w);
}
}
}
}
As before this algorithm numbers each node only once with different numbers from 0 to n - 1. Here the nodes are marked as visited only after they are popped from the stack. This implies that nodes may be pushed on the stack many times, and that the size of the stack may become as large as m (even 2 * m for undirected graphs). This is not such a large disadvantage: the graph itself also requires so much storage. If one nevertheless wants to prevent this, one should either apply the recursive algorithm, where the command "s.push(w)" is replaced by a conditional recursive call to the DFS procedure, or one should push instead of just w also v, and the index of w within the adjacency list of v. When popping this entry w, the next neighbor of v should be written instead of it.
void dfs(int v, int* preCounter, int* postCounter,
int[] preNumber, int[] postNumber)
{
(*preCounter)++;
preNumber[v] = *preCounter; // preorder number
for (each neighbor w of v)
if (preNumber[w] == -1)
dfs(w, &preCounter, &postCounter, preNumber, postNumber);
(*postCounter)++;
postNumber[v] = *postCounter; // postorder number
}
void dfsTraversal(int* preNumber, int* postNumber)
{
for (v = 0; v < n; v++)
preNumber[v] = -1; // Flag value
preCounter = postCounter = -1;
for (r = 0; r < n; r++)
if (preNumber[r] == -1)
dfs(r, &preCounter, &postCounter, preNumber, postNumber);
}
Here we computed two types of numbers: preorder DFS numbers and postorder DFS numbers depending on whether they were assigned before or after the recursion. The preorder numbers are the same that were simply called "dfs numbers" before. The importance of computing both types of numbers will become clear soon.
The last code fragment is written in a C-like style. In Java where integer parameters cannot be passed by reference, one should either make the counters global (ugly but efficient), or pass it in some kind of object, for example as an object from the class "Integer". Click here to see all presented traversal algorithms integrated in a working Java program.
With respect to a given spanning tree, for which we have computed pre- and postorder numbers, edges can be classified in constant time per edge as follows:
This classification does not distinguish tree edges from forward tree edges. If this matters, they can be distinguished testing whether u == parent[v] or not, where parent[v] gives the node from which v was reached. The classification allows to test in O(n + m) time whether a given spanning tree of a graph G is a DFS tree of G: compute the pre- and postorder numbers and test whether there are forbidden cross edges.
For undirected graphs this gives a particularly easy test for cycles:
boolean isAcyclic()
{
int[] parent = new int[n];
for (v = 0; v < n; v++)
parent[v] = -1; // Flag value
Bag b = new Bag(n);
for (r = 0; r < n; r++)
if (parent[r] == -1) // r is the root of a new component
{
counter++;
parent[r] = r;
b.add(r);
while (b.notEmpty())
{
v = b.remove();
for (each neighbor w of v)
if (parent[w] == -1) // w has not been visited yet
{
parent[w] = v;
b.add(w);
}
else
if (w != parent[v])
return false;
}
}
return true;
}
The above algorithm is a minor modification of the spanning-forest algorithm and has running time O(n + m). On undirected graphs one must be careful not to detect "cycles of length two": an edge (u, v) followed by the same edge in the other direction (v, u). In the algorithm these cases are singled out by the test "w != parent[v]". The applied method is simple and efficient, but one may nevertheless wonder whether it is necessary to use an additional array of n ints. One of the exercises deals with this, and the conclusion is that an array of n booleans is sufficient as well.
For directed graphs, the above algorithm might detect false cycles: reaching an earlier reached node does not need to imply that a directed cycle has been closed. On undirected graphs any graph-traversal algorithm can be used, but on directed graphs it appears to be essential to process the nodes in DFS order. Doing this, cycles are created only by backward tree edges, edges leading from a descendant v to an ancestor w. The ancestors of a node w are easy to recognize: these are exactly these nodes which already have been reached by the traversal, but for which the search is not yet completed. This can be characterized with two booleans per node. More formally, during the search any node may be in three possible states:
So, the idea can also be implemented conveniently with a three-valued array:
- 0.
- unreached
- 1.
- reached but not finished
- 2.
- finished
boolean hasCycle(int v, boolean[] status)
{
status[v] = 1;
for (each neighbor w of v)
if (status[w] == 0) // Unvisited node
{
if (hasCycle(w, reached, finished)) // Cycle in subtree
return true;
}
else
if (status[w] == 1) // w is an ancestor of v
return true;
status[v] = 2;
return false; // No cycles in any of the subtrees
}
boolean isAcyclic()
{
byte[] status = new byte[n];
for (v = 0; v < n; v++)
status[v] = 0;
for (r = 0; r < n; r++)
if (status[r] == 0) // Unvisited node
if (hasCycle(r, reached, finished)) // Cycle in subtree
return false;
return true; // No cycles in any of the subtrees
}
void connectedComponents(int[] component)
{
for (v = 0; v < n; v++)
component[v] = -1; // Flag value
Bag b = new Bag(n);
for (r = 0; r < n; r++)
if (component[r] == -1) // r is the root of a new component
{
component[r] = r;
b.add(r);
while (b.notEmpty())
{
v = b.remove();
for (each neighbor w of v)
if (component[w] == -1) // w has not been visited yet
{
component[w] = r;
b.add(w);
}
}
}
}
Hereafter, for all nodes v, component[v] gives the index of the "root"
of the component. In this case, this index equals the smallest index of
the nodes belonging to the component. The graph is connected if and
only if component[v] == 0 for all v. Using a counter which is increased
only when finding the root of a new component, the components can be
numbered consecutively, which saves memory if the sizes of the
connected components are to be stored in an int array size[]. Click here to see this algorithm integrated
in a working Java program.
Much more interesting and relevant is to determine the strong components. This problem is non-trivial: just running a graph traversal does not bring anything, reaching a node v from a node r does not mean that there is also a way back from v to r. At first it may even appear that finding the strong components is of an essentially harder nature than the problems discussed so far. This is not true: O(n + m) time is enough.
The algorithm consists of two rounds of DFS. For a graph G it proceeds as follows:
Lemma: The nodes reached by a search starting from a node r during the second DFS performed on G' precisely give the strong component to which r belongs.
Proof: Denote the strong component to which r belongs by S_r. By definition of strong components, for any node u in S_r there is a path from u to r. Thus, there is also a path from r to u in G'. This implies that all of S_r is reached during the search starting from r. Now consider a node v which is reached during the search starting from r. Apparently there is a path in G' from r to v, and thus there is a path from v to r in G. It remains to prove that there is a path from r to v in G. r is considered before v. This implies that r has larger postorder number than v. If we assume that during the first DFS v was reached before r, then we get a contradiction, because we know that there is a path from v to r, and thus the postorder number of v would be larger than that of r. So we may assume that during the first DFS r was reached before v. In that case r only gets a larger postorder number when v is a descendant from r in the DFS tree. That is, there must be a path from r to v.
The above idea immediately leads to an algorithm:
It remains to find an efficient implementation of the above. The idea is to maintain for each node its current indegree in a separate array. The original values in this array can be computed in O(m). Then in one pass through the array all nodes with indegree zero are detected and entered in a bag. The nodes are taken out of the bag in arbitrary order. When removing an edge (u, v) the indegree of v is reduced by 1. In this way the whole algorithm has running time O(n + m). A nice feature of the algorithm is that it is not necessary to test beforehand whether the graph is acyclic or not: if the bag is empty before all nodes are numbered, then we know there is a cycle. If this does not happen, we know that apparently the graph must have been acyclic.
boolean topologicalSort(int[] number)
{
int[] degree = new int[n];
for (v = 0; v < n; v++)
degree[v] = 0;
for (v = 0; v < n; v++)
for (each neighbor w of v)
degree[w]++;
Bag b = new Bag(n);
for (v = 0; v < n; v++)
if (degree[v] == 0)
b.add(v);
counter = 0;
while (b.notEmpty())
{
v = b.remove();
number[v] = counter;
counter++;
for (each neighbor w of v)
{
degree[w]--;
if (degree[w] == 0)
b.add(w);
}
}
return counter == n; // Graph is acyclic
}
Click here to see this algorithm integrated in a working Java program. The algorithm is efficient both considering time and memory consumption:
Lemma: For a graph G, the postorder numbers of the graph with inverted edges G' constitute a topological sorting of G.
Proof: Assume there is an edge (u, v) in G. This implies that in G' there is an edge (v, u). So, if during the DFS on G', v is reached before u, then u will be reached as a descendant, either directly or indirectly, from v and therefore u will get a smaller postorder number than v, as it should be. If u is reached before v, then u gets a smaller postorder number than v unless v is reached as a descendant from u. However, this would mean that there is a path in G' from u to v. Together with the edge (v, u), this means that u and v lie on a common cycle. That is, G', and hence G, is not acyclic, a contradiction.
This algorithm immediately leads to an O(n + m) algorithm even though inverting the graph is slightly technical if we are working with an array-based implementation of the adjacency lists. Furthermore, this doubles the memory requirement.
void unweightedSSSP(int s, int[] dist)
{
for (v = 0; v < n; v++)
dist[v] = INFINITY;
Queue q = new Queue(n);
dist[s] = 0;
q.enqueue(s);
while (q.notEmpty())
{
v = q.dequeue();
for (each neighbor w of v)
if (dist[w] == INFINITY) // w has not been visited yet
{
dist[w] = dist[v] + 1;
q.enqueue(w);
}
}
}
The algorithm is so that nodes v that are unreachable from s have
distance[v] = infinity, which appears to be reasonable. Click here to see it integrated in a working
Java program.
Here and in the following dist[u] denotes the distance values in the algorithm, while distance(u, v) denotes the real distance in the graph from node u to v. Because the length of the concatenation of two paths is the sum of the lengths of each path, the triangle inequality holds for all triples of nodes u, v and w:
distance(u, w) <= distance(u, v) + distance(v, w).
Using induction over the number of performed enqueue operations, it is easy to prove that at all times for all nodes dist[v] >= distance(s, v). Initially this is true. So, assume it is true after t enqueuing operations. If in step t + 1 we set dist[w] = dist[v] + 1, then we may assume that dist[v] >= distance(s, v), and thus, using the triangle inequality, dist[w] = dist[v] + 1 >= distance(s, v) + 1 = distance(s, v) + distance(v, w) >= distance(s, w). Thus, the computed values are not too small. It remains to prove that they are not too large. This is not as easy as one might think.
Lemma: The values dist[v] of the enqueued (dequeued) nodes is monotonically increasing.
Proof: We use induction over the number of performed enqueue (dequeue) operations. Initially the claim holds because any sequence of length at most 1 is monotonic. So, assume the claim holds for the first t operations. Then in operation t + 1 we are enqueuing a node w with value dist[w] = dist[v] + 1. Because v is the latest dequeued node, we may assume that dist[v] >= dist[u] for any earlier dequeued node u. But then dist[w] = dist[v] + 1 >= dist[u] + 1, the value of an arbitrary node on the queue. Here we essentially use that in a queue nodes that were enqueued earlier are also dequeued earlier (the FIFO order).
Theorem: At the end of the algorithm dist[v] = distance(s, v) for all v.
Proof: It remained to show that the values are not too large. The proof goes by contradiction. Assume that dist[w] > distance(s, w) for some w, and assume that w is the node lying closest to s which gets assigned too large a value dist[w]. Let u be the last node before w on a shortest path from s to w. So, distance(s, w) = distance(s, u) + 1. Because u lies closer to s than w, we may assume that u gets assigned the correct value: dist[u] = distance(s, u). Let v be the node which was responsible for enqueuing w. This implies dist[w] = dist[v] + 1. So, dist[v] = dist[w] - 1 > distance(s, w) - 1 = distance(s, u) = dist[u]. Thus, according to the previous lemma, node u will be dequeued before node v. Thus, w should have been enqueued by u instead of v, and we should have dist[w] = dist[u] + 1. This is a contradiction, which can be traced back to our assumption that there is a node w with dist[w] > distance(s, w).
In this kind of proofs it is very common to argue by contradiction, focusing on a supposed first occasion for which the algorithm makes a mistake. If there is no first mistake, then there is no mistake at all!
void weightedSSSP(int s, int[] dist)
{
for (v = 0; v < n; v++)
dist[v] = INFINITY;
PriorityQueue pq = new PriorityQueue(n, INFINITY);
dist[s] = 0;
pq.decreaseKey(s, dist[s]);
while (pq.notEmpty())
{
v = pq.deleteMin();
for (each neighbor w of v)
if (dist[w] > dist[v] + weight[v, w]) // shorter path
{
dist[w] = dist[v] + weight[v, w];
pq.decreaseKey(w, dist[w]);
}
}
}
In order to get a formulation without case distinctions, initially all nodes are inserted into the priority queue with infinite key value. This works correct even if some nodes may not be reachable from s: for these nodes the key value remains unchanged throughout the algorithm. As soon as all reachable elements are deleted from the priority queue, these are deleted and they are processed just as reachable nodes, but certainly this does not lead to improved distance values. So, the existence of unreachable nodes only causes some useless work.
At most n elements are enqueued and dequeued. At most m decreasekey operations are performed. It depends on the priority queue used how much time this takes. Using a binary heap, the construction can be performed in O(n) time and each decreasekey and deletemin in O(log n) time. So, with an ordinary heap Dijkstra's algorithm has running time O((n + m) * log n). Better priority queues (Fibonacci heaps) allow to perform the decreasekey operations in O(1) amortized time, reducing the running time to O(m + n * log n). Thus, for all m = Omega(n * log n), this is O(m), which is clearly optimal. Refined algorithms perform better for sparser graphs.
Click here to see the algorithm integrated in a working Java program. In this implementation the priority queue is realized in a primitive way using an array. This minimizes the memory consumption and allows to perform a decrease-key operation in constant time, but makes deletemins expensive: they take O(n) time. So, the running time of the whole algorithm is O(n^2 + m), which for all simple graphs is O(n^2). For dense graphs this is ok, but for all other graphs it is much better to use a priority queue with faster deletemins. In general, implementing a decrease-key operation may require an additional search tree to find the nodes, but because in this particular case all node indices lie between 0 and n - 1, it is sufficient to maintain an extra array pos[] of length n, pos[u] giving the position of node u in the priority queue (one more application of direct addressing).
The proof of correctness of the algorithm is similar to the proof of correctness for the unweighted case. First one shows that the nodes that are dequeued have increasing distances.
Lemma: The values dist[v] of the dequeued nodes is monotonically increasing.
Proof: We use induction over the number of performed dequeue operations. Initially the claim holds because any sequence of length at most 1 is monotonic. So, assume the claim holds for the first t operations. Let w be the node we are dequeuing in step t + 1. Let u be any node dequeued before. At the time u was enqueued using deletemin we must have had dist[u] <= dist[w]. So, consider possible updates to dist[w] after u was dequeued. Let v be the node which caused the latest update of dist[w]. In that case dist[w] = dist[v] + weight[v, w]. From our induction assumption it follows that dist[v] >= dist[u]. But then, dist[w] = dist[u] + weight[v, w] >= dist[u]. It is at this point that we essentially use that weight[v, w] >= 0. Otherwise this lemma cannot be proven!
Theorem: At the end of the algorithm dist[v] = distance(s, v) for all v.
Proof: As before it can be shown that the values of distance[] at all times give an upper bound on the distances: as long as they are infinity, this is clear, once they have a finite value, the value corresponds to the length of a path: there may be shorter paths, but this path can be used for sure. So, we may assume that dist[v] >= distance(s, v) for all nodes at all times.
Consider the node w lying closest to s, having smallest value of distance(s, w), which upon dequeuing has dist[w] > distance(s, w). If weight[v, w] = 0, and the shortest path from s to w runs through v, then we will nevertheless say that v lies closer to s than w. Let v be the last node before w on a shortest path from s to w. Thus, distance(s, w) = distance(s, v) + weight[v, w]. Because weight[v, w] >= 0, we have dist(s, v) <= dist(s, w), and thus v gets correct value, that is dist[v] = dist(s, v). So, dist[w] > distance(s, w) = distance(s, v) + weight[v, w] = dist[v] + weight[v, w] >= dist[v], and therefore, because of the previous lemma, v will be dequeued before w. But at that occasion the algorithm would have set dist[w] = dist[v] + weight[v, w] = distance(s, v) + weight[v, w] = distance(s, w), in contradiction with the assumption that distance[w] > dist[s, v].
In the following picture the action of Dijkstra's algorithm is illustrated. Edge weights are indicated along the edges, the current values of dist[] are indicated in the nodes. 99 stands for infinity. Nodes that have been removed from pq have final distance value. The (connected) subgraph with these nodes is marked.
void findShortestPathEdges(int[] dist, int[] parent)
{
for (all nodes w)
parent[w] = -1;
for (all edges (v, w))
if (dist[w] == dist[v] + weight[v, w])
parent[w] = v;
}
The routine is given for directed graphs. For undirected graphs (depending on the input format) it may be necessary to consider an edge (v, w) both for w and for v. In the current version parent[w] may be set several times if there are several paths of the same length from s to w. This may be prevented by performing the assignment only when parent[w] == w, but this does not make the routine faster.
This routine takes O(n + m) time independently of the type of graph, so these costs are negligible except for unweighted graphs. For weighted graphs it will certainly be more efficient to determine the edges by this separate procedure. But, even for unweighted graphs it may be profitable to perform a second pass over all edges: this reduces the amount of data that are handled in a single subroutine, and may therefore allow to hold a larger fraction of the graph in cache.
Afterwards every node has a unique predecessor, and the graph defined by parent[] is acyclic provided that there are no zero-cost cycles. So, the whole graph defined by parent[] constitutes a tree directed towards s, spanning all nodes reachable from s. In particular: independently of m, the set of all shortest paths has size n. Once parent[] has been computed, queries of the form "give the shortest path from v" can be solved using O(n) time and memory: start at v, push all edges on the path from v to s on a stack and print them while popping them.
It is not nice that the above algorithm does not handle zero-cost cycles correctly. This problem can most easily be overcome by an alternative algorithm which is only marginally slower than the given one: run a spanning-tree algorithm on the graph G' = (V, E'), where E' = {(v, w) in E| dist[w] == dist[v] + weight[v, w]}.
Give pseudocode realizing this operation. Distinguish four cases: an adjacency-matrix representation and a representation with adjacency lists, both for directed and undirected graphs. Indicate the time consumption for each of them. The time bounds should be given in terms of n, the number of nodes; m, the number of edges; d_u, the degree of u; d_v, the degree of v; and d_G, the current degree of the graph G.
For directed graphs it is hard to find the edges (w, v). But, choosing an appropriate graph representation requiring O(n + m) memory, a slightly more complicated algorithm can perform edge contraction efficiently: Point 1, 2 and 3 can be realized in O(d_v) time. Describe the necessary graph representation and the algorithm in full detail.
Point 4 is somewhat harder. Give a trivial realization requiring O(d_v * d_G) time. Organizing the adjacency structures in a suitable way, this can be improved to O(d_v * log(d_G)). Describe how this can be achieved. Instead of changing the organization of the adjacency structures, we can also use some additional data structure, so that we reasonably can expect to perform this operation in O(d_u + d_v) time. Describe how this can be achieved.
Now take the given heap implementation and complement it with a method decreaseKey(int i, int x) based on percolating up along the lines sketched in the text above. Replace the priority queue in the program testing Dijkstra's algorithm by this heap implementation.
Repeat the experiments for the same pairs of n and m values. Compare the times. Fit a suitable function through your results. Do the experiments conform with the theoretical analysis?
Actually the edges of T can be computed without explicitly constructing G'. Give an algorithm which, in addition to the memory required for storing G, essentially only requires memory for the array parent[].