A dictionary is an ADT supporting the operations find, insert and delete. Sometimes this data structure is also called search tree. The best-known dictionary tree implementation guaranteeing O(log n) time consumption is the AVL tree, which is not considered here. Another well known structure is the 2-3 tree. Here we consider the immediate generalization of it, the a-b trees. The analysis of their amortized behavior is particularly enlighting. Then we consider two less organized structures which nevertheless guarantee good amortized or expected behavior.

a-b Trees


a-b trees provide an interesting alternative to AVL trees. They are characterized by the following properties: Here a and b are parameters with a >= 2 and b >= 2 * a - 1. The depth of an a-b tree with n leafs is at least round_up(log_b n) and at most round_down(log_a n). If a > 2, then some special cases arise for trees with 2, ..., a - 1 leafs. Adding a dummy elements makes live easier. In the following description we first consider the case a = 2, b = 3, being the smallest values for which the idea works.

The internal nodes hold one or two keys guiding the searching process. In a binary node there is one key k, being larger than or equal to the values in the left subtree and smaller than the values in the right subtree. In a ternary node there are keys k_1 and k_2: k_1 being larger than or equal to the values in the left subtree and smaller than the values in the middle subtree; k_2 being larger than or equal to the values in the middle subtree and smaller than the values in the right subtree.

2-3 Tree


Searching is just as easy as in a binary search tree. Suppose we are looking for a value x. Then, in a binary node with a single key k, we perform
  if      (x <= k)
In a ternary node, with keys k_1 and k_2, we perform
  if      (x <= k_1)
  else if (x <= k_2)
In this way we continue until we reach a leaf. There we test whether the key is equal or not and perform a suitable operation in reaction.

The construction and these more elaborate comparison imply a certain overhead, but all the rest becomes much simpler because of this. When implementing a dictionary ADT based on 2-3 trees, then it is a good idea to have different classes for internal nodes and leafs, as they allow very different operations and have different fields as well.


For an insertion, we search were the node should come. If it is not there, we create a new leaf with the appropriate key. If the internal node to which it should be attached has degree two so far, then everything is fine: the new leaf is attached, and we are done. Otherwise, this parent node (which was on the point of getting an illegal degree four) is split in two internal nodes of degree two each. The new internal node should be added to the internal node one level up. There we must check the degree and possibly split the node again. In this way we either find a node with degree two and exit, or ultimately split the root in two and add a new root.


Deletions can be performed by just marking deleted nodes and rebuilding the structure if it gets too polluted. However, in this case, there is a rather simple inverse of the insertion operation. First we look for the element to delete. If we have found it, then there are three cases to distinguish for deleting a child w from node v:

It is important to point out that a-b trees grow or shrink by adding or deleting the root. So, changes of height arise at the top, not at the bottom as in AVL trees.

It is essential that 3 + 1 = 2 + 2 and that 1 + 2 = 3. The first implies that, once the maximum degree of a node is exceeded, it can be split in two nodes with degree within the legal range; the second implies that when two nodes are fused the resulting node cannot have too many children from the start. Thus, we could proceed similarly for a 3-5 tree, because 5 + 1 = 3 + 3 and 3 + 2 = 5. But, a 3-4 tree would be clumsy, because 4 + 1 = 5 < 3 + 3. For this reason we have required for a-b trees that b >= 2 * a - 1, assuring that b + 1 >= 2 * a and that a + a - 1 <= b.

Insert and Delete on 2-3 Tree

Amortized Performance

2-3 trees have the problem that after splitting a node (as the result of an insertion) the new nodes have degree 2, which is the minimum. Therefore, it may happen that a subsequent deletion immediately leads to a fusion again. These structural changes may go all the way up to the root every time. For example, this happens if we have a 2-3 tree with n = 2^k leafs, for some k >= 1, which has only binary internal nodes. Deleting the largest element causes all the nodes on the rightmost path to fuse; subsequently reinserting this element splits them again. Even though all this takes only logarithmic time, it still means a considerable increase of the cost.

Expensive 2-3 Tree Operations

This does not sound very efficient. The AVL trees gave us (in addition to the time for searching) only O(1) time per operation for restoring the structure. For the 2-3 trees above, it is possible that alternating inserts and deletes results in restructuring again and again the whole path up to the root. We now show that an arbitrary sequence of n insertions and deletions into an arbitrary 2-5 tree with at most n leafs causes at most O(n) restructuring operations. This shows that amortizedly 2-5 trees share the good properties of AVL trees: O(1) time per operation (in addition to the time for searching).

The idea is simple and provides a very nice example of an amortized analysis. For a-b trees, we will call nodes with degree a or b critical. All other nodes are said to be non-critical. In the case of 2-5 trees, splitting a node happens when a child is added to a node which had degree 5 before. The result is two nodes of degree 3. These resulting nodes cannot be split or fused immediately again. So, here we have a situation were splitting a critical node results in two non-critical nodes. Likewise, when fusing a node, we originally must have had two critical nodes of degree 2 each and the result is a non-critical node of degree 3. So, splitting and fusing can be viewed as an investment: we are achieving more than just maintaining the defining property. This is a typical situation in which an amortized analysis may be effective: a single operation may be expensive, but this can be viewed as an investment. The rest is a matter of good bookkeeping.

In our case we will deposit a token on every node whose degree is changed by addition or removal of a child or when a child is stolen by a sibling. In particular this implies that all critical nodes carry tokens, because any node is non-critical upon creation. The tokens on a node can be viewed as a prepayment for its possibly later restructuring. As an invariant we want to maintain that any critical node has a token. Because initially all internal nodes may be critical, we must be willing to come with an initial investment of up to n - 1 tokens, one for each internal node.

Inserting a node may result in a sequence of splitting operations. The rules are so, that only critical nodes get split, so the cost of this is covered by consuming the deposited tokens. The sequence of splittings ends either by creating a new root or by increasing the degree of a node. In each case we pay one token.

Deleting a node may result in a sequence of fusion operations. The rules are so, that v and v' get fused only when neither of them had more than 2 children before the deletion. So, each of them was critical and had tokens. These tokens are consumed to cover the cost of the fusion. The sequence of fusions may end in two possible ways. Either the degree of the node v in which the fusion ends is decreased, or v steals a child from a sibling v'. In the first case a token is deposited on v, in the second case on v'.

Summarizing, we see that if we start in a situation with a token on each critical node that then

So, the time is proportional to the number of deposited tokens, which equals the number of critical nodes in the beginning + one token for each operation. For any sequence of n operations we need at most 2 * n tokens in total: 2 tokens per operation.

Alternative Analysis

The given analysis using tokens is simple and works well in this case. However, it is not always possible to work with tokens, because they cannot be used in a flexible way. A token-based analysis only works if we can assure that every node addressed has tokens, it is not good enough if we can assure that there are tokens somewhere. The more general idea is to work with a potential, a positive function of the data structure. For a data structure with 0 elements the potential should be 0. In our case the potential could be the number of critical nodes. The amortized cost of an operation is defined to be
t_amortized = t_actual - potential_before + potential_after.

Lemma: If, for some data structure with n elements, t(n) gives an upper bound on the amortized time for an operation and p(n) an upper bound on the used potential function, then any sequence of m >= p(n) / t(n) operations takes at most 2 * m * t(n) time.

Proof: For any sequence of m operations, taking actual times t_actual(i), 1 <= i <= m time, we have

    T_tot =  sum_{i = 1}^m t_actual(i)
          =  sum_{i = 1}^m (t_amortized(i) + pot_before(i) - pot_after(i))
          =  sum_{i = 1}^m  t_amortized(i) + potential(0)  - potential(m)
          <= sum_{i = 1}^m  t_amortized(i) + potential(0).
          <= m * t(n) + p(n).
For m >= p(n) / t(n), p(n) <= m * t(n), and thus, for these m, T_tot <= 2 * m * t(n). End

The lemma implies, that the given definition of amortized time is closely related to the intuitive definition of amortized time as being the average time per operation over a sufficiently long sequence of operations. It also expresses the value of m that must be taken in order to assure an asymptotically optimal claim: m must be Omega(p(n) / t(n)). When defining a potential, the main objective is that it gives a small value of t(n). But, in order to obtain the strongest performance claims, even the value of p(n) should be taken into account.

In the case of the 2-5 trees, the actual time of an operation can be equated to the number of nodes addressed. When inserting, a certain number h of nodes is split and somewhere one node is addressed which is not split. So, t_actual = h + 1. These h nodes where all critical and the splitting does not introduce new critical nodes. So, this reduces the potential by h. The node at the top may be made critical, so

t_amortized_insert = h + 1 - h + 1 = 2.
When deleting a node, along a path of length h nodes get fused. These 2 * h nodes were all critical before. Here we are using in an essential way that we first try to steal a child before fusing. At the top one node is addressed which is not fused, but which may become critical. This gives
t_amortized_delete = 2 * h + 1 - 2 * h + 1 = 2.

Here both approaches can be applied and both give O(1) amortized time. In general an argument with tokens is clearer and may have didactical advantages where it can be applied. On the other hand, analyzing the amortized time with the definition based on potentials is more mathematical and can also be applied for harder problems.

Insertion Sort

Balanced search trees are suitable for sorting: numbers are inserted successively, and finally they are output in order. This sorting strategy is called insertion sort. a-b trees are particularly suited for this purpose, because it is easy to keep the leafs connected in a doubly linked list. It is convenient that all leafs lie at the same level and that the tree is not growing at the bottom. This allows to efficiently perform range queries, operations like "output all elements with keys between x and y". Outputting all elements in order is a special case of a range query.

Insertion sort has the interesting property that it is adaptive, that is, for easy instances it runs faster than for hard instances. An easy instance in this case means an instance which is already almost sorted. For a more precise statement we define the notion of an inversion. In a sequence of numbers S_i, an inversion is a situation in which S_i > S_j for i < j. That is, S_i and S_j are wrongly arranged. In a sorted sequence there are no inversions, and reversing a sorted sequence results in one inversion for every pair of numbers, n * (n - 1) / 2. Let generally I(S) denote the number of inversions in a sequence S. We will show that with a slightly modified search procedure a 2-3 tree can be used to perform insertion sort for a sequence S in O(n + n * log(I(S) / n)).

There is no need to start the search for an element at the root: if we already have an idea where it should be inserted, then we can try to reduce the time for the searching by starting close to the expected target leaf. If this is successful, then we can get very good performance, because the amortized time for insertions and deletions is constant. In case we want to reduce the time for performing insertion sort on almost sorted sequences, it is natural to assume that the elements are supplied in increasing order. So, then we should start the search at the rightmost leaf. From there the search moves up until the new value is no longer smaller than all keys in the node and then moves down along the usual path.

In an a-b tree all internal nodes have degree at least a >= 2. So, a subtree of height h contains at least 2^h leafs. The search for a value x moves up h + 1 levels to a node v only if all the leafs in the right subtree of v have values larger than x. So, this accounts for at least 2^h inversions. The total time for all searches is given by O(n + sum_i h_i), where h_i + 1 is the number of levels the search for element S_i moves up. We know that

sum_i 2^{h_i} <= I(S).

So, we want to put an upper-bound on sum_i h_i under this condition. Because 2^h_i strongly increases with h_i, the sum of the h_i is maximized when all h_i have the same value (this argument can be formalized using the Lagrange multiplicator method for computing extremal values on a surface). In that case, for all i, 2^h_i <= I(S) / n, which implies that h_i <= log(I(S) / n), and thus sum_i h_i <= n * log(I(S) / n).

A practical disadvantage is that the tree now must have links in both directions. In a normal a-b tree this is not needed, because the way back can be pushed on a stack while searching forward: as splitting and fusion operations are performed only along the search path, this is all one needs to know. So, we need quite an elaborate data structure for solving a simple problem like sorting. On the other hand, this is a result which cannot easily be achieved in another way: most sorting methods (merge sort, quick sort, heap sort) are not adaptive at all. Bubble-sort is adaptive, but in a much weaker sense: bubble-sort is fast when each elements stands close to its final position: if the maximum distance is d, then it can be implemented in O(n * d) time. In this case the number of inversions is bounded by O(d * n) and insertion sort takes O(n + log d * n), which is better for any non-constant d. Furthermore, bubble-sort is really bad if a single element has to move far.

Splay Trees

A splay tree is a very weakly balanced binary search tree, offering an alternative to AVL trees. The basic balancing operation of splay trees is the rotation just as for AVL trees. However, no balance information is maintained. The advantage is that one saves overhead, both for storing and for updating. So splay trees are self-adjusting: some kind of balancedness is established without enforcing it. Any single operation (insert, find or delete) on a splay tree with up to n nodes may cost O(n), but, performing m >= n operations, the amortized operation time is bounded by O(log n).

Self-adjustment typically implies that one cannot prevent a bad situation from building up, but once one is performing an exceptionally expensive operation, the structure gets improved. A well-known example of a self-adjusting structure is the union-find structure with path contraction: long paths cannot be excluded, but once one has to traverse such a long path, performing an expensive find operation, all the nodes on the path are hooked directly to the root node, implying that a subsequent find for any of the nodes on the path is much cheaper.

Simple Strategy

Splay trees may have any binary structure, in particular they may consist of a chain of n nodes. Because in this case accessing the deepest node takes linear time, the structure must be changed, because otherwise this operation might be repeated again and again. A simple idea is to turn the accessed element into being the root of the tree by a sequence of single rotations. The rational behind this is that if an element is accessed, it is likely that it will be accessed again. In practice access sequences are mostly not random: rather a small subset of elements is accessed much more frequently than others (as an example one can think of the names the police types in its database).

Splay-Tree Rotation

Unfortunately, this strategy is too simple. If first the keys 0, 1, ..., n - 1 are inserted in increasing order, a tree of depth n will be the result: every new node is first inserted as a right child of the root, and then a single rotation turns it into the root. This sequence of insertion is fine: it takes O(n) time. However, now accessing the keys in increasing order is very expensive: the time of accessing key i is proportional to O(n - i), for a total of O(sum_{i = 1}^n i) = O(n^2). The reason is that after searching 0 the depth is still n - 1, and more generally after searching i the depth is n - 1 - i.

Expensive Splay-Tree Operations

Good Strategy

The above single rotations are apparently too simple to achieve any kind of self-adjustment. We would like that expensive accesses to deep-lying nodes reduces the depth of most nodes on the search path. Cheap accesses to shallow nodes must not necessarily lead to an improved structure: we cannot hope to always improve the structure, if the tree is perfectly balanced, any restructuring can only make it worse.

Now we distinguish three cases:

Notice that the zig and zig-zag case are treated in the same way as in the simple strategy: in the zig case a single rotation is performed, in the zig-zag case a so-called double rotation, which has the same effect as performing two single rotations. Only in the zig-zig case the performed operation is different. At a first glance it is not clear at all, that this gives an improvement: it appears to be an unnecessarily unbalanced operation.

Splay Operations for Node X

The above strategy is called splaying. Without providing a proof we can see already now that splaying is much better than performing single rotations. As an example we consider the tree that is constructed by inserting the elements 0, 1, ..., n - 1, in order. This generates a tree of depth n - 1, a chain with only left children. Accessing these nodes now in the same order does not only turn the elements to the root, but also reduces the depth of the tree by almost a factor two with each access.

Let us state more precisely how the three dictionary operations, find, insert and delete are performed:

Splay-Tree Operations

Amortized Performance

Any of the operations, find, insert and delete, has cost proportional to the cost of the involved splaying operations (one for find and insert, two for delete). Thus, in order to put a bound on the amortized time of the splay-tree operations, it suffices to put a bound on the amortized time of the splaying operation. The cost of the splaying operation is proportional to the number of accessed nodes: accessing node x has cost proportional to the number of nodes on the path leading to x.

It is not easy to bound these costs in an amortized argument. Somehow we must define a potential function which does not increase too much (that is, not more than O(log n)) in any operation, and at the same time strongly decreases when accessing a deep lying node. A first idea might be to take as potential for a tree T the function p(T) = sum_{u in T} depth(u), where depth(u) gives the depth of node u in T. However, this potential may increase too strongly: adding a new smallest or largest value pushes all existing nodes one level deeper. Thus, if there were n nodes, p(T_new) - p(T_old) = n, giving an amortized time of at least n.

An idea which at a first glance works better is to take p(T) = sum_{u in T} log(depth(u) + 1). Even this potential may increase too strongly: if the tree T with n nodes is perfectly balanced, then there are (n + 1) / 2 nodes at depth d = log(n + 1) - 1. Adding a new smallest or largest element, all these elements are pushed one level down. So, just considering these elements, p(T_new) - p(T_old) >= n / 2 * (log(d + 1) - log d) = n / 2 * log(1 + 1/d) ~= n / (2 * d).

A potential function that really works, is given by

p(T) = sum_{u in T} log(s(u)).
Here s(u) gives the size of the subtree rooted at u, counting u itself as well. Defining the rank of a node u as rank(u) = log(s(u)) simplifies the formulation of p():
p(T) = sum_{u in T} r(u).
A very nice property of ranks based on the size of the subtree is that a rotation changes the ranks only for the nodes involved in the rotation. The depths of the nodes in the subtrees changes, but their ranks remain the same. Clearly the rank of any node of a tree with n nodes is at most log n. So, p(T) <= n * log n.

Theorem: The amortized time to access any node of a tree with n nodes is bounded by 2 + 3 * log n = O(log n).

Proof (outline): We remind that the amortized time of an operation is given by the actual time plus the change of potential. The change of potential equals the potential after the operation minus the potential before the operation. In our case the total potential is the sum of the potential of all nodes. Because the potential of the nodes changes only along the path from the root to the node x we are accessing, the amortized time can be written as

t_amortized = sum_{u in path to x} (1 + r_new(u) - r_old(u))
Here we account one cost unit for each accessed node.

We want to rewrite this sum as a sum over the operations: a zig-zig and a zig-zag operations is attributed a real cost of 2 cost units, a zig operation is attributed 1 cost unit, and handling the root stands for one cost unit as well. The amortized cost of any of these operations, is equal to this cost plus the sum of the new potentials of all involved nodes minus the sum of their old potentials. Thus, by regrouping the sum giving T_amortized, we get

t_amortized = 1 + sum_{operation} amortized_cost_of_operation

The trick is that we will show that the amortized cost of a zig-zag or a zig-zig operation for any node x on the path can be estimated on 3 * (r_new(x) - r_old(x)) and of a zig operation on 1 + 3 * (r_new(x) - r_old(x)). This estimate takes into account the cost for the other involved nodes and the changes of their potentials. Such an estimate is very effective in this case, because even though the splaying gets higher and higher in the tree, it is always the same node x we are working on. Thus, all but two terms of the sum cancel each other, only the negative contribution from the first summand and the positive contribution from the last summand remain. This gives the result of the theorem:

t_amortized = 1 + 1 + 3 * (r_after(x) - r_before(x)) <= 2 + 3 * log n.

Lemma: The amortized time for a zig operation at node x is bounded by 1 + 3 * (r_new(x) - r_old(x)).

Proof: The amortized time of a zig operation involving nodes x and y as in the picture illustrating the operations, can be written as

t_amortized_zig = 1 + r_new(x) + r_new(y) - r_old(x) - r_old(y).
Using that r_new(y) < r_old(y) gives
t_amortized_zig <= 1 + r_new(x) - r_old(x).
Because r_new(x) > r_old(x) we have r_new(x) - r_old(x) >= 0, and therefore we may add 2 * (r_new(x) - r_old(x)) on the right-hand side. This amount is added to get inequalities of a common form for all three operations. It is desirable to have inequalities of the same form because this allows to estimate T_amortized by telescoping the sum. End.

For the analysis of the zig-zag and zig-zig operations we need a simple inequality which helps us to estimate the logarithmic contributions:

Lemma: If a + b <= c, then log a + log b <= 2 * log c - 2.

Proof: Because the log function is monotonously increasing, it suffices to check the inequality for c = a + b. In that case, the relation to check can be rewritten as log (4 * a * b) <= log ((a + b)^2). This is satisfied if and only if 4 * a * b <= a^2 + b^2 + 2 * a * b, which can easily be verified by considering the function f(a, b) = a^2 + b^2 - 2 * a * b: it assumes its minimum value 0 for a = b. End.

Lemma: The amortized time for a zig-zag operation at node x is bounded by 3 * (r_new(x) - r_old(x)).

Proof: The amortized time of a zig-zag operation involving nodes x, y and z as in the picture illustrating the operations, can be written as

t_amortized_zig-zag = 2 + r_new(x) + r_new(y) + r_new(z) - r_old(x) - r_old(y) - r_old(z).
s_old(z) = s_new(x) and s_old(y) > s_old(x). Thus, also r_old(z) = r_new(x) and r_old(y) > r_old(x). This can be used to eliminate r_old(z) and r_old(y):
t_amortized_zig-zag <= 2 + r_new(y) + r_new(z) - 2 * r_old(x).
Because s_new(y) + s_new(z) < s_new(x), we get, using the above estimate, log(s_new(y)) + log(s_new(z)) < 2 * log(s_new(x)) - 2, which is equivalent to r_new(y) + r_new(z) < 2 * r_new(x) - 2. Substitution gives
t_amortized_zig-zag <= 2 * (r_new(x) - r_old(x)).
Because r_new(x) > r_old(x) we have r_new(x) - r_old(x) >= 0, and therefore we may add r_new(x) - r_old(x) on the right-hand side. This amount is added to get inequalities of a common form. End.

Lemma: The amortized time for a zig-zig operation at node x is bounded by 3 * (r_new(x) - r_old(x)).

Proof: The amortized time of a zig-zig operation involving nodes x, y and z as in the picture illustrating the operations, can be written as

t_amortized_zig-zig = 2 + r_new(x) + r_new(y) + r_new(z) - r_old(x) - r_old(y) - r_old(z).
s_old(z) = s_new(x), s_new(y) < s_new(x), s_old(y) > s_old(x). Thus, also r_old(z) = r_new(x), r_new(y) < r_new(x), r_old(y) > r_old(x). This can be used to eliminate r_old(z), r_new(y) and r_old(y):
t_amortized_zig-zig < 2 + r_new(x) + r_new(z) - 2 * r_old(x).
Because s_old(x) + s_new(z) < s_new(x), we get, using the above estimate, log(s_old(x)) + log(s_new(z)) < 2 * log (s_new(x)) - 2, which is equivalent to r_old(x) + r_new(z) < 2 * r_new(x) - 2. Using this to eliminate r_new(z) gives the claimed result. End.

In this case it appears that we cannot easily replace the argument with the potential function by an equivalent argument involving tokens. The reason is that tokens are supposed to be integral, whereas here we may have a large number of small contributions which together are bounded by O(log n).
Splay trees offer a self-adjusting alternative for AVL and other balanced search trees. An additional advantage is that recently visited nodes stand close to the root. A disadvantage is the large number of performed rotations.

Amortized Cost for Splay Tree

Skip Lists


There are many different dictionary data structures and several of them have practical importance. Some are used because they are simple, others are used because they have good guaranteed performance, others because they are fast in practice. The reason of existence of splay trees is that they are simple to implement and that their operations have less overhead than AVL trees. There is no longer a worst-case guarantee, but at least we could prove good amortized behavior. In this section we go one step further. We present a data structure with extremely simple operations, requiring very little memory. However, the performance guarantee is even weaker: we will show that the expected time for an operation is bounded by O(log n), but there is no hard guarantee, not even for a sequence of operations.

In binary search the nodes whose values are inspected are found by computation, but the same search pattern may be obtained by using a linked structure with nodes of different degrees. For supporting find operations on the values of a sorted array a[] of length n = 2^k, this structure consists of n nodes. Node 0 has degree k. More generally, the degree of node i is given by the number of zeroes at the lower end of the binary expansion of i. That is, if i = sum_{0 <= l < k} b_l * 2^l, b_l in {0, 1} for all l, 0 <= l < k, then the degree of node i equals max{j | 0 <= j < k and b_l = 0 for all l <= j} + 1. A node of degree j is called a level-j node. A level-j node i has links to the nodes s_l, 0 <= l < j, with s_l = i + 2^{j - l - 1}. For example, node 0 has links to the nodes s_0(0) = n / 2, s_1(0) = n / 4, ..., s_{k - 1}(0) = 1. Node s_{n / 2} has links to the nodes s_0(n / 2) = n / 2 + n / 4, s_1(n / 2) = n / 2 + n / 8, ..., s_{k - 2}(n / 2) = n / 2 + 1. There is 1 level-k node and there are 2^{k - j - 1} level-j nodes for all j, 0 <= j < k. So, the total number of links is k + sum_{j = 1}^{k - 1} j * 2^{k - j - 1} = k + 1 * n / 4 + 2 * n / 8 + 3 * n / 16 + ... + (k - 1) * 1 = n - 1, which is not much: on average there is less than one link per node. This result could also have been obtained by noticing that the obtained structure is a binomial tree with 2^k nodes. Any tree with n nodes has n - 1 links, because each node has indegree 1, except for the root which has indegree 0. Once the structure has been constructed, it can be used for performing find operations without any further computation by starting at node 0 and then testing the child at which to continue the search. These ideas can be worked out as follows:

  class Node
    int    v;
    int    j;
    Node[] s;
    Node(int i, int v, int k)
      this.v = v;
      if (i == 0)
        j = k;
        j = 0;
        while ((i & 1) == 0)
          i = i >> 1;
      s = new Node[j];
    boolean find(int x)
      int l = 0;
      while (l < j && s[l].v > x)
      if (l == j)
        return x == v;
      return s[l].find(x);
  class FindTree
    int  k;
    Node root;
    FindTree(int[] a, int k)
    { // a[] has length 2^k
      this.k = k;
      int n = 1 << k;
      int[] b = new int[n];
      for (int i = 0; i < n; i++)
        b[i] = a[i];
      Sort.sort(b, 0, n - 1);
      Node[] nodes = new Node[n];
      for (int i = 0; i < n; i++)
        nodes[i] = new Node(i, b[i], k);
      for (int i = 0; i < n; i++)
        int j = nodes[i].j;
        Node[] s = nodes[i].s;
        for (int l = 0; l < j; l++)
          s[l] = nodes[i + (1 << (j - l - 1))];
      root = nodes[0];
    boolean find(int x)
      return root.find(x);

Click here to see the above code fragment integrated in a working Java program.

In the case of binary heaps (presented in the chapter on priority queues) an explicit representation of the tree, based on pointers, is usually replaced by an implicit representation, computing the indices of the children. This is done in order to reduce the memory consumption and to assure that the data of nodes whose indices differ little lie close together in memory. These facts together are the main reason that, in practice, binary heaps are among the best priority queue implementations. Here we do the opposite: a binary search, which can be viewed as a traversal of an implicit tree, is replaced by a search on an explicit tree. The reason to present this alternative search, which on most computer systems will be slower, is that a similar data structure can be used to even support inserts and deletes. In such a dynamic context, binary search cannot be used anymore, because it requires that the array is kept sorted at all times. Deletions might be performed lazily, but keeping an array sorted under insertions may lead to operations taking linear time.

The presented structure is too static for updates. The idea is to relax the precise condition on the degree of the nodes: instead of choosing it in a fixed way based on the index, when inserting a node, its level is selected at random in such a way that the frequency with which level-j nodes occur is the same as before. That is, a newly inserted node should become a level-j node with probability 2^{-j - 1}. This can be realized easily by generating a random k-bit number, and to determine the number of trailing zero bits. Suppose that when inserting a value x we have chosen to construct a level-j node. Then, all links at level j or lower on the search path pointing to a value larger than x are set to point to the new node with key value x and this new node takes over the link from this node. It is handy to start with two sentinel nodes: one has key value -infinity, the other has key value +infinity. The first of these should have a height which is at least equal to the maximum height of all real nodes.

Deleting a node with key value x appears to be harder, because now we must find all nodes pointing to a certain level-j node with key x. But this can be achieved with a slightly modified search: normally, if the key of the next node equals the value we are searching for, the search goes there directly, but now the search continues until it comes to this node with a level 1 node. All these nodes take over the links from the node to remove.


How about the performance? The analysis of the memory usage is simple: the expected number of level j nodes is the same as before. Using that the expected value of a sum equals the sum of the expected values, it follows immediately that the expected number of links is n - 1 = O(n). In an implementation we may choose to increase the degree of all nodes by 1, assuring that in any case each node has a link to the next node in the list. Even when doing this, the number of links is bounded by O(n). The running time is harder to bound.

Lemma: The probability that the height of a skip list with n elements exceeds 2 * log n is at most 1 / n.

Proof: The node levels are chosen independently at random and distributed according to the geometric distribution with parameter 1 / 2. Thus, the probability that any node has level l > j is less than 2^{-j}. Using that Prob(A or B) <= Prob(A) + Prob(B), we get Prob(maximum height > j) <= n / 2^j. Taking j = 2 * log n, this probability is less than 1 / n. End.

Theorem: The expected time for an operation on a skip list with n elements is bounded by O(log n).

Proof: Because the time for an insert and a delete is of the same order as a find, we limit our analysis to that of a find. We must show that the path from the node with key -infinity to the node with the value x we are looking for has expected length O(log n). Let X be the random variable giving the length of this path. In any case this X = O(n), so the cases that there are more than 2 * log n levels contribute at most O(1) to the expected value. So, we may assume that there are at most 2 * log n levels. We show that the expected number of steps on any given level is O(1). At each level one vertical step is performed going to the next lower level. So, the vertical steps contribute at most 2 * log n. Let L_i, i >= 1, denote the number of horizontal steps on level i, going from a node at level i to the next node on level i. Because Exp[sum_{i >= 1} L_i] = sum_{i >= 1} Exp[L_i] due to the linearity of expectation, it suffices to prove that there is a constant c so that Exp[L_i] <= c, for all i >= 1. For proving this, it is very handy to view the construction of the skip lists in a different way. Instead of choosing the level of each node independently, it may also be considered to be a repeated selection of subsets: at level 1 the set is S_1 = S, containing all elements. In general, each of the elements in the subset S_i, i >= 1, is also present in S_{i + 1} independently of the others with probability 1 / 2. This gives the same distribution as before. If the search moves through k nodes at level i, then this means that none of the elements corresponding to these was promoted to S_{i + 1} and that the element in position k + 1 was promoted. The probability that this happens is 1/2^{k + 1}. So, the expected number of elements visited at level i is given by 0 * 1/2 + 1 * 1/4 + 2 * 1/8 + ... = 1. End.

A nice feature of skip lists is that memory usage can be traded against performance. Choosing the probability that a level-j node is created smaller, for example 3^{-j}, the memory usage goes down. At the same time the expected access time goes up. A generalization of the above proof shows that when choosing a node to be at level i with probability a^{-j}, a >= 2, the average time for a search is proportional to (a - 1) / log a * log n with an expected memory usage proportional to a / (a - 1) * n. Taking a = 4, the searches become about 50% more expensive, but the memory usage drops from 2 * n to 4 / 3 * n.

Skip List


The main argument in favor of skip lists is their simplicity in combination with low overhead. To check whether there are no hidden complications, we consider an implementation in Java. The nodes consist of a key and an array of pointers. This array has length equal to the level of the node. This gives the following implementation:
  class Node
    int    key;
    int    level;
    Node[] next;
    public Node(int key, int level)
      this.key   = key;
      this.level = level;
           next  = new Node[level];
So, a node has a size of level + 3 words. The above analysis has shown that the expected level equals 2, so the expected size of a node is 5 words. This includes the key. This is not much, but it is not better than an AVL tree.

The operations are indeed really simple. The method find is comparably complex to a find on a binary tree. At any point of the search there are three possible actions: go one-level down in the same node; go to the next node which is found by the next-link at this level; quit.

  public boolean find(int x)
    // Checks whether there is a node with key x.
    Node node  = head;
    int  level = maxLevel - 1;
    while (level >= 0)
      while (node.next[level].key < x)
        node = node.next[level];
      if (node.next[level].key > x)
        return true;
    return false;

Inserts and deletes are simpler than operations on balanced binary trees because there are no restructurings. So, the cost of an insert or delete is equal to that of a find, except for some assignments. There are two assignments for every level of the node to insert or delete, so the expected number of assignments is 4 per insertion. This is almost negligible in comparison to the cost of testing and following the links.

  public void insert(int x)
    // Inserts a node with key x. 
    // Does not test whether key x exists.
    int l = randomLevel();
    Node node = head;
    Node nwnd = new Node(x, l);
    if (l > maxLevel)
      maxLevel = l;
    int level = maxLevel - 1;
    while (level >= 0)
      while (node.next[level].key < x)
        node = node.next[level];
      if (level < l)
        nwnd.next[level] = node.next[level];
        node.next[level] = nwnd;

  public void delete(int x)
    // Deletes a node with key x. 
    // Correct when there is no such node.
    // Each key should occur at most once.
    Node node = head;
    int level = maxLevel - 1;
    while (level >= 0)
      while (node.next[level].key < x)
        node = node.next[level];
      if (node.next[level].key > x)
        node.next[level] = node.next[level].next[level];

Click here to see the above code fragments integrated in a working Java program. This implementation is rather slow. The main reason appears to be that there are many indirections: every step the key is compared with the key of a successor node, rather than with the key of the current node as is done in AVL trees. Skip lists also have some very nice features which are harder to realize with other dictionary implementations. Because at the bottom level the structure is simply a linear list, it is trivial to perform a range query in expected time O(log(number_of_entries_in_tree) + number_of_elements_in_range). Because of this it is also no problem to have elements with the same key.

Skip lists offer a conceptually simple dictionary data structure. Its time and memory consumption is comparable to that of more sophisticated data structures such as AVL trees.


  1. Show that the time for the restructuring operations needed when performing an arbitrary sequence of k >= n insertions (and no deletions) on a 2-3 tree with at most n leafs is bounded by O(k).

  2. Show that the time for the restructuring operations needed when performing an arbitrary sequence of k >= n insertions and deletions on a 2-4 tree with at most n leafs takes is bounded by O(k).

  3. In a-b trees, before fusing two nodes, it is first considered whether a child can be borrowed. This idea is required to assure that the degree condition holds after fusing. However, it also has a positive impact on the amortized number of performed restructuring operations. It turns out to be advantageous to even try to give away a child before splitting a node.

  4. a-b trees are also particularly interesting for large values of a and b. This offers the possibility to reach any element by traversing very few levels of the tree. This can bring substantial improvements in a non-uniform memory, having several levels of cache, a main memory and a hard disk with ever larger access times. Of course maintaining an a-b tree the conventional way, with an array or linked list of values in the internal nodes, implies that finding out the next node to visit costs O(b) time. So, the total time for a search in an a-b tree becomes O(b * log_a n). However, the values in the internal nodes can be maintained in some tree-like dictionary structure as well.

  5. In the chapter on union-find we have been considering the performance of several strategies. One of them was find with path contraction and simple first-to-second unions. This strategy, and the others as well, can also be analyzed in a very clean way using a suitably defined potential function.

  6. The analysis of insertion sort was given for an a-b tree. Repeat the analysis for an AVL tree showing that even for them the time to perform insertion sort can be bounded to O(n + n * log(I(S) / n)). Formulate a general condition that a tree-based dictionary must have in order to assure this result.

  7. Perform the following operations on a splay tree obtained by first performing insert(0), ..., insert(8): find(4), find(5), insert(9), delete(3). Show the situation after each complete operation.

  8. It is considered how many operations must be performed on a splay tree with n elements in order to guarantee an amortized time of O(log n).

  9. The rank of an element x in a set S is the number of elements in S that are smaller than x, where it is assumed that all elements are different. For dictionaries we consider the additional operations compute_rank and find_rank. The first determines for a key x its rank r in the set of keys stored in the dictionary T. The second finds the key x corresponding to a specified rank r.

  10. The standard operations on dictionaries are find, insert and delete, but there are other interesting operations as well. In the question above on high-degree a-b trees, we considered using a tree-like dictionary for maintaining the up to b - 1 values which have to be stored in a node. In this context it was important that this data structure also supports the operations fuse and split. Describe how to perform these operations on skip lists and specify the corresponding time consumption.

  11. It is considered how to construct and merge dictionaries.

  12. Related to merging is the operation insert_batch. This operation inserts m new elements to a dictionary with initially n elements. Clearly this can be performed by m conventional insertions. This takes O(m * log n) time using any reasonable dictionary implementation. In general it will be hard to come below this bound, because most data structures somehow sort. However, assuming that the m numbers are provided in sorted order, we can hope to do as good as O(m + log n).

  13. Constructing a skip list with n nodes by performing n insertions takes O(n * log n) expected time. In the case of general key weights, this is optimal, because this construction implies sorting the keys, but if for some reason the sorting can be performed faster, we may hope to do better. Investigate whether it is possible to construct a skip-list structure from a sorted array containing n key values in O(n) time.

  14. The performance of skip lists depends on the probability distribution used for selecting the level of newly inserted nodes. It was suggested that the standard choice of creating an l-level node with probability 2^{-l} is not necessarily optimal. Use the given program to investigate this. First perform tests with the current setting, measuring time, followed links and memory consumption for n = 2^k, for k = 10, 12, ... . Then modify the program so that an l-level node is generated with probability 4^{-l} and test again.

  15. In the program it is specified for the delete method that the keys should occur at most once.

This page was created by Jop Sibeyn.
Last update Sunday, 05 December 04 - 12:45.
For any comments: send an email.