The internal nodes hold one or two keys guiding the searching process. In a binary node there is one key k, being larger than or equal to the values in the left subtree and smaller than the values in the right subtree. In a ternary node there are keys k_1 and k_2: k_1 being larger than or equal to the values in the left subtree and smaller than the values in the middle subtree; k_2 being larger than or equal to the values in the middle subtree and smaller than the values in the right subtree.
if (x <= k) goleft(); else goright();In a ternary node, with keys k_1 and k_2, we perform
if (x <= k_1) goleft(); else if (x <= k_2) gomiddle(); else goright();In this way we continue until we reach a leaf. There we test whether the key is equal or not and perform a suitable operation in reaction.
The construction and these more elaborate comparison imply a certain overhead, but all the rest becomes much simpler because of this. When implementing a dictionary ADT based on 2-3 trees, then it is a good idea to have different classes for internal nodes and leafs, as they allow very different operations and have different fields as well.
It is important to point out that a-b trees grow or shrink by adding or deleting the root. So, changes of height arise at the top, not at the bottom as in AVL trees.
It is essential that 3 + 1 = 2 + 2 and that 1 + 2 = 3. The first implies that, once the maximum degree of a node is exceeded, it can be split in two nodes with degree within the legal range; the second implies that when two nodes are fused the resulting node cannot have too many children from the start. Thus, we could proceed similarly for a 3-5 tree, because 5 + 1 = 3 + 3 and 3 + 2 = 5. But, a 3-4 tree would be clumsy, because 4 + 1 = 5 < 3 + 3. For this reason we have required for a-b trees that b >= 2 * a - 1, assuring that b + 1 >= 2 * a and that a + a - 1 <= b.
This does not sound very efficient. The AVL trees gave us (in addition to the time for searching) only O(1) time per operation for restoring the structure. For the 2-3 trees above, it is possible that alternating inserts and deletes results in restructuring again and again the whole path up to the root. We now show that an arbitrary sequence of n insertions and deletions into an arbitrary 2-5 tree with at most n leafs causes at most O(n) restructuring operations. This shows that amortizedly 2-5 trees share the good properties of AVL trees: O(1) time per operation (in addition to the time for searching).
The idea is simple and provides a very nice example of an amortized analysis. For a-b trees, we will call nodes with degree a or b critical. All other nodes are said to be non-critical. In the case of 2-5 trees, splitting a node happens when a child is added to a node which had degree 5 before. The result is two nodes of degree 3. These resulting nodes cannot be split or fused immediately again. So, here we have a situation were splitting a critical node results in two non-critical nodes. Likewise, when fusing a node, we originally must have had two critical nodes of degree 2 each and the result is a non-critical node of degree 3. So, splitting and fusing can be viewed as an investment: we are achieving more than just maintaining the defining property. This is a typical situation in which an amortized analysis may be effective: a single operation may be expensive, but this can be viewed as an investment. The rest is a matter of good bookkeeping.
In our case we will deposit a token on every node whose degree is changed by addition or removal of a child or when a child is stolen by a sibling. In particular this implies that all critical nodes carry tokens, because any node is non-critical upon creation. The tokens on a node can be viewed as a prepayment for its possibly later restructuring. As an invariant we want to maintain that any critical node has a token. Because initially all internal nodes may be critical, we must be willing to come with an initial investment of up to n - 1 tokens, one for each internal node.
Inserting a node may result in a sequence of splitting operations. The rules are so, that only critical nodes get split, so the cost of this is covered by consuming the deposited tokens. The sequence of splittings ends either by creating a new root or by increasing the degree of a node. In each case we pay one token.
Deleting a node may result in a sequence of fusion operations. The rules are so, that v and v' get fused only when neither of them had more than 2 children before the deletion. So, each of them was critical and had tokens. These tokens are consumed to cover the cost of the fusion. The sequence of fusions may end in two possible ways. Either the degree of the node v in which the fusion ends is decreased, or v steals a child from a sibling v'. In the first case a token is deposited on v, in the second case on v'.
Summarizing, we see that if we start in a situation with a token on each critical node that then
t_amortized = t_actual - potential_before + potential_after.
Lemma: If, for some data structure with n elements, t(n) gives an upper bound on the amortized time for an operation and p(n) an upper bound on the used potential function, then any sequence of m >= p(n) / t(n) operations takes at most 2 * m * t(n) time.
Proof: For any sequence of m operations, taking actual times t_actual(i), 1 <= i <= m time, we have
T_tot = sum_{i = 1}^m t_actual(i) = sum_{i = 1}^m (t_amortized(i) + pot_before(i) - pot_after(i)) = sum_{i = 1}^m t_amortized(i) + potential(0) - potential(m) <= sum_{i = 1}^m t_amortized(i) + potential(0). <= m * t(n) + p(n).For m >= p(n) / t(n), p(n) <= m * t(n), and thus, for these m, T_tot <= 2 * m * t(n). End The lemma implies, that the given definition of amortized time is closely related to the intuitive definition of amortized time as being the average time per operation over a sufficiently long sequence of operations. It also expresses the value of m that must be taken in order to assure an asymptotically optimal claim: m must be Omega(p(n) / t(n)). When defining a potential, the main objective is that it gives a small value of t(n). But, in order to obtain the strongest performance claims, even the value of p(n) should be taken into account.
In the case of the 2-5 trees, the actual time of an operation can be equated to the number of nodes addressed. When inserting, a certain number h of nodes is split and somewhere one node is addressed which is not split. So, t_actual = h + 1. These h nodes where all critical and the splitting does not introduce new critical nodes. So, this reduces the potential by h. The node at the top may be made critical, so
t_amortized_insert = h + 1 - h + 1 = 2.When deleting a node, along a path of length h nodes get fused. These 2 * h nodes were all critical before. Here we are using in an essential way that we first try to steal a child before fusing. At the top one node is addressed which is not fused, but which may become critical. This gives
t_amortized_delete = 2 * h + 1 - 2 * h + 1 = 2.
Here both approaches can be applied and both give O(1) amortized time. In general an argument with tokens is clearer and may have didactical advantages where it can be applied. On the other hand, analyzing the amortized time with the definition based on potentials is more mathematical and can also be applied for harder problems.
Insertion sort has the interesting property that it is adaptive, that is, for easy instances it runs faster than for hard instances. An easy instance in this case means an instance which is already almost sorted. For a more precise statement we define the notion of an inversion. In a sequence of numbers S_i, an inversion is a situation in which S_i > S_j for i < j. That is, S_i and S_j are wrongly arranged. In a sorted sequence there are no inversions, and reversing a sorted sequence results in one inversion for every pair of numbers, n * (n - 1) / 2. Let generally I(S) denote the number of inversions in a sequence S. We will show that with a slightly modified search procedure a 2-3 tree can be used to perform insertion sort for a sequence S in O(n + n * log(I(S) / n)).
There is no need to start the search for an element at the root: if we already have an idea where it should be inserted, then we can try to reduce the time for the searching by starting close to the expected target leaf. If this is successful, then we can get very good performance, because the amortized time for insertions and deletions is constant. In case we want to reduce the time for performing insertion sort on almost sorted sequences, it is natural to assume that the elements are supplied in increasing order. So, then we should start the search at the rightmost leaf. From there the search moves up until the new value is no longer smaller than all keys in the node and then moves down along the usual path.
In an a-b tree all internal nodes have degree at least a >= 2. So, a subtree of height h contains at least 2^h leafs. The search for a value x moves up h + 1 levels to a node v only if all the leafs in the right subtree of v have values larger than x. So, this accounts for at least 2^h inversions. The total time for all searches is given by O(n + sum_i h_i), where h_i + 1 is the number of levels the search for element S_i moves up. We know that
sum_i 2^{h_i} <= I(S).
So, we want to put an upper-bound on sum_i h_i under this condition. Because 2^h_i strongly increases with h_i, the sum of the h_i is maximized when all h_i have the same value (this argument can be formalized using the Lagrange multiplicator method for computing extremal values on a surface). In that case, for all i, 2^h_i <= I(S) / n, which implies that h_i <= log(I(S) / n), and thus sum_i h_i <= n * log(I(S) / n).
A practical disadvantage is that the tree now must have links in both directions. In a normal a-b tree this is not needed, because the way back can be pushed on a stack while searching forward: as splitting and fusion operations are performed only along the search path, this is all one needs to know. So, we need quite an elaborate data structure for solving a simple problem like sorting. On the other hand, this is a result which cannot easily be achieved in another way: most sorting methods (merge sort, quick sort, heap sort) are not adaptive at all. Bubble-sort is adaptive, but in a much weaker sense: bubble-sort is fast when each elements stands close to its final position: if the maximum distance is d, then it can be implemented in O(n * d) time. In this case the number of inversions is bounded by O(d * n) and insertion sort takes O(n + log d * n), which is better for any non-constant d. Furthermore, bubble-sort is really bad if a single element has to move far.
Self-adjustment typically implies that one cannot prevent a bad situation from building up, but once one is performing an exceptionally expensive operation, the structure gets improved. A well-known example of a self-adjusting structure is the union-find structure with path contraction: long paths cannot be excluded, but once one has to traverse such a long path, performing an expensive find operation, all the nodes on the path are hooked directly to the root node, implying that a subsequent find for any of the nodes on the path is much cheaper.
Unfortunately, this strategy is too simple. If first the keys 0, 1, ..., n - 1 are inserted in increasing order, a tree of depth n will be the result: every new node is first inserted as a right child of the root, and then a single rotation turns it into the root. This sequence of insertion is fine: it takes O(n) time. However, now accessing the keys in increasing order is very expensive: the time of accessing key i is proportional to O(n - i), for a total of O(sum_{i = 1}^n i) = O(n^2). The reason is that after searching 0 the depth is still n - 1, and more generally after searching i the depth is n - 1 - i.
Now we distinguish three cases:
Notice that the zig and zig-zag case are treated in the same way as in the simple strategy: in the zig case a single rotation is performed, in the zig-zag case a so-called double rotation, which has the same effect as performing two single rotations. Only in the zig-zig case the performed operation is different. At a first glance it is not clear at all, that this gives an improvement: it appears to be an unnecessarily unbalanced operation.
The above strategy is called splaying. Without providing a proof we can see already now that splaying is much better than performing single rotations. As an example we consider the tree that is constructed by inserting the elements 0, 1, ..., n - 1, in order. This generates a tree of depth n - 1, a chain with only left children. Accessing these nodes now in the same order does not only turn the elements to the root, but also reduces the depth of the tree by almost a factor two with each access.
Let us state more precisely how the three dictionary operations, find, insert and delete are performed:
It is not easy to bound these costs in an amortized argument. Somehow we must define a potential function which does not increase too much (that is, not more than O(log n)) in any operation, and at the same time strongly decreases when accessing a deep lying node. A first idea might be to take as potential for a tree T the function p(T) = sum_{u in T} depth(u), where depth(u) gives the depth of node u in T. However, this potential may increase too strongly: adding a new smallest or largest value pushes all existing nodes one level deeper. Thus, if there were n nodes, p(T_new) - p(T_old) = n, giving an amortized time of at least n.
An idea which at a first glance works better is to take p(T) = sum_{u in T} log(depth(u) + 1). Even this potential may increase too strongly: if the tree T with n nodes is perfectly balanced, then there are (n + 1) / 2 nodes at depth d = log(n + 1) - 1. Adding a new smallest or largest element, all these elements are pushed one level down. So, just considering these elements, p(T_new) - p(T_old) >= n / 2 * (log(d + 1) - log d) = n / 2 * log(1 + 1/d) ~= n / (2 * d).
A potential function that really works, is given by
p(T) = sum_{u in T} log(s(u)).Here s(u) gives the size of the subtree rooted at u, counting u itself as well. Defining the rank of a node u as rank(u) = log(s(u)) simplifies the formulation of p():
p(T) = sum_{u in T} r(u).A very nice property of ranks based on the size of the subtree is that a rotation changes the ranks only for the nodes involved in the rotation. The depths of the nodes in the subtrees changes, but their ranks remain the same. Clearly the rank of any node of a tree with n nodes is at most log n. So, p(T) <= n * log n.
Theorem: The amortized time to access any node of a tree with n nodes is bounded by 2 + 3 * log n = O(log n).
Proof (outline): We remind that the amortized time of an operation is given by the actual time plus the change of potential. The change of potential equals the potential after the operation minus the potential before the operation. In our case the total potential is the sum of the potential of all nodes. Because the potential of the nodes changes only along the path from the root to the node x we are accessing, the amortized time can be written as
t_amortized = sum_{u in path to x} (1 + r_new(u) - r_old(u))Here we account one cost unit for each accessed node.
We want to rewrite this sum as a sum over the operations: a zig-zig and a zig-zag operations is attributed a real cost of 2 cost units, a zig operation is attributed 1 cost unit, and handling the root stands for one cost unit as well. The amortized cost of any of these operations, is equal to this cost plus the sum of the new potentials of all involved nodes minus the sum of their old potentials. Thus, by regrouping the sum giving T_amortized, we get
t_amortized = 1 + sum_{operation} amortized_cost_of_operation
The trick is that we will show that the amortized cost of a zig-zag or a zig-zig operation for any node x on the path can be estimated on 3 * (r_new(x) - r_old(x)) and of a zig operation on 1 + 3 * (r_new(x) - r_old(x)). This estimate takes into account the cost for the other involved nodes and the changes of their potentials. Such an estimate is very effective in this case, because even though the splaying gets higher and higher in the tree, it is always the same node x we are working on. Thus, all but two terms of the sum cancel each other, only the negative contribution from the first summand and the positive contribution from the last summand remain. This gives the result of the theorem:
t_amortized = 1 + 1 + 3 * (r_after(x) - r_before(x)) <= 2 + 3 * log n.End.
Lemma: The amortized time for a zig operation at node x is bounded by 1 + 3 * (r_new(x) - r_old(x)).
Proof: The amortized time of a zig operation involving nodes x and y as in the picture illustrating the operations, can be written as
t_amortized_zig = 1 + r_new(x) + r_new(y) - r_old(x) - r_old(y).Using that r_new(y) < r_old(y) gives
t_amortized_zig <= 1 + r_new(x) - r_old(x).Because r_new(x) > r_old(x) we have r_new(x) - r_old(x) >= 0, and therefore we may add 2 * (r_new(x) - r_old(x)) on the right-hand side. This amount is added to get inequalities of a common form for all three operations. It is desirable to have inequalities of the same form because this allows to estimate T_amortized by telescoping the sum. End. For the analysis of the zig-zag and zig-zig operations we need a simple inequality which helps us to estimate the logarithmic contributions:
Lemma: If a + b <= c, then log a + log b <= 2 * log c - 2.
Proof: Because the log function is monotonously increasing, it suffices to check the inequality for c = a + b. In that case, the relation to check can be rewritten as log (4 * a * b) <= log ((a + b)^2). This is satisfied if and only if 4 * a * b <= a^2 + b^2 + 2 * a * b, which can easily be verified by considering the function f(a, b) = a^2 + b^2 - 2 * a * b: it assumes its minimum value 0 for a = b. End.
Lemma: The amortized time for a zig-zag operation at node x is bounded by 3 * (r_new(x) - r_old(x)).
Proof: The amortized time of a zig-zag operation involving nodes x, y and z as in the picture illustrating the operations, can be written as
t_amortized_zig-zag = 2 + r_new(x) + r_new(y) + r_new(z) - r_old(x) - r_old(y) - r_old(z).s_old(z) = s_new(x) and s_old(y) > s_old(x). Thus, also r_old(z) = r_new(x) and r_old(y) > r_old(x). This can be used to eliminate r_old(z) and r_old(y):
t_amortized_zig-zag <= 2 + r_new(y) + r_new(z) - 2 * r_old(x).Because s_new(y) + s_new(z) < s_new(x), we get, using the above estimate, log(s_new(y)) + log(s_new(z)) < 2 * log(s_new(x)) - 2, which is equivalent to r_new(y) + r_new(z) < 2 * r_new(x) - 2. Substitution gives
t_amortized_zig-zag <= 2 * (r_new(x) - r_old(x)).Because r_new(x) > r_old(x) we have r_new(x) - r_old(x) >= 0, and therefore we may add r_new(x) - r_old(x) on the right-hand side. This amount is added to get inequalities of a common form. End.
Lemma: The amortized time for a zig-zig operation at node x is bounded by 3 * (r_new(x) - r_old(x)).
Proof: The amortized time of a zig-zig operation involving nodes x, y and z as in the picture illustrating the operations, can be written as
t_amortized_zig-zig = 2 + r_new(x) + r_new(y) + r_new(z) - r_old(x) - r_old(y) - r_old(z).s_old(z) = s_new(x), s_new(y) < s_new(x), s_old(y) > s_old(x). Thus, also r_old(z) = r_new(x), r_new(y) < r_new(x), r_old(y) > r_old(x). This can be used to eliminate r_old(z), r_new(y) and r_old(y):
t_amortized_zig-zig < 2 + r_new(x) + r_new(z) - 2 * r_old(x).Because s_old(x) + s_new(z) < s_new(x), we get, using the above estimate, log(s_old(x)) + log(s_new(z)) < 2 * log (s_new(x)) - 2, which is equivalent to r_old(x) + r_new(z) < 2 * r_new(x) - 2. Using this to eliminate r_new(z) gives the claimed result. End. In this case it appears that we cannot easily replace the argument with the potential function by an equivalent argument involving tokens. The reason is that tokens are supposed to be integral, whereas here we may have a large number of small contributions which together are bounded by O(log n).
Splay trees offer a self-adjusting alternative for AVL and other balanced search trees. An additional advantage is that recently visited nodes stand close to the root. A disadvantage is the large number of performed rotations.
In binary search the nodes whose values are inspected are found by computation, but the same search pattern may be obtained by using a linked structure with nodes of different degrees. For supporting find operations on the values of a sorted array a[] of length n = 2^k, this structure consists of n nodes. Node 0 has degree k. More generally, the degree of node i is given by the number of zeroes at the lower end of the binary expansion of i. That is, if i = sum_{0 <= l < k} b_l * 2^l, b_l in {0, 1} for all l, 0 <= l < k, then the degree of node i equals max{j | 0 <= j < k and b_l = 0 for all l <= j} + 1. A node of degree j is called a level-j node. A level-j node i has links to the nodes s_l, 0 <= l < j, with s_l = i + 2^{j - l - 1}. For example, node 0 has links to the nodes s_0(0) = n / 2, s_1(0) = n / 4, ..., s_{k - 1}(0) = 1. Node s_{n / 2} has links to the nodes s_0(n / 2) = n / 2 + n / 4, s_1(n / 2) = n / 2 + n / 8, ..., s_{k - 2}(n / 2) = n / 2 + 1. There is 1 level-k node and there are 2^{k - j - 1} level-j nodes for all j, 0 <= j < k. So, the total number of links is k + sum_{j = 1}^{k - 1} j * 2^{k - j - 1} = k + 1 * n / 4 + 2 * n / 8 + 3 * n / 16 + ... + (k - 1) * 1 = n - 1, which is not much: on average there is less than one link per node. This result could also have been obtained by noticing that the obtained structure is a binomial tree with 2^k nodes. Any tree with n nodes has n - 1 links, because each node has indegree 1, except for the root which has indegree 0. Once the structure has been constructed, it can be used for performing find operations without any further computation by starting at node 0 and then testing the child at which to continue the search. These ideas can be worked out as follows:
class Node { int v; int j; Node[] s; Node(int i, int v, int k) { this.v = v; if (i == 0) j = k; else { j = 0; while ((i & 1) == 0) { j++; i = i >> 1; } } s = new Node[j]; } boolean find(int x) { int l = 0; while (l < j && s[l].v > x) l++; if (l == j) return x == v; return s[l].find(x); } } class FindTree { int k; Node root; FindTree(int[] a, int k) { // a[] has length 2^k this.k = k; int n = 1 << k; int[] b = new int[n]; for (int i = 0; i < n; i++) b[i] = a[i]; Sort.sort(b, 0, n - 1); Node[] nodes = new Node[n]; for (int i = 0; i < n; i++) nodes[i] = new Node(i, b[i], k); for (int i = 0; i < n; i++) { int j = nodes[i].j; Node[] s = nodes[i].s; for (int l = 0; l < j; l++) s[l] = nodes[i + (1 << (j - l - 1))]; } root = nodes[0]; } boolean find(int x) { return root.find(x); } }Click here to see the above code fragment integrated in a working Java program.
In the case of binary heaps (presented in the chapter on priority queues) an explicit representation of the tree, based on pointers, is usually replaced by an implicit representation, computing the indices of the children. This is done in order to reduce the memory consumption and to assure that the data of nodes whose indices differ little lie close together in memory. These facts together are the main reason that, in practice, binary heaps are among the best priority queue implementations. Here we do the opposite: a binary search, which can be viewed as a traversal of an implicit tree, is replaced by a search on an explicit tree. The reason to present this alternative search, which on most computer systems will be slower, is that a similar data structure can be used to even support inserts and deletes. In such a dynamic context, binary search cannot be used anymore, because it requires that the array is kept sorted at all times. Deletions might be performed lazily, but keeping an array sorted under insertions may lead to operations taking linear time.
The presented structure is too static for updates. The idea is to relax the precise condition on the degree of the nodes: instead of choosing it in a fixed way based on the index, when inserting a node, its level is selected at random in such a way that the frequency with which level-j nodes occur is the same as before. That is, a newly inserted node should become a level-j node with probability 2^{-j - 1}. This can be realized easily by generating a random k-bit number, and to determine the number of trailing zero bits. Suppose that when inserting a value x we have chosen to construct a level-j node. Then, all links at level j or lower on the search path pointing to a value larger than x are set to point to the new node with key value x and this new node takes over the link from this node. It is handy to start with two sentinel nodes: one has key value -infinity, the other has key value +infinity. The first of these should have a height which is at least equal to the maximum height of all real nodes.
Deleting a node with key value x appears to be harder, because now we must find all nodes pointing to a certain level-j node with key x. But this can be achieved with a slightly modified search: normally, if the key of the next node equals the value we are searching for, the search goes there directly, but now the search continues until it comes to this node with a level 1 node. All these nodes take over the links from the node to remove.
Lemma: The probability that the height of a skip list with n elements exceeds 2 * log n is at most 1 / n.
Proof: The node levels are chosen independently at random and distributed according to the geometric distribution with parameter 1 / 2. Thus, the probability that any node has level l > j is less than 2^{-j}. Using that Prob(A or B) <= Prob(A) + Prob(B), we get Prob(maximum height > j) <= n / 2^j. Taking j = 2 * log n, this probability is less than 1 / n. End.
Theorem: The expected time for an operation on a skip list with n elements is bounded by O(log n).
Proof: Because the time for an insert and a delete is of the same order as a find, we limit our analysis to that of a find. We must show that the path from the node with key -infinity to the node with the value x we are looking for has expected length O(log n). Let X be the random variable giving the length of this path. In any case this X = O(n), so the cases that there are more than 2 * log n levels contribute at most O(1) to the expected value. So, we may assume that there are at most 2 * log n levels. We show that the expected number of steps on any given level is O(1). At each level one vertical step is performed going to the next lower level. So, the vertical steps contribute at most 2 * log n. Let L_i, i >= 1, denote the number of horizontal steps on level i, going from a node at level i to the next node on level i. Because Exp[sum_{i >= 1} L_i] = sum_{i >= 1} Exp[L_i] due to the linearity of expectation, it suffices to prove that there is a constant c so that Exp[L_i] <= c, for all i >= 1. For proving this, it is very handy to view the construction of the skip lists in a different way. Instead of choosing the level of each node independently, it may also be considered to be a repeated selection of subsets: at level 1 the set is S_1 = S, containing all elements. In general, each of the elements in the subset S_i, i >= 1, is also present in S_{i + 1} independently of the others with probability 1 / 2. This gives the same distribution as before. If the search moves through k nodes at level i, then this means that none of the elements corresponding to these was promoted to S_{i + 1} and that the element in position k + 1 was promoted. The probability that this happens is 1/2^{k + 1}. So, the expected number of elements visited at level i is given by 0 * 1/2 + 1 * 1/4 + 2 * 1/8 + ... = 1. End.
A nice feature of skip lists is that memory usage can be traded against performance. Choosing the probability that a level-j node is created smaller, for example 3^{-j}, the memory usage goes down. At the same time the expected access time goes up. A generalization of the above proof shows that when choosing a node to be at level i with probability a^{-j}, a >= 2, the average time for a search is proportional to (a - 1) / log a * log n with an expected memory usage proportional to a / (a - 1) * n. Taking a = 4, the searches become about 50% more expensive, but the memory usage drops from 2 * n to 4 / 3 * n.
class Node { int key; int level; Node[] next; public Node(int key, int level) { this.key = key; this.level = level; next = new Node[level]; } }So, a node has a size of level + 3 words. The above analysis has shown that the expected level equals 2, so the expected size of a node is 5 words. This includes the key. This is not much, but it is not better than an AVL tree.
The operations are indeed really simple. The method find is comparably complex to a find on a binary tree. At any point of the search there are three possible actions: go one-level down in the same node; go to the next node which is found by the next-link at this level; quit.
public boolean find(int x) { // Checks whether there is a node with key x. Node node = head; int level = maxLevel - 1; while (level >= 0) { while (node.next[level].key < x) node = node.next[level]; if (node.next[level].key > x) level--; else return true; } return false; }
Inserts and deletes are simpler than operations on balanced binary trees because there are no restructurings. So, the cost of an insert or delete is equal to that of a find, except for some assignments. There are two assignments for every level of the node to insert or delete, so the expected number of assignments is 4 per insertion. This is almost negligible in comparison to the cost of testing and following the links.
public void insert(int x) { // Inserts a node with key x. // Does not test whether key x exists. int l = randomLevel(); Node node = head; Node nwnd = new Node(x, l); if (l > maxLevel) maxLevel = l; int level = maxLevel - 1; while (level >= 0) { while (node.next[level].key < x) node = node.next[level]; if (level < l) { nwnd.next[level] = node.next[level]; node.next[level] = nwnd; } level--; } } public void delete(int x) { // Deletes a node with key x. // Correct when there is no such node. // Each key should occur at most once. Node node = head; int level = maxLevel - 1; while (level >= 0) { while (node.next[level].key < x) node = node.next[level]; if (node.next[level].key > x) level--; else node.next[level] = node.next[level].next[level]; } }
Click here to see the above code fragments integrated in a working Java program. This implementation is rather slow. The main reason appears to be that there are many indirections: every step the key is compared with the key of a successor node, rather than with the key of the current node as is done in AVL trees. Skip lists also have some very nice features which are harder to realize with other dictionary implementations. Because at the bottom level the structure is simply a linear list, it is trivial to perform a range query in expected time O(log(number_of_entries_in_tree) + number_of_elements_in_range). Because of this it is also no problem to have elements with the same key.
Skip lists offer a conceptually simple dictionary data structure. Its time and memory consumption is comparable to that of more sophisticated data structures such as AVL trees.