BAM
Abstract Machine for Bottom-Up Evaluation with the Push Method
|
#include <set_e.h>
Public Member Functions | |
set_e_c (str_t set_e_name) | |
bool | insert (T elem) |
bool | contains (T elem) const |
int | hash_size () const |
int | num_buckets () const |
int | num_overflow () const |
int | size_value (relsize_t relsize) const |
str_t | check () const |
void | dump (str_t headline=STR_NULL) const |
Public Member Functions inherited from rel_c | |
str_t | name () const |
bool | mem_err () const |
int | num_rows () const |
int | num_cols () const |
int | bound_args () |
int | free_args () |
str_t | impl_name () |
int | num_pages () const |
void | dump (str_t headline=STR_NULL) const |
Additional Inherited Members | |
Protected Member Functions inherited from rel_c | |
rel_c (str_t rel_name, int bound, int free, str_t impl) | |
Protected Attributes inherited from rel_c | |
mpool_c | MemPool |
str_t | Name |
bool | MemErr |
int | NumRows |
int | BoundArgs |
int | FreeArgs |
str_t | ImplName |
Implementation of a set data structure with an extendible hash table.
This is an implementation of a set data structure, i.e. a relation that is accessed with the binding pattern "bb...b" (all arguments bound). The set data structure mainly offers an
method for inserting a tuple into the relation, and a
method for checking whether a given tuple is contained in the set. The
method also contains an element check: It returns
if the tuple cannot be inserted because it is already contained in the set, and
if the insertion was successful. There is no method for deleting elements.
This implementation uses an extendible hash table.
Note that any reliable implementation must handle special cases like several tuples returning the same hash value (more than fitting into one bucket) or the hash table cannot be extended further. Our implementation does have a strict upper limit on the hash table size. Under bad circumstances, it uses a chain of overflow buckets. Overflow buckets are used only as a last resort, when the hash table has already grown to its maximal size and no further splitting of the bucket is possible. Therefore, once a hash table entry does contain a chain of overflow buckets, we do not need to program any further splitting of it, but simply add more buckets to the chain. We are out of luck in this case anyway. On the other hand, we do have to split the same bucket iteratively, if a single split did not yield a better distribution of the elements (this should be a very uncommon case, though).
The alternative would have been to split only once per insertion, and use an overflow bucket if it does not work. When the next element is inserted, we try again to split. In this case there could be overflow chains that are removed after another insertion. This is NOT the way we do it here.
Note that in both cases empty buckets are possible if the split does not lead to a redistribution of entries. However, only a logarithmic number of them can be produced in a single insertion. We cannot simply remove the empty bucket and replace it by a null pointer because the information about the local bit depth is important.
Of course, all this are border cases, which occur seldom. However, the code has to handle them.