BAM
Abstract Machine for Bottom-Up Evaluation with the Push Method
ydb

Introduction

 This is a prototype of a deductive database system,
 i.e. an implementation of the programming and query language Datalog.
 Datalog is a simplified and logically more pure version of
 the programming language Prolog.

Deductive Databases

 So far, most database applications use (at least) two languages:
 SQL for database queries and updates, and a standard programming
 language like Java, C++ or PHP for the programming part.
 In SQL, the declararive programming paradigm has been applied
 very successfully: One only writes logical conditions for the data
 one needs, and the DBMS automatically finds an efficient algorithm
 to compute these data. I.e. one specifies WHAT is searched,
 but not HOW it is computed. There are two main advantages of this:
 (1) Increased productivity: A declarative specification is much
 shorter than an equivalent standard program.
 (2) Better adaption to new hardware: Because the algorithm is
 not prescribed as in a standard program, new hardware developments
 can be used by updating the DBMS, with the queries in the applications
 unchanged.

 The goal of deductive databases is to extend the part of the
 application which can be specified declaratively in this way. 
 Ideally, the entire application is written in a single Datalog-like
 language.

Our Project ydb

 In the ydb project, our approach is to translate Datalog to C++,
 which is then compiled. In this way, we hope to reach a high
 efficiency. Furthermore, the integration with library code
 written in C++ is simplified. We also try to generate readable code,
 so that in the end a comparison with hand-crafted code should be
 possible.

 Our system uses the "Push" method developed by Stefan Brass and
 Heike Stephan. This uses a derived tuple immediately for deriving
 further tuples (as far as possible). In contrast, the standard
 textbook implementation of Datalog first derives all tuples
 which are derivable by a rule, and stores them in a temporary relation,
 before execution moves to the next rule.

 One advantage of the "Push" method is that it avoids copying of
 data values as far as possible. A derived tuple is represented
 by values in C++ variables and a position in the code.
 If from this tuple, another tuple can be derived, which contains
 some of these values (in possibly other argument positions),
 no copying is usually done. Furthermore, constants contained
 in the program rules often do not have to be represented explicitly,
 because we specialize rules in the Datalog-to-C++ compilation
 (this feature can be selectively switched off in case the generated
 program becomes too big).

 Of course, in order to detect duplicates and avoid possible
 non-termination because of the repeated generation of duplicates,
 sometimes generated tuples must be stored.
 Because the only purpose of this is the detection of duplicates,
 a hash table or other data structure for efficient element test
 can be used.
 We also plan to use nested data structures (NF^2 tables)
 in case one variable changes more often than another one:
 The "Push" method can be seen as working with "virtual NF^2 tables",
 and for the duplicate test, these virtual tables must be materialized.

 Another case where intermediate relations must be stored
 are rules containing more than one body literal with a derived
 predicate. In this case, a tuple generated for one such body literal
 must be stored until all possible join partners for the other such
 body literals have been generated. However, a Datalog-to-Datalog
 translation like SLDMagic which is applied before the Datalog-to-C++
 transformation, mainly produces rules of this simple type
 (only one body literal with a derived predicate).