Home Publications C++ Code Translator Abstract Machine Benchmark Results Links

Bottom-Up Evaluation with the "Push Method"

C++ Source Code for Library and Benchmarks

This is the version from May 27. It has a more uniform management of tests and benchmarks (a few tests from main.cpp still have to be migrated). The new version prints some help if called with the option "-h" and a list of available benchmarks if called with the option "-l". It permits much more control on the quantity of output.

The DBLP Benchmark Version 2 is the first with a cursor interface, which makes answers really accessible (previously, we only counted the answers within the generator). (The other systems also use anonymous variables in the query.) Interestingly, the runtime has not significantly changed. The same is true also for the transitive closure benchmark tc(X,Y): There is also no significant runtime difference between the old version (static method with local variables) and the new version (object with cursor interface for the query results, with all variables in attribute). In case of the transitive closure benchmark, there is also no significant runtime difference between the hand-crafted loader and the general, table-based loader. So we will soon eliminate the hand-crafted loaders. We still do some more performance checks with the old relation data structures before eliminating them.

Note: This is only a research prototype under construction. It is not intended for real applications where bugs could do harm.

tar-Archive:
[cpp.tar]

Project Management:

Makefile (Project Build File):
[Makefile]
Shellscript used for "make depend" (edits Makefile to make dependency list current):
[makedep]
Project Files for Microsoft Visual Studio:
[ydb.sln] [ydb.vcxproj] [ydb.vcxproj.filters]

Basic Definitions and Programming Support:

Version Settings:
[ver.h]
String Data Type:
[str.h] [str.cpp]
Error Messages:
[err.h] [err.cpp]
Assertions:
[check.h] [check.cpp]
Minimum and Maximum:
[min.h]
Macros for Rounded Integer Division and Percent:
[idiv.h]
Timer (Performance Measurements):
[perf.h] [perf.cpp]

Memory Management:

Memory Page:
[page.h]
Big Pieces of Dynamic Memory, from which Memory Pages are taken:
[mem.h] [mem.cpp]
Memory Pool (keeps Track of Allocated Memory Pages of this Pool, so they can be Deallocated in a Destructor):
[mpool.h] [mpool.cpp]
String Memory (Storage Space for Strings, Permits to Copy/Save Strings):
[strmem.h] [strmem.cpp]
Stack (Simple Stack Implementation, Classe Template):
[stack.h]
Flexible Array (List with Index Access, Class Template):
[flexarr.h]
String Table (Assigning Unique Numbers to Strings):
[strtab.h] [strtab.cpp]
Atoms (Short Strings, Symbolic Constants):
[atom.h] [atom.cpp]
Atom Table (Stores a Set of Atoms, Similar to Enumeration Type):
[atomtab.h] [atomtab.cpp]

Main Memory Relations:

Row Type (One Integer Column):
[row_1.h] [row_1.cpp]
Row Type (Two Integer Columns):
[row_2.h] [row_2.cpp]
Binding Patterns:
[bind.h] [bind.cpp]
Superclass of all Data Structures for Relations:
[rel.h] [rel.cpp]
List (Class Template):
[list.h]
List with One Integer Column:
[list_1.h] [list_1.cpp]
List with Two Integer Columns:
[list_2.h] [list_2.cpp]
Cursor for General Lists (C++ Template):
[cur_list.h]
Cursor for Lists with One Integer Column:
[cur_1.h] [cur_1.cpp]
Cursor for Lists with Two Integer Columns:
[cur_2.h] [cur_2.cpp]
Old Version of a List (Relation with Two Integer Columns) - will be removed:
[rel_ii.h] [rel_ii.cpp]
Old Version of a List Cursor (Cursor for Relation with Two Integer Columns) - will be removed:
[cur_ii.h] [cur_ii.cpp]
General Set (Dynamic Hash Table, Class Template):
[set.h] [set.cpp]
Set of Rows with One Integer Column:
[set_1.h] [set_1.cpp]
Set of Rows with Two Integer Columns:
[set_2.h] [set_2.cpp]
Old Set Implementation with Simple Hashtable (Duplicate Check for Rows with one Integer Column):
[dup_i.h] [dup_i.cpp]
Old Set Implementation with Simple Hashtable (Duplicate Check for Rows with Two Integer Columns):
[dup_2i.h] [dup_2i.cpp]
Set of Rows with Two Integer Columns and Small Domain 0..1023 (10 Bits), implemented with Bitmaps:
[set_tt.h] [set_tt.cpp]
Map from Natural Number to Natural Number (implemented with flexible array):
[rel_n_n.h] [rel_n_n.cpp]
Cursor for Map from Natural Number to Natural Number:
[cur_n_n.h] [cur_n_n.cpp]

Input, Scanner, Data Loader:

Superclass of Input Sources:
[in.h]
Input from File:
[in_file.h] [in_file.cpp]
Input from Keyboard with Prompt (currently not used):
[in_kbd.h] [in_kbd.cpp]
Input from Web Form via CGI (currently not used):
[in_cgi.h] [in_cgi.cpp]
Character Classification:
[ch.h] [ch.cpp]
Tokens / Word Symbol Types (Enumeration Type):
[tok.h] [tok.cpp]
Syntax Errors (Enumeration Type):
[syn.h] [syn.cpp]
Lexical Scanner:
[lex.h] [lex.cpp]
Argument Types of Predicates:
[argtype.h] [argtype.cpp]
Predicate Declarations for Table-Based Loader:
[pred.h] [pred.cpp]
Table-Based Data Loader:
[load.h] [load.cpp]

Test Support I:

Selective Output for Tests:
[out.h] [out.cpp]
Abstract Superclass of Standard Benchmarks:
[bench.h] [bench.cpp]

Performance Tests (Benchmarks):

Old Hand-Written Data Loader for DBLP Data:
[load_dblp.h] [load_dblp.cpp]
DBLP Benchmark, Version 1 (Handwritten loader, static method, relations in attributes, only counts results):
[bench_dblp_1.h] [bench_dblp_1.cpp]
DBLP Benchmark, Version 2 (Handwritten loader, relations etc. in attributes, Object with cursor interface):
[bench_dblp_1.h] [bench_dblp_1.cpp]
Data Loader for Transitive Closure Benchmark (with Binding Pattern "ff"):
[load_tc.h] [load_tc.cpp]
Transitive Closure Benchmark tc(X,Y) (Binding Pattern "ff"), Version 1 (handcrafted loader, static function with relations etc. in local variables):
[bench_tcff_1.h] [bench_tcff_1.cpp]
Transitive Closure Benchmark tc(X,Y) (Binding Pattern "ff"), Version 2 (with general loader, relations etc. in attributes, and standard cursor interface to retrieve answers):
[bench_tcff_2.h] [bench_tcff_2.cpp]
Data Loader for Transitive Closure Benchmark (with Binding Pattern "bf"):
[load_tcbf.h] [load_tcbf.cpp]
Transitive Closure Benchmark (with Binding Pattern "bf"), Version 1:
[bench_tcbf_1.h] [bench_tcbf_1.cpp]
Join 1 Benchmark (with Binding Pattern "ff"), Ver. 1 (Relations in Attributes):
[bench_j1axy_1.h] [bench_j1axy_1.cpp]
Join 1 Benchmark (with Binding Pattern "ff"), Ver. 2 (Without goto):
[bench_j1axy_2.h] [bench_j1axy_2.cpp]
Join 1 Benchmark (with Binding Pattern "ff"), Ver. 3 (Static Method):
[bench_j1axy_3.h] [bench_j1axy_3.cpp]
Join 1 Benchmark (with Binding Pattern "ff"), Ver. 4 (With Duplicate Test for b1, b2, c1):
[bench_j1axy_4.h] [bench_j1axy_4.cpp]
Join 1 Benchmark (with Binding Pattern "ff"), Ver. 5 (Without goto, With Complete Duplicate Check):
[bench_j1axy_5.h] [bench_j1axy_5.cpp]
Join 1 Benchmark (with Binding Pattern "ff"), Ver. 6 (Without goto, With Complete Duplicate Check Using Bitmaps):
[bench_j1axy_6.h] [bench_j1axy_6.cpp]
Wine Ontology Benchmark:
[bench_wine.h] [bench_wine.cpp]

Test Support II and Main Program:

Test Execution:
[test.h] [test.cpp]
Main Program:
[main.h] [main.cpp]

Installation:

Unpack the archive with the source files and just type "make". This should generate an executable program called "ydb". The compilation should also work with Microsoft Visual Studio 2015, Express Edition. If the compilation does not work, please contact brass@informatik.uni-halle.de. The makefile also knows about the following goals:

make clean: This removes the object files and several other temporary files.
make depend: This refreshes the dependencies of object files from header files in the Makefile.

Data Files:

The data files are taken from the OpenRuleBench Benchmark Suite, see OpenRuleBench Download. Our test program expects that the necessary data files are available in a subdirectory "data" of the directory, in which the program "ydb" is executed. Since some of the data files are randomly generated by OpenRuleBench scripts, we publish them here (to make everything reproducible). You will also need "dblp.data" (in the directory "large-joins/data" of the OpenRuleBench archive) and "wine_data.P" (in the directory "recursion/data"). All files are in the XSB syntax (standard Datalog facts), which our loader can read.

For the DBLP Test:
- Small Test File (to check correctness):
  [dblp_test.data]
- For the real file (122 MB) please use the OpenRuleBench Download.
For the Transitive Closure Tests:
- Small Test File (to check correctness):
  [tc_test.P]
- Graph with 1000 nodes and 50.000 edges, should not have cycles (but may have some):
  [tc_d1000_parsize50000_xsb_nocyc.P]
- Graph with 1000 nodes and 50.000 edges, with cycles:
  [tc_d1000_parsize50000_xsb_cyc.P]
- Graph with 1000 nodes and 250.000 edges, should not have cycles (but may have some):
  [tc_d1000_parsize250000_xsb_nocyc.P]
- Graph with 1000 nodes and 250.000 edges, with cycles:
  [tc_d1000_parsize250000_xsb_cyc.P]
- Graph with 1000 nodes and 500.000 edges, should not have cycles (but may have some):
  [tc_d1000_parsize500000_xsb_nocyc.P]
- Graph with 1000 nodes and 500.000 edges, with cycles:
  [tc_d1000_parsize500000_xsb_cyc.P]
- Graph with 2000 nodes and 500.000 edges, should not have cycles (but may have some):
  [tc_d2000_parsize500000_xsb_nocyc.P]
- Graph with 2000 nodes and 500.000 edges, with cycles:
  [tc_d2000_parsize500000_xsb_cyc.P]
- Graph with 2000 nodes and 1.000.000 edges, should not have cycles (but may have some):
  [tc_d2000_parsize1000000_xsb_nocyc.P]
- Graph with 2000 nodes and 1.000.000 edges, with cycles:
  [tc_d2000_parsize1000000_xsb_cyc.P]
For the Join 1 Test:
- 10.000 rows per predicate (i.e. 50.000 rows in total), Domain with 1000 values: [d1000_relsize10000_xsb_cyc.P]
- 50.000 rows per predicate (i.e. 250.000 rows in total), Domain with 1000 values: [d1000_relsize50000_xsb_cyc.P]
- 250.000 rows per predicate (i.e. 1.25 million rows in total), Domain with 1000 values: [d1000_relsize250000_xsb_cyc.P]

Using the Program:

The program has compiled in a number of tests and benchmarks. The test/benchmark can be selected with a command line argument. A list of tests is printed if the program is called with the option -l. The most important benchmarks are:

ydb dblp_2:
DBLP Benchmark (Ver. 2, Implemented with Cursor Interface).
ydb tcff_1_f2:
Transitive Closure Benchmark with Binding Pattern "ff",
Implementation 1, Data File 2: tc_d1000_parsize50000_xsb_cyc.P.
ydb -a30 -d tcff_2_test:
Transitive Closure Benchmark with Binding Pattern "ff",
Implementation 2, Debug output with a trace, showing at most 30 answers (there are 20).
ydb tcbf_1_f2:
Transitive Closure Benchmark with Binding Pattern "bf",
Implementation 1, Data File 2: tc_d1000_parsize50000_xsb_cyc.P.
ydb wine:
Wine Ontology Benchmark.
ydb j1axy_6_10k:
Join 1, a(X,Y) Benchmark, Implementation 6 (with duplicate test using bitmaps),
with 10.000 Rows per Predicate.

The program is called in the form ydb <Options> <Test/Benchmark IDs>. It understands the following options:

-a: Switch off printing of answers (by default, the first 10 answers are shown).
-aN: Set the limit for the number of answers to print to N (e.g. -a1000).
-d: Switch debug output on. The amount of debug output varies by test (it was manually added).
-h: Print help.
-l: Print list of tests/benchmarks (ID and short description).
-m: Print memory pools.
-r: Print relations.
-s: Print only summary line ("silent mode") The summary line contains the test ID, times in milliseconds (load time, eval time, total time in case of benchmarks) and the test result (status code). The test result "Warn" usually means that the difference between real lime (wallclock time) and CPU time was quite big. The summary line contains the greater of the two times.

Note: The current version writes a file "alert" with error messages and object dumps (in debug mode). I will soon add an option to suppress this. For the time being, one needs write permission for the directory in which one executes ydb because of this error log file (which is opened even if there are no errors).

Stefan Brass (brass@informatik.uni-halle.de), May 27, 2016

Original URL: http://www.informatik.uni-halle.de/~brass/push/cpp.html [XHTML 1.0] [CSS] [Links] [Legal Info]

MARTIN-LUTHER-UNIVERSITÄT HALLE-WITTENBERG	RESEARCH PROJECT
Institut für Informatik	Deductive Databases
Prof. Dr. Stefan Brass	Push Method