Bottom-Up Evaluation with the "Push Method"
C++ Source Code for Library
and Benchmarks
This is the version from May 27.
It has a more uniform management of tests
and benchmarks
(a few tests from main.cpp still have to be migrated).
The new version prints some help if called with the option "-h"
and a list of available benchmarks if called with the option "-l".
It permits much more control on the quantity of output.
The DBLP Benchmark Version 2 is the first with
a cursor interface,
which makes answers really accessible
(previously, we only counted the answers within the generator).
(The other systems also use anonymous variables in the query.)
Interestingly, the runtime has not significantly changed.
The same is true also for the transitive closure benchmark
tc(X,Y):
There is also no significant runtime difference
between the old version
(static method with local variables)
and the new version
(object with cursor interface for the query results,
with all variables in attribute).
In case of the transitive closure benchmark,
there is also no significant runtime difference
between the hand-crafted loader and the general, table-based loader.
So we will soon eliminate the hand-crafted loaders.
We still do some more performance checks with the old
relation data structures before eliminating them.
Note: This is only a research prototype under construction.
It is not intended for real applications
where bugs could do harm.
Project Management:
Basic Definitions and Programming Support:
Memory Management:
- Memory Page:
[page.h]
- Big Pieces of Dynamic Memory, from which Memory Pages are taken:
[mem.h]
[mem.cpp]
- Memory Pool (keeps Track of Allocated Memory Pages of this Pool,
so they can be Deallocated in a Destructor):
[mpool.h]
[mpool.cpp]
- String Memory (Storage Space for Strings,
Permits to Copy/Save Strings):
[strmem.h]
[strmem.cpp]
- Stack (Simple Stack Implementation, Classe Template):
[stack.h]
- Flexible Array (List with Index Access, Class Template):
[flexarr.h]
- String Table (Assigning Unique Numbers to Strings):
[strtab.h]
[strtab.cpp]
- Atoms (Short Strings, Symbolic Constants):
[atom.h]
[atom.cpp]
- Atom Table (Stores a Set of Atoms, Similar to Enumeration Type):
[atomtab.h]
[atomtab.cpp]
Main Memory Relations:
Input, Scanner, Data Loader:
Test Support I:
Performance Tests (Benchmarks):
- Old Hand-Written Data Loader for DBLP Data:
[load_dblp.h]
[load_dblp.cpp]
- DBLP Benchmark, Version 1 (Handwritten loader, static method,
relations in attributes,
only counts results):
[bench_dblp_1.h]
[bench_dblp_1.cpp]
- DBLP Benchmark, Version 2 (Handwritten loader,
relations etc. in attributes,
Object with cursor interface):
[bench_dblp_1.h]
[bench_dblp_1.cpp]
- Data Loader for Transitive Closure Benchmark
(with Binding Pattern "ff"):
[load_tc.h]
[load_tc.cpp]
- Transitive Closure Benchmark tc(X,Y) (Binding Pattern "ff"),
Version 1 (handcrafted loader, static function with
relations etc. in local variables):
[bench_tcff_1.h]
[bench_tcff_1.cpp]
- Transitive Closure Benchmark tc(X,Y) (Binding Pattern "ff"),
Version 2 (with general loader, relations etc. in attributes,
and standard cursor interface to retrieve answers):
[bench_tcff_2.h]
[bench_tcff_2.cpp]
- Data Loader for Transitive Closure Benchmark
(with Binding Pattern "bf"):
[load_tcbf.h]
[load_tcbf.cpp]
- Transitive Closure Benchmark
(with Binding Pattern "bf"),
Version 1:
[bench_tcbf_1.h]
[bench_tcbf_1.cpp]
- Join 1 Benchmark (with Binding Pattern "ff"),
Ver. 1 (Relations in Attributes):
[bench_j1axy_1.h]
[bench_j1axy_1.cpp]
- Join 1 Benchmark (with Binding Pattern "ff"),
Ver. 2 (Without goto):
[bench_j1axy_2.h]
[bench_j1axy_2.cpp]
- Join 1 Benchmark (with Binding Pattern "ff"),
Ver. 3 (Static Method):
[bench_j1axy_3.h]
[bench_j1axy_3.cpp]
- Join 1 Benchmark (with Binding Pattern "ff"),
Ver. 4 (With Duplicate Test for b1, b2, c1):
[bench_j1axy_4.h]
[bench_j1axy_4.cpp]
- Join 1 Benchmark (with Binding Pattern "ff"),
Ver. 5 (Without goto, With Complete Duplicate Check):
[bench_j1axy_5.h]
[bench_j1axy_5.cpp]
- Join 1 Benchmark (with Binding Pattern "ff"),
Ver. 6
(Without goto, With Complete Duplicate Check Using Bitmaps):
[bench_j1axy_6.h]
[bench_j1axy_6.cpp]
- Wine Ontology Benchmark:
[bench_wine.h]
[bench_wine.cpp]
Test Support II and Main Program:
Installation:
Unpack the archive with the source files
and just type "make
".
This should generate an executable program called
"ydb
".
The compilation should also work with
Microsoft Visual Studio 2015, Express Edition.
If the compilation does not work,
please contact
brass@informatik.uni-halle.de.
The makefile also knows about the following goals:
make clean
:
This removes the object files and several other temporary files.
make depend
:
This refreshes the dependencies of object files from header files
in the Makefile.
Data Files:
The data files are taken from the OpenRuleBench
Benchmark Suite, see
OpenRuleBench Download.
Our test program expects that the necessary data files
are available in a subdirectory "data
"
of the directory, in which the program "ydb
" is executed.
Since some of the data files
are randomly generated by OpenRuleBench scripts,
we publish them here
(to make everything reproducible).
You will also need "dblp.data
"
(in the directory "large-joins/data
"
of the OpenRuleBench archive)
and "wine_data.P
"
(in the directory "recursion/data
").
All files are in the XSB syntax (standard Datalog facts),
which our loader can read.
- For the DBLP Test:
- For the Transitive Closure Tests:
- Small Test File
(to check correctness):
[tc_test.P]
- Graph with 1000 nodes and 50.000 edges,
should not have cycles (but may have some):
[tc_d1000_parsize50000_xsb_nocyc.P]
- Graph with 1000 nodes and 50.000 edges,
with cycles:
[tc_d1000_parsize50000_xsb_cyc.P]
- Graph with 1000 nodes and 250.000 edges,
should not have cycles (but may have some):
[tc_d1000_parsize250000_xsb_nocyc.P]
- Graph with 1000 nodes and 250.000 edges,
with cycles:
[tc_d1000_parsize250000_xsb_cyc.P]
- Graph with 1000 nodes and 500.000 edges,
should not have cycles (but may have some):
[tc_d1000_parsize500000_xsb_nocyc.P]
- Graph with 1000 nodes and 500.000 edges,
with cycles:
[tc_d1000_parsize500000_xsb_cyc.P]
- Graph with 2000 nodes and 500.000 edges,
should not have cycles (but may have some):
[tc_d2000_parsize500000_xsb_nocyc.P]
- Graph with 2000 nodes and 500.000 edges,
with cycles:
[tc_d2000_parsize500000_xsb_cyc.P]
- Graph with 2000 nodes and 1.000.000 edges,
should not have cycles (but may have some):
[tc_d2000_parsize1000000_xsb_nocyc.P]
- Graph with 2000 nodes and 1.000.000 edges,
with cycles:
[tc_d2000_parsize1000000_xsb_cyc.P]
- For the Join 1 Test:
Using the Program:
The program has compiled in a number of tests
and benchmarks.
The test/benchmark can be selected with a command line argument.
A list of tests is printed if the program is called
with the option -l
.
The most important benchmarks are:
- ydb dblp_2:
DBLP Benchmark (Ver. 2, Implemented with Cursor Interface).
- ydb tcff_1_f2:
Transitive Closure Benchmark with Binding Pattern "ff",
Implementation 1,
Data File 2:
tc_d1000_parsize50000_xsb_cyc.P.
- ydb -a30 -d tcff_2_test:
Transitive Closure Benchmark with Binding Pattern "ff",
Implementation 2,
Debug output with a trace,
showing at most 30 answers (there are 20).
- ydb tcbf_1_f2:
Transitive Closure Benchmark with Binding Pattern "bf",
Implementation 1,
Data File 2:
tc_d1000_parsize50000_xsb_cyc.P.
- ydb wine:
Wine Ontology Benchmark.
- ydb j1axy_6_10k:
Join 1, a(X,Y) Benchmark,
Implementation 6 (with duplicate test using bitmaps),
with 10.000 Rows per Predicate.
The program is called in the form
ydb <Options> <Test/Benchmark IDs>
.
It understands the following options:
- -a:
Switch off printing of answers
(by default, the first 10 answers are shown).
- -aN:
Set the limit for the number of answers to print
to N (e.g. -a1000).
- -d:
Switch debug output on.
The amount of debug output varies by test
(it was manually added).
- -h:
Print help.
- -l:
Print list of tests/benchmarks (ID and short description).
- -m:
Print memory pools.
- -r:
Print relations.
- -s:
Print only summary line ("silent mode")
The summary line contains the test ID,
times in milliseconds
(load time, eval time, total time in case of benchmarks)
and the test result (status code).
The test result "Warn" usually means that the difference
between real lime (wallclock time) and CPU time was quite big.
The summary line contains the greater of the two times.
Note: The current version writes a file "alert"
with error messages and object dumps (in debug mode).
I will soon add an option to suppress this.
For the time being,
one needs write permission for the directory
in which one executes ydb because of this error log file
(which is opened even if there are no errors).
Stefan Brass
(brass@informatik.uni-halle.de),
May 27, 2016
Original URL:
http://www.informatik.uni-halle.de/~brass/push/cpp.html
[XHTML 1.0]
[CSS]
[Links]
[Legal Info]