MARTIN-LUTHER-UNIVERSITÄT HALLE-WITTENBERG

Institut für Informatik

Prof. Dr. Stefan Brass

Paper about Datalog with ordered predicates, i.e. predicates which are lists/arrays of facts instead of sets of facts. [Slides of the talk at Datalog 2.0 2012]

The program basically does now what is promised in the paper and on this page, with the following exceptions (features are mentioned in the paper, but not yet implemented):

- The pattern syntax for output. This will be included only in the next version (it is not important for the semantical core of the language, although it is a nice feature for practical applications and should be done in a few weeks).
- More built-in predicates. Currently, only arithmetic comparisons are possible, no computations like successor or sum. This will soon be implemented.
- The default order specification. Again, semantically this is not important, it only permits to make the programs shorter.

But currently the main thing to do is more testing (the program currently consists of 3328 lines of Prolog).

- Current version of the Prolog program [Version 0.1f from May 20]
- Simple Example Input
- Example: Hello, World
- Example: Output of HTML Table
- SWI-Prolog (the program is currently being tested only under SWI-Prolog, more Prologs will be tried later)

The input syntax is Datalog with extensions for ordered predicates. First, Datalog is a simplified and purified variant of Prolog (a logic programming language). Datalog is intended mainly for database applications. A program consists of a set of rules and facts.

A fact consists of a predicate,
followed by arguments in parentheses `(...)`

,
separated by commas.
An argument in a fact is a constant,
i.e.

- an integer like
`123`

, - a string like
`'abc'`

, or - an identifier like
`abc`

.

Identifiers must start with a lowercase letter
to distinguish them from variables,
e.g. `X`

.
Variables are not allowed in facts.
An example for a fact is

`emp('Andrew', 4000, 'Manager').`

This fact consists of the predicate `emp`

and three arguments.
For instance,
the first argument is the string `'Andrew'`

.
A fact corresponds to a tuple/row in a relational database,
and the predicate corresponds to a relation name.
The arguments correspond to the column values.
In standard Datalog,
the columns are identified by position,
not by name as in relational databases.
The contents of a database relation
is specified in Datalog as a set of facts,
for instance:

`% Table emp(EName, Sal, Job).`

`emp('Andrew', 4000, 'Manager').`

`emp('Betty', 3000, 'Programmer').`

`emp('Chris', 3000, 'Programmer').`

`emp('Doris', 2000, 'Clerk').`

`emp('Eddy', 1000, 'Salesman').`

`emp('Fred', 1000, 'Programmer').`

The first line is an example for a comment:
A comment starts with a percent sign and extends to the end of the line.
Note that every fact (and every rule)
must be delimited with a full stop "`.`

".

A rule consists of a head and a body.
The head is a single literal,
where a literal looks like a fact,
but can contain variables as arguments
(the optional order specification is discussed below).
The body consists of several literals,
separated by commas.
A rule states that if the body is true,
the head must be true,
too.
Therefore,
head and body are separated by
`<-`

(in Prolog, one would use `:-`

).
An example for a rule is:

`programmer(X) <- emp(X, Y, 'Programmer').`

This rule defines a derived predicate `programmer`

(corresponding to a view in a relational database).
Whenever the system finds a match for the body literal,
i.e. there is a way to assign values to the variables
to that the body is already known to be true,
the system derives the corresponding instance of the head literal.
From the facts given above and from the rule,
the system automatically derives the following facts:

`programmer('Betty').`

`programmer('Chris').`

`programmer('Fred').`

As in Prolog,
predicates without arguments do not use `()`

:

`it_rains.`

`use_umbrella <- it_rains.`

It is possible to define a derived predicate with several rules. Then all these rules can be used to derive facts about the new predicate. This can be used to express disjunctive conditions (logical "or"). For instance, the following rules define administrative employees as managers or clerks:

`admin_emp(X) <- emp(X, Y, 'Manager').`

`admin_emp(X) <- emp(X, Y, 'Clerk').`

Note that variable names are local to rules:
A variable `X`

in one rule
has no connection to a variable `X`

in another rule.

Derived predicates can be used in rule bodies, and recursive definitions are possible, too. Suppose the following database relation containing direct supervisors is given:

`supervisor('Betty', 'Andrew').`

`supervisor('Chris', 'Betty').`

`supervisor('Doris', 'Andrew').`

`supervisor('Eddy', 'Andrew').`

`supervisor('Fred', 'Betty').`

Now it is possible to define indirect supervisors (the "transitive closure" of the supervisor relation) as follows:

`boss(X, Y) <- supervisor(X, Y).`

`boss(X, Z) <- supervisor(X, Y), boss(Y, Z).`

At the beginning,
only the first non-recursive rule can be applied.
It basically copies the `supervisor`

-facts
to `boss`

-facts.
In the second round,
one already has these `boss`

-facts,
and can use them in the second, recursive rule.
Thus,
one now derives the `boss`

-relations over two hierarchy steps.
In the third round,
one could use the newly derived facts,
but since the company has no more hierarchy levels,
one does not find matching `supervisor`

-facts.
Since no new facts can be derived,
the recursion ends here.

In contrast to Prolog, termination can be guaranteed. Even programs like the following do not cause an infinite recursion:

`p(X) <- p(X).`

`p(a).`

One the recursive rule is applied, the system detects that the derived fact is a duplicate, and eliminates it. When nothing new can be derived, query evaluation stops. Note also that the sequence of facts and rules is not important (also in contrast to Prolog).

Besides predicates defined by facts and rules,
rule bodies can also contain built-in predicates,
which are defined by code inside the system.
At the moment,
our system contains only the comparisons
`<`

,
`<=`

,
`=`

,
`!=`

,
`>=`

,
`>`

.
More arithmetic operations will follow soon.
Of course,
for comparison operators the usual infix syntax is used,
e.g.

`good_salary(Name) <- emp(Name, Sal, Job), Sal > 2500.`

This is an example of a rule with two body literals.
The body literals are always conjunctively connected (logical "and"),
i.e. both body literals must be satisfied
for the rule to "fire",
and derive a fact about `good_salary`

(with the same value for the variable `Name`

which was used to satisfy the body of the rule).
In the example,
the following facts are derived:

`good_salary('Andrew').`

`good_salary('Betty').`

`good_salary('Chris').`

In the current version,
a comparison silently fails,
if the arguments have incompatibel types,
e.g. a string is compared with a number.
Do not rely on this behaviour,
a later version may print an error message.
However,
that is not as easy as it might seem first.
In Datalog,
the order of body literals is not important.
Thus,
it would not be correct to print an error message
as long as another body literal could fail.
The correct way to do this is to use another logic value "error"
besides "true" and "false".
If the body is evaluated to "error",
one can print the message.
This was all too complicated for the first version,
therefore,
the comparison simply returns "false"
if a string and a number is compared
(or if two identfiers/atoms are compared:
there is no defined order on them).
However,
`!=`

("not equals") returns true in such cases.

All variables used in a rule must be bound to a concrete value when the rule is applied. Therefore, every variable must appear in a normal body literal (not with a built-in predicate, not negated). For instance, it is not permissible that a variable is used only in the rule head, but not in the body. Otherwise a single application of the rule could generate an unbounded number of new facts, which would at least be difficult.

In connection with built-in predicates,
there are many cases where the range-restriction rule could be extended,
e.g. a comparison like `X = 1`

obviously binds `X`

to a value.
However,
to simplify the implementation,
such cases are not considered,
and `X`

must in addition appear in normal body literal.
At the moment this is anyway not very important,
but when built-in predicates are added to perform arithmetic computations,
the range-restriction rule will be relaxed.

At the moment,
queries are handled in a simple way:
The derived facts for a special predicate `answer`

are printed at the end.
At the moment,
this is only a small prototype to allow experimentation with the language.
But it will develop over time.
Of course,
in a real system,
one would distinguish between

- the database, which is stored like a standard relational database (not as a set of facts in the program itself),
- the views and stored procedures, i.e. derived predicates defined by rules, which are stored persistently in the system,
- the query, which might need auxillary predicates, defined only for the duration of the single interaction with the database.

At the moment,
all three components are contained in a single input file.
The predicate `answer`

is similar to the function `main`

in C/C++:
It is the top derived predicate,
all other predicates are relevant only
if they are used directly or indirectly by `answer`

.
Probably,
in a future version,
an alternative syntax
(more similar to the goals in Prolog)
will be introduced,
but at the moment,
the special predicate `answer`

suffices.
Anyway,
for the bottom-up evaluation used in deductive databases,
such a solution is very natural.

For printing the arguments of the derived `answer`

-facts,
the variables in the head of the first rule about `answer`

are used.
Thus,
in order to get more readable output,
it is probably advisable
to have normally only one rule about `answer`

,
with only variables as arguments in the rule head.
This is no real restriction,
since one can always use another auxiallary predicate
to first compute the answers.
For a future version,
it is planned to have also a tabular output as in a relational system,
currently these variable bindings are printed.
This is semantically no difference,
the tabular output would only look nicer in most cases.

Body literals may be negated,
which means that the corresponding positive literal
is not derivable.
The negation operator is written `\+`

as in many Prolog systems.
This is similar to the "not derivable" symbol in logic.
E.g.,
the following rule finds the top manager
of the company,
who has no supervisor:

`has_supervisor(X) <- supervisor(X,Y).`

`top_manager(X) <- emp(X, Y, Z), \+ has_supervisor(X).`

The auxillary predicate `has_supervisor`

is necessary to satisfy the range restriction requirement:
Every variable must appear in a positive (i.e. non-negated) body literal
(with a normal, i.e. not built-in predicate).
Actually,
this rule could be relaxed:
It is no problem if a variable appears only
in a single negative body literal,
and nowhere else in the rule.
Probably,
a future version will be more generous
in this respect,
but the current version has only the simple range restriction requirement.

In order to avoid inconsistencies like "p is derivable if p is not derivable", recursion through negation is forbidden. Formally, it must be possible to assign levels (non-negative integers) to the predicates such that

- the predicate in the head has an equal or higher level than all predicates in the body (i.e. all predicates it depends on), and
- the level is strictly higher than that of all predicates used negatively in the body.

There are many proposals for relaxing this condition, and a few applications which would really need this. However, since this is not the main emphasis of the prototype, the standard stratification will do for the first version. Furthermore, the stratification is also used in connection with ordered predicates, and there possible generalizations are subject of future research.

In standard Datalog,
predicates are sets of facts:
There are no duplicates,
and the order/sequence is not defined.
This corresponds to relational databases,
where relations are sets (or multisets in SQL) of tuples.
However,
in relational databases,
at least for the output,
it is possible to use an `ORDER BY`

clause in SQL.
Whereas this could long be seen as only a cosmetical part
of the final presentation of the query result,
newer developments of SQL made order important
also during query evaluation,
for the main logic of the query.
For instance,
in Oracle it is possible to retrieve the three employees with highest salary
as follows:

`SELECT ENAME, SAL`

FROM (SELECT ENAME, SAL

FROM EMP

ORDER BY SAL DESC)

WHERE ROWNUM <= 3

This example is interesting,
because it shows that a subquery,
corresponding to a view or a derived predicate,
might need a defined order.
In older SQL standards,
it was not possible to use `ORDER BY`

in subqueries
or view definitions.
Now it has become important
and was legalized in SQL-2008.

Our language extension is to permit "ordered predicates" in addition to the standard predicates. The semantics is a multiset of facts, on which a partial order is defined. This includes especially arrays, where the facts have a linear order. In order to distinguish the two types of predicates, a special declaration of the ordered predicates is needed, e.g.

`ordered emp_by_sal/2.`

As usual in Prolog,
it is possible to use the same predicate name with different arities
(number of arguments).
Therefore it is necessary to say explicitly
that only `emp_by_sal`

with two arguments
is an ordered predicate.
Predicates for which no `ordered`

-declaration is given
are considered as standard predicates.
The declaration must precede the first use of the predicate
in a rule head or body.

One can also view the ordered predicates
as standard predicates
with an additional, usually hidden, list-valued argument
which defines the order.
It is list-valued because there can be several ordering criteria
of different priority.
If two lists differ already in the first element,
that defines the order.
If they are equal in the first element,
but differ in the second,
that defines the order.
If one list is a prefix of the other list,
the prefix comes first.
The lists can contain special terms of the form `desc(X)`

,
which are sorted in inverse order.
So the list elements are sorted as follows:

- numbers come first (in the usual order),
- then strings (in alphabetic order),
- then terms of the form
`desc(S)`

, where`S`

is a string (in inverse alphabetic order), - then terms of the form
`desc(N)`

, where`N`

is a number (in inverse numeric order).

If different kinds of terms are compared,
this is likely to be an error.
The current implementation prints no warning,
but this might be added in future.
There is also a slight incompatibility with the comparison lterals:
While `'a' < 1`

and `'a' >= 1`

currently both fail because of incompatible types,
in the order routine this behaviour is not possible:
One must come first, and currently this is the number.
If a later version produces a type error at runtime,
this asymmetric behaviour is removed.

It is possible that distinct facts
have idential ordering values:
For the functions `rank`

and `dense_rank`

(see below)
this is important,
for the function `row_number`

(and thus for output) the implementation can decide
which fact to put first.
The inverse case is also possible:
Two facts can differ only in the special ordering argument,
then they are basically duplicates.

Because the ordering argument is often hidden,
and it might not be explicitly stored
to improve efficiency,
it is not contained in the standard list of arguments,
but it is written in angle brackets
between the predicate and the normal argument list.
In the example,
we need to order employees by salary in descending order.
Thus, the ordering argument is `desc(Sal)`

.
However,
the term would look a little long,
therefore the notation `^Sal`

is used.
The symbol `^`

is the nearest one can get in ASCII
to an arrow in upward direction,
which symbolizes the direction of increasing values.
Thus,
the predicate `emp_by_sal`

,
which corresponds to the subquery in the SQL query above,
is defined by the following rule:

`emp_by_sal<^Sal>(EName, Sal) <- emp(EName, Sal, Job).`

If for equal salaries, one wants to order by employee name (ascending), one can add a sorting criterion of smaller priority:

`emp_by_sal<^Sal,EName>(EName, Sal) <- emp(EName, Sal, Job).`

It is possible to define ordered predicates
with several rules,
then the union of the derived facts is computed,
and the ordering values of all derived facts for the predicate
determine the sort order.
I.e. the facts derivable with the different rules
can appear intermixed in the resulting sequence.
Often,
however,
one wants that the facts derivable with the first rule come first,
and then the facts derivable with the second rule,
and so on.
It is possible to use constants like `1`

, `2`

, ...
in the ordering argument,
but that remindes of the infamous line numbers in basic.
Therefore,
one can use the special ordering argument `@`

,
which is automatically replaced
by the current rule number.
The paper also contains a proposal
for a default ordering value,
which often helps to leave out the ordering argument completely,
but that is not yet implemented in the current version.
It is high on the TODO-list
and will appear in the next version.
With the default sort order,
one would basically get the same order as in Prolog
(Prolog has a defined order in which it produces answers,
but it is not based on data values like the salary,
but on the execution algorithm).

An important function of the system is that it
condenses the ordering arguments,
which are lists of values,
basically to a single index:
The position in the sorted sequence.
While `<...>`

can be used only in the rule head,
in the body, one can access the "array index" of a fact.
Therefore,
our goal to select the three employees
with highest salary can be solved as follows:

`ordered emp_by_sal/2.`

`emp_by_sal<^Sal>(EName, Sal) <- emp(EName, Sal, Job).`

`answer(EName, Sal) <- emp_by_sal[N](EName, Sal), N <= 3.`

This is equivalent to the above SQL query (repeated here for convenience):

`SELECT ENAME, SAL`

FROM (SELECT ENAME, SAL

FROM EMP

ORDER BY SAL DESC)

WHERE ROWNUM <= 3

In the above solution,
SQL as well as our Datalog extension,
the order of the result tuples is not defined.
Of course,
one can decide to order the three employees
by salary (highest first),
i.e. to attach an
`ORDER BY SAL DESC`

clause to the outer query, too.
In our Datalog extension,
we declare `answer`

as an ordered predicate,
and can either use the salary as in SQL
or the array index to specify the order.
This latter solution is interesting,
because it shows how an order can be preserved
from a body literal
to the defined predicate:

`ordered answer/2.`

`answer<N>(EName, Sal) <- emp_by_sal[N](EName, Sal), N <= 3.`

The default order specification, mentioned above, would do this automatically: If the predicate in the head is declared as an ordered predicate, and no order is specified, the default is first the rule number, and then the array indices of all body literals with ordered predicates in the sequence in which they appear in the body. In this way, rules are a bit like comprehension syntax: One first constructs a list of facts or fact-combinations and then can filter it with additional conditions.

Of course, it is also possible to use a different ordering sequence for the query result, i.e. we could sort the three employees with highest salary by name. This shows that more than one sort order can be used in a single query.

Suppose we want the employee with the highest salary for each job. E.g. Doris, a clerk, will be printed, because she is the only clerk, although she earns less than the manager Andrew. Then we need not a linear order on all facts for a predicate, but several chains (one for each job). Between the chains, the facts should be uncomparable. This is done by first partioning the facts by the job, and then ordering them only within a partition.

`ordered emp_job/2.`

`emp_job<Job|^Sal>(EName, Sal, Job) <- emp(EName, Sal, Job).`

This is somewhat similar to a two-dimensional array:
The first dimension is the job
(without particular order, so no indexes can be used here),
the second dimension is the position in the sequence
sorted by salary (for the given job).
However,
the syntax permits to access only the index in `[...]`

.
If the job is needed,
it must be a standard argument of the fact.
Given the above predicate,
the top earner for each job is computed as follows:

`answer(EName, Sal, Job) <- emp_job[1](EName, Sal, Job).`

In the last example,
it is implementation-depended
whether Betty or Chris is printed,
because both earn 3000,
which is the maximum for programmers in the example.
Of course,
such a behaviour is often not wanted,
and may be considered unfair in other applications
(e.g., when the best student is selected).
Therefore,
the SQL standard now contains a function `RANK()`

,
used as in the following example:

`SELECT ENAME, SAL`

FROM (SELECT ENAME, SAL,

RANK() OVER (PARTITION BY JOB ORDER BY SAL DESC) AS POS

FROM EMP)

WHERE POS = 1

The function `RANK()`

counts the number of tuples
which have a strictly "better" ordering value
and then adds 1 to start counting with 1 and not with 0.
So in case several tuples are equal with respect to the ordering value,
all get the same rank.
However,
afterwards it jumps to the actual number of tuples,
so there are holes the in sequence of `RANK()`

values.
In contrast,
`DENSE_RANK`

has no holes,
but then it loses the connection to the number of tuples.
The standard index
(which has no holes, corresponds to the number of tuples, but is sometimes
arbitrary) is called `ROW_NUMBER()`

.
Here are the values of the three functions in the example
(without partitioning):

EName Sal ROW_NUMBER RANK DENSE_RANK 'Andrew' 4000 1 1 1 'Betty' 3000 2 2 2 'Chris' 3000 3 2 2 'Doris' 2000 4 4 3 'Eddy' 1000 5 5 4 'Fred' 1000 6 5 4

In our proposal, one can choose a ranking function by using the corresponding keyword in the example, e.g.

`answer(EName, Sal, Job) <- emp_job[rank:1](EName, Sal, Job).`

This ensures that both, Betty and Chris, will be printed, because they have the same salary.

If one needs several ranking functions at the same time, one can access them in a single pair of brackets. E.g. suppose one wants to print the values of the three ranking functions (without partitioning), one could do that as follows:

`ordered emp_by_sal/2.`

`emp_by_sal<^Sal>(EName, Sal) <- emp(EName, Sal, Job).`

`answer(EName, Sal, N, R, D) <- emp_by_sal[N,rank:R,dense_rank:D](EName, Sal).`

It is not surprising that with recursion and the possibility to determine the first literal, one can get contradictory/inconsistent situations, as with recursion through negation. E.g. the following input has no reasonable semantics and must therefore be excluded:

`ordered p/1.`

`p<10>(a) <- p[1](b).`

`p<20>(b).`

If `p(b)`

is the first element
in the sorted list `p`

,
then `p(a)`

is true,
which then would come first.
But then `p(b)`

is no longer the first element.

The solution is the same as in the case of negation:
We require that it is possible to assign non-negative integers as
"levels" to the predicates,
such that for rules containing `p[...](...)`

in the body,
and `q`

in the head,
the level of `q`

is strictly greater than
the level of `p`

.
The same must hold for negation:
For rules containing `\+ p(...)`

in the body,
and `q`

in the head,
the level of `q`

must also be strictly greater than
the level of `p`

.
Furthermore, in any case,
if `p`

appears in the body of a rule,
and `q`

appears in the head of the same rule
(i.e. `q`

depends on `p`

),
the level of `q`

must be greater than or equal to
the level of `p`

.

These levels determine a computation sequence: The system can first compute all facts about predicates of level 0. There can be recursion between these predicates, but it is not yet possible to refer to the position of a fact in a sorted sequence, or the non-existence (non-derivability) of a fact. Once all derivable facts about predicates of level 0 are computed, the system sorts them and assigns positions. Now the facts for the predicates of level 1 can be computed. Their rules can refer to positions of facts of level 0 predicates, and also use level 0 predicates in negated body literals. This is possible because all derivable facts of level 0 predicates have been computed already and we know that no new facts can become derivable later.

Of course, the system can do query evaluation more efficiently, and does not necessarily have to really compute all derivable facts. However, it is important to have a very simple evaluation algorithm as a constructive definition of the semantics. The current prototype actually computes all derivable facts, but its purpose is currently only a demonstration of the language. We will work on more efficient query evaluation later.

The make this clear:
Recursion with ordered predicates is possible,
however,
one cannot use the ordering position `[...]`

in the recursive calls.
In predicates of higher strata
(i.e. which are not mutually recursive with the current predicate),
one can of course use the ordering position.

The starting point for the development of the language
was the need to specify output declaratively.
The idea was to define the output of a Datalog program
by permitting to derive text pieces
which are printed in some defined order.
While `answer`

is the "main"-predicate for standard queries,
an ordered predicate `output`

of arity 1
(i.e. with one argument)
is the "main" predicate for doing output with Datalog.
The difference is that if `answer`

-facts are derivable,
all arguments are printed with TABs to separate them,
and a newline after each fact.
For `output`

,
there is only a single argument,
and this is printed for all derivable facts
(in the defined order)
without any separation (not even a space).
In this way,
a text can be constructed.
A simple example is:

`ordered output/1.`

`output<@>('Hello, ').`

`output<@>(Name) <- name(Name).`

`output<@>('.\n').`

`name('Nina').`

With `<@>`

,
the current rule number is used to define the output sequence,
i.e. the three derivable facts are printed
in the sequence the facts/rules are written in the program.
In the paper,
a default order specification is proposed,
which is precisely `<@>`

in this example,
so this part would not be needed,
one could write for instance simply `output('Hello ').`

This is not implemented yet,
but will soon be added.
The paper also proposes a "pattern syntax" for output,
which is a big simplification and makes output very natural,
but again it is only an abbreviation for standard rules,
and will only be added in a later version.

In the example,
three `output`

facts are derivable:

`output[1]('Hello, ').`

`output[2]('Nina').`

`output[3]('.\n').`

So as explected, the program prints:

`Hello, Nina.`

(followed by a line break). The power lies in the possibility to use auxillary predicates to compute parts of the text. Suppose we want to compute an HTML table with the employee names and their salary ordered by employee names. This can be done as follows:

`ordered sal_table/1.`

`ordered sal_table_row/1.`

`sal_table<@>('<table>\n').`

`sal_table<@>('<tr> <th>Employee</th> <th>Salary</th> </tr>\n').`

`sal_table<@,Pos>(Text) <- sal_table_row[Pos](Text).`

`sal_table<@>('</table>\n').`

`sal_table_row<EName,@>('<tr><td>') <- emp(EName, Sal, Job).`

`sal_table_row<EName,@>(EName) <- emp(EName, Sal, Job).`

`sal_table_row<EName,@>('</td><td>') <- emp(EName, Sal, Job).`

`sal_table_row<EName,@>(Sal) <- emp(EName, Sal, Job).`

`sal_table_row<EName,@>('</td></tr>\n') <- emp(EName, Sal, Job).`

As mentioned before, this can be greatly simplified with the upcoming pattern syntax (see paper). Nevertheless it shows that predicates can be used to define pieces of a text, even parameterized pieces. The rule

`sal_table<@,Pos>(Text) <- sal_table_row[Pos](Text).`

shows how text from one predicate can be inserted into the text for another predicate.

The complete example can be downloaded here.

The possibility to number facts
and to compute the next number
permits to write loops over sets of facts.
This can be used to write aggregation functions.
E.g., the following program computes
the sum of all salaries
(the predicate `is`

still misses
in the current version of the prototype,
therefore,
this program cannot be testet yet):

`ordered emp_list/1.`

`emp_list<EName>(EName, Sal) <- emp(EName, Sal, Job).`

`% sal_sum(N, S) means that the sum of the first N-1 elements is S.`

`% I.e. N is the index of the next element to add to the sum.`

`sal_sum(1, 0).`

`sal_sum(N1, S1) <-`

`sal_sum(N, S),`

`emp_list[N,next:N1](EName, Sal),`

`S1 is S + Sal.`

`answer(S) <- sal_sum(nil, S).`

The keyword `next:`

in an array index
binds the following variable to the index of the next array element,
or `nil`

,
if the current element is the last.
This simplifies the writing of loops,
but is not strictly necessary.
One could write the recursion also as follows:

`sal_sum(1, 0).`

`sal_sum(N1, S1) <-`

`sal_sum(N, S),`

`emp_list[N](EName, Sal),`

`N1 is N + 1,`

`S1 is S + Sal.`

`answer(S) <-`

`sal_sum(N, S),`

`\+ emp_list[N](_,_).`

This is not strictly equivalent to the previous solution,
because in case `emp_list`

is empty,
this produces the sum `0`

,
whereas the solution with `next:`

gives the empty set of answers.

Of course, the possibility to compute aggregates does not mean that the language should not contain explicit aggregate functions. Aggregate functions are well-known from SQL and are obviously useful. These examples are intended only to demonstrate the expressive power of the sorting and numbering construct. Actually, it is no surprise that this is possible, since the LDL++ system can compute aggregates with nondeterministic choice. Our construct permits a kind of deterministic choice. Although a future version of the language will of course include aggregate functions, it is good to see that one can compute them directly, so one can also handle non-standard cases.

One can also use the symbol `last`

in the array index
to access the last fact in the sorted list.
E.g.,
one can compute the minimal and maximal salary as follows:

`ordered sal_list/1.`

`sal_list<Sal>(Sal) <- emp(EName, Sal, Job).`

`sal_range(Min, Max) <-`

`sal_list[1](Min),`

`sal_list[last](Max).`

The complete example can be downloaded here.

Stefan Brass (brass@informatik.uni-halle.de), May 26, 2012

Original URL: http://www.informatik.uni-halle.de/~brass/order/ [XHTML 1.0 Checked] [CSS Checked] [Links Checked]