22 Deductive Databases

Dr R. Baskaran

  • SQL-92 cannot express some queries:
  • Are we running low on any parts needed to build a ZX600 sports car?
  • What is the total component and assembly cost to build a ZX600 at today’s part prices?
  • Can we extend the query language to cover such queries?
  • Yes, by adding recursion.

What is a deductive database system?

 

A deductive database can be defined as an advanced database augmented with an inference system.

 

 

 By evaluating rules against facts, new facts can be derived, which in turn can be used to answer queries. It makes a database system more powerful.

Some basic concepts from logic

To understand the deductive database system well, some basic concepts from mathematical logic are needed.

  • – term
  • – n-ary predicate
  • – literal
  • – (well-formed) formula
  • – clause and Horn-clause
  • – facts
  • – logic program
  • Term

     A term is a constant, a variable or an expression of the form f(t1, t2, …, tn), where t1, t2, …, tn are terms and f is a function symbol.

Example: a, b, c, f(a, b),

  • n-ary predicate

    An n-ary predicate symbol is a symbol p appearing in an expression of the form p(t1, t2, …, tn), called an atom, where t1, t2, …, tn are terms. p(t1, t2, …, tn) can only evaluate to true or false.

Example: p(a, b), q(a, f(a, b)), p(x, y)

  • Clause

   -A clause is an expression of the following form:

ØA1 Ú ØA2 Ú … Ú ØAn Ú B1 Ú … Ú Bm

where Ai and Bj are atoms.

-The above expression can be written in the following equivalent form:

B1 Ú … Ú Bm ¬ A1 Ù … Ù An

or

B1, …, Bm ¬ A1 , …, An

  • Clause
  • Horn clause

A Horn clause is a clause with the head containing only one positive atom.

Bm ¬ A1 , …, An

  • Fact

A fact is a special Horn clause of the following form:

B ¬ with all variables in B being instantiated.

(B ¬ can be simply written as B.)

  • logic program

A logic program is a set of Horn clauses.

 

  • SQL queries can be read as follows:
  • “If some tuples exist in the From tables that satisfy the Where conditions,then the Select tuple is in the answer.”
  • Datalog is a query language that has the same if-then flavor:
  • New: The answer table can appear in the From clause, i.e., be defined recursively.
  • Prolog style syntax is commonly used.
  • Find the components of a trike?
  • We can write a relational algebra query to compute the answer on the given instance of Assembly.
  • But there is no R.A. (or SQL-92) query that computes the answer on all Assembly instances.
  • Intuitively, we must join Assembly with itself to deduce that trike contains spoke and tire.
  • Takes us one level down Assembly hierarchy.
  • To find components that are one level deeper (e.g., rim), need another join.
  • To find all components, need as many joins as there are levels in the given instance!
  • For any relational algebra expression, we can create an Assembly instance for which some answers are not computed by including more levels than the number of joins in the expression!

Can read the second rule as follows:

 

For all values of Part, Subpt and Qty,

if there is a tuple (Part, Part2, Qty) in Assembly and a tuple (Part2, Subpt) in Comp,

then there must be a tuple (Part, Subpt) in Comp.”

  • Each rule is a template: by assigning constants to the variables in such a way that each body “literal” is a tuple in the corresponding relation, we identify a tuple that must be in the head relation.
  • By setting Part=trike, Subpt=wheel, Qty=3 in the first rule, we can deduce that the tuple <trike,wheel> is in the relation Comp.
  • This is called an inference using the rule.
  • Given a set of tuples, we apply the rule by making all possible inferences with these tuples in the body.

For any instance of Assembly, we can compute all Comp tuples by repeatedly applying the two rules. (Actually, we can apply Rule 1 just once, then apply Rule 2 repeatedly.)

 

Don’t let the rule syntax of Datalog fool you: a collection of Datalog rules can be rewritten in SQL syntax, if recursion is allowed.

 

WITH RECURSIVE Comp(Part, Subpt) AS

(SELECT A1.Part, A1.Subpt FROM Assembly A1)

UNION

(SELECT A2.Part, C1.Subpt

FROM Assembly A2, Comp C1

WHERE A2.Subpt=C1.Part)

SELECT * FROM Comp C2

 

  • Let f be a function that takes values from domain D and returns values from D. A value v in D is a fixpoint of f if f(v)=v.
  • Consider the fn double+, which is applied to a set of integers and returns a set of integers (I.e., D is the set of all sets of integers).
  • E.g., double+({1,2,5})={2,4,10} Union {1,2,5}
  • The set of all integers is a fixpoint of double+.
  • The set of all even integers is another fixpoint of double+; it is smaller than the first fixpoint.
  • The least fixpoint of a function “f” is a fixpoint “v” of “f” such that every other fixpoint of “f” is smaller than or equal to “v”.
  • In general, there may be no least fixpoint (we could have two minimal fixpoints, neither of which is smaller than the other).
  • If we think of a Datalog program as a function that is applied to a set of tuples and returns another set of tuples, this function always has a least fixpoint.

    Big(Part) :- Assembly(Part, Subpt, Qty), Qty >2, not Small(Part).

Small(Part) :- Assembly(Part, Subpt, Qty), not Big(Part).

  • If rules contain not there may not be a least fixpoint. Consider the Assembly instance; trike is the only part that has 3 or more copies of some subpart. Intuitively, it should be in Big, and it will be if we apply Rule 1 first.
  • But we have Small(trike) if Rule 2 is applied first!
  • There are two minimal fixpoints for this program: Big is empty in one, and contains trike in the other (and all other parts are in Small in both fixpoints).
  • T depends on S if some rule with T in the head contains S or (recursively) some predicate that depends on S, in the body.
  • Stratified program: If T depends on not S, then S cannot depend on T (or not T).
  • If a program is stratified, the tables in the program can be partitioned into strata:
  • Stratum 0: All database tables.
  • Stratum I: Tables defined in terms of tables in Stratum I and lower strata.
  • If T depends on not S, S is in lower stratum than T.

      Repeated inferences: When recursive rules are repeatedly applied in the naïve way, we make the same inferences in several iterations.

Unnecessary  inferences:  Also,  if  we  just  want  to  find  the components  of  a  particular  part,  say  wheel,  computing  the fixpoint of the Comp program and then selecting tuples with wheel in the first column is wasteful, in that we compute many irrelevant facts.