5 Relational Algebra

Dr R. Baskaran

Relational Algebra

An algebra is a formal structure consisting of sets and operations on those sets.

Relational algebra is a formal system for manipulating relations.

Operands of this algebra are relations.
Operations of this algebra include the usual set operations (since relations are sets of tuples), and special operations defined for relations selection
projection
join

A predicate is a boolean expression whose operators are the logical connectives (and, or, not) and arithmetic comparisons (LT, LE, GT, GE, EQ, NE), an

Implementing Set Operations

To implement R1 U R2 (while eliminating duplicates) we can

sort R1 in O(N log N)
sort R2 in O(M log M)
merge R1 and R2 in O(N M)

If we allow duplicates in union (and remove them later) we can

copy R1 to R3 in O(N)
insert R2 in R3 in O(M)

If we have an index and don’t want duplicates we can

copy R1 to R3 in O(N)
for each tuple in R2 (which is O(M))
use index to lookup tuples in R1 with the same index value O(1)
if R2 tuple equals some such R1 tuple, don’t add R2 tuple to R3

Intersection and set difference have corresponding implementations.

Implementing Projection

To implement projection we must

process every tuple in the relation
remove any duplicates that result To avoid duplicates we can
sort the result and remove consecutive tuples that are equal
requires time O(N log N) where N is the size of the original relation
implement the result as a set
set insertion guarantees no duplicates
by using a hash table, insertion is O(1), so projection is O(N)

Implementing Selection

In the absence of an index we

apply the predicate to every tuple in the relation
insert matches in the resulting relation
duplicates can’t occur
take O(N) time

Given an index, and a predicate that uses the index key, we

Lookup tuples using the key
evaluate only those tuples with the predicate
take O(K) time, where K tuples match the key

Implementing Join with Nested Loops

A nested loop join on relations R1 (with N domains) and R2 (with M domains), considers all |R1| x |R2| pairs of tuples.

R3= join(R1,Ai,R2,Bj)

for each tuple t in R1 do

for each tuple s in R2 do

if t.Ai = s.Bj then

insert(R3, t.A1, t.A2, …, t.AN,

s.B1, …, s.B(j-1), s.B(j 1), …, s.BM)

This implementation takes time O(|R1|*|R2|).

Index Join

An index join exploits the existence of an index for one of the domains used in the join to find matching tuples more quickly.

R3= join(R1,Ai,R2,Bj)

for each tuple t in R1 do

for each tuple s in R2 at index(t.Ai) do

insert(R3, t.A1, t.A2, …, t.AN,

s.B1, …, s.B(j-1), s.B(j 1), …, s.BM)

We could choose to use an index for R2, and reverse the order of the loops.

The decision on which index to use depends on the number of tuples in each relation.

Sort Join

If we don’t have an index for a domain in the join, we can still improve on the nested-loop join using sort join.

R3= join(R1,Ai,R2,Bj)

Merge the tuples of both relations into a single list
list elements must identify the original relation
Sort the list based on the value in the join domains Ai and Bj
all tuples on the sorted list with a common value for the join domains are consecutive
Pair all (consecutive) tuples from the two relations with the same value in the join domains

Comparison of Join Implementations

Assumptions

Join R1 and R2 (on domain D) producing R3
R1 has i tuples, R2 has j tuples
|R3| = m, 0 <= m <= i * j
Every implementation takes at least time O(m)

Comparison

Nested-loop join takes time O(i * j)
Index join (using R2 index) takes time O(i m)
lookup is O(1) for each tuple in R1
at most O(m) tuples match
Sort join takes time O(m (i j)log(i j))
O(i j) to merge the tuples in R1 and R2
O((i j) log (i j)) to sort the list
O(m) to produce the output (0 <= m <= i*j)

Expressing Queries in Relational Algebra

Relational algebra is an unambiguous notation (or formalism) for expressing queries.

Queries are simply expressions in relational algebra.

Expressions can be manipulated symbolically to produce simpler expressions according to the laws of relational algebra.

Expression simplification is an important query optimization technique, which can affect the running time of queries by an order of magnitude or more.

early “selection” reduces the number of tuples
early “projection” reduces the number of domains

Algebraic Laws for Join

Commutativity (assuming order of columns doesn’t matter) join(R1, Ai, R2, Bj) = join(R2, Bj, R1, Ai)

Nonassociativity join (join(R1, Ai, R2, Bj),Bj,R3,Ck) is not the same as join (R1,Ai,join(R2, Bj, R3, Ck),Bj)

Algebraic Laws for Selection

Commutativity

select(select(R1,P1),P2) = select(select(R1,P2),P1)

Selection pushing

if P contains attributes of R

select(join(R,Ai,S,Bj),P) = join(select(R,P),Ai,S,Bj)

if P contains attributes of S

selection (join(R,Ai,S,Bj),P) = join(R,Ai,select(S,P),Bj)

Selection Splitting (where P = A and B)

select(R,P) = select(select(R,A),B)

select(R,P) = select(select(R,B),A)

Example: Selection Pushing and Splitting

Consider the following 4 relation database

CSG: Course-StudentID-Grade
SNAP: StudentID-Name-Address-Phone
CDH: Course-Day-Hour
CR: Course-Room

Implement the query “Where is Amy at Noon on Monday?”

Let P be (Name=”Amy” and Day=”Monday” and Hour=”Noon”)

We can use a brute-force approach that joins all the data in the relations into a single large relation, selects those tuples that meet the query criteria, and then isolates the answer field using projection.

R1 = join(CSG,SNAP)
R2 = join(R1,CDH)
R3 = join(R2,CR)
R4 = select(R3,P)
R5 = project(R4,Room)

project(select(join(join(join(CSG,SNAP),CDH),CR),P),Room)

Selection Pushing and Splitting (cont)

The selection uses only Name, Day, and Hour attributes (and not Course or Room), so we can push the selection inside the outermost join.

R1 = join(CSG,SNAP)
R2 = join(R1,CDH)
R3 = select(R2,P)
R4 = join(R3,CR)
R5 = project(R4,Room)

We cannot push selection further, because the predicate involves attributes from both operands of the next innermost join (R1,CDH).

We can split the selection into two, one based on Name, and the other based on Day-Hour.

R1 = join(CSG,SNAP)
R2 = join(R1,CDH)
R3 = select(R2,Day=”Monday” and Hour=”Noon”)
R4 = select(R3,Name=”Amy”)
R5 = join(R4,CR)
R6 = project(R5,Room)

Selection Pushing and Splitting (cont 2)

Now we can push the first selection inside the join, since it involves only attributes from the CDH relation.

R1 = join(CSG,SNAP)
R2 = select(CDH,Day=”Monday” and Hour=”Noon”)
R3 = join(R1,R2)
R4 = select(R3,Name=”Amy”)
R5 = join(R4,CR)
R6 = project(R5,Room)

Similarly we can push the second selection inside the preceding join, since it involves only attributes from R1 (ie, Name).

R1 = join(CSG,SNAP)
R2 = select(CDH,Day=”Monday” and Hour=”Noon”)
R3 = select(R1,Name=”Amy”)
R4 = join(R2,R3)
R5 = join(R4,CR)
R6 = project(R5,Room)

Continuing to push the second select inside the first join

R1 = select(SNAP,Name=”Amy”)
R2 = join(CSG,R1)
R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
R4 = join(R2,R3)
R5 = join(R4,CR)
R6 = project(R5,Room)

Algebraic Laws for Projection

Projection pushing

To push a projection operation inside a join requires that the result of the projection contain the attributes used in the join.

project(join(R,Ai,S,Bj),D1,D2,…Dn)

In this case, we know that the domains in the projection will exist in the relation that results from the join.

In performing projection first (on the two join relations)

we should only project on those domains that exist in each of the two relations
we must ensure that the join domains Ai and Bj exist in the resulting two relations Let PDR = {D|D domain in R, D in {D1…Dn}} U Ai

Let PDS = {D|D domain in S, D in {D1…Dn}} U Bi

R1 = project(R,PDR)

R2 = project(S,PDS)

R3 = join(R1,Ai,R2,Bj) = project(join(R,Ai,S,Bj),D1,D2,…Dn)

Example: Projection Pushing

Implement the query “Where is Amy at Noon on Monday?”

R1 = select(SNAP,Name=”Amy”)
R2 = join(CSG,R1)
R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
R4 = join(R2,R3)
R5 = join(R4,CR)
R6 = project(R5,Room)

This approach carries along unnecessary attributes every step of the way.

R1 carries Address and Phone attributes
R4 carries Grade attribute

We use projection pushing to eliminate unnecessary attributes early in the implementation.

R1 = select(SNAP,Name=”Amy”)
R2 = join(CSG,R1)
R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
R4 = join(R2,R3)
R5 = project(CR, Course, Room)
R6 = project(R4, Course)
R7 = join(R5,R6)
R8 = project(R7,Room)

Note that R5 is unnecessary, since the domains in the projection are all the domains of CR.

Projection Pushing (cont)

Implement the query “Where is Amy at Noon on Monday?”

R1 = select(SNAP,Name=”Amy”)
R2 = join(CSG,R1)
R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
R4 = join(R2,R3)
R5 = project(R4, Course)
R6 = join(CR,R5)
R7 = project(R6,Room)

We can continue pushing the projection on Course below the join for R4.

R1 = select(SNAP,Name=”Amy”)
R2 = join(CSG,R1)
R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
R4 = project(R2,Course)
R5 = project(R3,Course)
R6 = join(R4,R5)
R7 = join(CR,R6)
R8 = project(R7,Room)

Projection Pushing (cont2)

We can continue pushing the projection on Course for R4 below the join for R2.

R1 = select(SNAP,Name=”Amy”)\
R2 = project(CSG,Course,StudentID)
R3 = project(R1,StudentID)
R4 = join(R2,R3)
R5 = project(R4,Course)
R6 = select(CDH,Day=”Monday” and Hour=”Noon”)
R7 = project(R6,Course)
R8 = join(R6,R7)
R9 = join(CR,R8)
R10 = project(R9,Room)

A query language is a language in which user requests information from the database. it can be categorized as either procedural or nonprocedural. In a procedural language the user instructs the system to do a sequence of operations on database to compute the desired result. In non procedural language the user describes the desired information without giving a specific procedure for obtaining that information.

The relational algebra is a procedural query language. It consists of a set of operations that take one or two relations as input and produces a new relation as output.

Fundamental Operations

SELECT
PROJECT
UNION
SET DIFFERENCE
CARTESIAN PRODUCT
RENAME

Select and project operations are unary operation as they operate on a single relation.Union, set difference, Cartesian product and rename operations are binary operations as they operate on pairs of relations.

Other Operations

SET INTERSECTION
NATURAL JOIN
DIVISION
ASSIGNMENT

The select operation: – to identify a set of tuples which is a part of a relation and to extract only these tuples out. The select operation selects tuples that satisfy a given predicate or condition.

It is a unary operation defined on a single relation.
It is denoted as σ.

The union operation: – is used when we need some attributes that appear in either or both of the two relations.

It is denoted as U.

example:

Borrower (customer-name, loan-number)

Depositor (customer-name, account-number)

Customer (customer-name, street-number, customer-city)

For a union operation r U s to be valid, two conditions must hold:

The relation r and s must be of the same arity, i.e. they must have the same number of attributes.
The domains of the ith attribute of r and the ith attribute of s must be the same for all i.

The set difference operation: – finds tuples in one relation but not in other.

It is denoted as –

The Cartesian product operation: – allows combining information from two relations.

It is denoted as r X s where r and s are relations.

If relation r has n1 tuples and relation s has n2 tuples then r X s has n1*n2 tuples.

Example:

Borrower (customer-name, loan-number)

Loan (loan-number, branch-name, city, amount)

The rename operation: – used to rename.

It is denoted as ρ.

E : relational algebra expression

ρ x (E): returns the result of expression E under the name x.

ρ x (A1, A2, A3… An) (E): returns the result of expression E under the name x with attributes renamed to A1, A2, A3… An.

The set intersection operation: – finds tuples in both the relations.

It is denoted as ∩.

Example:

Borrower (customer-name, loan-number)

Depositor (customer-name, account-number)

Customer (customer-name, street-number, customer-city)

The natural join operation: – it is a binary operation and a combination of certain selections and a Cartesian product into one operation.

It is denoted as |X| .
It is associative.

It forms a Cartesian product of its two arguments.

Then performs a selection forcing equality on those attributes those appear in both the relations.

And finally removes duplicates attributes.

r(R): r is a relation with attributes R.

s(S): s is a relation with attributes S.

If R ∩ S = Ф i.e. they have no attributes in common then r |X| s = r X s

The division / quotient operation: –

It is denoted as ÷. Letr(R) and s(S) be relations

r ÷ s: – the result consists of the restrictions of tuples in r to the attribute names unique to R, i.e. in the Header of r but not in the Header of s, for which it holds that all their combinations with tuples in s are present in r.

Extended Relational Algebra Operations

GENERALIZED PROJECTION: – It extends the projection operation by allowing arithmetic functions to be used in projection list.

Π F1,F2 … Fn (E)

Where E: relational algebra expression

Fi: arithmetic expression

AGGREGATE FUNCTION:-It takes a collection of values and returns a single value as a result.

Limitations Of Relational Algebra

Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations which cannot be expressed by relational algebra. The transitive closure of a binary relation is one of them.

Relational Algebra Implemented In SQL

SQL (Structured query Language) is the most popular computer language used to create, modify, retrieve data from relational database management system.The basic structure of an SQL expression consists of three clauses:

SELECT: – This clause corresponds to the projection operation of the relational algebra. It is used to list the attributes of the result of a query.

FROM: -It corresponds to the Cartesian product operation of the relational algebra. It lists the relations scanned in the evaluation of an expression.

WHERE: – This clause corresponds to selection predicate of relational algebra. It consists of a predicate involving attributes of the relations that appear in the FROM clause.

SQL QUERY FORM:

Select A1, A2….An

From r1, r2…rm

Where P

Ai : attribute

Ri : relation

P : predicate

SELECT clause- specifies the table columns retrieved.

FROM clause- specifies the tables to be accessed.

WHERE clause- which rows in the FROM tables to use.

Joining Tables

The FROM clause allows more than 1 table in its list. The rows from one table must be correlated with the rows of the others. This correlation is known as joining.

Set Operations

UNION, INTERSECT and EXCEPT operations can be done in SQL corresponding to their operations U, ∩ and – in relational algebra only if the domains of the attributes of the relations match and the relations have same arity i.e same number of attributes.