5 Relational Algebra
Dr R. Baskaran
Relational Algebra
An algebra is a formal structure consisting of sets and operations on those sets.
Relational algebra is a formal system for manipulating relations.
- Operands of this algebra are relations.
- Operations of this algebra include the usual set operations (since relations are sets of tuples), and special operations defined for relations selection
- projection
- join
A predicate is a boolean expression whose operators are the logical connectives (and, or, not) and arithmetic comparisons (LT, LE, GT, GE, EQ, NE), an
Implementing Set Operations
To implement R1 U R2 (while eliminating duplicates) we can
- sort R1 in O(N log N)
- sort R2 in O(M log M)
- merge R1 and R2 in O(N M)
If we allow duplicates in union (and remove them later) we can
- copy R1 to R3 in O(N)
- insert R2 in R3 in O(M)
If we have an index and don’t want duplicates we can
- copy R1 to R3 in O(N)
- for each tuple in R2 (which is O(M))
- use index to lookup tuples in R1 with the same index value O(1)
- if R2 tuple equals some such R1 tuple, don’t add R2 tuple to R3
Intersection and set difference have corresponding implementations.
Implementing Projection
To implement projection we must
- process every tuple in the relation
- remove any duplicates that result To avoid duplicates we can
- sort the result and remove consecutive tuples that are equal
- requires time O(N log N) where N is the size of the original relation
- implement the result as a set
- set insertion guarantees no duplicates
- by using a hash table, insertion is O(1), so projection is O(N)
Implementing Selection
In the absence of an index we
- apply the predicate to every tuple in the relation
- insert matches in the resulting relation
- duplicates can’t occur
- take O(N) time
Given an index, and a predicate that uses the index key, we
- Lookup tuples using the key
- evaluate only those tuples with the predicate
- take O(K) time, where K tuples match the key
Implementing Join with Nested Loops
A nested loop join on relations R1 (with N domains) and R2 (with M domains), considers all |R1| x |R2| pairs of tuples.
R3= join(R1,Ai,R2,Bj)
for each tuple t in R1 do
for each tuple s in R2 do
if t.Ai = s.Bj then
insert(R3, t.A1, t.A2, …, t.AN,
s.B1, …, s.B(j-1), s.B(j 1), …, s.BM)
This implementation takes time O(|R1|*|R2|).
Index Join
An index join exploits the existence of an index for one of the domains used in the join to find matching tuples more quickly.
R3= join(R1,Ai,R2,Bj)
for each tuple t in R1 do
for each tuple s in R2 at index(t.Ai) do
insert(R3, t.A1, t.A2, …, t.AN,
s.B1, …, s.B(j-1), s.B(j 1), …, s.BM)
We could choose to use an index for R2, and reverse the order of the loops.
The decision on which index to use depends on the number of tuples in each relation.
Sort Join
If we don’t have an index for a domain in the join, we can still improve on the nested-loop join using sort join.
R3= join(R1,Ai,R2,Bj)
- Merge the tuples of both relations into a single list
- list elements must identify the original relation
- Sort the list based on the value in the join domains Ai and Bj
- all tuples on the sorted list with a common value for the join domains are consecutive
- Pair all (consecutive) tuples from the two relations with the same value in the join domains
Comparison of Join Implementations
Assumptions
- Join R1 and R2 (on domain D) producing R3
- R1 has i tuples, R2 has j tuples
- |R3| = m, 0 <= m <= i * j
- Every implementation takes at least time O(m)
Comparison
- Nested-loop join takes time O(i * j)
- Index join (using R2 index) takes time O(i m)
- lookup is O(1) for each tuple in R1
- at most O(m) tuples match
- Sort join takes time O(m (i j)log(i j))
- O(i j) to merge the tuples in R1 and R2
- O((i j) log (i j)) to sort the list
- O(m) to produce the output (0 <= m <= i*j)
Expressing Queries in Relational Algebra
Relational algebra is an unambiguous notation (or formalism) for expressing queries.
Queries are simply expressions in relational algebra.
Expressions can be manipulated symbolically to produce simpler expressions according to the laws of relational algebra.
Expression simplification is an important query optimization technique, which can affect the running time of queries by an order of magnitude or more.
- early “selection” reduces the number of tuples
- early “projection” reduces the number of domains
Algebraic Laws for Join
Commutativity (assuming order of columns doesn’t matter) join(R1, Ai, R2, Bj) = join(R2, Bj, R1, Ai)
Nonassociativity join (join(R1, Ai, R2, Bj),Bj,R3,Ck) is not the same as join (R1,Ai,join(R2, Bj, R3, Ck),Bj)
Algebraic Laws for Selection
Commutativity
select(select(R1,P1),P2) = select(select(R1,P2),P1)
Selection pushing
- if P contains attributes of R
select(join(R,Ai,S,Bj),P) = join(select(R,P),Ai,S,Bj)
- if P contains attributes of S
selection (join(R,Ai,S,Bj),P) = join(R,Ai,select(S,P),Bj)
Selection Splitting (where P = A and B)
select(R,P) = select(select(R,A),B)
select(R,P) = select(select(R,B),A)
Example: Selection Pushing and Splitting
Consider the following 4 relation database
- CSG: Course-StudentID-Grade
- SNAP: StudentID-Name-Address-Phone
- CDH: Course-Day-Hour
- CR: Course-Room
Implement the query “Where is Amy at Noon on Monday?”
Let P be (Name=”Amy” and Day=”Monday” and Hour=”Noon”)
We can use a brute-force approach that joins all the data in the relations into a single large relation, selects those tuples that meet the query criteria, and then isolates the answer field using projection.
- R1 = join(CSG,SNAP)
- R2 = join(R1,CDH)
- R3 = join(R2,CR)
- R4 = select(R3,P)
- R5 = project(R4,Room)
project(select(join(join(join(CSG,SNAP),CDH),CR),P),Room)
Selection Pushing and Splitting (cont)
The selection uses only Name, Day, and Hour attributes (and not Course or Room), so we can push the selection inside the outermost join.
- R1 = join(CSG,SNAP)
- R2 = join(R1,CDH)
- R3 = select(R2,P)
- R4 = join(R3,CR)
- R5 = project(R4,Room)
We cannot push selection further, because the predicate involves attributes from both operands of the next innermost join (R1,CDH).
We can split the selection into two, one based on Name, and the other based on Day-Hour.
- R1 = join(CSG,SNAP)
- R2 = join(R1,CDH)
- R3 = select(R2,Day=”Monday” and Hour=”Noon”)
- R4 = select(R3,Name=”Amy”)
- R5 = join(R4,CR)
- R6 = project(R5,Room)
Selection Pushing and Splitting (cont 2)
Now we can push the first selection inside the join, since it involves only attributes from the CDH relation.
- R1 = join(CSG,SNAP)
- R2 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R3 = join(R1,R2)
- R4 = select(R3,Name=”Amy”)
- R5 = join(R4,CR)
- R6 = project(R5,Room)
Similarly we can push the second selection inside the preceding join, since it involves only attributes from R1 (ie, Name).
- R1 = join(CSG,SNAP)
- R2 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R3 = select(R1,Name=”Amy”)
- R4 = join(R2,R3)
- R5 = join(R4,CR)
- R6 = project(R5,Room)
Continuing to push the second select inside the first join
- R1 = select(SNAP,Name=”Amy”)
- R2 = join(CSG,R1)
- R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R4 = join(R2,R3)
- R5 = join(R4,CR)
- R6 = project(R5,Room)
Algebraic Laws for Projection
Projection pushing
To push a projection operation inside a join requires that the result of the projection contain the attributes used in the join.
project(join(R,Ai,S,Bj),D1,D2,…Dn)
In this case, we know that the domains in the projection will exist in the relation that results from the join.
In performing projection first (on the two join relations)
- we should only project on those domains that exist in each of the two relations
- we must ensure that the join domains Ai and Bj exist in the resulting two relations Let PDR = {D|D domain in R, D in {D1…Dn}} U Ai
Let PDS = {D|D domain in S, D in {D1…Dn}} U Bi
R1 = project(R,PDR)
R2 = project(S,PDS)
R3 = join(R1,Ai,R2,Bj) = project(join(R,Ai,S,Bj),D1,D2,…Dn)
Example: Projection Pushing
Implement the query “Where is Amy at Noon on Monday?”
- R1 = select(SNAP,Name=”Amy”)
- R2 = join(CSG,R1)
- R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R4 = join(R2,R3)
- R5 = join(R4,CR)
- R6 = project(R5,Room)
This approach carries along unnecessary attributes every step of the way.
- R1 carries Address and Phone attributes
- R4 carries Grade attribute
We use projection pushing to eliminate unnecessary attributes early in the implementation.
- R1 = select(SNAP,Name=”Amy”)
- R2 = join(CSG,R1)
- R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R4 = join(R2,R3)
- R5 = project(CR, Course, Room)
- R6 = project(R4, Course)
- R7 = join(R5,R6)
- R8 = project(R7,Room)
Note that R5 is unnecessary, since the domains in the projection are all the domains of CR.
Projection Pushing (cont)
Implement the query “Where is Amy at Noon on Monday?”
- R1 = select(SNAP,Name=”Amy”)
- R2 = join(CSG,R1)
- R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R4 = join(R2,R3)
- R5 = project(R4, Course)
- R6 = join(CR,R5)
- R7 = project(R6,Room)
We can continue pushing the projection on Course below the join for R4.
- R1 = select(SNAP,Name=”Amy”)
- R2 = join(CSG,R1)
- R3 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R4 = project(R2,Course)
- R5 = project(R3,Course)
- R6 = join(R4,R5)
- R7 = join(CR,R6)
- R8 = project(R7,Room)
Projection Pushing (cont2)
We can continue pushing the projection on Course for R4 below the join for R2.
- R1 = select(SNAP,Name=”Amy”)\
- R2 = project(CSG,Course,StudentID)
- R3 = project(R1,StudentID)
- R4 = join(R2,R3)
- R5 = project(R4,Course)
- R6 = select(CDH,Day=”Monday” and Hour=”Noon”)
- R7 = project(R6,Course)
- R8 = join(R6,R7)
- R9 = join(CR,R8)
- R10 = project(R9,Room)
A query language is a language in which user requests information from the database. it can be categorized as either procedural or nonprocedural. In a procedural language the user instructs the system to do a sequence of operations on database to compute the desired result. In non procedural language the user describes the desired information without giving a specific procedure for obtaining that information.
The relational algebra is a procedural query language. It consists of a set of operations that take one or two relations as input and produces a new relation as output.
Fundamental Operations
- SELECT
- PROJECT
- UNION
- SET DIFFERENCE
- CARTESIAN PRODUCT
- RENAME
Select and project operations are unary operation as they operate on a single relation.Union, set difference, Cartesian product and rename operations are binary operations as they operate on pairs of relations.
Other Operations
- SET INTERSECTION
- NATURAL JOIN
- DIVISION
- ASSIGNMENT
The select operation: – to identify a set of tuples which is a part of a relation and to extract only these tuples out. The select operation selects tuples that satisfy a given predicate or condition.
- It is a unary operation defined on a single relation.
- It is denoted as σ.
The union operation: – is used when we need some attributes that appear in either or both of the two relations.
- It is denoted as U.
example:
Borrower (customer-name, loan-number)
Depositor (customer-name, account-number)
Customer (customer-name, street-number, customer-city)
For a union operation r U s to be valid, two conditions must hold:
- The relation r and s must be of the same arity, i.e. they must have the same number of attributes.
- The domains of the ith attribute of r and the ith attribute of s must be the same for all i.
The set difference operation: – finds tuples in one relation but not in other.
- It is denoted as –
The Cartesian product operation: – allows combining information from two relations.
- It is denoted as r X s where r and s are relations.
If relation r has n1 tuples and relation s has n2 tuples then r X s has n1*n2 tuples.
Example:
Borrower (customer-name, loan-number)
Loan (loan-number, branch-name, city, amount)
The rename operation: – used to rename.
- It is denoted as ρ.
E : relational algebra expression
ρ x (E): returns the result of expression E under the name x.
ρ x (A1, A2, A3… An) (E): returns the result of expression E under the name x with attributes renamed to A1, A2, A3… An.
The set intersection operation: – finds tuples in both the relations.
- It is denoted as ∩.
Example:
Borrower (customer-name, loan-number)
Depositor (customer-name, account-number)
Customer (customer-name, street-number, customer-city)
The natural join operation: – it is a binary operation and a combination of certain selections and a Cartesian product into one operation.
- It is denoted as |X| .
- It is associative.
It forms a Cartesian product of its two arguments.
Then performs a selection forcing equality on those attributes those appear in both the relations.
And finally removes duplicates attributes.
r(R): r is a relation with attributes R.
s(S): s is a relation with attributes S.
If R ∩ S = Ф i.e. they have no attributes in common then r |X| s = r X s
The division / quotient operation: –
- It is denoted as ÷. Letr(R) and s(S) be relations
r ÷ s: – the result consists of the restrictions of tuples in r to the attribute names unique to R, i.e. in the Header of r but not in the Header of s, for which it holds that all their combinations with tuples in s are present in r.
Extended Relational Algebra Operations
GENERALIZED PROJECTION: – It extends the projection operation by allowing arithmetic functions to be used in projection list.
Π F1,F2 … Fn (E)
Where E: relational algebra expression
Fi: arithmetic expression
AGGREGATE FUNCTION:-It takes a collection of values and returns a single value as a result.
Limitations Of Relational Algebra
Although relational algebra seems powerful enough for most practical purposes, there are some simple and natural operators on relations which cannot be expressed by relational algebra. The transitive closure of a binary relation is one of them.
Relational Algebra Implemented In SQL
SQL (Structured query Language) is the most popular computer language used to create, modify, retrieve data from relational database management system.The basic structure of an SQL expression consists of three clauses:
SELECT: – This clause corresponds to the projection operation of the relational algebra. It is used to list the attributes of the result of a query.
FROM: -It corresponds to the Cartesian product operation of the relational algebra. It lists the relations scanned in the evaluation of an expression.
WHERE: – This clause corresponds to selection predicate of relational algebra. It consists of a predicate involving attributes of the relations that appear in the FROM clause.
SQL QUERY FORM:
Select A1, A2….An
From r1, r2…rm
Where P
Ai : attribute
Ri : relation
P : predicate
SELECT clause- specifies the table columns retrieved.
FROM clause- specifies the tables to be accessed.
WHERE clause- which rows in the FROM tables to use.
Joining Tables
The FROM clause allows more than 1 table in its list. The rows from one table must be correlated with the rows of the others. This correlation is known as joining.
Set Operations
UNION, INTERSECT and EXCEPT operations can be done in SQL corresponding to their operations U, ∩ and – in relational algebra only if the domains of the attributes of the relations match and the relations have same arity i.e same number of attributes.