20 Introduction to relation databases
Hiren Joshi
Introduction
Though, a file can be used to read and write data, the flat file has some limitations like
- When a file size is increased, working with flat file can be very slow
- Searching for a specific record(s) is difficult in flat file as it does not provide querying facility.
- Managing simultaneous access of the records is problematic as though the lock mechanism is used for locking a file, it is not sufficient for managing concurrent access.
- The file processing is sequential, so insert a new record or delete an existing record the process makes large overhead, especially in a large file.
- File processing provides only file permission which is not sufficient to enforce different levels of access to data.
The Relational Database Management System (RDBMS) helps to solve all of the above issues.
- RDBMSs can provide much faster access to data compare to the flat file.
- RDBMSs can be queried to extract specific record or set of records.
- RDBMSs have built-in mechanism for simultaneous access of the records.
- RDBMSs provide random access of the data.
- RDBMSs have built-in privilege systems which facilitates different levels of access to the data.
Relational Database System Concept
Dr. E.F.Codd developed relation database model. Relational database eliminated some of the problems which are associated with standard file and other database. By using RDBMS , data redundancy can be reduced which helps to save disk storage and leads to efficient data retrieval. Today, Relational database is the de facto standard for database application.
Relational databases are the most commonly used type of databases. Relation databases are developed on the sound theoretical basis in relational algebra. It is not mandatory to understand relation algebra theory to use relational databases.
Let us understand basic relational database concept.
Table
Relational database is consists of relations. A relation is also called as tables. There are differences between relation and table, though these terms are used interchangeably. A table is structured made up of rows and columns. The following figure shows a sample table. This table stores information of customers.
Customer
CustomerID | Name | Address | City |
1 | Anand Mehta | 19- Rajni Society | Ahmedabad |
2 | Bhavesh Rajvir | 25- Ravi Park | Rajkot |
3 | Chetan Pandya | 22-Panchvati Society | Ahmedabad |
4 | Dharmarth Shah | 12 ABC Park | Ahmedabad |
The table name is: Customer. Customer table has 4 columns which represent different piece of data and 4 rows that correspondent to individual customer.
Column
Each column in a table represent different piece of data. For example, in the above customer table has 4 columns namely CustomerID,Name,Address and City. Every column in a table must have unique name. Each column is associated with a precise data type. For instance, in the above customer table CustomerID column has integer data type while Name, Address and city has string data type. Columns are also known as fields or attributes.
Row
Each row in the table represents a record. For example, in the customer table there are 4 customers. It is not compulsory that a table must have same number of columns and rows. In the above customer table to demonstrate the concept of rows and columns there are 4 columns and 4 rows. Rows are also known as records or tuples.
Values
Each row consists of a collection of individual values. The intersection of column and row is known as cell. Each cell represents a single value. Each value must have the data type as specified by its corresponding column.
Primary Key
To uniquely identify a record in a table, the attribute or set of attribute is used. This attribute is known as primary key. In case the primary key consist of set of attributes, it is known as composite primary key. For example, in the customer table customerID is defined as primary key. The reason for defining customerID is a primary key is that names may be repeated for customers. You can be defined composite primary key consist of – Name, Address, City -attributes.
Primary key is used to main entity integrated for a table. Entity integrity ensures that the attribute must be unique within a table and cannot be null.
The table has primary key defined is referred as master table or parent table.
Foreign Key
The database is relational database because there is a relation can be set between tables of the database. The relationship between tables is defined by using foreign key. CustomerID is a primary key for customer table. In Order table customerID can be defined as foreign key.
Foreign key is used to provide referential integrity between tables. Referential integrity means an attribute can take only values available in parent table. For instance, considered we have an order table. Then in order table only those customer can be placed order which are available in customer table.
The table in which foreign key is defined is known as child table or detail table.
The following figure depicts the relationship between customer and order table.
Customer
A table to qualify as a relation must have following characteristics.
I. A table is recognized as 2-Dimensional structure combined of rows and columns.
II. A table must have a unique name.
III. Each column name must be unique name within the table.
IV. Each row-column intersection represent a single data value (not multivalued, not composed)
V. Each column has a specific range of values known as the attribute domain
VI. The order rows and columns are not significant within a table.
VII. Each row with the table must be unique. That means there cannot be two rows with exactly the same values for all their columns.
Schemas
The table design having attribute name, primary key and foreign key attribute is known as schema of a table. The complete set of table schemas is considered as database schema.
Primary key is differentiated using underline.
The schema for customer table is shown below
Customer(CustomerID, Name, Address, City)
Relationships
A relationship in a database table can be assigned. Hence this database is known as relation database. Foreign key is used to set relationship between data of two tables. For example, in the order and customer table customerid set relationship between order table and customer table.
There are 3 types of relationship between two tables. These relationships are classified based on the records on each side of relation.
1. One-to-one relationship
2. One-to-many relationship
3. Many-to-many relationship
One-to-one relationship: A one-to-one relationship make sure that the record from one table to another matches only one. For example, the relationship between bank AccountNumber and bank customer is one-to-one. Each accountnumber is associated with only one customer.
One-to-many relationship: A one-to-many relationship allows record from one table to matches many records from another table. For example, the relationship between bank customer and accountnumber is one-to-many as one customer can have many bank-accounts. Many –to-one relationship is another type of relation which is the same as one-to-many relationship with only difference in direction.
Many-to-many relationship: In many-to-many relationship, records from one table matches records to another table. For example, the relationship between author and book is many-to-many as one book can have many authors and one author can write many books.
Database Design
Database is generally designed by database administrator (DBA) or database design specialist. Generally, DBA or database specialist design database for large database which has thousands of users. For small and medium size web sites, the database for the website is often design by the web programmer.
Database design affects to use SQL to work with database. In general, a well-designed database is easy to understand and query, while a poorly designed database is difficult to work with.
A relational database system should model the real-world environment where it is used. The job a designer is to analyze and to map the real world environment into relational database system. An entity or object in real world is represented as a table in relational database. The attributes of real world entity is become attributes of table and each row of a table represents one instance of the entity. In simple way, you can say that a relational table is an entity set and a record of a relation table is an entity. Though in spoken world, entity is also used to represent entity set.
To model a database and the relationship between relations after a real world system, ER (Entity-Relationship) modeling is used.
The six basic steps for designing data structure of database
1. Recognize data elements
2. Subdivide data elements into smallest useful elements
3. Identify tables and assign the columns
4. Find the primary key and foreign key
5. Check whether the data structure is normalized
6. Identify the indexes
1. Identify data elements
Data elements are identified by many ways. It is depend on the nature of the system. It includes analyzing existing system if it is available, interviewing users, evaluating comparable systems. The documents used by a real world system (i.e. invoice) is helped to identify the data elements of the system. Once data elements are identified begin to think about the entities which are associated with these elements. This will help to identify tables of database.
2. Subdivide data elements into smallest useful elements
If a data element consists of two or more component, then it should be subdividing the element into those components. At what extent to subdivide the data element depends on how it will be used. Because it is difficult to predict all the future uses for the data, most designers subdivide data elements as much as possible. For example, the attribute name : Tejas Trivedi consist of firstname Tejas and lastname Trivedi, It is subdivided into two attributes(fields) named firstname and lastname. The data element can be easily rebuild by concatenating subdivided data elements.
3. Identify tables and assign the columns
After the identification and subdivision of all data elements of database group them by entities with which they are associated. These entities will later become tables of the database and the data elements will become attributes of a table. In case a data element relates to more than one entity, it can be included for all the entities it relates to. The duplicates data elements can be removed when the database is normalized. At this step, elements can be excluded that are not required and any additional element can be added.
4. Identify the primary key and foreign key
Each table should have a primary key. If possible, an existing column should be used as primary key. In case, an existing column is not suitable to define primary key, an ID column can be defined as primary key which is auto increment by 1 for each record. Keep in mind that the value of primary key is not changed or seldom changed.
If two tables have one-to-one relationship, they should be related by their primary keys.
If two tables have one-to-many relationship, a foreign key is defined to the many side of table. The foreign key must have the same data type as the primary key column which is related with.
If two tables have many-to-many relationship, a third new table (linking table) is required to relate these two tables. So each table of many-to-many relationship will have a one-to-many relationship with the linking table. The linking table is consist of primary key of two tables. Generally, linking table does not have a primary key as it is only used to relate the two tables.
5. Review whether the data structure is normalized
Normalization is a process to reduce data redundancy and anomalies. There are various normal forms (NFs) named – First Normal Form (1NF), Second Normal Form (2NF), Third Normal Form (3NF), Forth Normal Form (4NF) , Fifth Normal Form (5NF), Domain Key Normal Form(DKNF).
2NF is better than 1NF. 3NF is better than 2NF, BCNF is better than 3NF and so on.
For the most business database design process, 3NF is sufficient. The highest level normal form is not always most desirable.
Keys and Functional Dependency
Key: The key is an attribute or set of attributes which determine other attribute or attributes. If you know the value of attribute A, then you can determine the value of attribute B.
Functional Dependency: The value of attribute or attributes determines the value of another attribute or attributes. If the value of attribute A determines the value of attribute B, then it is said as A->B (read as A functionally determines B). In that case, A (the left hand side attribute or attributes) is known as determinant or functional determinant and B (the right hand side attribute or attributes) is known as dependent or functional dependent.
The following table shows characteristics for each normal-form.
The table is not in normal form, and then it is considered as in 0th normal form.
To convert the above relation into 1 NF,
I. there are not any multivalued attributes and
II. Every attribute value is atomic
III. Define Primary key
So, let us make the table as below.
The primary key of the above relation is (orderID, ProductID)
Functional dependencies for the relation in 1 NF.
OrderID,ProductID -> OrderDate, CustomerID, CustomerName, CustomerAddress
ProductDescription, ProductFinish, UnitPrice ,OrderedQuantity
If all the relations of a database is in 1NF, then the database is called in 1NF.
There are anomalies in 1NF. The anomalies are I. insertion anomaly II. Update anomaly and III. Delete anomaly.
Let us understand these anomalies using an example.
The above relation has composite primary key (orderID, ProductID).
New product cannot be added without order. So, there is an insertion anomaly.
If Dining table delete from orderID 1611, then information related to the product finish and unit price is also lost. So, there is delete anomaly.
If the price of productid 4 is changed, then it is required to be updated in multiple records.
So, there is an update anomaly.
Let us note down functional dependencies for the relation in 1 NF.
OrderID,ProductID -> OrderDate, CustomerID, CustomerName, CustomerAddress ProductDescription, ProductFinish, UnitPrice ,OrderedQuantity
Relation Name: Order_customer_product
As all the relation are in 1NF, the database is in 1NF.
Now to normalize table into 2 NF, there must be fully functional dependencies.
Fully functional dependency: If attribute B is functionally dependent on a key A but not on any subset of key A, then attribute B is fully functionally dependent on attribute A. This generally happens when attribute A is composite key.
In the above relation , there is a partial functional dependency for functional dependnecy
OrderID,ProductID -> OrderDate, CustomerID, CustomerName, CustomerAddress ProductDescription, ProductFinish, UnitPrice ,OrderedQuantity
To determine OrderDate, CustomerID, CustomerName, CustomerAddress only OrderID is required. There is no need for productID.
To determine ProductDescription, ProductFinish and UnitPrice only productID is required.
There is no need for orderID.
So it is not fully functional dependency. Following attribute shows functional dependency
OrderID -> OrderDate, CustomerID, CustomerName, CustomerAddress
ProductID -> ProductDescription, ProductFinish, UnitPrice
OrderID, ProductID -> OrderedQuantity
All the relation are in 2NF, the database is in 2NF.
Now to normalize table into 3NF, there must not be transitive functional dependencies.
Transitive functional dependency: If attribute A is functionally determine attribute B, and attribute B functionally determine attribute C, then attribute C is transitively functional dependent on attribute A.
In the above relation , there is a transitive functional dependency for
OrderID -> OrderDate, CustomerID, CustomerName, CustomerAddress
The reason is OrderID -> CustomerID and
CustomerID -> CustomerName, CustomerAddress
So, the relations are in 3NF.
In the order relation customerID is foreign key referencing customerID attributeof Customer relation.
To summarize the original relation is converted into following relations to normalize upto 3NF.
ProductID -> ProductDescription, ProductFinish, UnitPrice
CustomerID -> CustomerName, CustomerAddress
OrderID -> orderdate, CustomerID
OrderID, ProductID -> OrderedQuantity
- Identify the indexes
The index should be created for
- When the column is Foreign key
- When the column is used frequently in search condition or join
- When the column has large number of distinct values
- When the column is not updated frequently.
- An index is a structure which provides for location one or more rows directly. Without index, the entire table has to scan for matching criteria. In case, index is not created then to search particularly records may take more time.
- By Default MySQL create index for primary key and unique constraint.
- A composite index can be created which consist of two or more columns. Composite index is created for columns which are not updated frequently or when the index will cover almost every search condition on the table.
you can view video on Introduction to relation databases |
References:
1. Luke Welling, Laura Thomson: PHP and MySQL Web Development, Pearson,
2. W. Jason Gilmore: Beginning PHP and MySQL 5 From Novice to Professional, Apress
3. Elizabeth Naramore, Jason Gerner, Yann Le Scouarnec, Jeremy Stolz, Michael K. Glass:Beginning PHP5, Apache, and MySQL Web Development, Wrox,
4. Robin Nixon: Learning PHP, MySQL, and JavaScript, O’Reilly Media
5. Ed Lecky-Thompson, Heow Eide-Goodman, Steven D. Nowicki, Alec Cove: Professional PHP,Wrox
6. Tim Converse, Joyce Park, Clark Morgan: PHP5 and MySQL Bible
7. Joel Murach, Ray Harris: Murach’s PHP and MySQL, Shroff/Murach
8. Ivan Bayross, Web Enabled Commercial Application Development Using HTML/Javascript/DHTML/PHP , BPB Publications
9. Joel Murach, “Murach’s MySQL”, Shroff/Murach
10. Julie C. Meloni, Sams Teach Yourself PHP, MySQL and Apache All in One, Sams
11. Larry Ullman, PHP and MySQL for Dynamic Web Sites: Visual QuickPro Guide, Pearson Education
12. http://www.php.net/
13. http://www.w3schools.com/
14. http://www.tutorialspoint.com/