2 Introduction

1. MODULE

This module deals with the basic concepts of data management systems.

2. INTRODUCTION

This module explains the various data languages. Then it deals with the definition of database views. The benefits of database views are explained. Then the data models are explained in detail.

3. LEARNING OUTCOME

The outcome of this module is to explain the basic concepts of database management systems and clarify the misconceptions about the various topics in the subject.

4. LANGUAGES

The languages involved in database management systems are

(i) Data Definition Language

(ii) Data Manipulation Language

(iii) Data Control Language

4.1 DATA DEFINITION LANGUAGE

It is a language used to define the various structure of various database objects such as tables, views, schemas, indexes, etc. The common DDL commands are CREATE, ALTER and DROP.It permits the specification of data types. structures and any data constraints. All these specifications are stored in the database.DDL statements are used to build and modify the structure of your tables and other objects in the database. When you execute a DDL statement, it takes effect immediately.

• The create table statement does exactly that:

CREATE TABLE <table name> (

<attribute name 1> <data type 1>,

…

<attribute name n> <data type n>);

The data types that you will use most frequently are character strings, which might be called VARCHAR or CHAR for variable or fixed length strings; numeric types such as NUMBER or INTEGER, which will usually specify a precision; and DATE or related types. Data type syntax is variable from system to system; the only way to be sure is to consult the documentation for your own software.

• The alter table statement may be used as you have seen to specify primary and foreign key constraints, as well as to make other modifications to the table structure. Key constraints may also be specified in the CREATE TABLE statement.

ALTER TABLE <table name>

ADD CONSTRAINT <constraint name> PRIMARY KEY (<attribute list>);

You get to specify the constraint name. Get used to following a convention of tablename_pk (for example, Customers_pk), so you can remember what you did later. The attribute list contains the one or more attributes that form this PK; if more than one, the names are separated by commas.

• The foreign key constraint is a bit more complicated, since we have to specify both the FK attributes in this (child) table, and the PK attributes that they link to in the parent table.

ALTER TABLE <table name>

ADD CONSTRAINT <constraint name> FOREIGN KEY (<attribute list>)

REFERENCES <parent table name> (<attribute list>);

Name the constraint in the form childtable_parenttable_fk (for example, Orders_Customers_fk). If there is more than one attribute in the FK, all of them must be included (with commas between) in both the FK attribute list and the REFERENCES (parent table) attribute list.You need a separate foreign key definition for each relationship in which this table is the child.

• If you totally mess things up and want to start over, you can always get rid of any object you’ve created with a drop statement. The syntax is different for tables and constraints.

DROP TABLE <table name>;

ALTER TABLE <table name>

DROP CONSTRAINT <constraint name>;

This is where consistent constraint naming comes in handy, so you can just remember the PK or FK name rather than remembering the syntax for looking up the names in another table. The DROP TABLE statement gets rid of its own PK constraint, but won’t work until you separately drop any FK constraints (or child tables) that refer to this one. It also gets rid of all data that was contained in the table—and it doesn’t even ask you if you really want to do this!

• All of the information about objects in your schema is contained, not surprisingly, in a set of tables that is called the data dictionary. There are hundreds of these tables most database systems, but all of them will allow you to see information about your own tables, in many cases with a graphical interface. How you do this is entirely system-dependent.

4.2 DATA MANIPULATION LANGUAGE

It deals with the general enquiry facility of the data. DML statements are used to work with the data in tables. When you are connected to most multi-user databases (whether in a client program or by a connection from a Web page script), you are in effect working with a private copy of your tables that can’t be seen by anyone else until you are finished (or tell the system that you are finished). You have already seen the SELECT statement; it is considered to be part of DML even though it just retreives data rather than modifying it.

• The insert statement is used, obviously, to add new rows to a table.

INSERT INTO <table name>

VALUES (<value 1>, … <value n>);

The comma-delimited list of values must match the table structure exactly in the number of attributes and the data type of each attribute. Character type values are always enclosed in single quotes; number values are never in quotes; date values are often (but not always) in the format ‘yyyy-mm-dd’ (for example, ‘2006-11-30’).

Yes, you will need a separate INSERT statement for every row.

• The update statement is used to change values that are already in a table.

UPDATE <table name>

SET <attribute> = <expression>

WHERE <condition>;

The update expression can be a constant, any computed value, or even the result of a SELECT statement that returns a single row and a single column. If the WHERE clause is omitted, then the specified attribute is set to the same value in every row of the table (which is usually not what you want to do). You can also set multiple attribute values at the same time with a comma-delimited list of attribute=expression pairs.

• The delete statement does just that, for rows in a table.

DELETE FROM <table name>

WHERE <condition>;

If the WHERE clause is omitted, then every row of the table is deleted (which again is usually not what you want to do)—and again, you will not get a ―do you really want to do this?‖ message.

• If you are using a large multi-user system, you may need to make your DML changes visible to the rest of the users of the database. Although this might be done automatically when you log out, you could also just type:

COMMIT;

• If you’ve messed up your changes in this type of system, and want to restore your private copy of the database to the way it was before you started (this only works if you haven’t already typed COMMIT), just type:

ROLLBACK;

Although single-user systems don’t support commit and rollback statements, they are used in large systems to control transactions, which are sequences of changes to the database. Transactions are frequently covered in more advanced courses.

SELF CHECK EXERCISE

State whether the following statement is true or false.

Procedural DML allows user to tell system exactly how to manipulate data.

Answer : True.

Explanation : Procedural DML describes how the manipulation is to be done.

4.3 DATA CONTROL LANGUAGE

A controlled access to database may include

– a security system

– an integrity system

– a concurrency control system

– a recovery control system

– user accessible catalog

5. DATABASE VIEWS

The following unit deals with the database views.

5.1 VIEWS

Allows each user to have his or her own view of the database.

A view is essentially some subset of the database.

5.2 BENEFITS

Reduce complexity

Provide a level of security

Provide a mechanism to customize the appearance of the database

Present a consistent, unchanging picture of the structure of the database, even if the underlying database is changed.

6. COMPONENTS OF DBMS

Hardware : can range from a PC to a network of computers.

Software : DBMS, operating system, network software (if necessary) and also the application programs.

Data – used by the organization and a description of this data called the schema.

Procedures – instructions and rules that should be applied to the design and use of the database and DBMS.

People – database administrator, database programmer, database user

7. ROLES

Data Administrator(DA)

Database Administrator(DBA) – where exactly to store

Database Designers(Logical and Physical) – structure of the database

Application Programmers – merge the differences between the exggternal and internal level

End users(naive and sophisticated) – never exposed to this kind of system, to who know the query language.

8. DATABASE MODELS

We have to talk about the evolution of database.

8.1 INFORMATION MODEL

Network (CODASYL) – 1970s

Hierarchical (IMS) – Late 1960 and 1970s

Relational – 1970s and early 1980s

Entity – Relationship – 1970s

Extended relational – 1980s

Semantic 1970s and early 1980s

Object oriented – Late 1980’s and early 1990’s

Object relational – Late 1980’s and early 1990’s

Semi structured – Late 1990’s

SELF CHECK EXERCISE

State whether the following statement is true or false.

Categories of data model does not include object based.

Answer : False.

Explanation : Object based, record based, physical are some of the categories of data model.

8.1.1 NETWORK MODEL

The popularity of the network data model coincided with the popularity of the hierarchical data model. Some data were more naturally modeled with more than one parent per child. So, the network model permitted the modeling of many-to-many relationships in data. In 1971, the Conference on Data Systems Languages (CODASYL) formally defined the network model. The basic data modeling construct in the network model is the set construct. A set consists of an owner record type, a set name, and a member record type. A member record type can have that role in more than one set, hence the multiparent concept is supported. An owner record type can also be a member or owner in another set. The data model is a simple network, and link and intersection record types (called junction records by IDMS) may exist, as well as sets between them . Thus, the complete network of relationships is represented by several pairwise sets; in each set some (one) record type is owner (at the tail of the network arrow) and one or more record types are members (at the head of the relationship arrow). Usually, a set defines a 1:M relationship, although 1:1 is permitted. The CODASYL network model is based on mathematical set theory.

8.1.2 HIERARCHICAL MODEL

The hierarchical data model organizes data in a tree structure. There is a hierarchy of parent and child data segments. This structure implies that a record can have repeating information, generally in the child data segments. Data in a series of records, which have a set of field values attached to it. It collects all the instances of a specific record together as a record type. These record types are the equivalent of tables in the relational model, and with the individual records being the equivalent of rows. To create links between these record types, the hierarchical model uses Parent Child Relationships. These are a 1:N mapping between record types. This is done by using trees, like set theory used in the relational model, “borrowed” from maths. For example, an organization might store information about an employee, such as name, employee number, department, salary. The organization might also store information about an employee’s children, such as name and date of birth. The employee and children data forms a hierarchy, where the employee data represents the parent segment and the children data represents the child segment. If an employee has three children, then there would be three child segments associated with one employee segment. In a hierarchical database the parent-child relationship is one to many. This restricts a child segment to having only one parent segment. Hierarchical DBMSs were popular from the late 1960s, with the introduction of IBM’s Information Management System (IMS) DBMS, through the 1970s.

8.1.3 RELATIONAL MODEL

A relational database is based on the relational model developed by E.F. Codd. A relational database allows the definition of data structures, storage and retrieval operations and integrity constraints. In such a database the data and relations between them are organized into tables. A table is a collection of records and each record in a table contains the same fields. The contents of a table can be permanently saved for future use.

Properties of the relational database model

Properties of Relational Tables:

1. Data is presented as a collection of relations.

2. Each relation is depicted as a table.

3. Columns are attributes that belong to the entity modeled by the table (ex. In a student table, you could have name, address, student ID, major, etc.).

4. Each row (“tuple”) represents a single entity (ex. In a student table, John Smith, 14 Oak St, 9002342, Accounting, would represent one student entity).

5. Every table has a set of attributes that taken together as a “key” (technically, a “superkey”) uniquely identifies each entity (Ex. In the student table, ―student ID‖ would uniquely identify each student – no two students would have the same student ID).

Overview

Certain fields may be designated as keys, which means that searches for specific values of that field will use indexing to speed them up and more importantly, uniquely identify each entity. There are many types of keys, however, quite possibly the two most important are the primary key and the foreign key. The primary key is what uniquely identifies each entity. The foreign key is a primary key of one table that also sits in another table. Ultimately, the use of foreign keys is the heart of the relational database model. This linkage that the foreign key provides is what allows tables to pull data from each other and link data together. Where fields in two different tables take values from the same set, a join operation can be performed to select related records in the two tables by matching values in those fields. Most often, but not always, the fields will have the same name in both tables. For example, an “orders” table might contain (customer-ID – primary key, product-code – foreign key) pairs and a “products” table might contain (product-code – primary key, price) pairs so to calculate a given customer’s bill you would sum the prices of all products ordered by that customer by joining on the product-code fields of the two tables. This can be extended to joining multiple tables on multiple fields. Because these relationships are only specified at retrieval time, relational databases are classed as dynamic database management system.

8.1.4 OBJECT ORIENTED MODEL

Object DBMSs add database functionality to object programming languages. They bring much more than persistent storage of programming language objects. Object DBMSs extend the semantics of the C++, Smalltalk and Java object programming languages to provide full-featured database programming capability, while retaining native language compatibility. A major benefit of this approach is the unification of the application and database development into a seamless data model and language environment. As a result, applications require less code, use more natural data modeling, and code bases are easier to maintain. Object developers can write complete database applications with a modest amount of additional effort.

According to Rao (1994), “The object-oriented database (OODB) paradigm is the combination of object-oriented programming language (OOPL) systems and persistent systems. The power of the OODB comes from the seamless treatment of both persistent data, as found in databases, and transient data, as found in executing programs.”

In contrast to a relational DBMS where a complex data structure must be flattened out to fit into tables or joined together from those tables to form the in-memory structure, object DBMSs have no performance overhead to store or retrieve a web or hierarchy of interrelated objects. This one-to-one mapping of object programming language objects to database objects has two benefits over other storage approaches: it provides higher performance management of objects, and it enables better management of the complex interrelationships between objects. This makes object DBMSs better suited to support applications such as financial portfolio risk analysis systems, telecommunications service applications, world wide web document structures, design and manufacturing systems, and hospital patient record systems, which have complex relationships between data.

8.1.5 OBJECT RELATIONAL MODEL

Object/relational database management systems (ORDBMSs) add new object storage capabilities to the relational systems at the core of modern information systems. These new facilities integrate management of traditional fielded data, complex objects such as time-series and geospatial data and diverse binary media such as audio, video, images, and applets. By encapsulating methods with data structures, an ORDBMS server can execute complex analytical and data manipulation operations to search and transform multimedia and other complex objects.

As an evolutionary technology, the object/relational (OR) approach has inherited the robust transaction- and performance-management features of it s relational ancestor and the flexibility of its object-oriented cousin. Database designers can work with familiar tabular structures and data definition languages (DDLs) while assimilating new object-management possibilities. Query and procedural languages and call interfaces in ORDBMSs are familiar: SQL3, vendor procedural languages, and ODBC, JDBC, and proprietary call interfaces are all extensions of RDBMS languages and interfaces. And the leading vendors are, of course, quite well known: IBM, Inform ix, and Oracle.

8.1.6 SEMI STRUCTURED MODEL

In semistructured data model, the information that is normally associated with a schema is contained within the data, which is sometimes called “self-describing”. In such database there is no clear separation between the data and the schema, and the degree to which it is structured depends on the application. In some forms of semistructured data there is no separate schema, in others it exists but only places loose constraints on the data. Semi-structured data is naturally modelled in terms of graphs which contain labels which give semantics to its underlying structure. Such databases subsume the modelling power of recent extensions of flat relational databases, to nested databases which allow the nesting (or encapsulation) of entities, and to object databases which, in addition, allow cyclic references between objects.

Semistructured data has recently emerged as an important topic of study for a variety of reasons. First, there are data sources such as the Web, which we would like to treat as databases but which cannot be constrained by a schema. Second, it may be desirable to have an extremely flexible format for data exchange between disparate databases. Third, even when dealing with structured data, it may be helpful to view it as semistructured for the purposes of browsing.

8.1.7 ER DIAGRAM

An Entity Relationship Diagram (ERD) is a data model describing how entities (or concepts or things) relate to one another. When created by business analysts, ERDs can be used to understand the business domain, clarify business terminology, and connect business concepts to database structures.

Essentially, a conceptual or logical ERD will visually show how the terms in your glossary relate to one another. They are especially helpful in clarifying information models for relational databases and helping business users understand database structures at a high level and without details.

The Key Elements of an ERD

Although they look complex, an ERD has 3 simple components.

Entities – An entity is a thing. In business domain terms, it’s a concept or glossary-level term. In relational database terms, it’s the table.
Relationships – The real insight from this type of diagram comes when we see how entities relate to one another, or relationships. Relationships can be thought of as verbs that link two or more nouns. Relationships can be modeled numerically, using the multiplicity syntax from a class diagram, or using Crows Foot Notation.
Attributes – Within each entity, there can be more than one attribute. Attributes provide detailed information about the concept. In a relational database, attributes are represented by the fields where the information inside a record is held.

Like any analysis model, creating an ERD is an iterative process that involves elicitation, analysis, and review with stakeholders. Here are some steps you’ll go through as you create an ERD.

1. Create boxes for each entity or primary business concept relevant to your model.

2. Model the relationships between each by drawing lines to connect related entities. Label the relationships using verbs or a numeric notation. Crows Foot Notation is common for ERDs, but you can also use the multiplicity notation from UML’s Class Diagrams.

3. Identify relevant attributes within each entity. For a conceptual model, focus on the most important attributes. As your model evolves, make your attribute lists more specific.

4. Review your model with business and technical stakeholders.

5. Repeat until your domain is well-represented by your model.

As an end result, you’ll have clearly defined how different business concepts relate to one another, and created a solid conceptual foundation for designing a relational database to support your business requirements, as well as a way to get business and technical stakeholders on the same page about how these concepts relate.

9. ADVANTAGES OF DBMS

Balance conflicting requirements

Improved data accessibility and reponsiveness

Increased productivity

Improved maintenance through data independance

Increased concurrency

Improved backup and recovery services.

SUMMARY

This session we have dealt with the introduction to database management systems.

We have also discussed about the roles and components of database management system.

The various information models have been discussed. We have listed the advantages of DBMS.