3 Database System Architecture

Dr R. Baskaran

Over the past years, conceptually different DBMSs have been developed, and these DBMSs have different capabilities, both regarding organisation of data, modelling of data and access to data. In this section I will dive a little further into the different DBMSs and especially concentrate on ORDBMSs and OODBMSs. I will also present the recent academic approaches towards persistent languages and explain what orthogonal persistence is, but first I will explain the concepts of hierarchical, network and relational DBMSs.

Hierarchical

Hierarchical DBMSs became commercial available in the late 1960s with IBMs IMS and the DL/1-language. The hierarchical DBMSs organise and model their data in a hierarchical fashion as a collection of trees. All data elements have an owner or root (one and only one) – the hierarchical mother of the data-instance, and accordingly all other records have a unique parent record. Finally, the access to data in a hierarchical database is done by using low-level calls which the programmer writes into his/hers programs for access purposes using a navigational language, navigating from the root records to the actual record of interest (Silberschatz et al. 1991). To be able to do this, the programmer must know the physical representation of the database.

This approach is well suited for large systems containing a lot of data where the data can be organised in a hierarchical way without compromising the information. Hierarchical databases support two types of information, the record type which is a record containing data, and parent-child relations (PCR) which defines a 1:N relationship between one parent record and N child-records. Hierarchical databases are still used in many systems and IMS is still the leading hierarchical DBMS. On the other hand, this approach has several major limitations due to its representation of data. It is not trivial to represent a non-hierarchical structure of information in such a database (Elmasri & Navathe 1994).

Network

C.W. Bachman developed the first commercial network DBMS database (Bachman 1965). The system – IDS – was available in 1964. The modelling paradigm of the network database is somewhat different than its hierarchical counterpart. The network databases arranges its data as a directed graph and has a standard navigational language (DBTG 1971). This paradigm made it possible to move directly from one specific entry point in a data set to another record in another data set (Silberschatz et al.1991).

The network databases offers an efficient access-path to its data and is capable to represent almost any informational structure containing simple types (e.g. integers, floats, strings and characters). This is accomplished using different kinds of mapping mechanisms called sets. A set is a container of pointers identifying which sets of data can be reached from the current record. Three sets are defined by the CODASYL standard – singular/system sets, multimember sets and recursive sets. Using these sets, the database designer and programmer may represent and navigate on 1:1, 1:N and N:M relationships (Elmasri & Navathe 1994). To be able to do this, the programmer has to know the physical representation of the database and access the database using a low-level navigational language (Bachman 1973). This approach to DBMS is more flexible than the hierarchical approach, but still the programmer has to know the physical representation of data to be able to access it, and accordingly applications using a network database has to be changed every time the structure of the database changes.

Relational

The relational data-model was first represented by Edgar F. Codd in (Codd 1970) and later in a series of papers up to 1972 (Codd 1971a, Codd 1971b, Codd 1971c, and Codd 1972). The model offers a conceptually different approach to data storage. In the relational database, all data is represented as simple tabular data structures (relations) which may be accessed using a high-level non-procedural language. This language is used to gain access to the relations and the desired set of data and the programmer does not have to write algorithms for navigation. By using this approach the physical implementation of the database is hidden, thus the programmer does not have to know the physical implementation to be able to access the data. In 1974 Chamberlain and others at IBM proposed such a high-level non-procedural language – SEQUEL, which later, due to legal problems, was renamed into SQL (Date 1990). In 1986 the ANSI X3H2 committee accepted SQL as an ANSI standard, and soon SQLs biggest competitor, QUEL, vanished.

The relational approach separates the program from the physical implementation of the database, making the program less sensitive to changes of the physical representation of the data by unifying data and metadata in the database. This makes the development of programs more effective and less dependent on changes in the physical representation of data (Gray 97).

SQL and RDBMSs have become widely used due to the separation of the physical and logical representation (and marketing of course). It is much easier to understand tables and records than pointers and pointer?s pointers to records. Most significantly, research showed that a high-level relational query language could give performance comparable to the best record-oriented databases (Gray 97).

Object-relational

The object-relational DBMS (ORDBMS) is the newest commercial breed of DBMSs which embraces some object-oriented features and encapsulate these features into an RDBMS, creating an ORDBMS. ORDBMSs are mainly based upon the criteria defined by Stonebraker et al.(1990). This manifesto is built as an opposing manifesto to Atkinson et al. (1989). Stonebraker et al. suggests to extend the capabilities of an RDBMS to include support for richer object structures and rules and still be open to other systems. This is done by thirteen propositions of requirements to ORDBMSs as extensions to RDBMS.

The first tenet in (Stonebraker et al. 1990) is concerned with richer objects and rules. This includes inheritance, advanced data type (ADT), and a number of different constructors, e.g. sets, lists and bags, needed to operate on objects or collections of objects. Stonebraker et al. (1990) suggest that all functions which involves data should be written in a high-level, non-procedural access language (HLL) to avoid programming towards low-level interfaces being dependent on physical implementation of the database.

The second tenet is concerned with the increasing DBMS functions, how the functions should be written and how they should access data. Stonebraker et al. (1990) state that essentially all programmatic access to DBMS should be using the HLL while ensuring data independence. This should be accomplished by including updateable views, enumeration of members in collections combined with the HLLs capabilities to specify membership, and by avoiding all kinds of low-level access dependent on the physical implementation. The ORDBMS should also, according to Stonebraker et al. (1990), keep a backward compatibility to RDBMS, making porting from RDBMS to ORDBMS easy.

The third tenet concerns openness and the ORDBMSs application-programming interface (API). Stonebraket et al. (1990) put forward the idea that ORDBMSs should be accessible from multiple HLLs, and that all of these HLLs should be based upon SQL. Persistent programming languages should be supported on top of a single DBMS by compiler extensions to the programming language, and a runtime-system.

The ORDBMS camp is however split. Darwen & Date (1995) have written an opposing manifesto to Stonebraker et al. (1990) and Atkinson et al. (1989) suggesting a completely different set of guidelines to the future DBMSs. First of all, Darwen & Date, state that SQL is not suited as the base language for the future DBMSs (they feel strongly that “any attempt to move forward, if it is to stand time, must reject SQL unequivocally”). They claim that the foundation of the DBMSs must be firmly rooted in the relational model of data, that the relational model is capable to include object-oriented features and that a new HLL has to be developed (and SQL omitted).

Darwen & Date and Stonebraker et al. agree on several important issues (e.g. use of the relational model and development of a new HLL), but disagree on other (e.g. SQL and multiple inheritance).

In 1992 the ANSI X3H2 committee started its work on a new object-relational standard SQL3 (SQL3 is the ORDBMS standard) which was scheduled to be finished in 1995. The committee is still working on it and the expected timeframe for completion is currently 1999 (JCC 1998).

Since no standard currently exists, no vendors can truthfully claim to be SQL3 compliant, but some vendors have done major changes to their RDBMS to include object-relational features (e.g. ORACLE,IBMs DB2, and Informix). The ORDBMS models its data based upon the extensions to the relational model and relational calculus. ORDBMS is capable of storing ADTs or basal types in records,lists, bags etc. and can access the data using an extended version of SQL. The result – some kind of hybrid DBMS, half relational, half object-oriented. Whether or not this is a good idea, is not the subject for this essay.

Object-oriented

The conceptual paradigm of the object-oriented DBMS (OODBMS) is quite different from the other approaches presented. OODBMSs offer persistence to objects, including the object?s associations and methods. Atkinson et al. (1989) gave a guideline on the requirements to an OODBMS, and this manifesto is still a very good guideline on OODBMS. In 1993 ODMG put forward its ODMG-93, and later in 1997 ODMG 2.0 standard which today is the de facto standard for OODBMSs (Cattell 1994 and Cattell et al. 1997). Before I look deeper into the concepts of the OODBMS, orthogonal persistence has to be defined and understood.

A Database is a collection of related data organised in a way that data can be easily accessed, managed and updated. Any piece of information can be a data, for example name of your school. Database is actualy a place where related piece of information is stored and various operations can be performed on it.

DBMS

A DBMS is a software that allows creation, definition and manipulation of database. Dbms is actualy a tool used to perform any kind of operation on data in database. Dbms also provides protection and security to database. It maintains data consistency in case of multiple users. Here are some examples of popular dbms, MySql, Oracle, Sybase, Microsoft Access and IBM DB2 etc.

Users : Users may be of various type such as DB administrator, System developer and End users.

Database application : Database application may be Personal, Departmental, Enterprise and Internal
DBMS : Software that allow users to define, create and manages database access, Ex: MySql, Oracle etc.
Database : Collection of logical data.

Functions of DBMS

Provides data Independence
Concurrency Control
Provides Recovery services
Provides Utility services
Provides a clear and logical view of the process that manipulates data.

Advantages of DBMS

Segregation of applicaion program.
Minimal data duplicacy.
Easy retrieval of data.
Reduced development time and maintainance need.

Disadvantages of DBMS

Complexity
Costly
Large in size

Data architecture should be defined in the planning phase of the design of a new data processing and storage system. The major types and sources of data necessary to support an enterprise should be identified in a manner that is complete, consistent, and understandable. The primary requirement at this stage is to define all of the relevant data entities, not to specify computer hardware items. A data entity is any real or abstracted thing about which an organization or individual wishes to store data.

EXAMPLE SCENARIOS

Hierarchical Model

In this model each entity has only one parent but can have several children . At the top of hierarchy there is only one entity which is called Root.

Network Model

In the network model, entities are organised in a graph,in which some entities can be accessed through sveral path

Relational Model

In this model, data is organised in two-dimesional tables called relations. The tables or relation are related to each other.

Two-tier Client / Server Architecture

Two-tier Client / Server architecture is used for User Interface program and Application Programs that runs on client side. An interface called ODBC(Open Database Connectivity) provides an API that allow client side program to call the dbms. Most DBMS vendors provide ODBC drivers. A client program may connect to several DBMS’s. In this architecture some variation of client is also possible for example in some DBMS’s more functionality is transferred to the client including data dictionary, optimization etc. Such clients are called Data server.

Three-tier Client / Server Architecture

Three-tier Client / Server database architecture is commonly used architecture for web applications. Intermediate layer called Application server or Web Server stores the web connectivty software and the business logic(constraints) part of application used to access the right amount of data from the database server. This layer acts like medium for sending partially processed data between the database server and the client.

E.F Codd was a Computer Scientist who invented Relational model for Database management. Based on relational model, Relation database was created. Codd proposed 13 rules popularly known as Codd’s 12 rules to test DBMS’s concept against his relational model. Codd’s rule actualy define what quality a DBMS requires in order to become a Relational Database Management System(RDBMS). Till now, there is hardly any commercial product that follows all the 13 Codd’s rules. Even Oracle follows only eight and half out(8.5) of 13. The Codd’s 12 rules are as follows.

Rule zero

This rule states that for a system to qualify as an RDBMS, it must be able to manage database entirely through the relational capabilities.

Rule 1 : Information rule

All information(including metadeta) is to be represented as stored data in cells of tables. The rows and columns have to be strictly unordered.

Rule 2 : Guaranted Access

Each unique piece of data(atomic value) should be accesible by : Table Name + primary key(Row) + Attribute(column).

NOTE : Ability to directly access via POINTER is a violation of this rule.

Rule 3 : Systemetic treatment of NULL

Null has several meanings, it can mean missing data, not applicable or no value. It should be handled consistently. Primary key must not be null. Expression on NULL must give null.

Rule 4 : Active Online Catalog

Database dictionary(catalog) must have description of Database. Catalog to be governed by same rule as rest of the database. The same query language to be used on catalog as on application database.

Rule 5 : Powerful language

One well defined language must be there to provide all manners of access to data. Example: SQL. If a file supporting table can be accessed by any manner except SQL interface, then its a violation to this rule.

Rule 6 : View Updation rule

All view that are theoretically updatable should be updatable by the system.

Rule 7 : Relational Level Operation

There must be Insert, Delete, Update operations at each level of relations. Set operation like Union, Intersection and minus should also be supported.

Rule 8 : Physical Data Independence

The physical storage of data should not matter to the system. If say, some file supporting table were renamed or moved from one disk to another, it should not effect the application.

Rule 9 : Logical Data Independence

If there is change in the logical structure(table structures) of the database the user view of data should not change. Say, if a table is split into two tables, a new view should give result as the join of the two tables. This rule is most difficult to satisfy.

Rule 10 : Integrity Independence

The database should be able to conforce its own integrity rather than using other programs. Key and Check constraints, trigger etc should be stored in Data Dictionary. This also make RDBMS independent of front-end.

Rule 11 : Distribution Independence

A database should work properly regardless of its distribution across a network. This lays foundation of distributed database.

Rule 12 : Nonsubversion rule

If low level access is allowed to a system it should not be able to subvert or bypass integrity rule to change data. This can be achieved by some sort of looking or encryption.

A Relational Database management System(RDBMS) is a database management system based on relational model introduced by E.F Codd. In relational model, data is represented in terms of tuples(rows).

RDBMS is used to manage Relational database. Relational database is a collection of organized set of tables from which data can be accessed easily. Relational Database is most commonly used database. It consists of number of tables and each table has its own primary key.

What is Table ?

In Relational database, a table is a collection of data elements organised in terms of rows and columns. A table is also considered as convenient representation of relations. But a table can have duplicate tuples while a true relation cannot have duplicate tuples. Table is the most simplest form of data storage. Below is an example of Employee table.

What is a Record ?

A single entry in a table is called a Record or Row. A Record in a table represents set of related data.

For example, the above Employee table has 4 records. Following is an example of single record.

What is Field ?

A table consists of several records(row), each record can be broken into several smaller entities known as Fields. The above Employee table consist of four fields, ID, Name, Age and Salary.

What is a Column ?

In Relational table, a column is a set of value of a particular type. The term Attribute is also used to represent a column. For example, in Employee table, Name is a column that represent names of employee.

Database Keys

Keys are very important part of Relational database. They are used to establish and identify relation between tables. They also ensure that each record within a table can be uniquely identified by combination of one or more fields within a table.

Super Key

Super Key is defined as a set of attributes within a table that uniquely identifies each record within a table. Super Key is a superset of Candidate key.

Candidate Key

Candidate keys are defined as the set of fields from which primary key can be selected. It is an attribute or set of attribute that can act as a primary key for a table to uniquely identify each record in that table.

Primary Key

Primary key is a candidate key that is most appropriate to become main key of the table. It is a key that uniquely identify each record in a table.