15 Distributed Database Systems

A distributed database system allows applications to access data from local and remote databases. In a homogenous distributed database system, each database is an Oracle database. In a heterogeneous distributed database system, at least one of the databases is a non-Oracle database. Distributed databases use a client/server architecture to process information requests.

Homogenous Distributed Database Systems

A homogenous distributed database system is a network of two or more Oracle databases that reside on one or more machines. An application can simultaneously access or modify the data in several databases in a single distributed environment. For example, a single query from a Manufacturing client on local database mfg can retrieve joined data from the products table on the local database and the dept table on the remote hq database.

For a client application, the location and platform of the databases are transparent. You can also create synonyms for remote objects in the distributed system so that users can access them with the same syntax as local objects. For example, if you are connected to database mfg but want to access data on database hq, creating a synonym on mfg for the remote dept table enables you to issue this query:

SELECT * FROM dept;

In this way, a distributed system gives the appearance of native data access. Users on mfg do not have to know that the data they access resides on remote databases.

Distributed Databases Versus Distributed Processing

The terms distributed database and distributed processing are closely related, yet have distinct meanings. There definitions are as follows:

Distributed database

A set of databases in a distributed system that can appear to applications as a single data source.

Distributed processing

The operations that occurs when an application distributes its tasks among different computers in a network. For example, a database application typically distributes front-end presentation tasks to client computers and allows a back-end database server to manage shared access to a database. Consequently, a distributed database application processing system is more commonly referred to as a client/server database application system.

Oracle distributed database systems employ a distributed processing architecture. For e xample, an Oracle database server acts as a client when it requests data that another Oracle database server manages.

The terms distributed database system and database replication are related, yet distinct. In a pure (that is, not replicated) distributed database, the system manages a single copy of all data and supporting database objects. Typically, distributed database applications use distributed transactions to access both local and remote data and modify the global database in real- time.

The term replication refers to the operation of copying and maintaining database objects in multiple databases belonging to a distributed system. While replication relies on distributed database technology, database replication offers applications benefits that are not possible within a pure distributed database environment.

Most commonly, replication is used to improve local database performance and protect the availability of applications because alternate data access options exist. For example, an application may normally access a local database rather than a remote server to minimize network traffic and achieve maximum performance. Furthermore, the application can continue to function if the local server experiences a failure, but other servers with replicated data remain accessible.

In a heterogeneous distributed database system, at least one of the databases is a non-Oracle system. To the application, the heterogeneous distributed database system appears as a single, local, Oracle database. The local Oracle database server hides the distribution and heterogeneity of the data.

The Oracle database server accesses the non-Oracle system using Oracle Heterogeneous Services in conjunction with an agent. If you access the non-Oracle data store using an Oracle Transparent Gateway, then the agent is a system-specific application. For example, if you include a Sybase database in an Oracle distributed system, then you need to obtain a Sybase-specific transparent gateway so that the Oracle databases in the system can communicate with it.

Alternatively, you can use generic connectivity to access non-Oracle data stores so long as the non-Oracle system supports the ODBC or OLE DB protocols.

Heterogeneous Services

Heterogeneous Services (HS) is an integrated component within the Oracle database server and the enabling technology for the current suite of Oracle Transparent Gateway products. HS provides the common architecture and administration mechanisms for Oracle gateway products and other heterogeneous access facilities. Also, it provides upwardly compatible functionality for users of most of the earlier Oracle Transparent Gateway releases.

Transparent Gateway Agents

For each non-Oracle system that you access, Heterogeneous Services can use a transparent gateway agent to interface with the specified non-Oracle system. The agent is specific to the non-Oracle system, so each type of system requires a different agent.

The transparent gateway agent facilitates communication between Oracle and non-Oracle databases and uses the Heterogeneous Services component in the Oracle database server. The agent executes SQL and transactional requests at the non-Oracle system on behalf of the Oracle database server.

Schema Objects and Naming in a Distributed Database

A schema object (for example, a table) is accessible from all nodes that form a distributed database. Therefore, just as a non-distributed local DBMS architecture must provide an unambiguous naming scheme to distinctly reference objects within the local database, a distributed DBMS must use a naming scheme that ensures that objects throughout the distributed database can be uniquely identified and referenced.

To resolve references to objects (a process called name resolution) within a single database, the DBMS usually forms object names using a hierarchical approach. For example, within a single database, a DBMS guarantees that each schema has a unique name, and that within a schema, each object has a unique name. Because uniqueness is enforced at each level of the hierarchical structure, an object’s local name is guaranteed to be unique within the database and references to the object’s local name can be easily resolved.

Distributed database management systems simply extend the hierarchical naming model by enforcing unique database names within a network. As a result, an object’s global object name is guaranteed to be unique within the distributed database, and references to the object’s global object name can be resolved among the nodes of the system.

Transparency in a Distributed Database System

The functionality of a distributed database system must be provided in such a manner that the complexities of the distributed database are transparent to both the database users and the database administrators.

For example, a distributed database system should provide methods to hide the physical location of objects throughout the system from applications and users. Location transparency exists if a user can refer to the same table the same way, regardless of the node to which the user connects. Location transparency is beneficial for the following reasons:

Access to remote data is simplified, because the database users do not need to know the location of objects.
Objects can be moved with no impact on end-users or database applications.

A distributed database system should also provide query, update, and transaction transparency. For example, standard SQL commands, such as SELECT, INSERT, UPDATE, and DELETE, should allow users to access remote data without the requirement for any programming. Transaction transparency occurs when the DBMS provides the functionality described below using standard SQL COMMIT, SAVEPOINT, and ROLLBACK commands, without requiring complex programming or other special operations to provide distributed transaction control.

The statements in a single transaction can reference any number of local or remote tables.
The DBMS guarantees that all nodes involved in a distributed transaction take the same action: they either all commit or all roll back the transaction.
If a network or system failure occurs during the commit of a distributed transaction, the transaction is automatically and transparently resolved globally; that is, when the network or system is restored, the nodes either all commit or all roll back the transaction.

A distributed DBMS architecture should also provide facilities to transparently replicate data among the nodes of the system. Maintaining copies of a table across the databases in a distributed database is often desired so that

Tables that have high query and low update activity can be accessed faster by local user sessions because no network communication is necessary.
If a database that contains a critical table experiences a prolonged failure, replicates of the table in other databases can still be accessed.

A DBMS that manages a distributed database should make table replication transparent to users working with the replicated tables.

Finally, the functional transparencies explained above are not sufficient alone. The distributed database must also perform with acceptable speed.

Traditionally in many organizations, the control of data resources has been centralized due to the origin of data management within the mainframe environment. However, with the emergence of client-server technology and the blending of data and process in the object-oriented methodologies,many organizations are questioning the need to retain the centralized data and database management functions. These organizations are experimenting with the concept of decentralized data and database management, where application developers/programmers/integrators perform many, if not all, the functions of a data analyst and/or database analyst. There are numerous risks associated with such a decision and few benefits.

Data analysis, database administration and application development/integration are very separate functions and require separate skill sets. A data analyst, who is responsible for the conceptual and logical gathering and organization of information facts, is a person with broad-based analytical talents, is a good and discerning listener, has excellent oral and written communications skills.

These talents serve the role of data analyst by allowing the analyst to determine the relevant facts (data) in a business user ’s description of the information needed to perform a function.

Frequently, this description flows in a “stream-of-consciousness” manner and the listening and analytical skills help the data analyst focus on the real entities and attributes instead of the inconsequential information. This skill is developed through training and practice and is essential to the proper collection and organization of relevant data.

As the area responsible for the establishment and reusability of data, a data analyst is expected to understand the uses of each entity and its role in the corporate data management scheme. This duty requires the data analyst to provide flexible yet solid definitions and usage of the logical entities and attributes found in all the organization’s data and file structures. In advocating and participating in the planning and coordination of the information resource across related applications and business areas, the amount of data sharing can be maximized, and the amount of design and data redundancy can be minimized. Data analysts are also concerned with the metadata (definitions, standard names, formats, common code sets, etc…) of an object and its accessibility and central storage.

Perhaps more than any other of the discrete disciplines within IS, Data Administration requires a concrete grasp of the real business the company is in, not just the technical aspects of interaction with a computer. Database administrators and application developers/integrators are not required to possess this level of business understanding.

The database administrator, a function separate from a data analyst, is a person with special skills relating to the DBMS under their control. This physical data management function requires intimate knowledge of the DBMS, the platform it operates upon, and the performance and technical usage requirements of the application under construction or enhancement. Proper database analysis and database structure and design can prevent the problems of poor performance and high maintenance databases, and the creation of unsharable data.

Defining proper access to the database, providing appropriate storage parameters and executing regular and robust maintenance routines (backup and recovery, performance monitoring, etc…) are all the responsibility of a database administrator. These functions require the talents of technical expertise and tenacious problem solving. It also requires detailed training in the DBMS’ operations, acquired through courses and practice. Database administrators are usually less concer ned with the business content of the data under their control than are data analysts, but they must understand the expected usage to design and enhance optimally performing databases. The enhancement of database structures (adding or deleting columns, rena ming columns or tables, etc.) must be done judiciously and by a technician skilled in the nuances of the database. For example, Oracle does not provide a facility for dropping and renaming columns, a point known to Oracle DBA’s but not by many other IS professionals.

Application developers/integrators are expected to code and design the applications that provide data to the databases and present that data to the users. Application developers are usually trained in the languages and interfaces of their applications, but are not usually concerned with the analysis of that data from a business perspective. Since they work with data after the database has been structured, they frequently do not understand fully the need for normalized logical design. This lack of understanding can result in incorrect normalizing of data if application developers perform database design or enhancement. This improper normalizing can cause a database to perform poorly and require users to re-enter rather than reuse data in the application.

Also, application developers concentrate on a single application at a time. Frequently, they do not have the broad, enterprise perspective necessary for the reduction or elimination of redundancy that is essential if data is to be used as a corporate resource. The development of many stovepipe applications in the past is a result of data structure design by application specialists who were not considering the broader implications of sharing data and reducing data redundancy.

Many organizations considering the combination of data management and application development cite the need for swifter implementation of databases and more rapid enhancements to existing databases. Sensitivity to deadline pressures in a constant throughout all development projects.Decentralizing data management (logical and/or physical) appears to offer some slight advantages in faster application development. However, the actual exposure to poorly defined data and poorly structured databases, incorrect enhancement procedures and unsharable data far outweigh the small saving in time resulting from application developers performing data management functions. Industry studies have consistently shown high costs for redesign and re-enhancement when the logical and physical data management functions are not performed by data management specialists (data analysts and database administrators). Effective project management practices suggest the division of labor into discrete tasks and each of those tasks to be performed by a specialist in that area. Employing this management practice in the area of data management in a system-development or enhancement project will enable an organization to adequately maintain the costs of that project. Simply stated, faster application development is not the objective. The correct objective is the development and enhancement of high quality and high integrity applications as efficiently as possible.

Data is rapidly growing in stature as a recognized corporate resource. A centralized approach to logica l and physical data management will promote the development and use of integrated, sharable data throughout applications, preserve the quality of that data and serve the needs of the business more effectively.

Distribution transparency is the property of distributed databases by the virtue of which the internal details of the distribution are hidden from the users. The DDBMS designer may choose to fragment tables, replicate the fragments and store them at different sites. However, since users are oblivious of these details, they find the distributed database easy to use like any centralized database.

The three dimensions of distribution transparency are −

Location transparency
Fragmentation transparency
Replication transparency

Location Transparency

Location transparency ensures that the user can query on any table(s) or fragment(s) of a table as if they were stored locally in the user ’s site. The fact that the table or its fragments are stored at remote site in the distributed database system, should be completely oblivious to the end user. The address of the remote site(s) and the access mechanisms are completely hidden.

In order to incorporate location transparency, DDBMS should have access to updated and accurate data dictionary and DDBMS directory which contains the details of locations of data.

Fragmentation Transparency

Fragmentation transparency enables users to query upon any table as if it were unfragmented. Thus, it hides the fact that the table the user is querying on is actually a fragment or union of some fragments. It also conceals the fact that the fragments are located at diverse sites.

This is somewhat similar to users of SQL views, where the user may not know that they are using a view of a table instead of the table itself.

Replication Transparency

Replication transparency ensures that replication of databases are hidden from the users. It enables users to query upon a table as if only a single copy of the table exists.

Replication transparency is associated with concurrency transparency and failure transparency. Whenever a user updates a data item, the update is reflected in all the copies of the table. However, this operation should not be known to the user. This is concurrency transparency. Also, in case of failure of a site, the user can still proceed with his queries using replicated copies without any knowledge of failure. This is failure transparency.

Combination of Transparencies

In any distributed database system, the designer should ensure that all the stated transparencies are maintained to a considerable extent. The designer may choose to fragment tables, replicate them and store them at different sites; all oblivious to the end user. However, complete distribution transparency is a tough task and requires considerable design efforts.

In synchronous replication approach, the database is synchronized so that all the replications always have the same value. A transaction requesting a data item will have access to the same value in all the sites. To ensure this uniformity, a transaction that updates a data item is expanded so that it makes the update in all the copies of the data item. Generally, two-phase commit protocol is used for the purpose.

In asynchronous replication approach, the replicas do not always maintain the same value. One or more replicas may store an outdated value, and a transaction can see the different values. The process of bringing all the replicas to the current value is called synchronization.

To summarize, we have looked at the definitions of distributed databases, Then the hierarchy of distributed environment was discussed in detail. The need for data replication is mentioned along with the various types of replication. The issues pertaining to data replication is also dealt in this session.