20 Database Replication
Dr R. Baskaran
REPLICATION OVERVIEW
Many business services have gone online. For instance, online banking and online shopping have become standard activities in our daily lives, and we expect a smooth user experience. The services need to be available around the clock, and respond within seconds – whether there are many thousands of users accessing the system at the same time or one is the only person in the world using it. How do the service providers achieve this? In fact, they use the same principle that was applied in the old times: replication. A generation ago, when banking and shopping were still tasks that forced you to leave your home, when still real people were in charge of serving your requests, replication was standard practice. The more customers lived in an area and demanded the service, the more bank clerks and cashiers were hired. Backups were on call should the main staff become sick. Now, compute nodes1 replace the human beings. Service execution is distributed over all nodes. If the number of users increases, more nodes are added to share the load, guaranteeing that the quality of service remains acceptable. If a node fails, a failover procedure transfers the tasks executing on the failed node onto another node, making the failure imperceptible to the end user. Replication is relatively straightforward if a service only requires computation but has no critical data or information associated with it, i.e., the service is stateless.
However, many services are stateful, meaning they use and manipulate business critical data such as money transfers of a bank client or purchase information in a bookstore. Such data are typically stored in a database system. Then, coordination among nodes becomes more challenging. In fact, in many cases, not only the service functionality is replicated but also the data so that each node can access its own copy of the data. Informally, database replication means that the logical data items of the database (e.g., the tuples in a relational database or the objects in the object-oriented world) have multiple physical copies, located on different nodes. Database replication is precisely the topic of this book. Database replication is used by many different applications, not only banking and e-commerce. It is used for many different purposes and deployed over a range of computer configurations. It can be small- or large-scale, affecting a few data items or terabytes of data. Obviously, given the diversity of applications and environments, there is no one-fits-all replication solution. Instead, a replication approach has to consider many different issues and must be specifically designed and adjusted to the particular task on hand.
Let us have a closer look at a replication solution. For better illustration, we use the example of an online store selling some products, say puppets. The store’s web site offers functionality to query the catalogue of puppets it offers and to purchase the ones selected. Behind the web server, the company maintains a database that keeps all the product information including pictures, the stock of each of the products and pricing information. It also maintains information about customers that are currently or have recently been involved in a purchase. So why would our company want to replicate the data? The first reason is fault-tolerance. Our puppet company definitely wants to guarantee its customers 24/7 access to their system, despite failures. A failure can mean that the process running the database software fails, that the physical machine crashes, e.g., due to malfunctioning of the hardware, or that the connectivity between the client and the server system is (temporarily) interrupted, due to some network failure. To handle these failure cases, our company installs two copies of the database on different nodes. In such a case, where the entire database is replicated, a node is also often referred to as a replica. Now, if one of the replicas fails, then there is still one replica up and running. The system is able to tolerate the failure of one of the copies. This is also referred to as high availability as the service remains available despite failures. In most high-availability solutions, the replicas reside in the same local area network to allow fast communication between the two.
A failure detection mechanism detects any failure, and the clients connected to the failed node are reconnected to the available node where request execution simply resumes. Such a solution, however, cannot handle network failures between clients and database as the client will not be able to reach any of the replicas. To address this problem, wide area replication can be used. With wide area replication, the database replicas are geographically spread and each client connects to the closest replica. Even if a remote replica is not reachable, access to the local replica is usually provided. However, the latency of the wide area network introduces interesting challenges, and failover becomes more complex due to the possibility of network partitions. A second major use for replication is performance, as it can help to increase throughput and reduce response time. Let us come back to our puppet store. As a first option, the company installs multiple replicas within a local area network, resulting in what is often called a cluster. The cluster appears as one unit to the outside, and as requests come in, they are distributed across the replicas. By adding new replicas, the system can scale up to increasing demands. In this case, replication serves the purpose of scalability as it is able to provide increasing throughput. However, as our company expands and attracts customers worldwide, it realizes that the quality of service for users that are far away from the company’s headquarters suffers due to the long network delay for each message exchange. Therefore, the company also replicates its database at geographically strategic locations, providing acceptable response times to all their customers as they can now connect to a close-by replica. We call this replication for fast local access. At the same time, as more replicas are added across the world, the company is also able to increase the overall throughput of their system.
Cluster and wide area replication are not the only replication configurations. For example, assume the company also sells their puppets to toy stores and large department stores. Its saleswomen traverse the country, packed with their laptops to show the products to these stores and sell them in large scale. For that, they replicate at least parts of the database onto their laptops. This is a scenario where replicas are often in a disconnected mode, and the different replicas have quite different scale. Nevertheless, the purpose of replication remains the same: data remain available for the sales people although no network connection might exist, and the access is fast because it is local.
CHALLENGES
Despite its advantages, replication is not a straightforward technique to apply, and there are many hurdles to overcome before one has a suitable replication solution that fits the application requirements. We will discuss some of the issues in this book. At the forefront will be replica control: assuring that data copies remain consistent. Other issues are the architectural design options and autonomic support to provide self-management.
REPLICA CONTROL
When a data item is updated, its physical copies need to be updated. As easy as this might sound, this task, called replica control, is not a straightforward approach as there are many possible approaches, each having its advantages and its drawbacks depending on the application and the configuration. Let us illustrate this with an example. Assume our puppet company has decided to deploy a cluster of database replicas as depicted in Figure 1.4. Each node maintains a full copy of the database. When a client request arrives, it is redirected to one of the replicas that controls its execution. In most applications, there are two major request types: update requests, such as purchase() in the figure, update at least one data item; and read-only requests, such as check-status(), only read data items. Our company employes a Read-one-write-all (ROWA) replication strategy: the update of a data item is performed at all replicas, while a read operation accesses a single replica. ROWA can be implemented in various ways. The fundamental differences between existing approaches lie in where and when copies are updated [Gray et al., 1996]. In regard to where, our company uses a primary copy approach. There is one database replica that is considered the primary (replica). It holds the primary copy of the database. The other replicas hold secondary copies and are called secondary (replicas). All update requests are sent to the primary replica and are first executed there. An update request might read and write several data items. All writes are forwarded to the secondary’s where they are also executed. Read-only requests can be executed at the primary or the secondary’s. They can execute completely locally without coordination with other replicas. In regard to when copies are updated, our company uses an eager approach, also referred to as synchronous replication. The secondary’s apply the changes to their own copies immediately when they receive them, and then send a confirmation to the primary. Only when the primary knows that all secondary’s have the changes, it confirms to the user that the execution was successful.
Primary copy vs. update anywhere. The use of a primary replica forces all updates to be executed first at a single node. This simplifies the coordination of concurrent update requests. However, it has several disadvantages. For example, the primary can become a bottleneck. The alternative is to use an update anywhere approach (also called update everywhere). Each replica accepts both update and read-only requests and is responsible for their execution. While it avoids the pitfalls of the primary copy approach, it has its own problems. In particular, it is much harder to guarantee consistency as data copies might now be updated concurrently at different replicas. Eager vs. lazy. By using eager replication, the primary only returns a confirmation to the user once all secondary’s have the updates executed. Thus, copies are “virtually” consistent. However, clients might experience prolonged response times due to the replication coordination.
Especially with wide area replication, this can become a serious problem. Also network connectivity can be spotty in wide area networks, and the entire service might render unavailable if one replica is not reachable due to network problems. The alternative is to use lazy replication, also called asynchronous replication. With lazy replication, an update request is completely executed at one replica, which propagates the writes to the other replicas after returning a confirmation to the client. Thus, the coordination tasks do not affect the user. However, maintaining consistency is more difficult, as the same update takes place at different times at the various replicas. Additionally, when lazy replication is combined with update anywhere, two updates concurrently submitted to two replicas can update the same data item and succeed. Later, when the updates are propagated, the system has to detect such conflict, possibly undo one of the updates despite the fact that the client was already informed that the update was successful. Transactions. Although we have not yet mentioned it, but to many readers, it will probably already be obvious that many of the applications that require replication will also require transactions and their properties – in particular atomicity, isolation and durability. In fact, it is pretty straightforward to map the execution of a client request to a transaction. Each request reads and/or writes several data items. From the outside, the execution of the request should appear as one logical execution unit. That is exactly the definition of a transaction. A transaction is a user-defined sequence of read and write operations on data items, and the system provides a set of properties for their execution. Atomicity guarantees that a transaction either executes entirely or commits, or it aborts not leaving any changes in the database.
Thus, database systems provide rollback mechanisms to abort transactions and provide distributed commit protocols for distributed transactions, i.e., for transactions that access data items residing on different nodes. Isolation provides a transaction with the impression that no other transaction is currently executing in the system. Concurrency control mechanisms such as locking are in charge of that. Durability guarantees that once the initiator of the transaction has received the confirmation of commit, the changes of the transaction are, indeed, reflected in the database (they can, of course, later be overwritten by other transactions). Sophisticated logging protocols guarantee durability despite individual node failures. Replication does not make it easier to achieve these properties. In fact, replica control, atomic commit protocols and concurrency control often work tightly together to let a replicated system appear as a single transactional system. In this book, transactions will be first-class citizens in considering and analyzing replica control algorithms. Note that while the next chapter describes the transactional properties in more detail, we assume the reader to be familiar with transactions and the basic mechanisms to implement them.
REPLICATION MODEL
A database consists of a set of data items x, y,… . In a replicated database, there is a set of database nodes RA, RB, … each of them having copies of data items. Thus, we refer to x, y, … as logical data items, and each logical data item x has physical copies xA, xB, … where RA is the node (replica) on which xA resides. From the perspective of the application, a transaction is a sequence of read and write operations on the logical data items of the database. The transaction is ended with a commit or an abort request. The latter indicates that the updates executed so far need to be rolled back. One of the tasks of replica control is to map the operations on the logical data items onto operations on the physical copies. The most common execution model is to translate a logical read operation ri(x) of transaction Ti to one physical read operation ri(xA) on one particular copy xA. And a logical write operation wi(x) is mapped to physical write operations wi(xA), wi(xB),… on all copies of x. This is called a read-one-write-all (ROWA) approach. Thus, a transaction Ti can have sub-transactions on many nodes, namely on each node on which it accesses at least one physical copy. For simplicity of the discussion, we assume in this and most of the other chapters full replication where each node in the system has a full copy of the database, i.e., copies of all data items. The terms “node” and “replica” are then used interchangeably. With full replication, an update transaction, i.e., a transaction that has at least one write operation, has sub-transactions on all nodes, while read-only transactions typically only access the copies of a single node, albeit it is possible to distribute the reads among several nodes. ROWA works fine because in most applications reads by far outnumber writes.
Hence, it makes sense to keep the overhead for read operations as small as possible. ROWA is not suitable when failures occur, as an update transaction cannot complete anymore once a single copy becomes unavailable. Therefore, a derivation is the read-one-write-all-available, or ROWAA, approach where write operations execute only on all copies that are currently available. We will see later what that exactly means. Performing the mapping between logical and physical operations is not sufficient. Replica control must be tightly coupled with the mechanisms that achieve the transactional ACID properties: atomicity, consistency, isolation and durability. In fact, the ultimate goal is that the replicated system provides the same semantics as the original non-replicated system. This is what is termed as 1- copy-equivalence: the replicated system behaves like a 1-copy non-replicated system [Bernstein et al., 1987]. The ACID properties are all related to providing well-defined consistency in the advent of concurrent access and failures. When replicating a database, due to the distributed execution and the possibility of node failures, if no extra measures are taken, one can easily end up with transaction executions that would be disallowed in a non-replicated system. This means that designing a database replication solution implies to take care of 1-copy equivalence. In this chapter, we look at each of the ACID properties individually and discuss what it means to provide this property in a replicated environment, i.e., what does it mean to extend it with 1-copy-equivalence.