Conflict -aware replication for dynamic content Web sites
Doctor of Philosophy
Conflict-aware replication is a novel lazy replication technique for scaling the back-end database of a dynamic content web server using a cluster of commodity computers. This technique provides both throughput scaling and 1-copy serializability. It has generally been believed that this combination is hard to achieve through replication because of the growth of the number of conflicts. Conflict-aware replication interposes a (possibly replicated) scheduler between the database and application server tiers. The conflict-aware scheduler directs incoming queries in such a way that the overall execution is serializable and the number of conflicts is reduced. The technique requires that the incoming transactions specify the tables that they access at the beginning of the transaction. Using this information, conflict-aware replication provides both scaling and 1-copy serializability, while it avoids making any changes to the application server or database. We have implemented a prototype of the conflict-aware scheduler in a cluster-based dynamic content site. We have also implemented various other scheduler algorithms in this prototype for comparison purposes, including conflict-aware and oblivious, with 1-copy serializability and with different looser consistency models. We have evaluated this method using the industry standard TPC-W e-commerce benchmark, an auction site benchmark, modeled after eBay.com, and a bulletin board benchmark, modeled after slashdot.org. For these applications, we have found that pre-specifying what tables are accessed involves very little work on behalf of the programmer and could easily be automated. For clusters with small number of database machines (up to 8) we have measured an implementation of the algorithms. We use simulation to extend our measurement results to larger clusters, faster database engines, and lower conflict rates. This dissertation shows that conflict-awareness brings considerable benefits in terms of both overall throughput scaling and latency reduction compared to both eager and conflict-oblivious lazy replication for a large range of cluster configurations and conflict rates. Furthermore, for all our applications, except those with very high conflict rates, the performance of conflict-aware replication equals or approaches that of looser consistency models. The dissertation also shows that the cost of conflict-aware replication is minimal in terms of data availability and fault tolerance.
Computer science; Applied sciences; Conflict-aware; Replication Scheduling Web sites