Geographically Distributed Database Management at the Cloud’s Edge by Ca˘t˘alin-Alexandru Avram A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Doctor of Philosophy in Computer Science Waterloo, Ontario, Canada, 2017 (cid:13)c Ca˘t˘alin-Alexandru Avram 2017 Examining Committee Membership The following served on the Examining Committee for this thesis. The decision of the Examining Committee is by majority vote. External Examiner: Patrick Martin Professor, School of Computing, Queen’s University Supervisor: Ken Salem Professor, Cheriton School of Computer Science, University of Waterloo Internal Members: Bernard Wong Associate Professor, Cheriton School of Computer Science, University of Waterloo ¨ Tamer Ozsu Professor, Cheriton School of Computer Science, University of Waterloo Internal-External Member: Wojciech Golab Assistant Professor, Electrical and Computer Engineering, University of Waterloo iii I hereby declare that I am the sole author of this thesis. This is a true copy of the thesis, including any required final revisions, as accepted by my examiners. I understand that my thesis may be made electronically available to the public. v Abstract Request latency resulting from the geographic separation between clients and remote ap- plication servers is a challenge for cloud-hosted web and mobile applications. Numerous studies have shown the importance of low latency to the end user experience. Small re- sponse time increases on the order of a few hundred milliseconds directly translate to reduced user satisfaction and loss of revenue that persist even after a low latency environ- ment is restored. One way to address this challenge in geo-distributed settings is to push all or part of the application, along with the data it requires, to the edge of the cloud - closer to application clients. This thesis explores the idea of taking advantage of clients’ proximity to the edge of the network in order to reduce request latencies. SpearDB is a prototype replicated distributed database system which operates in a star network topology, with a core site and a large number of edge sites that are close to clients. Clients access the nearest edge, which holds replicas of locally relevant portions of the database. SpearDB’s edge sites coordinate through the core to provide a global trans- actional consistency guarantee (parallel snapshot isolation or PSI), while handling as much work locally as possible. SpearDB provides full general purpose transactional semantics with ACID guarantees. Experiments show that SpearDB is effective at reducing workload latencies for applications whose access patterns are geographically localizable. Many appli- cations fit this criteria: bulletin boards (e.g., Craigslist, Kijiji), local commerce or services (e.g., Groupon, Uber), booking and ticketing (e.g., OpenTable, StubHub), location based services (mapping, directions, augmented reality), local news outlets and client-centric ser- vices (e-mail, rss feeds, gaming). SpearDB introduces protocols for executing application transactions in a geo-distributed setting under strong consistency guarantees. These pro- tocols automatically hide the complexity as well as much of the latency introduced by geo-distribution from applications. The effectiveness of SpearDB depends on the placement of primary and secondary replicas at core and edge sites. The secondary replica placement problem is shown to be NP-hard. Several algorithms for automatic data partitioning and replication are presented to provide approximate solutions. These algorithms work in a geo-distributed core-edge setting under partial replication. Their goal is to bring data closer to clients in order to lower request latencies. Experimental comparisons of the resulting placements’ latency impact show good results. Surprisingly however, the placements produced by the simplest of the proposed algorithms are comparable in quality to those produced by more complex approaches. vii Acknowledgments I would like to express my gratitude towards my supervisor, Dr. Ken Salem who has been incredibly helpful throughout the long process of developing all the work that is described in this document and instrumental to its completion. I have learned a lot from him both personally and professionally. A special thanks to Dr. Bernard Wong who also took part in the projects that have lead to this thesis and provided me with much needed guidance along the way. ¨ I would like to thank the other members of my committee as well: Dr. Tamer Ozsu, Dr. Patrick Martin and Dr. Wojciech Golab; for taking the time to read and review this thesis. And finally I would like to thank my family for all the support and understanding they have provided throughout my studies. ix
Description: