ebook img

Cherry Garcia: Transactions across Heterogeneous Data Stores PDF

255 Pages·2016·2.68 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Cherry Garcia: Transactions across Heterogeneous Data Stores

Copyright and use of this thesis This thesis must be used in accordance with the provisions of the Copyright Act 1968. Reproduction of material protected by copyright may be an infringement of copyright and copyright owners may be entitled to take legal action against persons who infringe their copyright. Section 51 (2) of the Copyright Act permits an authorized officer of a university library or archives to provide a copy (by communication or otherwise) of an unpublished thesis kept in the library or archives, to a person who satisfies the authorized officer that he or she requires the reproduction for the purposes of research or study. The Copyright Act grants the creator of a work a number of moral rights, specifically the right of attribution, the right against false attribution and the right of integrity. You may infringe the author’s moral rights if you: - fail to acknowledge the author of this thesis if you quote sections from the work - attribute this thesis to another author - subject this thesis to derogatory treatment which may prejudice the author’s reputation For further information contact the University’s Copyright Service. sydney.edu.au/copyright CHERRY GARCIA: TRANSACTIONS ACROSS Linking Named Entities to Wikipedia HETEROGENEOUS DATA STORES Will Radford Supervisor: Dr.JamesR.Curran Athesissubmitted infulfilmentoftherequirements forthedegreeofDoctorofPhilosophy A thesis submitteSdchoinoloffuIlnfifolrmmaetniotnToefchtnholeogrieesquirements for the FacultyofEngineering&IT degree of Doctor of Philosophy in the School of Information Technologies at TheUniversityofSydney The University of Sydney 2015 Akon Samir Dey October 2015 © Copyright by Akon Samir Dey 2016 All Rights Reserved ii Abstract In recent years, cloud or utility computing has revolutionised the way software, hardware and network infrastructure is provisioned and deployed into production. A key component of these vast, diverse and heterogeneous systems is the per- sistence layer provided by a variety of data store and database services, broadly categorised into what is referred to as NoSQL (Not only SQL) databases or data stores. These come in many flavours from simple key-value stores and column stores to database services with support for SQL-like interfaces. These systems are primarily designed to operate at internet-scale with high scalability and fault-tolerance in mind. As a result, they typically sacrifice consis- tency guarantees and often support only single-item consistent operations or no transactions at all. While these consistency limitations are fine for a wide class of applications, there are a few or sometimes only parts of larger applications that need ACID transactional guarantees in order to function correctly. To address this, we define a data store client API, we call REST+T (REST withTransactions), anextensionofHTTPthatsupportstransactionsononestore. Then, we use this to define a client-coordinated transaction commitment protocol and library, called Cherry Garcia, to enable easy applications development across diverse, heterogeneous data stores that each support single-item transactions. We extend the well-known YCSB benchmark, to present YCSB+T, to enable us to group multiple data store operations into ACID transactions and evaluate proper- ties such as throughput. YCSB+T also provides the ability to detect and quantify data store anomalies that result from the execution of the workload. Finally, we describe our prototype implementations of REST+T in a system called Tora, and ourclient-coordinatedtransactionlibrary, alsocalledCherryGarcia, thatsupports transactions across Windows Azure Storage (WAS), Google Cloud Storage (GCS) and Tora. We evaluate these using both YCSB+T and micro-benchmarks. iii Acknowledgements This thesis has enabled me to discover my strengths and overcome my weaknesses. It has taught me to appreciate the friendship and support of family, friends and colleagues who have enabled, empowered and encouraged me in this endeavour. I would like to express my deepest gratitude to my supervisor, Prof. Alan Fekete, forhissupportandguidance. Hiskindness, resourcefulnessandwillingness to help in my research work and outside of it, made this dissertation possible. My heartfelt thanks go to my co-supervisor, Assoc. Prof. Uwe R¨ohm, for his guidance, support, encouragement and ideas on presenting my work. He has complemented Alan as a guiding force behind this thesis. IthankmembersoftheDatabaseResearchGroup,MichaelCahill,JackGalilee, Hyungsoo Jung, Meena Rajani, Ying Zhou, Vincent Gramoli and Paul Greenfield for their feedback and support. I also grateful to Arunmoezhi Ramachandran, Raghunath Nambiar, Sherif Sakr, David Bermbach, J¨orn Kuhlemkamp for their collaboration. Many thanks to Lynne Hutton-Williams and Robyn Kemmis for their friendship, kindness and timely help. I am eternally grateful to my parents, Bharati and Samir Dey, for their uncon- ditional love, a wonderful childhood, and for instilling in me the curiosity to learn. I thank my brother, Kautuk Dey, for his encouragement and support during this work. Mostofall, Iwouldliketothankmywife, Aditi, forherloveandunderstanding during this very challenging time of our life. I could not have done this without you. v Contents Abstract iii Acknowledgements v List of Figures xvi List of Algorithms xvii 1 Introduction 1 1.1 Problem definition . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Relevant publications . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.4 Organisation of the Thesis . . . . . . . . . . . . . . . . . . . . . . . 4 2 Background 5 2.1 Motivating example . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Database Management Systems . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Transactions in Database Management Systems . . . . . . . 7 2.2.2 Properties of Transactions . . . . . . . . . . . . . . . . . . . 8 2.2.3 Transaction Programming Interface . . . . . . . . . . . . . . 9 2.2.4 Requirements of Transactional Systems . . . . . . . . . . . . 9 2.3 Distributed Concurrency Control . . . . . . . . . . . . . . . . . . . 10 2.4 Serializability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4.1 Serializability in Distributed Databases . . . . . . . . . . . . 12 2.4.2 Serializability in Heterogeneous Federation . . . . . . . . . . 13 2.4.3 Weak Isolation . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Distributed Transaction Recovery . . . . . . . . . . . . . . . . . . . 15 vii 2.5.1 The Basic Two-Phase Commit Algorithm . . . . . . . . . . 16 2.5.2 The Transaction Tree Two-Phase Commit Algorithm . . . . 26 2.5.3 Optimised Algorithms for Distributed Commit . . . . . . . . 28 2.6 Distributed systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.7 NoSQL and NewSQL database systems . . . . . . . . . . . . . . . . 35 2.8 Transactions in NoSQL database systems . . . . . . . . . . . . . . . 38 2.9 Evaluating database systems . . . . . . . . . . . . . . . . . . . . . . 39 2.9.1 TPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.9.2 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.9.3 Evaluation metrics . . . . . . . . . . . . . . . . . . . . . . . 42 2.9.4 Properties of a good benchmark . . . . . . . . . . . . . . . . 43 2.10 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3 Related Work 47 3.1 Cloud and web services . . . . . . . . . . . . . . . . . . . . . . . . . 47 3.2 Distributed Concurrency Control . . . . . . . . . . . . . . . . . . . 48 3.3 Distributed Transaction Recovery . . . . . . . . . . . . . . . . . . . 49 3.4 Transaction Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 Transactions over HTTP . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 Transaction support in NoSQL and Cloud Storage Systems . . . . . 52 3.7 Middleware coordinated transactions . . . . . . . . . . . . . . . . . 54 3.8 Client coordinated transactions . . . . . . . . . . . . . . . . . . . . 54 3.8.1 Comparison to our approach . . . . . . . . . . . . . . . . . . 56 3.9 Time in Distributed Systems . . . . . . . . . . . . . . . . . . . . . . 56 3.10 Database benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.10.1 Cloud services benchmarks . . . . . . . . . . . . . . . . . . . 58 3.10.2 YCSB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.11 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 4 REST+T: Scalable transactions over HTTP 63 4.1 State and state change . . . . . . . . . . . . . . . . . . . . . . . . . 65 4.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.3 REST with Transaction support . . . . . . . . . . . . . . . . . . . . 69 4.4 The REST+T Proposal . . . . . . . . . . . . . . . . . . . . . . . . 69 viii 4.4.1 REST+T data header metadata . . . . . . . . . . . . . . . . 72 4.4.2 REST+T methods . . . . . . . . . . . . . . . . . . . . . . . 75 4.5 Limiting REST+T to HTTP verbs . . . . . . . . . . . . . . . . . . 77 4.6 A Case Study - Tora: a transaction-aware NoSQL data store . . . . 78 4.6.1 REST+T interface . . . . . . . . . . . . . . . . . . . . . . . 79 4.6.2 Storage layer . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 5 Cherry Garcia - The Protocol 81 5.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.2 Intuition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 5.2.1 Overview of Two-Phase Commit . . . . . . . . . . . . . . . . 84 5.2.2 Distributed Two-Phase Commit . . . . . . . . . . . . . . . . 86 5.2.3 Write-Ahead Log (WAL) Protocol . . . . . . . . . . . . . . . 87 5.3 An example application . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.4 Assumptions on the platforms . . . . . . . . . . . . . . . . . . . . . 90 5.5 Protocol overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.5.1 Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.5.2 Phase 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.6 Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.7 Data item records . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.8 Data store abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 95 5.9 Transaction abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.10 Cherry Garcia Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.10.1 Start transaction . . . . . . . . . . . . . . . . . . . . . . . . 97 5.10.2 Transactional read . . . . . . . . . . . . . . . . . . . . . . . 97 5.10.3 Transactional write . . . . . . . . . . . . . . . . . . . . . . . 98 5.10.4 Transaction commit . . . . . . . . . . . . . . . . . . . . . . . 100 5.10.5 Transaction abort . . . . . . . . . . . . . . . . . . . . . . . . 102 5.10.6 Transaction recovery . . . . . . . . . . . . . . . . . . . . . . 103 5.10.7 Performance Analysis of the Protocol . . . . . . . . . . . . . 103 5.11 Deadlock detection and avoidance . . . . . . . . . . . . . . . . . . . 104 5.11.1 First preparer wins . . . . . . . . . . . . . . . . . . . . . . . 104 5.11.2 Ensuring at least one winner . . . . . . . . . . . . . . . . . . 105 ix

Description:
1Cherry Garcia is a name of a Ben & Jerry's ice-cream flavour with heterogeneous aspects of chocolate and fruit. The books by Mullender [98], Lynch [90], and Coulouris [31] are good sources on broader topic .. JSON and JavaScript and frameworks like Twitter's Bootstrap and Google's AngularJS.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.