ebook img

Fast and Scalable Cloud Data Management PDF

199 Pages·2020·6.625 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Fast and Scalable Cloud Data Management

Felix Gessert Wolfram Wingerath Norbert Ritter Fast and Scalable Cloud Data Management Fast and Scalable Cloud Data Management Felix Gessert • Wolfram Wingerath (cid:129) Norbert Ritter Fast and Scalable Cloud Data Management FelixGessert WolframWingerath BaqendGmbH BaqendGmbH Hamburg,Germany Hamburg,Germany NorbertRitter DepartmentofInformatics UniversityofHamburg Hamburg,Germany ISBN978-3-030-43505-9 ISBN978-3-030-43506-6 (eBook) https://doi.org/10.1007/978-3-030-43506-6 ©SpringerNatureSwitzerlandAG2020 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. Thepublisher,theauthors,andtheeditorsaresafetoassumethattheadviceandinformationinthisbook arebelievedtobetrueandaccurateatthedateofpublication.Neitherthepublishernortheauthorsor theeditorsgiveawarranty,expressedorimplied,withrespecttothematerialcontainedhereinorforany errorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregardtojurisdictional claimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG. Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Preface Our research for this book goes back a long way. It all started as early as 2010 with a bachelor’s thesis on how to make use of the purely expiration-based cachingmechanismsofthewebinadatabaseapplicationwithrigorousconsistency requirements.Strongencouragementbyfellowresearchersandgrowingdeveloper interesteventually madeusrealizethatthetaskofprovidinglowlatency forusers in globally distributed applications does not only pose an interesting research challengebutalsoanactualreal-worldproblemthatwasstillmostlyunsolved. ThisrevelationeventuallyledtothecreationofBaqend,aBackend-as-a-Service platform designed for developing fast web applications. We built Baqend on knowledgegatheredinmanybachelor’sandmaster’sandanumberofPhDtheses. Technically,Baqendisrootedinourresearchsystems,Orestesforwebcachingwith tunable consistency, itsextension Quaestor for query resultcaching, and InvaliDB for scalable push-based real-time queries for end-users. We are telling you all this for a reason: Because given its origin, this book does not only condense our knowledgeafteryearsofresearchdoneinapracticalcontextbutitalsoencapsulates ourviewontheconceptsandsystemsthatarecurrentlyoutthere.Whilewetryto provideabalancedoverviewofthecurrentstateofaffairsindatamanagementand webtechnology,weareclearlyopinionatedwithregardtocertainbestpracticesand architecturalpatterns.Wewouldliketoconsiderthisapositivetraitofthisbook— andwehopeyouagree;-) Hamburg,Germany FelixGessert Hamburg,Germany WolframWingerath Hamburg,Germany NorbertRitter December2019 v Contents 1 Introduction .................................................................. 1 1.1 ModernDataManagementandtheWeb............................... 2 1.2 Latencyvs.Throughput................................................. 3 1.3 ChallengesinModernDataManagement ............................. 3 1.4 OutlineoftheBook..................................................... 6 References..................................................................... 8 2 LatencyinCloud-BasedApplications ..................................... 13 2.1 Three-TierArchitectures................................................ 14 2.1.1 RequestFlow.................................................... 15 2.1.2 Implementation ................................................. 15 2.1.3 ProblemsofServer-RenderedArchitectures .................. 16 2.2 Two-TierArchitectures ................................................. 17 2.2.1 RequestFlow.................................................... 19 2.2.2 Implementation ................................................. 19 2.2.3 ProblemsofClient-RenderedArchitectures................... 21 2.3 LatencyandRound-TripTime ......................................... 21 2.4 CloudComputingasaSourceofLatency ............................. 23 2.4.1 Characteristics .................................................. 23 2.4.2 ServiceModels.................................................. 24 2.4.3 DeploymentModels............................................ 25 2.4.4 LatencyinCloudArchitectures................................ 26 References..................................................................... 27 3 HTTPforGloballyDistributedApplications............................. 33 3.1 HTTPandtheRESTArchitecturalStyle .............................. 33 3.2 LatencyontheWeb:TCP,TLSandNetworkOptimizations......... 35 3.3 WebCachingforScalabilityandLowLatency........................ 38 3.3.1 TypesofWebCaches........................................... 39 3.3.2 ScalabilityofWebCaching .................................... 40 3.3.3 Expiration-BasedWebCaching................................ 41 vii viii Contents 3.3.4 Invalidation-BasedWebCaching .............................. 43 3.3.5 ChallengesofWebCachingforDataManagement........... 44 3.4 TheClientPerspective:Processing,Rendering,andCaching forMobileandWebApplications...................................... 45 3.4.1 Client-SideRenderingandProcessing......................... 46 3.4.2 Client-SideCachingandStorage .............................. 48 3.5 ChallengesandOpportunities:UsingWebCachingforCloud DataManagement....................................................... 50 References..................................................................... 50 4 SystemsforScalableDataManagement .................................. 57 4.1 NoSQLDatabaseSystems.............................................. 57 4.2 DataModels:Key-Value,Wide-ColumnandDocumentStores...... 59 4.2.1 Key-ValueStores ............................................... 59 4.2.2 DocumentStores................................................ 60 4.2.3 Wide-ColumnStores ........................................... 60 4.3 PivotalTrade-Offs:Latencyvs.Consistencyvs.Availability......... 61 4.3.1 CAP ............................................................. 62 4.3.2 PACELC......................................................... 63 4.4 RelaxedConsistencyModels........................................... 63 4.4.1 StrongConsistencyModels .................................... 65 4.4.2 Staleness-BasedConsistencyModels.......................... 66 4.4.3 Session-BasedConsistencyModels ........................... 68 4.5 Offloading Complexity to the Cloud: Database- and Backend-as-a-Service................................................... 70 4.5.1 Database-as-a-Service.......................................... 70 4.5.2 Backend-as-a-Service........................................... 72 4.5.3 Multi-Tenancy .................................................. 73 4.6 Summary ................................................................ 75 References..................................................................... 75 5 CachinginResearchandIndustry......................................... 85 5.1 ReducingLatency:Replication,CachingandEdgeComputing...... 87 5.1.1 EagerandLazyGeo-Replication .............................. 87 5.1.2 Caching.......................................................... 88 5.1.3 EdgeComputing................................................ 88 5.1.4 Challenges....................................................... 89 5.2 Server-Side,Client-SideandWebCaching............................ 89 5.2.1 Server-SideCaching............................................ 89 5.2.2 Client-SideDatabaseCaching ................................. 90 5.2.3 CachinginObject-RelationalandObject-Document Mappers......................................................... 92 5.2.4 WebCaching.................................................... 93 5.3 CacheCoherence:Expiration-Basedvs.Invalidation-Based Caching.................................................................. 94 5.3.1 Expiration-BasedCaching...................................... 94 Contents ix 5.3.2 Leases ........................................................... 94 5.3.3 Piggybacking.................................................... 95 5.3.4 Time-to-Live(TTL) ............................................ 96 5.3.5 Invalidation-BasedCaching.................................... 96 5.3.6 BrowserCaching................................................ 99 5.3.7 WebPerformance............................................... 100 5.4 QueryCaching .......................................................... 100 5.4.1 Peer-to-PeerQueryCaching ................................... 101 5.4.2 Mediators........................................................ 101 5.4.3 QueryCachingProxiesandMiddlewares ..................... 101 5.4.4 SearchResultCaching.......................................... 102 5.4.5 SummaryDataStructuresforCaching ........................ 103 5.5 Eagervs.LazyGeoReplication........................................ 104 5.5.1 ReplicationandCaching ....................................... 104 5.5.2 EagerGeo-Replication ......................................... 105 5.5.3 LazyGeo-Replication .......................................... 108 5.6 Summary ................................................................ 113 References..................................................................... 114 6 TransactionalSemanticsforGloballyDistributedApplications........ 131 6.1 Latencyvs.DistributedTransactionProcessing....................... 131 6.1.1 DistributedTransactionArchitectures......................... 132 6.2 EntityGroupTransactions.............................................. 137 6.3 Multi-ShardTransactions............................................... 138 6.4 Client-CoordinatedTransactions....................................... 139 6.5 Middleware-CoordinatedTransactions ................................ 140 6.6 DeterministicTransactions ............................................. 141 6.7 Summary:Consistencyvs.LatencyinDistributedApplications ..... 142 References..................................................................... 143 7 PolyglotPersistenceinDataManagement ................................ 149 7.1 FunctionalandNon-functionalRequirements......................... 150 7.1.1 ImplementationofPolyglotPersistence....................... 152 7.2 Multi-TenancyandVirtualizationinCloud-BasedDeployments..... 154 7.2.1 DatabasePrivacyandEncryption.............................. 156 7.2.2 ServiceLevelAgreements...................................... 157 7.3 Auto-ScalingandElasticity............................................. 158 7.4 DatabaseBenchmarking................................................ 159 7.4.1 ConsistencyBenchmarking .................................... 159 7.5 RESTAPIs,Multi-ModelDatabasesandBackend-as-a-Service..... 160 7.5.1 Backend-as-a-Service........................................... 161 7.5.2 PolyglotPersistence ............................................ 163 7.6 Summary ................................................................ 164 References..................................................................... 164 x Contents 8 TheNoSQLToolbox:TheNoSQLLandscapeinaNutshell............ 175 8.1 Sharding................................................................. 176 8.1.1 RangePartitioning.............................................. 176 8.1.2 HashPartitioning ............................................... 177 8.1.3 Entity-GroupSharding ......................................... 177 8.2 Replication .............................................................. 178 8.3 StorageManagement.................................................... 179 8.4 QueryProcessing........................................................ 182 8.5 Summary:SystemStudies.............................................. 183 References..................................................................... 187 9 SummaryandFutureTrends............................................... 191 9.1 FromAbstractRequirementstoConcreteSystems.................... 192 9.2 FutureProspects......................................................... 193 About the Authors Felix Gessert is the CEO and co-founder of the Backend-as-a-Service company Baqend.1 During his PhD studies at the University of Hamburg, he developed the coretechnologybehindBaqend’swebperformanceservice.Heispassionateabout making the web faster by turning research results into real-world applications. He frequentlytalksatconferencesaboutexcitingtechnologytrendsindatamanagement andwebperformance.AsaJuniorFellowoftheGermanInformaticsSociety(GI), heisworkingonnewideastofacilitatetheresearchtransferofacademiccomputer scienceinnovationintopractice. Wolfram“Wolle”WingerathistheleadingdataengineeratBaqend1 whereheis responsible for data analytics and all things related to real-time query processing. During his PhD studies at the University of Hamburg, he conceived the scalable designbehindBaqend’sreal-timequeryengineandtherebyalsodevelopedastrong background in real-time databases and related technology such as scalable stream processing, NoSQL database systems, cloud computing, and Big Data analytics. Eager to connect with others and share his experiences, he regularly speaks at developerandresearchconferences. NorbertRitterisafullprofessorofcomputerscienceattheUniversityofHamburg, whereheheadstheDatabasesandInformationSystems(DBIS)group.Hereceived his PhD from the University of Kaiserslautern in 1997. His research interests includedistributedandfederateddatabasesystems,transactionprocessing,caching, clouddatamanagement,informationintegration,andautonomousdatabasesystems. He has been teaching NoSQL topics in various courses for several years. Seeing the many open challenges for NoSQL systems, he, Wolle, and Felix have been organizing the annual Scalable Cloud Data Management Workshop2 to promote researchinthisarea. 1Baqend:https://www.baqend.com/. 2ScalableCloudDataManagementWorkshop:https://scdm.cloud. xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.