High Performance in-memory computing with Apache Ignite Building low latency, near real time application Shamim Ahmed Bhuiyan, Michael Zheludkov and Timur Isachenko Thisbookisforsaleathttp://leanpub.com/ignite Thisversionwaspublishedon2017-05-09 ThisisaLeanpubbook.LeanpubempowersauthorsandpublisherswiththeLeanPublishing process.LeanPublishingistheactofpublishinganin-progressebookusinglightweighttoolsand manyiterationstogetreaderfeedback,pivotuntilyouhavetherightbookandbuildtractiononce youdo. ©2016-2017ShamimAhmedBhuiyan Tweet This Book! PleasehelpShamimAhmedBhuiyan,MichaelZheludkovandTimurIsachenkobyspreadingthe wordaboutthisbookonTwitter! Thesuggestedhashtagforthisbookis#shamim_ru. Findoutwhatotherpeoplearesayingaboutthebookbyclickingonthislinktosearchforthis hashtagonTwitter: https://twitter.com/search?q=#shamim_ru InmemoryofmyFather.-ShamimAhmedBhuiyan Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Whatthisbookcovers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CodeSamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Abouttheauthors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 WhatisApacheIgnite? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 ModernapplicationarchitecturewithApacheIgnite . . . . . . . . . . . . . . . . . . . . 8 WhousesApacheIgnite? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 WhyIgniteinsteadofothers? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 OurHope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 Chapterone:InstallationandthefirstIgniteapplication . . . . . . . . . . . . . . . . . . . 15 Pre-requisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 RunmultipleinstancesofApacheIgniteinasinglehost . . . . . . . . . . . . . . . . . . 18 Configureamulti-nodeclusterindifferenthost . . . . . . . . . . . . . . . . . . . . . . . 19 RestclienttomanipulatewiththeApacheIgnite . . . . . . . . . . . . . . . . . . . . . . 20 Javaclient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 SQLclient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 What’sNext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Chaptertwo:Architectureoverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Functionaloverview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 ClusterTopology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 ClientandServer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Embeddedwiththeapplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 ServerinseparateJVM(realclustertopology) . . . . . . . . . . . . . . . . . . . . . . 38 ClientandServerinseparateJVMonsinglehost . . . . . . . . . . . . . . . . . . . . . 39 CachingTopology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 CONTENTS Partitionedcachingtopology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Replicatedcachingtopology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Localmode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Cachingstrategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Cache-aside . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Read-throughandWrite-through . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Writebehind . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Datamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 CAPtheoremandwheredoesIgnitestandin? . . . . . . . . . . . . . . . . . . . . . . . . 48 Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Clustergroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Datacollocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 ComputecollocationwithData . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 ZeroSPOF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 HowSQLqueriesworksinIgnite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Multi-datacenterreplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Asynchronoussupport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Resilience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 KeyAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 What’snext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Chapterthree:In-memorycaching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 ApacheIgniteasa2ⁿlevelcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 MyBatis2ⁿlevelcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Hibernate2ⁿlevelcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Javamethodcaching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 WebsessionclusteringwithApacheIgnite . . . . . . . . . . . . . . . . . . . . . . . . . . 96 ApacheIgniteasabigmemory,off-heapmemory . . . . . . . . . . . . . . . . . . . . . . 109 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 What’snext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Chapterfour:Persistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 PersistenceIgnite’scache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 PersistenceinRDBMS(PostgreSQL) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 PersistenceinMongoDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Cachequeries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Scanqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Textqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 SQLqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Projectionandindexingwithannotations . . . . . . . . . . . . . . . . . . . . . . . . . 147 QueryAPI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 CONTENTS CollocateddistributedJoins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Non-collocateddistributedjoins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 PerformancetuningSQLqueries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 ApacheIgnitewithJPA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Expiration&EvictionofcacheentriesinIgnite . . . . . . . . . . . . . . . . . . . . . . . 166 Expiration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 Eviction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Transaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 Ignitetransactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 Transactioncommitprotocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 OptimisticTransactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 PessimisticTransactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Performanceimpactontransaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 What’snext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182 Chapterfive:AcceleratingBigDatacomputing . . . . . . . . . . . . . . . . . . . . . . . . 183 Hadoopaccelerator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 In-memoryMap/Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 UsingApachePigfordataanalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 Nearreal-timedataanalysiswithHive . . . . . . . . . . . . . . . . . . . . . . . . . . 204 ReplaceHDFSbyIgniteIn-memoryFileSystem(IGFS) . . . . . . . . . . . . . . . . . 210 Hadoopfilesystemcache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 IgniteforApacheSpark . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228 ApacheSpark–anintroduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 IgniteContext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 IgniteRDD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Preparingthesandbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Spark-shelltorunSparkjobs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 SparkapplicationexampleinScalatosharestates . . . . . . . . . . . . . . . . . . . . 238 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 What’snext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Chaptersix:Streamingandcomplexeventprocessing . . . . . . . . . . . . . . . . . . . . 242 Introducingdatastreamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 IgniteDataStreamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 StreamReceiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 StreamVisitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250 Cameldatastreamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 DirectIngestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 MediatedIngestion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Flumestreamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Stormdatastreamer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 CONTENTS Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 What’snext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Chapterseven:Distributedcomputing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 286 Computegrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 DistributedClosures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290 MapReduceandFork-join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Per-Nodesharestate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Distributedtasksession . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310 Faulttoleranceandcheckpointing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313 Collocationofcomputationanddata . . . . . . . . . . . . . . . . . . . . . . . . . . . 323 Jobscheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329 ServiceGrid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Developingservices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 Clustersingleton . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 338 Servicemanagementandconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . 340 DevelopingmicroservicesinIgnite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 Preface My first acquaintance with High load systems was at the beginning of 2007, and I started working on a real-world project since 2009. From that moment, I spent most of my office time with Cassandra,Hadoop,andnumerousCEPtools.OurfirstHadoopproject(theyear2011-2012)witha cluster of 54 nodes often disappointed me with its long startup time. I have never been satisfied with the performance of our applications and was always looking for something new to boost the performance of our information systems. During this time, I have tried HazelCast, Ehcache, Oracle Coherence as in-memory caches to gain the performance of the applications. I was usually disappointedfromthecomplexityofusingtheselibrariesorfromtheirfunctionallimitations. WhenIfirstencounteredApacheIgnite,Iwasamazed!ItwastheplatformthatI’dbeenwaitingon for a long time: a simple spring based framework with a lot of awesome features such as DataBase caching,Bigdataacceleration,Streamingandcompute/servicegrids. In 2015, I had participated in Russian HighLoad++ conference¹ with my presentation and started blogging in Dzone/JavaCodeGeeks and in my personal blog² about developing High-load systems. They became popular shortly, and I received a lot of feedback from the readers. Through them, I clarifiedtheideabehindthebook.Thegoalofthebookwastoprovideaguideforthosewhoreally need to implement an in-memory platform in their projects. At the same time, the idea behind the bookisnotwritingamanual.AlthoughtheApacheIgniteplatformisverybigandgrowingdayby day, we concentrate only on the features of the platform (from our point of view) that can really helptoimprovetheperformanceoftheapplications. We hope that High-performance in-memory computing with Apache Ignite will be the go-to guide for architects and developers: both new and at an intermediate level, to get up and to develop with aslittlefrictionaspossible. ShamimAhmed What this book covers Introductiongivesanoverviewofthetrendsthathavemadein-memorycomputingsuchimportant technology today. By the end of this chapter, you will have a clear idea of what Apache Ignite are and how can you design application with Apache Ignite for getting maximum performance from yourapplication. Chapter one - Installation and the first Ignite application walks you through the initial setup of an Ignite grid and running of some sample application. At the end of the chapter, you will implement ¹http://www.highload.ru/2015/abstracts/1875.html ²http://frommyworkshop.blogspot.ru Preface 2 yourfirstsimpleIgniteapplicationtoreadandwriteentriesfromtheCache.Youwillalsolearnhow toinstallandconfigureanSQLIDEtorunSQLqueriesagainstIgnitecaches. Chaptertwo-Architectureoverview coversthefunctionalandarchitectureoverviewoftheApache Ignitedatafabrics.HereyouwilllearntheconceptsandtheterminologyoftheApacheIgnite.This chapter introduces the main features of Apache Ignite such as cluster topology, caching topology, caching strategies, transactions, Ignite data model, data collocation and how SQL queries works in ApacheIgnite.Youwillbecomefamiliarwithsomeotherconceptslikemulti-datacenterreplication, Igniteasynchronoussupportandresilienceabilities. Chapter three - In-memory caching presents some of the popular Ignite data grid features, such as 2nd level cache, java method caching, web session clustering and off-heap memory. This chapter covers developments and technics to improve the performance of your existing web applications withoutchanginganycode. Chapterfour-Persistenceguidesyouthroughtheimplementationoftransactionsandpersistenceof theApacheIgnitecache.Thischapterexploresindepth:SQLfeatureandtransactionoftheApache Ignite. Chapterfive-AcceleratingBigDatacomputing,wefocusonmoreadvancedfeaturesandextensions totheIgniteplatform.Inthischapter,wewilldiscussthemainproblemsoftheHadoopecosystems andhowIgnitecanhelptoimprovetheperformanceoftheexistsHadoopjobs.Wedetailthethree main features of the Ignite Hadoop accelerator: in-memory Map/Reduce, IGFS, and Hadoop file system cache. We also provide examples of using Apache Pig and Hive to run Map/Reduce jobs on top of the Ignite in-memory Map/Reduce. At the end of the chapter, we show how to share states in-memoryacrossdifferentSparkapplicationseasily. Chapter six - Streaming and complex event processing takes the next step and goes beyond using ApacheIgnitetosolvecomplexreal-timeeventprocessingproblem.ThischaptercovershowIgnite canbeusedeasilywithotherBigDatatechnologiessuchasflume,storm,andcameltosolvevarious business problems. We will guide you through with a few complete examples for developing real- timedataprocessingonApacheIgnite. Chapterseven-Distributivecomputing covers,howIgnitecanhelpyoutoeasilydevelopMicroser- vice like application, which will be performed in parallel fashion to gain high performance, low latency,andlinearscalability.YouwilllearnaboutIgniteMapReduce&ForkJoin,Distributedclosure execution,continuousmapping,etc.fordataprocessingacrossmultiplenodesinthecluster. Code Samples Allcodesamples,scripts,andmorein-depthexamplescanbefoundonGitHubatGitHubrepo³ ³https://github.com/srecon/ignite-book-code-samples
Description: