ebook img

Automatic scaling of Cassandra clusters Master thesis Tor Andreas Baakind PDF

168 Pages·2013·2.16 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Automatic scaling of Cassandra clusters Master thesis Tor Andreas Baakind

UNIVERSITY OF OSLO Department of Informatics Automatic scaling of Cassandra clusters Master thesis Tor Andreas Baakind April 30, 2013 Automatic scaling of Cassandra clusters TorAndreasBaakind April30,2013 ii Abstract Thepurposeofthisthesisistocreateanautomaticscalingimplementation forCassandraclusters. Theautomaticscalershouldneverlowertheoverall performance of the cluster in a way that results in a bad user experience. It should also be able to successfully scale up and down nodes, and the cluster should continue as if nothing happened. Last but not least, it is desirable that the automatic scaler performs equally, or better than, the personwhoisinchargeofadministratingthedatabase. In this thesis we have developed an early version of an autoscaler that may run alongside a Cassandra instance. The implementation is split into two separate implementations: a master-, and an agent-implementation. Themasterwillbedeployedtothesameserverastheapplicationusingthe cluster,eventhoughthisisnotrequired. Theagentimplementationwillbe deployed to, and run alongside, all nodes that are a part of the cluster. The agent will monitor the node’s resource usage, and send messages backtothemasteriftheusageincreasesabove,ordecreasesbelowcertain thresholds. Weperformedasetoftestcasestoprovethattheimplementationworks asintended. Thetestcasesrecordedthenodesresource-usagetodetermine theimpactourimplementationmakestotheoverallperformance. iii iv Acknowledgments I would like to thank my supervisors, Ketil Velle and Dag Langmyhr, for their guidance and valuable feedback. This thesis would not have been completed without them. I would also like to thank Jørgen Sørensen for proofreadingthethesis. I also want to thank my girlfriend Anniken, for being so supportive and understanding throughout the thesis work. And finally, I would like tothankmyparentsforbelievinginme. v vi Contents I Introduction 1 1 Introduction 3 1.1 Problemdefinition . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 II Background 7 2 Motivation 9 3 NoSQL 13 3.1 Databasetransactions . . . . . . . . . . . . . . . . . . . . . . . 13 3.1.1 TheACIDsacrifice . . . . . . . . . . . . . . . . . . . . 14 3.2 Brewer’sCAPtheorem . . . . . . . . . . . . . . . . . . . . . . 16 3.3 NoSQLdatastores . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Extensiblerecordstores . . . . . . . . . . . . . . . . . 18 3.3.2 Key-valuestores . . . . . . . . . . . . . . . . . . . . . 20 3.3.3 Documentstores . . . . . . . . . . . . . . . . . . . . . 21 3.3.4 Graphdatabases . . . . . . . . . . . . . . . . . . . . . 22 3.3.5 Otherknowndatamodelsthatarenotcovered . . . . 22 3.3.6 NoSQLadvantagesanddisadvantages . . . . . . . . 23 4 Cassandra 27 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.2 Datamodel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3 Nodeandclusterconfiguration . . . . . . . . . . . . . . . . . 30 4.4 Thegossipprotocol . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4.1 Hintedhandoffs . . . . . . . . . . . . . . . . . . . . . 33 4.5 Merkletree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6 Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.6.1 Single-threadedstages . . . . . . . . . . . . . . . . . . 34 4.6.2 Multi-threadedstages . . . . . . . . . . . . . . . . . . 35 4.7 NodeTool . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 vii 5 Relatedwork 39 5.1 Netflix’sPriam. . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1 AmazonWebServices . . . . . . . . . . . . . . . . . . 39 5.1.2 Netflix’smotivation . . . . . . . . . . . . . . . . . . . 39 5.1.3 WhywedidnotchoosePriam . . . . . . . . . . . . . 40 5.2 Hector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.2 WhywedidnotuseHector . . . . . . . . . . . . . . . 42 III Theproject 43 6 Introduction 45 6.1 NamingtheimplementationHecuba . . . . . . . . . . . . . . 45 7 Goalsandmethodology 47 7.1 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.1 Kanban . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 7.2.2 Storypoints . . . . . . . . . . . . . . . . . . . . . . . . 49 7.2.3 Velocitytrack . . . . . . . . . . . . . . . . . . . . . . . 49 7.3 Sourcecontrol . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 7.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 8 Failedattempts 53 8.1 IncludeHecubaintoCassandra’ssourcecode . . . . . . . . . 53 8.2 ImplementedasanextensiontoexistingJava-projects . . . . 54 8.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 9 Hecubadesign 57 9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 9.2 Loadbalanceissues . . . . . . . . . . . . . . . . . . . . . . . . 57 9.2.1 Tokenrange . . . . . . . . . . . . . . . . . . . . . . . . 59 9.2.2 Token-generation . . . . . . . . . . . . . . . . . . . . . 60 9.2.3 LoadbalancedifferencesbetweenHecubaandPriam 60 9.3 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . 60 9.4 Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 9.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 10 Hecubaimplementation 65 10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 10.2 Codeseparation . . . . . . . . . . . . . . . . . . . . . . . . . . 65 10.2.1 Autoscale . . . . . . . . . . . . . . . . . . . . . . . . . 66 10.2.2 Autoscale-common . . . . . . . . . . . . . . . . . . . . 69 10.2.3 Autoscale-agent . . . . . . . . . . . . . . . . . . . . . . 70 10.3 Toolsandframeworks . . . . . . . . . . . . . . . . . . . . . . 73 10.3.1 Maven . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 10.3.2 SigarAPI . . . . . . . . . . . . . . . . . . . . . . . . . . 73 10.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 viii

Description:
hard to determine what generated the memory increase, but it may have been a result of Cassandra executing background processes. Cassandra periodically compact data and keeps as much data as possible in main memory for faster access. Even though the cluster was “empty”, there were still a few
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.