ebook img

Cassandra High Availability: Harness the power of Apache Cassandra to build scalable, fault-tolerant, and readily available applications PDF

248 Pages·2014·3.82 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Cassandra High Availability: Harness the power of Apache Cassandra to build scalable, fault-tolerant, and readily available applications

www.it-ebooks.info Cassandra High Availability www.it-ebooks.info Table of Contents Cassandra High Availability Credits About the Author About the Reviewers www.PacktPub.com Support files, eBooks, discount offers, and more Why subscribe? Free access for Packt account holders Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Errata Piracy Questions 1. Cassandra’s Approach to High Availability ACID The monolithic architecture The master-slave architecture Sharding Master failover Cassandra’s solution Cassandra’s architecture Distributed hash table Replication Replication across data centers Tunable consistency The CAP theorem Summary 2. Data Distribution Hash table fundamentals Distributing hash tables Consistent hashing The mechanics of consistent hashing Token assignment Manually assigned tokens www.it-ebooks.info vnodes How vnodes improve availability Adding and removing nodes Node rebuilding Heterogeneous nodes Partitioners Hotspots Effects of scaling out using ByteOrderedPartitioner A time-series example Summary 3. Replication The replication factor Replication strategies SimpleStrategy NetworkTopologyStrategy Snitches Maintaining the replication factor when a node fails Consistency conflicts Consistency levels Repairing data Balancing the replication factor with consistency Summary 4. Data Centers Use cases for multiple data centers Live backup Failover Load balancing Geographic distribution Online analysis Analysis using Hadoop Analysis using Spark Data center setup RackInferringSnitch PropertyFileSnitch GossipingPropertyFileSnitch Cloud snitches Replication across data centers Setting the replication factor Consistency in a multiple data center environment The anatomy of a replicated write Achieving stronger consistency between data centers Summary 5. Scaling Out www.it-ebooks.info Choosing the right hardware configuration Scaling out versus scaling up Growing your cluster Adding nodes without vnodes Adding nodes with vnodes How to scale out Adding a data center How to scale up Upgrading in place Scaling up using data center replication Removing nodes Removing nodes within a data center Decommissioning a data center Other data migration scenarios Snitch changes Summary 6. High Availability Features in the Native Java Client Thrift versus the native protocol Setting up the environment Connecting to the cluster Executing statements Prepared statements Batched statements Caution with batches Handling asynchronous requests Running queries in parallel Load balancing Failing over to a remote data center Downgrading the consistency level Defining your own retry policy Token awareness Tying it all together Falling back to QUORUM Summary 7. Modeling for High Availability How Cassandra stores data Implications of a log-structured storage Understanding compaction Size-tiered compaction Leveled compaction Date-tiered compaction CQL under the hood Single primary key Compound keys www.it-ebooks.info Partition keys Clustering columns Composite partition keys The importance of the storage model Understanding queries Query by key Range queries Denormalizing with collections How collections are stored Sets Lists Maps Working with time-series data Designing for immutability Modeling sensor data Queries Time-based ordering Using a sentinel value Satisfying our queries When time is all that matters Working with geospatial data Summary 8. Antipatterns Multikey queries Secondary indices Secondary indices under the hood Distributed joins Deleting data Garbage collection Resurrecting the dead Unexpected deletes The problem with tombstones Expiring columns TTL antipatterns When null does not mean empty Cassandra is not a queue Unbounded row growth Summary 9. Failing Gracefully Knowledge is power Monitoring via Java Management Extensions Using OpsCenter Choosing a management toolset Logging www.it-ebooks.info Cassandra logs Garbage collector logs Monitoring node metrics Thread pools Column family statistics Finding latency outliers Communication metrics When a node goes down Marking a downed node Handling a downed node Handling slow nodes Backing up data Taking a snapshot Incremental backups Restoring from a snapshot Summary Index www.it-ebooks.info Cassandra High Availability www.it-ebooks.info Cassandra High Availability Copyright © 2014 Packt Publishing All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews. Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book. Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information. First published: December 2014 Production reference: 1221214 Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK. ISBN 978-1-78398-912-6 www.packtpub.com www.it-ebooks.info Credits Author Robbie Strickland Reviewers Richard Low Jimmy Mårdell Rob Murphy Russell Spitzer Commissioning Editor Kunal Parikh Acquisition Editors Richard Harvey Owen Roberts Content Development Editors Samantha Gonsalves Azharuddin Sheikh Technical Editor Ankita Thakur Copy Editors Pranjali Chury Merilyn Pereira Project Coordinator Sanchita Mandal Proofreaders Simran Bhogal Maria Gould Ameesha Green Paul Hindle Indexer Rekha Nair Graphics www.it-ebooks.info

Description:
Apache Cassandra is a massively scalable, peer-to-peer database designed for 100 percent uptime, with deployments in the tens of thousands of nodes supporting petabytes of data. This book offers readers a practical insight into building highly available, real-world applications using Apache Cassandr
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.