Table of Contents Mastering Apache Cassandra Credits About the Author Acknowledgments About the Reviewers www.PacktPub.com Support files, eBooks, discount offers and more Why Subscribe? Free Access for Packt account holders Preface What this book covers What you need for this book Who this book is for Conventions Reader feedback Customer support Downloading the example code Errata Piracy Questions 1. Quick Start Introduction to Cassandra Distributed database High availability Replication Multiple data centers A brief introduction to a data model Installing Cassandra locally CRUD with cassandra-cli Cassandra in action Modeling data Writing code Setting up Application Summary 2. Cassandra Architecture Problems in the RDBMS world Enter NoSQL The CAP theorem Consistency Availability Partition-tolerance Significance of the CAP theorem Cassandra Cassandra architecture Ring representation How Cassandra works Write in action Read in action Components of Cassandra Messaging service Gossip Failure detection Partitioner Replication Log Structured Merge tree CommitLog MemTable SSTable Bloom filter Index files Datafiles Compaction Tombstones Hinted handoff Read repair and Anti-entropy Merkle tree Summary 3. Design Patterns The Cassandra data model The counter column The expiring column The super column The column family Keyspaces Data types – comparators and validators Writing a custom comparator The primary index The wide-row index Simple groups Sorting for free, free as in speech An inverse index with a super column family An inverse index with composite keys The secondary index Patterns and antipatterns Avoid storing an entity in a single column (wherever possible) Atomic update Managing time series data Wide-row time series High throughput rows and hotspots Advanced time series Avoid super columns Transaction woes Use expiring columns batch_mutate Summary 4. Deploying a Cluster Evaluating requirements Hard disk capacity RAM CPU Nodes Network System configurations Optimizing user limits Swapping memory Clock synchronization Disk readahead The required software Installing Oracle Java 6 RHEL and CentOS systems Debian and Ubuntu systems Installing the Java Native Access (JNA) library Installing Cassandra Installing from a tarball Installing from ASFRepository for Debian/Ubuntu Anatomy of the installation Cassandra binaries Configuration files Setting up Cassandra's data directory and commit log directory Configuring a Cassandra cluster The cluster name The seed node Listen, broadcast, and RPC addresses Initial token Partitioners The random partitioner The byte-ordered partitioner The Murmur3 partitioner Snitches SimpleSnitch PropertyFileSnitch GossipingPropertyFileSnitch RackInferringSnitch EC2Snitch EC2MultiRegionSnitch Replica placement strategies SimpleStrategy NetworkTopologyStrategy NetworkTopologyStrategy and multiple data center setups Launching a cluster with a script Creating a keyspace Authorization and authentication Summary 5. Performance Tuning Stress testing Performance tuning Write performance Read performance Choosing the right compaction strategy Size tiered compaction strategy Leveled compaction Row cache Key cache Cache settings Enabling compression Tuning the bloom filter More tuning via cassandra.yaml index_interval commitlog_sync column_index_size_in_kb commitlog_total_space_in_mb Tweaking JVM Java heap Garbage collection Other JVM options Scaling horizontally and vertically Network Summary 6. Managing a Cluster – Scaling, Node Repair, and Backup Scaling Adding nodes to a cluster Removing nodes from a cluster Removing a live node Removing a dead node Replacing a node Backup and restoration Using Cassandra bulk loader to restore the data Load balancing Priam – managing large clusters on AWS Summary 7. Monitoring Cassandra JMX interface Accessing MBeans using JConsole Cassandra nodetool Monitoring with nodetool cfstats netstats ring and describering tpstats compactionstats info Administrating with nodetool drain decommission move removetoken repair upgradesstable snapshot DataStax OpsCenter OpsCenter Features Installing OpsCenter and an agent Prerequisites Running a Cassandra cluster Installing OpsCenter from Tarball Setting up an OpsCenter agent Monitoring and administrating with OpsCenter Other features of OpsCenter Nagios – monitoring and notification Installing Nagios Prerequisites Preparation Installation Installing Nagios Configuring Apache httpd Installing Nagios plugins Setting up Nagios as a service Nagios plugins Nagios plugins for Cassandra Executing remote plugins via an NRPE plugin Installing NRPE on host machines Installing NRPE plugin on a Nagios machine Setting things up to monitor Monitoring and notification using Nagios Cassandra log Enabling Java Options for GC Logging Troubleshooting High CPU usage High memory usage Hotspots OpenJDK may behave erratically Disk performance Slow snapshot Getting help from the mailing list Summary 8. Integration Using Hadoop Hadoop and Cassandra Introduction to Hadoop HDFS – Hadoop Distributed File System Data management NameNode DataNodes Hadoop MapReduce JobTracker TaskTracker Reliability of data and process in Hadoop Setting up local Hadoop Testing the installation Cassandra with Hadoop MapReduce ColumnFamilyInputFormat ColumnFamilyOutputFormat ConfigHelper Wide-row support Bulk loading Secondary index support Cassandra and Hadoop in action Executing, debugging, monitoring, and looking at results Hadoop in Cassandra cluster Cassandra filesystem Integration with Pig Installing Pig Integrating Pig and Cassandra Cassandra and Solr Development note on Solandra DataStax Enterprise – the next level Solr integration Summary 9. Introduction to CQL 3 and Cassandra 1.2 CQL – the Cassandra Query Language CQL 3 for Thrift refugees Wide rows Composite columns CQL 3 basics The CREATE KEYSPACE query The CREATE TABLE query Compact storage Creating a secondary index The INSERT query The SELECT query select expression The WHERE clause The ORDER BY clause The LIMIT clause The USING CONSISTENCY clause The UPDATE query The DELETE query The TRUNCATE query The ALTER TABLE query Adding a new column Dropping an existing column Modifying the data type of an existing column Altering table options The ALTER KEYSPACE query BATCH querying The DROP INDEX query The DROP TABLE query The DROP KEYSPACE query The USE statement What's new in Cassandra 1.2? Virtual Nodes Off-heap Bloom filters JBOD improvements Parallel leveled compaction Murmur3 partitioner Atomic batches Query profiling Collections support Sets Lists Maps Support for programming languages Summary Index
Description: