ebook img

Pro Couchbase Server PDF

329 Pages·2014·9.921 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Pro Couchbase Server

BOOKS FOR PROFESSIONALS BY PROFESSIONALS® Ostrovsky Rodenski Pro Couchbase Server RELATED Pro Couchbase Server is a hands-on guide for developers and administrators who want to take advantage of the power and scalability of Couchbase Server in their applications. This book takes you from the basics of NoSQL database design, through application development, to Couchbase Server administration. Never have document databases been so powerful and performant. The NoSQL movement has fundamentally changed the database world in recent years. Influenced by the growing needs of web-scale applications, NoSQL databases such as Couchbase Server provide new approaches to scalability, reliability, and performance. With the power and flexibility of Couchbase Server, you can model your data however you want, and easily change the data model any time you want. Pro Couchbase Server provides all that you need to take full advantage of Couchbase for production workloads. Learn to install and configure Couchbase, and to manage your installation day-by-day. Design good data models that can scale. Implement advanced query techniques such as ElasticSearch for access to full-text query support. Pro Couchbase Server shows what is possible and helps you take full advantage of Couchbase Server and all the performance and scalability that it offers. • Helps you design and develop a document database using Couchbase Server • Takes you through deploying and maintaining Couchbase Server • Gives you the tools to scale out your application as needed Shelve in ISBN 978-1-4302-6613-6 Databases/General 54999 User level: Intermediate–Advanced SOURCE CODE ONLINE 9781430266136 www.apress.com For your convenience Apress has placed some of the front matter material after the index. Please use the Bookmarks and Contents at a Glance links to access them. Contents at a Glance About the Authors �������������������������������������������������������������������������������������������������������������xvii About the Technical Reviewers �����������������������������������������������������������������������������������������xix Acknowledgments �������������������������������������������������������������������������������������������������������������xxi Introduction ���������������������������������������������������������������������������������������������������������������������xxiii ■ Part I: Getting Started �����������������������������������������������������������������������������������1 ■ Chapter 1: Getting Started with Couchbase Server ����������������������������������������������������������������3 ■ Chapter 2: Designing Document-Oriented Databases with Couchbase ���������������������������17 ■ Part II: Development �����������������������������������������������������������������������������������29 ■ Chapter 3: The Couchbase Client Libraries ����������������������������������������������������������������������31 ■ Chapter 4: CRUD and Key-Based Operations �������������������������������������������������������������������57 ■ Chapter 5: Working with Views ���������������������������������������������������������������������������������������79 ■ Chapter 6: The N1QL Query Language �����������������������������������������������������������������������������99 ■ Chapter 7: Advanced Couchbase Techniques ����������������������������������������������������������������121 ■ Chapter 8: ElasticSearch Integration �����������������������������������������������������������������������������143 ■ Part III: Couchbase at Scale ���������������������������������������������������������������������163 ■ Chapter 9: Sizing and Deployment Considerations��������������������������������������������������������165 ■ Chapter 10: Basic Administration ����������������������������������������������������������������������������������177 ■ Chapter 11: Monitoring and Best Practices �������������������������������������������������������������������207 ■ Chapter 12: Couchbase Server in the Cloud ������������������������������������������������������������������241 ■ Chapter 13: Cross-Datacenter Replication (XDCR) ��������������������������������������������������������267 v ■ Contents at a GlanCe ■ Part IV: Mobile Development with Couchbase ������������������������������������������281 ■ Chapter 14: Couchbase Lite on Android ������������������������������������������������������������������������283 ■ Chapter 15: Couchbase Lite on iOS ��������������������������������������������������������������������������������293 ■ Chapter 16: Synchronizing Data with the Couchbase Sync Gateway ����������������������������301 Index ���������������������������������������������������������������������������������������������������������������������������������313 vi Introduction Ever since we decided to start writing this book, there has been one question which kept popping up whenever someone heard about it: why Couchbase Server? The immediate answer was obvious: because we absolutely love it. But putting aside our natural enthusiasm for every piece of new technology that comes out, Couchbase Server does have a few distinct characteristics that make it stand out from other NoSQL solutions. The first distinguishing feature of Couchbase Server is that it’s blazingly fast. Couchbase Server keeps coming at the top of every performance benchmark, some of which were commissioned by its competitors. This is mostly due to a solid caching layer it inherited from one of its ancestors: memcached. Next is the fact that Couchbase Server scales exceedingly well. While the NoSQL movement promotes scalability and some products imply scalability in their name, only a few products have actually proven themselves in large scale. Couchbase Server scales and does so in a very easy and streamlined manner. Moreover, Couchbase Server can also scale down if needed, making it a perfect match to run in an elastic cloud environment. High availability is another important aspect of Couchbase Server architecture. There is no single point of failure in a Couchbase Server cluster, since the clients are aware of the topology of the entire cluster, including where every document is located. In addition the documents are replicated across multiple nodes and can be accessed even if some nodes are unavailable. For those reasons and many others, we found Couchbase Server to be a fascinating technology. One that is worth investing long months of studying into, just to create a solid knowledge base which others can use. We hope this book will be helpful to all who wish to make the most of Couchbase Server. xxiii Part i Getting Started Chapter 1 Getting Started with Couchbase Server Relational databases have dominated the data landscape for over three decades. Emerging in the 1970s and early 1980s, relational databases offered a searchable mechanism for persisting complex data with minimal use of storage space. Conserving storage space was an important consideration during that era, due to the high price of storage devices. For example, in 1981, Morrow Designs offered a 26 MB hard drive for $3,599—which was a good deal compared to the 18 MB North Star hard drive for $4,199, which had appeared just six months earlier. Over the years, the relational model progressed, with the various implementations providing more and more functionality. One of the things that allowed relational databases to provide such a rich set of capabilities was the fact that they were optimized to run on a single machine. For many years, running on a single machine scaled nicely, as newer and faster hardware became available in frequent intervals. This method of scaling is known as vertical scaling. And while most relational databases could also scale horizontally—that is, scale across multiple machines—it introduced additional complexity to the application and database design, and often resulted in inferior performance. From SQL to NoSQL This balance was finally disrupted with the appearance of what is known today as Internet scale, or web scale, applications. Companies such as Google and Facebook needed new approaches to database design in order to handle the massive amounts of data they had. Another aspect of the rapidly growing industry was the need to cope with constantly changing application requirements and data structure. Out of these new necessities for storing and accessing large amounts of frequently changing data, the NoSQL movement was born. These days, the term NoSQL is used to describe a wide range of mechanisms for storing data in ways other than with relational tables. Over the past few years, dozens of open-source projects, commercial products, and companies have begun offering NoSQL solutions. The CAP Theorem In 2000, Eric Brewer, a computer scientist from the University of California, Berkeley, proposed the following conjecture: It is impossible for a distributed computer system to satisfy the following three guarantees simultaneously (which together form the acronym CAP): • Consistency: All components of the system see the same data. • Availability: All requests to the system receive a response, whether success or failure. • Partition tolerance: The system continues to function even if some components fail or some message traffic is lost. 3 Chapter 1 ■ GettinG Started with CouChbaSe Server A few years later, Brewer further clarified that consistency and availability in CAP should not be viewed as binary, but rather as a range—and distributed systems can compromise with weaker forms of one or both in return for better performance and scalability. Seth Gilbert and Nancy Lynch of MIT offered a formal proof of Brewer’s conjecture. While the formal proof spoke of a narrower use of CAP, and its status as a “theorem” is heavily disputed, the essence is still useful for understanding distributed system design. Traditional relational databases generally provide some form of the C and A parts of CAP and struggle with horizontal scaling because they are unable to provide resilience in the face of node failure. The various NoSQL products offer different combinations of CA/AP/CP. For example, some NoSQL systems provide a weaker form of consistency, known as eventual consistency, as a compromise for having high availability and partition tolerance. In such systems, data arriving at one node isn’t immediately available to others—the application logic has to handle stale data appropriately. In fact, letting the application logic make up for weaker consistency or availability is a common approach in distributed systems that use NoSQL data stores. As you’ll see in this book, Couchbase Server provides cluster-level consistency and good partition tolerance through replication. NoSQL and Couchbase Server NoSQL databases have made a rapid entrance onto the main stage of the database world. In fact, it is the wide variety of available NoSQL products that makes it hard to find the right choice for your needs. When comparing NoSQL solutions, we often find ourselves forced to compare different products feature by feature in order to make a decision. In this dense and competitive marketplace each product must offer unique capabilities to differentiate itself from its brethren. Couchbase Server is a distributed NoSQL database, which stands out due to its high performance, high availability, and scalability. Reliably providing these features in production is not a trivial thing, but Couchbase achieves this in a simple and easy manner. Let’s take a look at how Couchbase deals with these challenges. • Scaling: In Couchbase Server, data is distributed automatically over nodes in the cluster, allowing the database to share and scale out the load of performing lookups and disk IO horizontally. Couchbase achieves this by storing each data item in a vBucket, a logical partition (sometimes called a shard), which resides on a single node. The fact that Couchbase shards the data automatically simplifies the development process. Couchbase Server also provides a cross-datacenter replication (XDCR) feature, which allows Couchbase Server clusters to scale across multiple geographical locations. • High availability: Couchbase can replicate each vBucket across multiple nodes to support failover. When a node in the cluster fails, the Couchbase Server cluster makes one of the replica vBuckets available automatically. • High performance: Couchbase has an extensive integrated caching layer. Keys, metadata, and frequently accessed data are kept in memory in order to increase read/write throughput and reduce data access latency. To understand how unique Couchbase Server is, we need to take a closer look at each of these features and how they’re implemented. We will do so later in this chapter, because first we need to understand Couchbase as a whole. Couchbase Server, as we know it today, is the progeny of two products: Apache CouchDB and Membase. CouchOne Inc., was a company funded by Damien Katz, the creator of CouchDB. The company provided commercial support for the Apache CouchDB open-source database. In February 2011 CouchOne Inc. merged with Membase Inc., the company behind the open source Membase distributed key-value store. Membase was created by a few of the core contributors of Memcached, the popular distributed cache project, and provided persistence and querying on top of the simplicity and high-performance key-value mechanism provided by Memcached. 4 Chapter 1 ■ GettinG Started with CouChbaSe Server The new company, called Couchbase Inc., released Couchbase Server, a product that was based on Membase’s scalable high-performance capabilities, to which they eventually added capabilities from CouchDB, including storage, indexing, and querying. The initial version of Couchbase Server included a caching layer, which traced its origins directly back to Membase, and a persistence layer, which owed a lot to Apache CouchDB. Membase and CouchDB represent two of the leading approaches in the NoSQL world today: key-value stores and document-oriented databases. Both approaches still exist in today’s Couchbase Server. Couchbase as Key-Value Store vs. Document Database Key-value stores are, in essence, managed hash tables. A key-value store uses keys to access values in a straightforward and relatively efficient way. Different key-value stores expose different functionality on top of the basic hash-table-based access and focus on different aspects of data manipulation and retrieval. As a key-value store, Couchbase is capable of storing multiple data types. These include simple data types such as strings, numbers, datetime, and booleans, as well as arbitrary binary data. For most of the simple data types, Couchbase offers a scalable, distributed data store that provides both key-based access as well as minimal operations on the values. For example, for numbers you can use atomic operations such as increment and decrement. Operations are covered in depth in Chapter 4. Document databases differ from key-value stores in the way they represent the stored data. Key-value stores generally treat their data as opaque blobs and do not try to parse it, whereas document databases encapsulate stored data into “documents” that they can operate on. A document is simply an object that contains data in some specific format. For example, a JSON document holds data encoded in the JSON format, while a PDF document holds data encoded in the Portable Document binary format. ■ Note JavaScript object notation (JSon) is a widely used, lightweight, open data interchange format. it uses human-readable text to encode data objects as collections of name–value pairs. JSon is a very popular choice in the noSQL world, both for exchanging and for storing data. You can read more about it at: www.json.org. One of the main strengths of this approach is that documents don’t have to adhere to a rigid schema. Each document can have different properties and parts that can be changed on the fly without affecting the structure of other documents. Furthermore, document databases actually “understand” the content of the documents and typically offer functionality for acting on the stored data, such as changing parts of the document or indexing documents for faster retrieval. Couchbase Server can store data as JSON documents, which lets it index and query documents by specific fields. Couchbase Server Architecture A Couchbase Server cluster consists of between 1 and 1024 nodes, with each node running exactly one instance of the Couchbase Server software. The data is partitioned and distributed between the nodes in the cluster. This means that each node holds some of the data and is responsible for some of the storing and processing load. Distributing data this way is often referred to as sharding, with each partition referred to as a shard. Each Couchbase Server node has two major components: the Cluster Manager and the Data Manager, as shown in Figure 1-1. Applications use the Client Software Development Kits (SDKs) to communicate with both of these components. The Couchbase Client SDKs are covered in depth in Chapter 3. 5 Chapter 1 ■ GettinG Started with CouChbaSe Server Web Clients Query API Console HTTP Administration Caching Query Server API Layer Engine Configuration Storage Layer Cluster Manager Data Manager Couchbase Node Figure 1-1. Couchbase server architecture • The Cluster Manager: The Cluster Manager is responsible for configuring nodes in the cluster, managing the rebalancing of data between nodes, handling replicated data after a failover, monitoring nodes, gathering statistics, and logging. The Cluster Manager maintains and updates the cluster map, which tells clients where to look for data. Lastly, it also exposes the administration API and the web management console. The Cluster Manager component is built with Erlang/OTP, which is particularly suited for creating concurrent, distributed systems. • The Data Manager: The Data Manager, as the name implies, manages data storage and retrieval. It contains the memory cache layer, the disk persistence mechanism, and the query engine. Couchbase clients use the cluster map provided by the Cluster Manager to discover which node holds the required data and then communicate with the Data Manager on that node to perform database operations. Data Storage Couchbase manages data in buckets—logical groupings of related resources. You can think of buckets as being similar to databases in Microsoft SQL Server, or to schemas in Oracle. Typically, you would have separate buckets for separate applications. Couchbase supports two kinds of buckets: Couchbase and memcached. Memcached buckets store data in memory as binary blobs of up to 1 MB in size. Data in memcached buckets is not persisted to disk or replicated across nodes for redundancy. Couchbase buckets, on the other hand, can store data as JSON documents, primitive data types, or binary blobs, each up to 20 MB in size. This data is cached in memory and persisted to disk and can be dynamically rebalanced between nodes in a cluster to distribute the load. Furthermore, Couchbase buckets can be configured to maintain between one and three replica copies of the data, which provides redundancy in the event of node failure. Because each copy must reside on a different node, replication requires at least one node per replica, plus one for the active instance of data. Documents in a bucket are further subdivided into virtual buckets (vBuckets) by their key. Each vBucket owns a subset of all the possible keys, and documents are mapped to vBuckets according to a hash of their key. Every vBucket, in turn, belongs to one of the nodes of the cluster. As shown in Figure 1-2, when a client needs to access a document, it first hashes the document key to find out which vBucket owns that key. The client then checks the cluster map to find which node hosts the relevant vBucket. Lastly, the client connects directly to the node that stores the document to perform the get operation. 6

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.