Table Of Content

Table of Contents Page: vii Foreword Page: xv Preface Page: xix General Information Page: xx HBase Version Page: xx Building the Examples Page: xxi Hush: The HBase URL Shortener Page: xxiii Running Hush Page: xxv Conventions Used in This Book Page: xxv Using Code Examples Page: xxvi Safari® Books Online Page: xxvi How to Contact Us Page: xxvii Acknowledgments Page: xxvii Chapter 1. Introduction Page: 1 The Dawn of Big Data Page: 1 The Problem with Relational Database Systems Page: 5 Nonrelational Database Systems, Not-Only SQL or NoSQL? Page: 8 Dimensions Page: 10 Scalability Page: 12 Database (De-)Normalization Page: 13 Building Blocks Page: 16 Backdrop Page: 16 Tables, Rows, Columns, and Cells Page: 17 Auto-Sharding Page: 21 Storage API Page: 22 Implementation Page: 23 Summary Page: 27 HBase: The Hadoop Database Page: 27 History Page: 27 Nomenclature Page: 29 Summary Page: 29 Chapter 2. Installation Page: 31 Quick-Start Guide Page: 31 Requirements Page: 34 Hardware Page: 34 Servers Page: 35 Networking Page: 39 Software Page: 40 Operating system Page: 40 Filesystem Page: 43 Java Page: 46 Hadoop Page: 46 SSH Page: 48 Domain Name Service Page: 48 Synchronized time Page: 49 File handles and process limits Page: 49 Datanode handlers Page: 51 Swappiness Page: 51 Windows Page: 52 Filesystems for HBase Page: 52 Local Page: 54 HDFS Page: 54 S3 Page: 54 Other Filesystems Page: 55 Installation Choices Page: 55 Apache Binary Release Page: 55 Building from Source Page: 58 Run Modes Page: 58 Standalone Mode Page: 59 Distributed Mode Page: 59 Pseudodistributed mode Page: 59 Fully distributed mode Page: 60 Specifying region servers Page: 60 ZooKeeper setup Page: 60 Using the existing ZooKeeper ensemble Page: 62 Configuration Page: 63 hbase-site.xml and hbase-default.xml Page: 64 hbase-env.sh Page: 65 regionserver Page: 65 log4j.properties Page: 65 Example Configuration Page: 65 hbase-site.xml Page: 66 regionservers Page: 66 hbase-env.sh Page: 66 Client Configuration Page: 67 Deployment Page: 68 Script-Based Page: 68 Apache Whirr Page: 69 Puppet and Chef Page: 70 Operating a Cluster Page: 71 Running and Confirming Your Installation Page: 71 Web-based UI Introduction Page: 71 Shell Introduction Page: 73 Stopping the Cluster Page: 73 Chapter 3. Client API: The Basics Page: 75 General Notes Page: 75 CRUD Operations Page: 76 Put Method Page: 76 Single Puts Page: 77 The KeyValue class Page: 83 Client-side write buffer Page: 86 List of Puts Page: 90 Atomic compare-and-set Page: 93 Get Method Page: 95 Single Gets Page: 95 The Result class Page: 98 List of Gets Page: 100 Related retrieval methods Page: 103 Delete Method Page: 105 Single Deletes Page: 105 List of Deletes Page: 108 Atomic compare-and-delete Page: 112 Batch Operations Page: 114 Row Locks Page: 118 Scans Page: 122 Introduction Page: 122 The ResultScanner Class Page: 124 Caching Versus Batching Page: 127 Miscellaneous Features Page: 133 The HTable Utility Methods Page: 133 The Bytes Class Page: 134 Chapter 4. Client API: Advanced Features Page: 137 Filters Page: 137 Introduction to Filters Page: 137 The filter hierarchy Page: 138 Comparison operators Page: 139 Comparators Page: 139 Comparison Filters Page: 140 RowFilter Page: 141 FamilyFilter Page: 142 QualifierFilter Page: 144 ValueFilter Page: 144 DependentColumnFilter Page: 145 Dedicated Filters Page: 147 SingleColumnValueFilter Page: 147 SingleColumnValueExcludeFilter Page: 148 PrefixFilter Page: 149 PageFilter Page: 149 KeyOnlyFilter Page: 151 FirstKeyOnlyFilter Page: 151 InclusiveStopFilter Page: 151 TimestampsFilter Page: 152 ColumnCountGetFilter Page: 154 ColumnPaginationFilter Page: 154 ColumnPrefixFilter Page: 155 RandomRowFilter Page: 155 Decorating Filters Page: 155 SkipFilter Page: 155 WhileMatchFilter Page: 157 FilterList Page: 159 Custom Filters Page: 160 Filters Summary Page: 167 Counters Page: 168 Introduction to Counters Page: 168 Single Counters Page: 171 Multiple Counters Page: 172 Coprocessors Page: 175 Introduction to Coprocessors Page: 175 The Coprocessor Class Page: 176 Coprocessor Loading Page: 179 Loading from the configuration Page: 180 Loading from the table descriptor Page: 181 The RegionObserver Class Page: 182 Handling region life-cycle events Page: 183 State: pending open Page: 183 Handling client API events Page: 184 State: open Page: 184 State: pending close Page: 184 The RegionCoprocessorEnvironment class Page: 185 The ObserverContext class Page: 186 The BaseRegionObserver class Page: 187 The MasterObserver Class Page: 190 The MasterCoprocessorEnvironment class Page: 191 The BaseMasterObserver class Page: 192 Endpoints Page: 193 The CoprocessorProtocol interface Page: 194 The BaseEndpointCoprocessor class Page: 195 HTablePool Page: 199 Connection Handling Page: 203 Chapter 5. Client API: Administrative Features Page: 207 Schema Definition Page: 207 Tables Page: 207 Table Properties Page: 210 Column Families Page: 212 HBaseAdmin Page: 218 Basic Operations Page: 219 Table Operations Page: 220 Schema Operations Page: 228 Cluster Operations Page: 230 Cluster Status Information Page: 233 Chapter 6. Available Clients Page: 241 Introduction to REST, Thrift, and Avro Page: 241 Interactive Clients Page: 244 Native Java Page: 244 REST Page: 244 Operation Page: 244 Supported formats Page: 246 Plain (text/plain) Page: 246 XML (text/xml) Page: 247 JSON (application/json) Page: 248 Protocol Buffer (application/x-protobuf) Page: 249 Raw binary (application/octet-stream) Page: 249 REST Java client Page: 250 Thrift Page: 251 Installation Page: 251 Operation Page: 252 Example: PHP Page: 253 Avro Page: 255 Installation Page: 255 Operation Page: 255 Other Clients Page: 256 Batch Clients Page: 257 MapReduce Page: 257 Native Java Page: 257 Clojure Page: 258 Hive Page: 258 Pig Page: 263 Cascading Page: 267 Shell Page: 268 Basics Page: 269 Commands Page: 271 General Page: 272 Data definition Page: 273 Data manipulation Page: 273 Tools Page: 274 Replication Page: 274 Scripting Page: 274 Web-based UI Page: 277 Master UI Page: 277 Main page Page: 277 User Table page Page: 279 ZooKeeper page Page: 282 Region Server UI Page: 283 Main page Page: 283 Shared Pages Page: 283 Chapter 7. MapReduce Integration Page: 289 Framework Page: 289 MapReduce Introduction Page: 289 Classes Page: 290 InputFormat Page: 290 Mapper Page: 291 Reducer Page: 292 OutputFormat Page: 292 Supporting Classes Page: 293 MapReduce Locality Page: 293 Table Splits Page: 294 MapReduce over HBase Page: 295 Preparation Page: 295 Static Provisioning Page: 296 Dynamic Provisioning Page: 296 Data Sink Page: 301 Data Source Page: 306 Data Source and Sink Page: 308 Custom Processing Page: 311 Chapter 8. Architecture Page: 315 Seek Versus Transfer Page: 315 B+ Trees Page: 315 Log-Structured Merge-Trees Page: 316 Storage Page: 319 Overview Page: 319 Write Path Page: 320 Files Page: 321 Root-level files Page: 323 Table-level files Page: 324 Region-level files Page: 324 Region splits Page: 326 Compactions Page: 328 HFile Format Page: 329 KeyValue Format Page: 332 Write-Ahead Log Page: 333 Overview Page: 333 HLog Class Page: 335 HLogKey Class Page: 336 WALEdit Class Page: 336 LogSyncer Class Page: 337 LogRoller Class Page: 338 Replay Page: 338 Single log Page: 339 Log splitting Page: 339 Edits recovery Page: 341 Durability Page: 341 Read Path Page: 342 Region Lookups Page: 345 The Region Life Cycle Page: 348 ZooKeeper Page: 348 Replication Page: 351 Life of a Log Edit Page: 352 Normal processing Page: 352 Non-Responding slave clusters Page: 353 Internals Page: 353 Choosing region servers to replicate to Page: 353 Keeping track of logs Page: 353 Reading, filtering, and sending edits Page: 354 Cleaning logs Page: 354 Region server failover Page: 355 Chapter 9. Advanced Usage Page: 357 Key Design Page: 357 Concepts Page: 357 Tall-Narrow Versus Flat-Wide Tables Page: 359 Partial Key Scans Page: 360 Pagination Page: 362 Time Series Data Page: 363 Time-Ordered Relations Page: 367 Advanced Schemas Page: 369 Secondary Indexes Page: 370 Search Integration Page: 373 Transactions Page: 376 Bloom Filters Page: 377 Versioning Page: 381 Implicit Versioning Page: 381 Custom Versioning Page: 384 Chapter 10. Cluster Monitoring Page: 387 Introduction Page: 387 The Metrics Framework Page: 388 Contexts, Records, and Metrics Page: 389 Master Metrics Page: 394 Region Server Metrics Page: 394 RPC Metrics Page: 396 JVM Metrics Page: 397 Info Metrics Page: 399 Ganglia Page: 400 Installation Page: 401 Ganglia-related steps Page: 401 Ganglia monitoring daemon Page: 401 Ganglia meta daemon Page: 403 HBase-related steps Page: 404 Ganglia web frontend Page: 404 Usage Page: 405 JMX Page: 408 JConsole Page: 410 JMX Remote API Page: 413 Nagios Page: 417 Chapter 11. Performance Tuning Page: 419 Garbage Collection Tuning Page: 419 Memstore-Local Allocation Buffer Page: 422 Compression Page: 424 Available Codecs Page: 424 Snappy Page: 425 LZO Page: 425 GZIP Page: 425 Verifying Installation Page: 426 Compression test tool Page: 426 Startup check Page: 427 Enabling Compression Page: 427 Optimizing Splits and Compactions Page: 429 Managed Splitting Page: 429 Region Hotspotting Page: 430 Presplitting Regions Page: 430 Load Balancing Page: 432 Merging Regions Page: 433 Client API: Best Practices Page: 434 Configuration Page: 436 Load Tests Page: 439 Performance Evaluation Page: 439 YCSB Page: 440 Chapter 12. Cluster Administration Page: 445 Operational Tasks Page: 445 Node Decommissioning Page: 445 Rolling Restarts Page: 447 Adding Servers Page: 447 Pseudodistributed mode Page: 448 Adding a local backup master Page: 448 Adding a local region server Page: 449 Fully distributed cluster Page: 450 Adding a backup master Page: 450 Data Tasks Page: 452 Import and Export Tools Page: 452 CopyTable Tool Page: 457 Bulk Import Page: 459 Bulk load procedure Page: 459 Using the importtsv tool Page: 460 Using the completebulkload Tool Page: 461 Advanced usage Page: 461 Replication Page: 462 Additional Tasks Page: 464 Coexisting Clusters Page: 464 Required Ports Page: 466 Changing Logging Levels Page: 466 Troubleshooting Page: 467 HBase Fsck Page: 467 Analyzing the Logs Page: 468 Common Issues Page: 471 Basic setup checklist Page: 471 File handles Page: 471 DataNode connections Page: 471 Compression Page: 471 Stability issues Page: 472 Garbage collection/memory tuning Page: 472 ZooKeeper problems Page: 472 “Could not obtain block” errors Page: 473 Appendix A. HBase Configuration Properties Page: 475 Appendix B. Road Map Page: 489 HBase 0.92.0 Page: 489 HBase 0.94.0 Page: 490 Appendix C. Upgrade from Previous Releases Page: 491 Upgrading to HBase 0.90.x Page: 491 From 0.20.x or 0.89.x Page: 491 Within 0.90.x Page: 492 Upgrading to HBase 0.92.0 Page: 492 Appendix D. Distributions Page: 493 Cloudera’s Distribution Including Apache Hadoop Page: 493 Appendix E. Hush SQL Schema Page: 495 Appendix F. HBase Versus Bigtable Page: 497 Index Page: 501

Description:

If you're looking for a scalable storage solution to accommodate a virtually endless amount of data, this book shows you how Apache HBase can fulfill your needs. As the open source implementation of Google's BigTable architecture, HBase scales to billions of rows and millions of columns, while ensuring that write and read performance remain constant. Many IT executives are asking pointed questions about HBase. This book provides meaningful answers, whether you’re evaluating this non-relational database or planning to put it into practice right away. Discover how tight integration with Hadoop makes scalability with HBase easier Distribute large datasets across an inexpensive cluster of commodity servers Access HBase with native Java clients, or with gateway servers providing REST, Avro, or Thrift APIs Get details on HBase’s architecture, including the storage format, write-ahead log, background processes, and more Integrate HBase with Hadoop's MapReduce framework for massively parallelized data processing jobs Learn how to tune clusters, design schemas, copy tables, import bulk data, decommission nodes, and many other tasks