Table of Contents Page: vii Foreword Page: xv Preface Page: xix General Information Page: xx HBase Version Page: xx Building the Examples Page: xxi Hush: The HBase URL Shortener Page: xxiii Running Hush Page: xxv Conventions Used in This Book Page: xxv Using Code Examples Page: xxvi Safari® Books Online Page: xxvi How to Contact Us Page: xxvii Acknowledgments Page: xxvii Chapter 1. Introduction Page: 1 The Dawn of Big Data Page: 1 The Problem with Relational Database Systems Page: 5 Nonrelational Database Systems, Not-Only SQL or NoSQL? Page: 8 Dimensions Page: 10 Scalability Page: 12 Database (De-)Normalization Page: 13 Building Blocks Page: 16 Backdrop Page: 16 Tables, Rows, Columns, and Cells Page: 17 Auto-Sharding Page: 21 Storage API Page: 22 Implementation Page: 23 Summary Page: 27 HBase: The Hadoop Database Page: 27 History Page: 27 Nomenclature Page: 29 Summary Page: 29 Chapter 2. Installation Page: 31 Quick-Start Guide Page: 31 Requirements Page: 34 Hardware Page: 34 Servers Page: 35 Networking Page: 39 Software Page: 40 Operating system Page: 40 Filesystem Page: 43 Java Page: 46 Hadoop Page: 46 SSH Page: 48 Domain Name Service Page: 48 Synchronized time Page: 49 File handles and process limits Page: 49 Datanode handlers Page: 51 Swappiness Page: 51 Windows Page: 52 Filesystems for HBase Page: 52 Local Page: 54 HDFS Page: 54 S3 Page: 54 Other Filesystems Page: 55 Installation Choices Page: 55 Apache Binary Release Page: 55 Building from Source Page: 58 Run Modes Page: 58 Standalone Mode Page: 59 Distributed Mode Page: 59 Pseudodistributed mode Page: 59 Fully distributed mode Page: 60 Specifying region servers Page: 60 ZooKeeper setup Page: 60 Using the existing ZooKeeper ensemble Page: 62 Configuration Page: 63 hbase-site.xml and hbase-default.xml Page: 64 hbase-env.sh Page: 65 regionserver Page: 65 log4j.properties Page: 65 Example Configuration Page: 65 hbase-site.xml Page: 66 regionservers Page: 66 hbase-env.sh Page: 66 Client Configuration Page: 67 Deployment Page: 68 Script-Based Page: 68 Apache Whirr Page: 69 Puppet and Chef Page: 70 Operating a Cluster Page: 71 Running and Confirming Your Installation Page: 71 Web-based UI Introduction Page: 71 Shell Introduction Page: 73 Stopping the Cluster Page: 73 Chapter 3. Client API: The Basics Page: 75 General Notes Page: 75 CRUD Operations Page: 76 Put Method Page: 76 Single Puts Page: 77 The KeyValue class Page: 83 Client-side write buffer Page: 86 List of Puts Page: 90 Atomic compare-and-set Page: 93 Get Method Page: 95 Single Gets Page: 95 The Result class Page: 98 List of Gets Page: 100 Related retrieval methods Page: 103 Delete Method Page: 105 Single Deletes Page: 105 List of Deletes Page: 108 Atomic compare-and-delete Page: 112 Batch Operations Page: 114 Row Locks Page: 118 Scans Page: 122 Introduction Page: 122 The ResultScanner Class Page: 124 Caching Versus Batching Page: 127 Miscellaneous Features Page: 133 The HTable Utility Methods Page: 133 The Bytes Class Page: 134 Chapter 4. Client API: Advanced Features Page: 137 Filters Page: 137 Introduction to Filters Page: 137 The filter hierarchy Page: 138 Comparison operators Page: 139 Comparators Page: 139 Comparison Filters Page: 140 RowFilter Page: 141 FamilyFilter Page: 142 QualifierFilter Page: 144 ValueFilter Page: 144 DependentColumnFilter Page: 145 Dedicated Filters Page: 147 SingleColumnValueFilter Page: 147 SingleColumnValueExcludeFilter Page: 148 PrefixFilter Page: 149 PageFilter Page: 149 KeyOnlyFilter Page: 151 FirstKeyOnlyFilter Page: 151 InclusiveStopFilter Page: 151 TimestampsFilter Page: 152 ColumnCountGetFilter Page: 154 ColumnPaginationFilter Page: 154 ColumnPrefixFilter Page: 155 RandomRowFilter Page: 155 Decorating Filters Page: 155 SkipFilter Page: 155 WhileMatchFilter Page: 157 FilterList Page: 159 Custom Filters Page: 160 Filters Summary Page: 167 Counters Page: 168 Introduction to Counters Page: 168 Single Counters Page: 171 Multiple Counters Page: 172 Coprocessors Page: 175 Introduction to Coprocessors Page: 175 The Coprocessor Class Page: 176 Coprocessor Loading Page: 179 Loading from the configuration Page: 180 Loading from the table descriptor Page: 181 The RegionObserver Class Page: 182 Handling region life-cycle events Page: 183 State: pending open Page: 183 Handling client API events Page: 184 State: open Page: 184 State: pending close Page: 184 The RegionCoprocessorEnvironment class Page: 185 The ObserverContext class Page: 186 The BaseRegionObserver class Page: 187 The MasterObserver Class Page: 190 The MasterCoprocessorEnvironment class Page: 191 The BaseMasterObserver class Page: 192 Endpoints Page: 193 The CoprocessorProtocol interface Page: 194 The BaseEndpointCoprocessor class Page: 195 HTablePool Page: 199 Connection Handling Page: 203 Chapter 5. Client API: Administrative Features Page: 207 Schema Definition Page: 207 Tables Page: 207 Table Properties Page: 210 Column Families Page: 212 HBaseAdmin Page: 218 Basic Operations Page: 219 Table Operations Page: 220 Schema Operations Page: 228 Cluster Operations Page: 230 Cluster Status Information Page: 233 Chapter 6. Available Clients Page: 241 Introduction to REST, Thrift, and Avro Page: 241 Interactive Clients Page: 244 Native Java Page: 244 REST Page: 244 Operation Page: 244 Supported formats Page: 246 Plain (text/plain) Page: 246 XML (text/xml) Page: 247 JSON (application/json) Page: 248 Protocol Buffer (application/x-protobuf) Page: 249 Raw binary (application/octet-stream) Page: 249 REST Java client Page: 250 Thrift Page: 251 Installation Page: 251 Operation Page: 252 Example: PHP Page: 253 Avro Page: 255 Installation Page: 255 Operation Page: 255 Other Clients Page: 256 Batch Clients Page: 257 MapReduce Page: 257 Native Java Page: 257 Clojure Page: 258 Hive Page: 258 Pig Page: 263 Cascading Page: 267 Shell Page: 268 Basics Page: 269 Commands Page: 271 General Page: 272 Data definition Page: 273 Data manipulation Page: 273 Tools Page: 274 Replication Page: 274 Scripting Page: 274 Web-based UI Page: 277 Master UI Page: 277 Main page Page: 277 User Table page Page: 279 ZooKeeper page Page: 282 Region Server UI Page: 283 Main page Page: 283 Shared Pages Page: 283 Chapter 7. MapReduce Integration Page: 289 Framework Page: 289 MapReduce Introduction Page: 289 Classes Page: 290 InputFormat Page: 290 Mapper Page: 291 Reducer Page: 292 OutputFormat Page: 292 Supporting Classes Page: 293 MapReduce Locality Page: 293 Table Splits Page: 294 MapReduce over HBase Page: 295 Preparation Page: 295 Static Provisioning Page: 296 Dynamic Provisioning Page: 296 Data Sink Page: 301 Data Source Page: 306 Data Source and Sink Page: 308 Custom Processing Page: 311 Chapter 8. Architecture Page: 315 Seek Versus Transfer Page: 315 B+ Trees Page: 315 Log-Structured Merge-Trees Page: 316 Storage Page: 319 Overview Page: 319 Write Path Page: 320 Files Page: 321 Root-level files Page: 323 Table-level files Page: 324 Region-level files Page: 324 Region splits Page: 326 Compactions Page: 328 HFile Format Page: 329 KeyValue Format Page: 332 Write-Ahead Log Page: 333 Overview Page: 333 HLog Class Page: 335 HLogKey Class Page: 336 WALEdit Class Page: 336 LogSyncer Class Page: 337 LogRoller Class Page: 338 Replay Page: 338 Single log Page: 339 Log splitting Page: 339 Edits recovery Page: 341 Durability Page: 341 Read Path Page: 342 Region Lookups Page: 345 The Region Life Cycle Page: 348 ZooKeeper Page: 348 Replication Page: 351 Life of a Log Edit Page: 352 Normal processing Page: 352 Non-Responding slave clusters Page: 353 Internals Page: 353 Choosing region servers to replicate to Page: 353 Keeping track of logs Page: 353 Reading, filtering, and sending edits Page: 354 Cleaning logs Page: 354 Region server failover Page: 355 Chapter 9. Advanced Usage Page: 357 Key Design Page: 357 Concepts Page: 357 Tall-Narrow Versus Flat-Wide Tables Page: 359 Partial Key Scans Page: 360 Pagination Page: 362 Time Series Data Page: 363 Time-Ordered Relations Page: 367 Advanced Schemas Page: 369 Secondary Indexes Page: 370 Search Integration Page: 373 Transactions Page: 376 Bloom Filters Page: 377 Versioning Page: 381 Implicit Versioning Page: 381 Custom Versioning Page: 384 Chapter 10. Cluster Monitoring Page: 387 Introduction Page: 387 The Metrics Framework Page: 388 Contexts, Records, and Metrics Page: 389 Master Metrics Page: 394 Region Server Metrics Page: 394 RPC Metrics Page: 396 JVM Metrics Page: 397 Info Metrics Page: 399 Ganglia Page: 400 Installation Page: 401 Ganglia-related steps Page: 401 Ganglia monitoring daemon Page: 401 Ganglia meta daemon Page: 403 HBase-related steps Page: 404 Ganglia web frontend Page: 404 Usage Page: 405 JMX Page: 408 JConsole Page: 410 JMX Remote API Page: 413 Nagios Page: 417 Chapter 11. Performance Tuning Page: 419 Garbage Collection Tuning Page: 419 Memstore-Local Allocation Buffer Page: 422 Compression Page: 424 Available Codecs Page: 424 Snappy Page: 425 LZO Page: 425 GZIP Page: 425 Verifying Installation Page: 426 Compression test tool Page: 426 Startup check Page: 427 Enabling Compression Page: 427 Optimizing Splits and Compactions Page: 429 Managed Splitting Page: 429 Region Hotspotting Page: 430 Presplitting Regions Page: 430 Load Balancing Page: 432 Merging Regions Page: 433 Client API: Best Practices Page: 434 Configuration Page: 436 Load Tests Page: 439 Performance Evaluation Page: 439 YCSB Page: 440 Chapter 12. Cluster Administration Page: 445 Operational Tasks Page: 445 Node Decommissioning Page: 445 Rolling Restarts Page: 447 Adding Servers Page: 447 Pseudodistributed mode Page: 448 Adding a local backup master Page: 448 Adding a local region server Page: 449 Fully distributed cluster Page: 450 Adding a backup master Page: 450 Data Tasks Page: 452 Import and Export Tools Page: 452 CopyTable Tool Page: 457 Bulk Import Page: 459 Bulk load procedure Page: 459 Using the importtsv tool Page: 460 Using the completebulkload Tool Page: 461 Advanced usage Page: 461 Replication Page: 462 Additional Tasks Page: 464 Coexisting Clusters Page: 464 Required Ports Page: 466 Changing Logging Levels Page: 466 Troubleshooting Page: 467 HBase Fsck Page: 467 Analyzing the Logs Page: 468 Common Issues Page: 471 Basic setup checklist Page: 471 File handles Page: 471 DataNode connections Page: 471 Compression Page: 471 Stability issues Page: 472 Garbage collection/memory tuning Page: 472 ZooKeeper problems Page: 472 “Could not obtain block” errors Page: 473 Appendix A. HBase Configuration Properties Page: 475 Appendix B. Road Map Page: 489 HBase 0.92.0 Page: 489 HBase 0.94.0 Page: 490 Appendix C. Upgrade from Previous Releases Page: 491 Upgrading to HBase 0.90.x Page: 491 From 0.20.x or 0.89.x Page: 491 Within 0.90.x Page: 492 Upgrading to HBase 0.92.0 Page: 492 Appendix D. Distributions Page: 493 Cloudera’s Distribution Including Apache Hadoop Page: 493 Appendix E. Hush SQL Schema Page: 495 Appendix F. HBase Versus Bigtable Page: 497 Index Page: 501
Description: