ebook img

Architecting HBase Applications A Guidebook for Successful Development and Design PDF

251 Pages·2016·7.09 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Architecting HBase Applications A Guidebook for Successful Development and Design

Architecting HBase Applications A GUIDEBOOK FOR SUCCESSFUL DEVELOPMENT AND DESIGN Jean-Marc Spaggiari & Kevin O'Dell Architecting HBase Applications A Guidebook for Successful Development and Design Jean-Marc Spaggiari and Kevin O’Dell BBeeiijjiinngg BBoossttoonn FFaarrnnhhaamm SSeebbaassttooppooll TTookkyyoo Architecting HBase Applications by Jean-Marc Spaggiari and Kevin O’Dell Copyright © 2016 Jean-Marc Spaggiari and Kevin O’Dell. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Marie Beaugureau Indexer: WordCo Indexing Services, Inc. Production Editor: Nicholas Adams Interior Designer: David Futato Copyeditor: Jasmine Kwityn Cover Designer: Karen Montgomery Proofreader: Amanda Kersey Illustrator: Rebecca Demarest August 2016: First Edition Revision History for the First Edition 2016-07-14: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491915813 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Architecting HBase Applications, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-91581-3 [LSI] To my father. I wish you could have seen it... —Jean-Marc Spaggiari To my mother, who I think about every day; my father, who has always been there for me; and my beautiful wife Melanie and daughter Scotland, for putting up with all my com‐ plaining and the extra long hours. —Kevin O’Dell Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Part I. Introduction to HBase 1. What Is HBase?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Column-Oriented Versus Row-Oriented 5 Implementation and Use Cases 5 2. HBase Principles. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Table Format 7 Table Layout 8 Table Storage 9 Internal Table Operations 15 Compaction 15 Splits (Auto-Sharding) 17 Balancing 19 Dependencies 19 HBase Roles 20 Master Server 21 RegionServer 21 Thrift Server 22 REST Server 22 3. HBase Ecosystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Monitoring Tools 25 v Cloudera Manager 26 Apache Ambari 28 Hannibal 32 SQL 33 Apache Phoenix 33 Apache Trafodion 33 Splice Machine 34 Honorable Mentions (Kylin, Themis, Tephra, Hive, and Impala) 34 Frameworks 35 OpenTSDB 35 Kite 36 HappyBase 37 AsyncHBase 37 4. HBase Sizing and Tuning Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Hardware 40 Storage 40 Networking 41 OS Tuning 42 Hadoop Tuning 43 HBase Tuning 44 Different Workload Tuning 46 5. Environment Setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 System Requirements 50 Operating System 50 Virtual Machine 50 Resources 52 Java 53 HBase Standalone Installation 53 HBase in a VM 56 Local Versus VM 57 Local Mode 57 Virtual Linux Environment 58 QuickStart VM (or Equivalent) 58 Troubleshooting 59 IP/Name Configuration 59 Access to the /tmp Folder 59 Environment Variables 59 Available Memory 60 First Steps 61 Basic Operations 61 vi | Table of Contents Import Code Examples 62 Testing the Examples 66 Pseudodistributed and Fully Distributed 68 Part II. Use Cases 6. Use Case: HBase as a System of Record. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Ingest/Pre-Processing 74 Processing/Serving 75 User Experience 79 7. Implementation of an Underlying Storage Engine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Table Design 83 Table Schema 84 Table Parameters 85 Implementation 87 Data conversion 88 Generate Test Data 88 Create Avro Schema 89 Implement MapReduce Transformation 89 HFile Validation 94 Bulk Loading 95 Data Validation 96 Table Size 97 File Content 98 Data Indexing 100 Data Retrieval 104 Going Further 105 8. Use Case: Near Real-Time Event Processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Ingest/Pre-Processing 110 Near Real-Time Event Processing 111 Processing/Serving 112 9. Implementation of Near Real-Time Event Processing. . . . . . . . . . . . . . . . . . . . . . . . . . 115 Application Flow 117 Kafka 117 Flume 118 HBase 118 Lily 120 Solr 120 Table of Contents | vii Implementation 121 Data Generation 121 Kafka 122 Flume 123 Serializer 130 HBase 134 Lily 136 Solr 138 Testing 139 Going Further 140 10. Use Case: HBase as a Master Data Management Tool. . . . . . . . . . . . . . . . . . . . . . . . . . 141 Ingest 142 Processing 143 11. Implementation of HBase as a Master Data Management Tool. . . . . . . . . . . . . . . . . . 147 MapReduce Versus Spark 147 Get Spark Interacting with HBase 148 Run Spark over an HBase Table 148 Calling HBase from Spark 148 Implementing Spark with HBase 149 Spark and HBase: Puts 150 Spark on HBase: Bulk Load 154 Spark Over HBase 156 Going Further 160 12. Use Case: Document Store. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Serving 163 Ingest 164 Clean Up 166 13. Implementation of Document Store. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 MOBs 167 Storage 169 Usage 170 Too Big 170 Consistency 172 Going Further 173 viii | Table of Contents

Description:
HBase is a remarkable tool for indexing mass volumes of data, but getting started with this distributed database and its ecosystem can be daunting. With this hands-on guide, youll learn how to architect, design, and deploy your own HBase applications by examining real-world solutions. Along with HBa
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.