www.it-ebooks.info www.it-ebooks.info Monitoring with Ganglia Matt Massie, Bernard Li, Brad Nicholes, and Vladimir Vuksan Beijing • Cambridge • Farnham • Köln • Sebastopol • Tokyo www.it-ebooks.info Monitoring with Ganglia by Matt Massie, Bernard Li, Brad Nicholes, and Vladimir Vuksan Copyright © 2013 Matthew Massie, Bernard Li, Brad Nicholes, Vladimir Vuksan. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://my.safaribooksonline.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Editors: Mike Loukides and Meghan Blanchette Indexer: Ellen Troutman-Zaig Production Editor: Kara Ebrahim Cover Designer: Karen Montgomery Copyeditor: Nancy Wolfe Kotary Interior Designer: David Futato Proofreader: Kara Ebrahim Illustrator: Kara Ebrahim November 2012: First Edition. Revision History for the First Edition: 2012-11-7 First release See http://oreilly.com/catalog/errata.csp?isbn=9781449329709 for release details. Nutshell Handbook, the Nutshell Handbook logo, and the O’Reilly logo are registered trademarks of O’Reilly Media, Inc. Monitoring with Ganglia, the image of a Porpita pacifica, and related trade dress are trademarks of O’Reilly Media, Inc. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and O’Reilly Media, Inc., was aware of a trademark claim, the designations have been printed in caps or initial caps. While every precaution has been taken in the preparation of this book, the publisher and authors assume no responsibility for errors or omissions, or for damages resulting from the use of the information con- tained herein. ISBN: 978-1-449-32970-9 [LSI] 1352302880 www.it-ebooks.info Table of Contents Preface ..................................................................... ix 1. Introducing Ganglia ..................................................... 1 It’s a Problem of Scale 1 Hosts ARE the Monitoring System 2 Redundancy Breeds Organization 3 Is Ganglia Right for You? 4 gmond: Big Bang in a Few Bytes 4 gmetad: Bringing It All Together 7 gweb: Next-Generation Data Analysis 8 But Wait! That’s Not All! 9 2. Installing and Configuring Ganglia ........................................ 11 Installing Ganglia 11 gmond 11 gmetad 14 gweb 16 Configuring Ganglia 20 gmond 20 gmetad 33 gweb 38 Postinstallation 40 Starting Up the Processes 41 Testing Your Installation 41 Firewalls 41 3. Scalability ............................................................ 43 Who Should Be Concerned About Scalability? 43 gmond and Ganglia Cluster Scalability 43 gmetad Storage Planning and Scalability 44 RRD File Structure and Scalability 44 iii www.it-ebooks.info Acute IO Demand During gmetad Startup 46 gmetad IO Demand During Normal Operation 46 Forecasting IO Workload 47 Testing the IO Subsystem 48 Dealing with High IO Demand from gmetad 50 4. The Ganglia Web Interface .............................................. 53 Navigating the Ganglia Web Interface 53 The gweb Main Tab 53 Grid View 53 Cluster View 54 Host View 58 Graphing All Time Periods 58 The gweb Search Tab 60 The gweb Views Tab 60 The gweb Aggregated Graphs Tab 63 Decompose Graphs 64 The gweb Compare Hosts Tab 64 The gweb Events Tab 64 Events API 66 The gweb Automatic Rotation Tab 67 The gweb Mobile Tab 67 Custom Composite Graphs 67 Other Features 69 Authentication and Authorization 70 Configuration 70 Enabling Authentication 70 Access Controls 71 Actions 72 Configuration Examples 72 5. Managing and Extending Metrics ......................................... 73 gmond: Metric Gathering Agent 73 Base Metrics 75 Extended Metrics 77 Extending gmond with Modules 78 C/C++ Modules 79 Mod_Python 89 Spoofing with Modules 96 Extending gmond with gmetric 97 Running gmetric from the Command Line 97 Spoofing with gmetric 99 How to Choose Between C/C++, Python, and gmetric 100 iv | Table of Contents www.it-ebooks.info XDR Protocol 101 Packets 102 Implementations 103 Java and gmetric4j 103 Real World: GPU Monitoring with the NVML Module 104 Installation 104 Metrics 105 Configuration 105 6. Troubleshooting Ganglia ............................................... 107 Overview 107 Known Bugs and Other Limitations 107 Useful Resources 108 Release Notes 108 Manpages 108 Wiki 108 IRC 108 Mailing Lists 108 Bug Tracker 109 Monitoring the Monitoring System 109 General Troubleshooting Mechanisms and Tools 110 netcat and telnet 110 Logs 114 Running in Foreground/Debug Mode 114 strace and truss 115 valgrind: Memory Leaks and Memory Corruption 116 iostat: Checking IOPS Demands of gmetad 116 Restarting Daemons 117 gstat 117 Common Deployment Issues 119 Reverse DNS Lookups 119 Time Synchronization 119 Mixing Ganglia Versions Older than 3.1 with Current Versions 119 SELinux and Firewall 120 Typical Problems and Troubleshooting Procedures 120 Web Issues 120 gmetad Issues 125 rrdcached Issues 126 gmond Issues 126 7. Ganglia and Nagios ................................................... 129 Sending Nagios Data to Ganglia 130 Monitoring Ganglia Metrics with Nagios 133 Table of Contents | v www.it-ebooks.info Principle of Operation 134 Check Heartbeat 135 Check a Single Metric on a Specific Host 135 Check Multiple Metrics on a Specific Host 136 Check Multiple Metrics on a Range of Hosts 136 Verify that a Metric Value Is the Same Across a Set of Hosts 137 Displaying Ganglia Data in the Nagios UI 138 Monitoring Ganglia with Nagios 139 Monitoring Processes 139 Monitoring Connectivity 140 Monitoring cron Collection Jobs 140 Collecting rrdcached Metrics 140 8. Ganglia and sFlow .................................................... 143 Architecture 145 Standard sFlow Metrics 147 Server Metrics 147 Hypervisor Metrics 149 Java Virtual Machine Metrics 150 HTTP Metrics 151 memcache Metrics 153 Configuring gmond to Receive sFlow 155 Host sFlow Agent 157 Host sFlow Subagents 158 Custom Metrics Using gmetric 160 Troubleshooting 161 Are the Measurements Arriving at gmond? 161 Are the Measurements Being Sent? 165 Using Ganglia with Other sFlow Tools 165 9. Ganglia Case Studies .................................................. 171 Tagged, Inc. 172 Site Architecture 172 Monitoring Configuration 173 Examples 175 SARA 180 Overview 180 Advantages 181 Customizations 182 Challenges 184 Conclusion 186 Reuters Financial Software 186 Ganglia in the QA Environment 186 vi | Table of Contents www.it-ebooks.info Ganglia in a Major Client Project 188 Lumicall (Mobile VoIP on Android) 190 Monitoring Mobile VoIP for the Enterprise 191 Ganglia Monitoring Within Lumicall 191 Implementing gmetric4j Within Lumicall 192 Lumicall: Conclusion 194 Wait, How Many Metrics? Monitoring at Quantcast 194 Reporting, Analysis, and Alerting 196 Ganglia as an Application Platform 198 Best Practices 198 Tools 199 Drawbacks 200 Conclusions 201 Many Tools in the Toolbox: Monitoring at Etsy 202 Monitoring Is Mandatory 202 A Spectrum of Tools 202 Embrace Diversity 203 Conclusion 204 A. Advanced Metric Configuration and Debugging ............................ 205 B. Ganglia and Hadoop/HBase ............................................. 215 Index ..................................................................... 221 Table of Contents | vii www.it-ebooks.info www.it-ebooks.info