ebook img

Graph Databases: [new opportunities for connected data] PDF

238 Pages·2015·10.32 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Graph Databases: [new opportunities for connected data]

2 n d E d i t Graph Databases i o n SECOND EDITION Discover how graph databases can help you manage and query highly “ Graph analysis is possibly connected data. With this practical book, you’ll learn how to design and the single most effective G implement a graph database that brings the power of graphs to bear competitive differentiator r on a broad range of problem domains. Whether you want to speed up a for organizations pursuing your response to user queries or build a database that can adapt as your p business evolves, this book shows you how to apply the schema-free data-driven operations h graph model to real-world problems. and decisions.” D This second edition includes new code samples and diagrams, using the —Gartner a latest Neo4j syntax, as well as information on new functionality. Learn IT Market Clock for Database t how different organizations are using graph databases to outperform their Management Systems, 2014 a competitors. With this book’s data modeling, query, and code examples, b you’ll quickly be able to implement your own solution. a s ■ Model data with the Cypher query language and property e graph model s ■ Learn best practices and common pitfalls when modeling with graphs ■ Plan and implement a graph database solution in test-driven fashion ■ Explore real-world examples to learn how and why organizations use a graph database ■ Understand common patterns and components of graph Graph database architecture ■ Use analytical techniques and algorithms to mine graph R database information o b in s Ian Robinson works on research and development for future versions of the Neo4j o Databases n graph database and previously served as Neo’s Director of Customer Success. , W Jim Webber, Neo Technology’s Chief Scientist, is a distributed systems specialist e working on very large-scale graph data technology. b b Emil Eifrem is CEO of Neo Technology and co-founder of the open source Neo4j e graph database project. r & E if r e m NEW OPPORTUNITIES FOR CONNECTED DATA DATA/DATA SCIENCE Twitter: @oreillymedia facebook.com/oreilly US $39.99 CAN $45.99 ISBN: 978-1-491-93089-2 Ian Robinson, Jim Webber & Emil Eifrem 2 n d E d i t Graph Databases i o n SECOND EDITION Discover how graph databases can help you manage and query highly “ Graph analysis is possibly connected data. With this practical book, you’ll learn how to design and the single most effective G implement a graph database that brings the power of graphs to bear competitive differentiator r on a broad range of problem domains. Whether you want to speed up a for organizations pursuing your response to user queries or build a database that can adapt as your p business evolves, this book shows you how to apply the schema-free data-driven operations h graph model to real-world problems. and decisions.” D This second edition includes new code samples and diagrams, using the —Gartner a latest Neo4j syntax, as well as information on new functionality. Learn IT Market Clock for Database t how different organizations are using graph databases to outperform their Management Systems, 2014 a competitors. With this book’s data modeling, query, and code examples, b you’ll quickly be able to implement your own solution. a s ■ Model data with the Cypher query language and property e graph model s ■ Learn best practices and common pitfalls when modeling with graphs ■ Plan and implement a graph database solution in test-driven fashion ■ Explore real-world examples to learn how and why organizations use a graph database ■ Understand common patterns and components of graph Graph database architecture ■ Use analytical techniques and algorithms to mine graph R database information o b in s Ian Robinson works on research and development for future versions of the Neo4j o Databases n graph database and previously served as Neo’s Director of Customer Success. , W Jim Webber, Neo Technology’s Chief Scientist, is a distributed systems specialist e working on very large-scale graph data technology. b b Emil Eifrem is CEO of Neo Technology and co-founder of the open source Neo4j e graph database project. r & E if r e m NEW OPPORTUNITIES FOR CONNECTED DATA DATA/DATA SCIENCE Twitter: @oreillymedia facebook.com/oreilly US $39.99 CAN $45.99 ISBN: 978-1-491-93089-2 Ian Robinson, Jim Webber & Emil Eifrem SECOND EDITION Graph Databases Ian Robinson, Jim Webber & Emil Eifrem Boston Graph Databases by Ian Robinson, Jim Webber, and Emil Eifrem Copyright © 2015 Neo Technology, Inc. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://safaribooksonline.com). For more information, contact our corporate/ institutional sales department: 800-998-9938 or [email protected]. Editor: Marie Beaugureau Interior Designer: David Futato Production Editor: Kristen Brown Cover Designer: Ellie Volckhausen Proofreader: Christina Edwards Illustrator: Rebecca Demarest Indexer: WordCo Indexing Services June 2013: First Edition June 2015: Second Edition Revision History for the Second Edition 2015-06-09: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491930892 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Graph Databases, the cover image of an European octopus, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 978-1-491-93089-2 [LSI] Table of Contents Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is a Graph? 1 A High-Level View of the Graph Space 4 Graph Databases 5 Graph Compute Engines 7 The Power of Graph Databases 8 Performance 8 Flexibility 9 Agility 9 Summary 10 2. Options for Storing Connected Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Relational Databases Lack Relationships 11 NOSQL Databases Also Lack Relationships 15 Graph Databases Embrace Relationships 18 Summary 24 3. Data Modeling with Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Models and Goals 25 The Labeled Property Graph Model 26 Querying Graphs: An Introduction to Cypher 27 Cypher Philosophy 28 MATCH 30 RETURN 30 iii Other Cypher Clauses 31 A Comparison of Relational and Graph Modeling 32 Relational Modeling in a Systems Management Domain 33 Graph Modeling in a Systems Management Domain 38 Testing the Model 39 Cross-Domain Models 41 Creating the Shakespeare Graph 45 Beginning a Query 46 Declaring Information Patterns to Find 48 Constraining Matches 49 Processing Results 50 Query Chaining 51 Common Modeling Pitfalls 52 Email Provenance Problem Domain 52 A Sensible First Iteration? 52 Second Time’s the Charm 55 Evolving the Domain 58 Identifying Nodes and Relationships 63 Avoiding Anti-Patterns 63 Summary 64 4. Building a Graph Database Application. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Data Modeling 65 Describe the Model in Terms of the Application’s Needs 66 Nodes for Things, Relationships for Structure 67 Fine-Grained versus Generic Relationships 67 Model Facts as Nodes 68 Represent Complex Value Types as Nodes 71 Time 72 Iterative and Incremental Development 74 Application Architecture 76 Embedded versus Server 76 Clustering 81 Load Balancing 82 Testing 85 Test-Driven Data Model Development 85 Performance Testing 91 Capacity Planning 95 Optimization Criteria 95 Performance 96 Redundancy 98 Load 98 iv | Table of Contents Importing and Bulk Loading Data 99 Initial Import 99 Batch Import 100 Summary 104 5. Graphs in the Real World. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 Why Organizations Choose Graph Databases 105 Common Use Cases 106 Social 106 Recommendations 107 Geo 108 Master Data Management 109 Network and Data Center Management 109 Authorization and Access Control (Communications) 110 Real-World Examples 111 Social Recommendations (Professional Social Network) 111 Authorization and Access Control 123 Geospatial and Logistics 132 Summary 147 6. Graph Database Internals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Native Graph Processing 149 Native Graph Storage 152 Programmatic APIs 158 Kernel API 158 Core API 159 Traversal Framework 160 Nonfunctional Characteristics 162 Transactions 162 Recoverability 163 Availability 164 Scale 166 Summary 170 7. Predictive Analysis with Graph Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 Depth- and Breadth-First Search 171 Path-Finding with Dijkstra’s Algorithm 173 The A* Algorithm 181 Graph Theory and Predictive Modeling 182 Triadic Closures 182 Structural Balance 184 Local Bridges 188 Table of Contents | v Summary 190 A. NOSQL Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 vi | Table of Contents Foreword Graphs Are Eating The World, And There’s No Going Back In the three years since we first wrote Graph Databases, our industry has witnessed a fundamental shift in the way in which it views its data assets. Data, always present in some stratum of innovation, has for several decades delivered only a fraction of its potential, in large part because the technologies at our disposal have forced us to treat it as though it were nothing but isolated islands of middling significance. Graphs and graph databases change this completely. As vertical after vertical discovers the transformative power of connected data, the breakaway leaders in these industries are stealing an irreversible march on their com‐ petitors. Graphs are everywhere, they’re eating the world, and there’s no going back. As I wrote in my foreword to the first edition, this change in perspective started almost two decades ago, when a precocious web search startup challenged the domi‐ nance of the market leaders—AltaVista, Lycos, Excite, et al—through its application of a simple algorithm that made sense of the way in which web documents are con‐ nected. Today, Google dominates the web search space. In its wake, other industry leaders have asked themselves: “What if we take the relationships and connections in our data and reimagined our business along those relationships? What would that look like?” The answers to these questions are omnipresent in our online lives today in the form of Facebook, Twitter, and the like. What was once a specialist and often proprietary means for realizing the opportuni‐ ties inherent in connected data is now a commodity technology. In the past three years the features, usability, and performance of the world’s leading graph database have matured enormously; awareness and adoption have penetrated far wider, deeper, and more quickly than we could have hoped; and the inventiveness and irreversible vii impact of introducing graph databases into formerly discrete-data-oriented domains have invigorated and challenged the markets at every turn. In 2011, we thought the main verticals to adopt graph databases would be software, financial services, and telecom; and largely we were right. However, what’s been even more amazing has been the adoption of graph databases outside of those top three verticals. We’ve seen industry after industry being eaten by graphs. In each case, the adoption of graph technology has resulted in better products and more remarkable customer experiences. Companies such as Pitney Bowes, eBay, and Cisco are deploying the graph to solve some of their most mission-critical problems, forcing their competi‐ tion to catch up or leave the industry. Four of the top ten global retailers today use Neo4j. Behind them, their non-adapting competitors are struggling to make it because they’ve failed to adapt. This ability of graph databases to colonize and radically transform an industry is nowhere more apparent than in the emerging Internet of Things (IoT), a domain which might more aptly be called the Internet of Connected Things, because without the connections, there’s no point to it. When you have a lot of connected things, you have a graph-based problem. In recent years, a major telco equipment provider has entered the IoT space with a product that, embedded inside large telecom networks, sniffs network traffic and builds a model of all the connected devices on the network. If devices in one category are all flashing red at the same time, you can easily determine if it’s truly because all of them are simultaneously failing or if it’s because they’re all connected to a firewall and power supply that has just gone out. That level of real-time, predictive analysis is what you can do when taking a connected view of the IoT. The speed with which such solutions can be developed and put into production is a result of some significant changes to the underlying graph database technology. In 2013 we introduced Neo4j 2.0, marking a big change in the features, usability, and performance of the product. Besides a wholly new visualization tool, Neo4j 2.0 came with an improved data model, whose chief features, labels, optional constraints, and declarative indexes—coupled with numerous improvements to the Cypher query lan‐ guage—make designing and developing a graph database application easier and more intuitive than ever before. Accompanying this maturation of the technology is an amazing growth in commu‐ nity traction. According to db-engines.com, graph databases have been the fastest growing database category since 2013. Big data is the hottest growing sector in the tech industry, and graph databases are at the absolute nexus of that growth. Graphs are indeed eating the world, and there’s no turning back. viii | Foreword

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.