http://dx.doi.org/10.1090/cbms/107 Conference Board of the Mathematical Sciences C B M S Regional Conference Series in Mathematics Number 10 7 Complex Graph s and Network s Fan Chun g Linyuan L u Published for the Conference Board of the Mathematical Science s by the ^^^ America n Mathematical Societ y */^™^L Providence , Rhode Island with support from the National Science Foundatio n NSF-CBMS Regional Research Conference o n The Combinatorics of Large Sparse Graphs , held at California Stat e University San Marcos , June 7-12, 200 4 Partially supported by the National Science Foundation 2000 Mathematics Subject Classification. Primar y 05Cxx, 68R10, 68W20, 90B10, 90C06, 90C35, 94C15. For additional information and updates on this book, visit www.ams.org/bookpages/cbms-107 Library of Congress Cataloging-in-Publieatio n Dat a Chung, Fan R. K., 1949- Complex graphs and networks / Fan Chung, Linyuan Lu. p. cm. — (CBMS regional conference series in mathematics ; no. 107) Includes bibliographical references and index. ISBN-13: 978-0-8218-3657-6 (alk. paper) ISBN-10: 0-8218-3657-9 (alk. paper) 1. Grap h theory—Congresses . 2 . Combinatoria l analysis—Congresses . 3 . Informatio n networks—Congresses. I . Lu, Linyuan, 1971 — II . Title. III . Series: Regiona l conference se- ries in mathematics ; no. 107. QA166.C484 200 6 511'.5—dc22 200604289 8 Copying an d reprinting . Individua l reader s of this publication, an d nonprofit librarie s acting for them, are permitted to make fair use of the material, such as to copy a chapter for use in teaching or research. Permissio n is granted to quote brief passages from thi s publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted onl y under licens e from th e American Mathematica l Society . Request s fo r such permission should be addressed to the Acquisitions Department, American Mathematical Society, 201 Charles Street, Providence, Rhode Island 02904-2294, USA. Requests can also be made by e-mail to [email protected]. © 200 6 by the American Mathematical Society. All rights reserved. The American Mathematical Society retains all rights except those granted to the United States Government . Printed in the United States of America. @ Th e paper used in this book is acid-free and falls within the guidelines established to ensure permanence and durability. Visit the AMS home page at http: //www. ams. org/ 10 9 8 7 6 5 4 3 2 1 1 1 10 09 08 07 06 Contents Preface vi i Chapter 1. Grap h Theory in the Information Age 1 1.1. Introductio n 1 1.2. Basi c definitions 3 1.3. Degre e sequences and the power law 6 1.4. Histor y of the power law 8 1.5. Example s of power law graphs 1 0 1.6. A n outline of the book 1 7 Chapter 2. Ol d and New Concentration Inequalities 2 1 2.1. Th e binomial distribution and its asymptotic behavior 2 1 2.2. Genera l Chernoff inequalities 2 5 2.3. Mor e concentration inequalities 3 0 2.4. A concentration inequality with a large error estimate 3 3 2.5. Martingale s and Azuma's inequality 3 5 2.6. Genera l martingale inequalities 3 8 2.7. Supermartingale s and Submartingales 4 1 2.8. Th e decision tree and relaxed concentration inequalities 4 6 Chapter 3. A Generative Model — the Preferential Attachment Scheme 5 5 3.1. Basi c steps of the preferential attachment scheme 5 5 3.2. Analyzin g the preferential attachment model 5 6 3.3. A useful lemma for rigorous proofs 5 9 3.4. Th e peril of heuristics via an example of balls-and-bins 6 0 3.5. Scale-fre e networks 6 2 3.6. Th e sharp concentration of preferential attachment scheme 6 4 3.7. Model s for directed graphs 7 0 Chapter 4. Duplicatio n Models for Biological Networks 7 5 4.1. Biologica l networks 7 5 4.2. Th e duplication model 7 6 4.3. Expecte d degrees of a random graph in the duplication model 7 7 4.4. Th e convergence of the expected degrees 7 9 4.5. Th e generating functions for the expected degrees 8 3 4.6. Tw o concentration results for the duplication model 8 4 4.7. Powe r law distribution of generalized duplication models 8 9 Chapter 5. Rando m Graphs with Given Expected Degrees 9 1 5.1. Th e Erdos-Renyi model 9 1 5.2. Th e diameter of G 9 5 n v iv CONTENT S 5.3. A general random graph model 9 7 5.4. Size , volume and higher order volumes 9 7 5.5. Basi c properties of G(w) 10 0 5.6. Neighborhoo d expansion in random graphs 10 3 5.7. A random power law graph model 10 7 5.8. Actua l versus expected degree sequence 10 9 Chapter 6. Th e Rise of the Giant Component 11 3 6.1. N o giant component if w < 1? 11 4 6.2. I s there a giant component if w > 1? 11 5 6.3. N o giant component if w < 1? 11 6 6.4. Existenc e and uniqueness of the giant component 11 7 6.5. A lemma on neighborhood growth 12 6 6.6. Th e volume of the giant component 12 9 6.7. Provin g the volume estimate of the giant component 13 1 6.8. Lowe r bounds for the volume of the giant component 13 6 6.9. Th e complement of the giant component and its size 13 8 6.10. Comparin g theoretical results with the collaboration graph 14 1 Chapter 7. Averag e Distance and the Diameter 14 3 7.1. Th e small world phenomenon 14 3 7.2. Preliminarie s on the average distance and diameter 14 4 7.3. A lower bound lemma 14 6 7.4. A n upper bound for the average distance and diameter 14 7 7.5. Averag e distance and diameter of random power law graphs 14 9 7.6. Example s and remarks 15 8 Chapter 8. Eigenvalue s of the Adjacency Matrix of G(w) 16 1 8.1. Th e spectral radius of a graph 16 1 8.2. Th e Perron-Frobenius Theorem and several useful facts 16 2 8.3. Tw o lower bounds for the spectral radius 16 3 8.4. A n eigenvalue upper bound for G(w) 16 4 8.5. Eigenvalu e theorems for G(w) 16 5 8.6. Example s and counterexamples 16 9 8.7. Th e spectrum of the adjacency matrix of power law graphs 17 0 Chapter 9. Th e Semi-Circle Law for G(w) 17 3 9.1. Rando m matrices and Wigner's semi-circle law 17 3 9.2. Thre e spectra of a graph 17 4 9.3. Th e Laplacian of a graph 17 5 9.4. Th e Laplacian of a random graph in G(w) 17 6 9.5. A bound for random graphs with large minimum degree 17 7 9.6. Th e semi-circle law for Laplacian eigenvalues of graphs 17 9 9.7. A n upper bound on the spectral norm of the Laplacian 18 0 9.8. Implication s of Laplacian eigenvalues for G(w) 18 5 9.9. A n example of eigenvalues of a random power law graph 18 7 Chapter 10. Couplin g On-line and Off-line Analyses of Random Graphs 18 9 10.1. On-lin e versus off-line 18 9 10.2. Comparin g random graphs 19 0 CONTENTS v 10.3. Edge-independen t and almost edge-independent random graphs 19 4 10.4. A growth-deletion model for random power law graphs 19 8 10.5. Couplin g on-line and off-line random graph models 20 0 10.6. Concentratio n results for the growth-deletion model 20 5 10.7. Th e proofs of the main theorems 21 5 Chapter 11. Th e Configuration Model for Power Law Graphs 22 3 11.1. Model s for random graphs with given degree sequences 22 3 11.2. Th e evolution of random power law graphs 22 4 11.3. A criterion for the giant component in the configuration model 22 5 11.4. Th e sizes of connected components in certain ranges for (3 22 5 11.5. Th e distribution of connected components for f3 > 4 22 9 11.6. O n the size of the second largest component 23 2 11.7. Variou s properties of a random graph of the configuration model 23 6 11.8. Comparison s with realistic massive graphs 23 7 Chapter 12. Th e Small World Phenomenon in Hybrid Graphs 24 1 12.1. Modelin g the small world phenomenon 24 1 12.2. Loca l graphs with many short paths between local edges 24 2 12.3. Th e hybrid power law model 24 4 12.4. Th e diameter of the hybrid model 24 8 12.5. Loca l graphs and local flows 25 0 12.6. Extractin g the local graph 25 1 12.7. Communitie s and examples 25 3 Bibliography 25 5 Index 261 This page intentionally left blank Preface In many ways, working on graph theory problems over the years has always seemed like fun and games. Recently , through examples of large sparse graphs in realistic networks, research in graph theory has been forging ahead into an exciting new dimension. Graph theory has emerged as a primary tool for detecting numerous hidden structures in various information networks, including Internet graphs, social networks, biological networks, or more generally, any graph representing relations in massive data sets. How will we explain from first principles the universal and ubiquitous coherence in the structure of these realistic but complex networks? I n order to analyze these large sparse graphs we will need to us e all the tool s at ou r disposal , includin g combinatorial, probabilistic and spectral methods. Tim e and again, we have been pushed beyond the limit of the existing techniques and have had to create new and better tools to be able to analyze these networks. The examples of these networks have led us to focus on new, general and powerful ways to look at graph theory. I n the other direction, we hope that these new perspectives on graph theory contribute to a sound scientific foundation for our understanding of the discrete networks that permeate this information age. This book is based on ten lectures given at the CBMS Workshop on the Com- binatorics of Large Sparse Graphs in June 2004 at the California State University at San Marcos. Variou s portions of the twelve chapters here are based on several papers coauthored with many collaborators. Indeed , t o deal with the numerou s leads in such an emerging area it is crucial to have partners to sound out the right approaches, to separate what can be rigorously proved and under what condition s from wha t canno t b e proved, t o face seemingly overwhelmin g obstacle s and ye t still gather enough energy to overcome one more challenge. Special thanks are due to our coauthors, including Bill Aiello, Reid Andersen, David Galas, Greg Dewey, Shirin Handjani, Doug Jungreis, and Van Vu. We are particularly grateful to Ross Richardson and Reid Andersen for many beautiful illustrations in the book and to the students in Math261 spring 2004 at UCSD for taking valuable lecture notes. I n the course of writing, we have greatly benefitted from discussions with Alan Frieze, Joe Buhler and Herb Wilf. Most of all, we are indebted to Steve Butler and Ron Graham for their thoughtful readings and invaluable comments without which this book would not have so swiftly converged. Fan Chung and Lincoln Lu, May 2006 vii This page intentionally left blank http://dx.doi.org/10.1090/cbms/107/01 CHAPTER 1 Graph Theor y i n the Informatio n Ag e 1.1. Introductio n Graph theory has a history dating back more than 25 0 years (startin g wit h Leonhard Euler and his quest for a walk linking seven bridges in Konigsberg [17]). Since then, graph theory, the study of networks in their most basic form as inter- connections among objects, has evolved from its recreational roots into a rich and distinct subject. O f particular significance is its vital role in our understanding of the mathematics governing the discrete universe. Throughout th e years, grap h theorist s hav e been studyin g variou s types of graphs, such as planar graphs (drawn without edges crossing in the plane), interval graphs (arising in scheduling), symmetric graphs (hypercubes, platonic solids and those from group theory), routing networks (from communications) and computa- tional graphs that are used in designing algorithms or simulations. In 1999, at the dawn of the new Millennium, a most surprising type of graph was uncovered. Indeed , its universal importance has brought graph theory to the heart of a new paradigm of science in this information age . Thi s family of graphs consists of a wide collection arisin g fro m divers e arena s bu t havin g completel y unexpected coherence . Example s includ e the WWW-graphs, th e phon e graphs , the email graphs, the so-called "Hollywood " graphs of costars, the "collaboration " graph of coauthors, as well as legions of others from all branches of natural, social and life sciences. The prevailing characteristics of these graphs are the following: • Larg e — The size of the network typically ranges from hundreds of thou- sands to billions of vertices. Brute force approaches are no longer feasible. Mathematical wizardry is in demand again — how can we use a relatively small number of parameters to capture the shape of the network? • Spars e — The number of edges is linear, i.e., within a small multiple of the number of vertices. There might be dense graphs (having a quadratic number of edges in terms of vertices) out there but the large graphs that we encounter are mostly sparse. • Th e small world phenomenon — This is used to refer to two distinct properties: small distance (tw o strangers are typically joined by a short chain of mutual acquaintances), and the clustering effect (two people who share a common neighbor are more likely to know each other)