Table Of ContentUseR!
Eric D. Kolaczyk
Gábor Csárdi
Statistical
Analysis
of Network
Data with R
Second Edition
Use R!
Series Editors
Robert Gentleman, 23andMe Inc., South San Francisco, USA
Kurt Hornik, Department of Finance, Accounting and Statistics, WU
Wirtschaftsuniversität Wien, Vienna, Austria
Giovanni Parmigiani, Dana-Farber Cancer Institute, Boston, USA
Use R!
This series of inexpensive and focused books on R will publish shorter books
aimed at practitioners. Books can discuss the use of R in a particular subject area
(e.g.,epidemiology,econometrics,psychometrics)orasitrelatestostatisticaltopics
(e.g., missing data, longitudinal data). In most cases, books will combine LaTeX
and R so that the code for figures and tables can be put on a website. Authors
shouldassumeabackgroundassuppliedbyDalgaard’sIntroductoryStatisticswith
R or other introductory books so that each book does not repeat basic material.
More information about this series at http://www.springer.com/series/6991
á á
Eric D. Kolaczyk G bor Cs rdi
(cid:129)
Statistical Analysis
R
of Network Data with
Second Edition
123
EricD.Kolaczyk Gábor Csárdi
Department MathematicsandStatistics RStudio
BostonUniversity Boston, MA, USA
Boston, MA, USA
ISSN 2197-5736 ISSN 2197-5744 (electronic)
UseR!
ISBN978-3-030-44128-9 ISBN978-3-030-44129-6 (eBook)
https://doi.org/10.1007/978-3-030-44129-6
1stedition:©SpringerScience+BusinessMediaNewYork2014
2ndedition:©SpringerNatureSwitzerlandAG2020
Chapter11isadaptedinpartwithpermissionfromChapter4ofEricD.Kolaczyk,TopicsattheFrontier
of Statistics and Network Analysis: (Re)Visiting the Foundations, SemStat Elements (Cambridge:
CambridgeUniversityPress,2017),doi:10.1017/9781108290159.
Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar
methodologynowknownorhereafterdeveloped.
Theuse ofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc. inthis publi-
cationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromthe
relevantprotectivelawsandregulationsandthereforefreeforgeneraluse.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
hereinorforanyerrorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregard
tojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations.
ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG
Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland
Pour Josée, sans qui ce livre n’aurait pas vu
le jour—E.D.K.
Z-nak, a gyémántért és az aranyért—G.CS.
Preface
Networksand network analysisarearguablyone ofthelargestgrowthareas ofthe
early twenty-first century in the quantitative sciences. Despite roots in social net-
work analysis going back to the 1930s, and roots in graph theory going back
centuries, the phenomenal rise and popularity of the modern field of `network
science’,asitissometimescalled,issomethingthatcouldnothavebeenpredicted
20 years ago. Networks have permeated everyday life, far beyond the realm of
research and methodology, through now-familiar realities such as the Internet,
social networks, viral marketing, and more.
Measurementanddataanalysisareintegralcomponentsofnetworkresearch.As
a result, there is a critical need for all sorts of statistics for network analysis, both
common and sophisticated, ranging from applications, to methodology, to theory.
Aswithotherareasofstatistics,therearebothdescriptiveandinferentialstatistical
techniquesavailable,aimedataddressingahostofnetwork-relatedtasks,including
basic visualization and characterization of network structure; sampling, modeling,
and inference of network topology; and modeling and prediction of network-
indexed processes, both static and dynamic.
Softwareforperformingmostsuchnetwork-relatedanalysesisnowavailablein
various languages and environments, across different platforms. Not surprisingly,
the R community has been particularly active in the development of software for
doingstatisticalanalysisofnetworkdata.Asofthiswritingtherearealreadydozens
of contributed R packages devoted to some aspect of network analysis. Together,
these packages address tasks ranging from standard manipulation, visualization,
andcharacterizationofnetworkdata(e.g.,igraph,network,andsna),tomodeling
of networks (e.g., igraph, eigenmodel, ergm, and blockmodels), to network
topology inference (e.g., glasso and huge). In addition, there is a great deal of
analysis that can be done using tools and functions from the R base package.
Inthisbookweaimtoprovideaneasilyaccessibleintroductiontothestatistical
analysisofnetworkdata,bywayoftheRprogramminglanguage.Asaresult,this
book is not, on the one hand, a detailed manual for using the various R packages
encountered herein, nor, on the other hand, does it provide exhaustive coverage
vii
viii Preface
of the conceptual and technical foundations of the topic area. Rather, we have
attempted to strike a balance between the two and, in addition, to do so using a
(hopefully!) optimal level of brevity. Accordingly, we envision the book being
used, for example, by (i) statisticians looking to begin engaging in the statistical
analysis of network data, whether at a research level or in conjunction with a new
collaboration, and hoping to use R as a natural segue, (ii) researchers from other
similarly quantitative fields (e.g., computer science, statistical physics, and eco-
nomics) working in the area of complex networks, who seek to get up to speed
relativelyquicklyonhowtodostatisticalanalyses(bothfamiliarandunfamiliar)of
network data inR,and(iii)practitioners inappliedareas wishingtogetafoothold
on how to do a specific type of analysis relevant to a particular application of
interest.
More generally, the book has been written ata level aimed at graduate students
and researchers in quantitative disciplines engaged in the statistical analysis of
network data, although advanced undergraduates already comfortable with R
shouldfindmuchofthebookfairlyaccessibleaswell.Therefore,weanticipatethe
book being of interest to readers in statistics, of course, but also in areas such as
computational biology, computer science and machine learning, economics, neu-
roscience, quantitative finance, signal processing, statistical physics, and the
quantitative social sciences.
For the second edition of this book, there are three significant changes. First,
followingapackage-wideoverhaulofthenomenclatureusedinigraphafewyears
ago, all of the many calls to igraph functions throughout this book have been
updated accordingly. Second, we have added a new chapter on the topic of net-
worked experiments, an area in which there has been an explosion of recent
activity, with relevance from the health sciences to politics to marketing. Lastly,
mirroring the substantial amount of research and development on the topic of
stochastic block models over the past 5 years, we have updated our treatment to
incorporate the blockmodels package.
Thereareanumberofpeoplewewishtothank,whosehelpatvariousstagesof
development and writing is greatly appreciated. Thanks again go to the editorial
team at Springer for their enthusiasm in encouraging us to take on this project
originally and to pursue the current revision. Thanks go as well to the various
students in the course Statistical Analysis of Network Data (MA703) at Boston
University in the Fall semesters of 2013, 2015, and 2019 for their comments and
feedback. Special thanks for this edition are due to Will Dean and Jiawei Li, who
spent the better part of a summer going through every code line in the book for
functionality, nomenclature, and such. We are again grateful as well to Christophe
Ambroise, Alain Barrat, Mark Coates, Suchi Gopal, Emmanuel Lazega, and Petra
Staufer for kindly making available their data.
More broadly, we would like to express our appreciation in general for the
countlesshoursofeffortinvestedbythedevelopersofthemanyRpackagesthatwe
have made use of throughout the pages of this book. Without their work, the
breadth and scope of our own here would be significantly reduced. And we would
Preface ix
like to thank the many people who have made use of the first edition of the book
and sent feedback, whether in the form of enthusiastic comments or flags for new
software glitches that arose. Finally, yet again, we wish to express our deepest
gratitude toourrespective families fortheirlove,patience,andsupportthroughout
the revision of this book.
AllcodeanddatausedinthisbookhavebeenmadeavailableintheRpackage
sand, distributed through the CRAN archive.
Boston, MA, USA Eric D. Kolaczyk
London, England Gábor Csárdi
December 2019
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Why Networks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Types of Network Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Visualizing and Characterizing Networks. . . . . . . . . . . 3
1.2.2 Network Modeling and Inference . . . . . . . . . . . . . . . . 5
1.2.3 Network Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Why Use R for Network Analysis? . . . . . . . . . . . . . . . . . . . . . 8
1.4 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 About the R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Manipulating Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Creating Network Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Undirected and Directed Graphs . . . . . . . . . . . . . . . . . 14
2.2.2 Representations for Graphs . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Operations on Graphs. . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Decorating Network Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.1 Vertex, Edge, and Graph Attributes. . . . . . . . . . . . . . . 18
2.3.2 Using Data Frames. . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Talking About Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.1 Basic Graph Concepts . . . . . . . . . . . . . . . . . . . . . . . . 21
2.4.2 Special Types of Graphs. . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Additional Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3 Visualizing Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Elements of Graph Visualization . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Graph Layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
xi