ebook img

Statistical Analysis Of Network Data With R PDF

235 Pages·2020·5.397 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical Analysis Of Network Data With R

UseR! Eric D. Kolaczyk Gábor Csárdi Statistical Analysis of Network Data with R Second Edition Use R! Series Editors Robert Gentleman, 23andMe Inc., South San Francisco, USA Kurt Hornik, Department of Finance, Accounting and Statistics, WU Wirtschaftsuniversität Wien, Vienna, Austria Giovanni Parmigiani, Dana-Farber Cancer Institute, Boston, USA Use R! This series of inexpensive and focused books on R will publish shorter books aimed at practitioners. Books can discuss the use of R in a particular subject area (e.g.,epidemiology,econometrics,psychometrics)orasitrelatestostatisticaltopics (e.g., missing data, longitudinal data). In most cases, books will combine LaTeX and R so that the code for figures and tables can be put on a website. Authors shouldassumeabackgroundassuppliedbyDalgaard’sIntroductoryStatisticswith R or other introductory books so that each book does not repeat basic material. More information about this series at http://www.springer.com/series/6991 á á Eric D. Kolaczyk G bor Cs rdi (cid:129) Statistical Analysis R of Network Data with Second Edition 123 EricD.Kolaczyk Gábor Csárdi Department MathematicsandStatistics RStudio BostonUniversity Boston, MA, USA Boston, MA, USA ISSN 2197-5736 ISSN 2197-5744 (electronic) UseR! ISBN978-3-030-44128-9 ISBN978-3-030-44129-6 (eBook) https://doi.org/10.1007/978-3-030-44129-6 1stedition:©SpringerScience+BusinessMediaNewYork2014 2ndedition:©SpringerNatureSwitzerlandAG2020 Chapter11isadaptedinpartwithpermissionfromChapter4ofEricD.Kolaczyk,TopicsattheFrontier of Statistics and Network Analysis: (Re)Visiting the Foundations, SemStat Elements (Cambridge: CambridgeUniversityPress,2017),doi:10.1017/9781108290159. Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpart of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission orinformationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodologynowknownorhereafterdeveloped. Theuse ofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc. inthis publi- cationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromthe relevantprotectivelawsandregulationsandthereforefreeforgeneraluse. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained hereinorforanyerrorsoromissionsthatmayhavebeenmade.Thepublisherremainsneutralwithregard tojurisdictionalclaimsinpublishedmapsandinstitutionalaffiliations. ThisSpringerimprintispublishedbytheregisteredcompanySpringerNatureSwitzerlandAG Theregisteredcompanyaddressis:Gewerbestrasse11,6330Cham,Switzerland Pour Josée, sans qui ce livre n’aurait pas vu le jour—E.D.K. Z-nak, a gyémántért és az aranyért—G.CS. Preface Networksand network analysisarearguablyone ofthelargestgrowthareas ofthe early twenty-first century in the quantitative sciences. Despite roots in social net- work analysis going back to the 1930s, and roots in graph theory going back centuries, the phenomenal rise and popularity of the modern field of `network science’,asitissometimescalled,issomethingthatcouldnothavebeenpredicted 20 years ago. Networks have permeated everyday life, far beyond the realm of research and methodology, through now-familiar realities such as the Internet, social networks, viral marketing, and more. Measurementanddataanalysisareintegralcomponentsofnetworkresearch.As a result, there is a critical need for all sorts of statistics for network analysis, both common and sophisticated, ranging from applications, to methodology, to theory. Aswithotherareasofstatistics,therearebothdescriptiveandinferentialstatistical techniquesavailable,aimedataddressingahostofnetwork-relatedtasks,including basic visualization and characterization of network structure; sampling, modeling, and inference of network topology; and modeling and prediction of network- indexed processes, both static and dynamic. Softwareforperformingmostsuchnetwork-relatedanalysesisnowavailablein various languages and environments, across different platforms. Not surprisingly, the R community has been particularly active in the development of software for doingstatisticalanalysisofnetworkdata.Asofthiswritingtherearealreadydozens of contributed R packages devoted to some aspect of network analysis. Together, these packages address tasks ranging from standard manipulation, visualization, andcharacterizationofnetworkdata(e.g.,igraph,network,andsna),tomodeling of networks (e.g., igraph, eigenmodel, ergm, and blockmodels), to network topology inference (e.g., glasso and huge). In addition, there is a great deal of analysis that can be done using tools and functions from the R base package. Inthisbookweaimtoprovideaneasilyaccessibleintroductiontothestatistical analysisofnetworkdata,bywayoftheRprogramminglanguage.Asaresult,this book is not, on the one hand, a detailed manual for using the various R packages encountered herein, nor, on the other hand, does it provide exhaustive coverage vii viii Preface of the conceptual and technical foundations of the topic area. Rather, we have attempted to strike a balance between the two and, in addition, to do so using a (hopefully!) optimal level of brevity. Accordingly, we envision the book being used, for example, by (i) statisticians looking to begin engaging in the statistical analysis of network data, whether at a research level or in conjunction with a new collaboration, and hoping to use R as a natural segue, (ii) researchers from other similarly quantitative fields (e.g., computer science, statistical physics, and eco- nomics) working in the area of complex networks, who seek to get up to speed relativelyquicklyonhowtodostatisticalanalyses(bothfamiliarandunfamiliar)of network data inR,and(iii)practitioners inappliedareas wishingtogetafoothold on how to do a specific type of analysis relevant to a particular application of interest. More generally, the book has been written ata level aimed at graduate students and researchers in quantitative disciplines engaged in the statistical analysis of network data, although advanced undergraduates already comfortable with R shouldfindmuchofthebookfairlyaccessibleaswell.Therefore,weanticipatethe book being of interest to readers in statistics, of course, but also in areas such as computational biology, computer science and machine learning, economics, neu- roscience, quantitative finance, signal processing, statistical physics, and the quantitative social sciences. For the second edition of this book, there are three significant changes. First, followingapackage-wideoverhaulofthenomenclatureusedinigraphafewyears ago, all of the many calls to igraph functions throughout this book have been updated accordingly. Second, we have added a new chapter on the topic of net- worked experiments, an area in which there has been an explosion of recent activity, with relevance from the health sciences to politics to marketing. Lastly, mirroring the substantial amount of research and development on the topic of stochastic block models over the past 5 years, we have updated our treatment to incorporate the blockmodels package. Thereareanumberofpeoplewewishtothank,whosehelpatvariousstagesof development and writing is greatly appreciated. Thanks again go to the editorial team at Springer for their enthusiasm in encouraging us to take on this project originally and to pursue the current revision. Thanks go as well to the various students in the course Statistical Analysis of Network Data (MA703) at Boston University in the Fall semesters of 2013, 2015, and 2019 for their comments and feedback. Special thanks for this edition are due to Will Dean and Jiawei Li, who spent the better part of a summer going through every code line in the book for functionality, nomenclature, and such. We are again grateful as well to Christophe Ambroise, Alain Barrat, Mark Coates, Suchi Gopal, Emmanuel Lazega, and Petra Staufer for kindly making available their data. More broadly, we would like to express our appreciation in general for the countlesshoursofeffortinvestedbythedevelopersofthemanyRpackagesthatwe have made use of throughout the pages of this book. Without their work, the breadth and scope of our own here would be significantly reduced. And we would Preface ix like to thank the many people who have made use of the first edition of the book and sent feedback, whether in the form of enthusiastic comments or flags for new software glitches that arose. Finally, yet again, we wish to express our deepest gratitude toourrespective families fortheirlove,patience,andsupportthroughout the revision of this book. AllcodeanddatausedinthisbookhavebeenmadeavailableintheRpackage sand, distributed through the CRAN archive. Boston, MA, USA Eric D. Kolaczyk London, England Gábor Csárdi December 2019 Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Why Networks? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Types of Network Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Visualizing and Characterizing Networks. . . . . . . . . . . 3 1.2.2 Network Modeling and Inference . . . . . . . . . . . . . . . . 5 1.2.3 Network Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3 Why Use R for Network Analysis? . . . . . . . . . . . . . . . . . . . . . 8 1.4 About This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.5 About the R Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 Manipulating Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.2 Creating Network Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 Undirected and Directed Graphs . . . . . . . . . . . . . . . . . 14 2.2.2 Representations for Graphs . . . . . . . . . . . . . . . . . . . . . 16 2.2.3 Operations on Graphs. . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3 Decorating Network Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.1 Vertex, Edge, and Graph Attributes. . . . . . . . . . . . . . . 18 2.3.2 Using Data Frames. . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Talking About Graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Basic Graph Concepts . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.2 Special Types of Graphs. . . . . . . . . . . . . . . . . . . . . . . 24 2.5 Additional Reading. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3 Visualizing Network Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Elements of Graph Visualization . . . . . . . . . . . . . . . . . . . . . . . 29 3.3 Graph Layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 xi

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.