Springer Theses Recognizing Outstanding Ph.D. Research Vincent Traag Algorithms and Dynamical Models for Communities and Reputation in Social Networks Springer Theses Recognizing Outstanding Ph.D. Research For furthervolumes: http://www.springer.com/series/8790 Aims and Scope The series ‘‘Springer Theses’’ brings together a selection of the very best Ph.D. theses from around the world and across the physical sciences. Nominated and endorsed by two recognized specialists, each published volume has been selected for its scientific excellence and the high impact of its contents for the pertinent fieldofresearch.Forgreateraccessibilitytonon-specialists,thepublishedversions includeanextendedintroduction,aswellasaforewordbythestudent’ssupervisor explaining the special relevance of the work for the field. As a whole, the series will provide a valuable resource both for newcomers to the research fields described, and for other scientists seeking detailed background information on specialquestions.Finally,itprovidesanaccrediteddocumentationofthevaluable contributions made by today’s younger generation of scientists. Theses are accepted into the series by invited nomination only and must fulfill all of the following criteria • They must be written in good English. • ThetopicshouldfallwithintheconfinesofChemistry,Physics,EarthSciences, Engineering andrelatedinterdisciplinaryfieldssuchasMaterials, Nanoscience, Chemical Engineering, Complex Systems and Biophysics. • The work reported in the thesis must represent a significant scientific advance. • Ifthethesisincludespreviouslypublishedmaterial,permissiontoreproducethis must be gained from the respective copyright holder. • They must have been examined and passed during the 12 months prior to nomination. • Each thesis should include a foreword by the supervisor outlining the signifi- cance of its content. • The theses should have a clearly defined structure including an introduction accessible to scientists not expert in that particular field. Vincent Traag Algorithms and Dynamical Models for Communities and Reputation in Social Networks Doctoral Thesis accepted by the Catholic University of Louvain, Belgium 123 Author Supervisors Dr. Vincent Traag Prof.PaulVan Dooren KITLV DepartmentofMathematicalEngineering— Leiden ICTEAM The Netherlands Université catholique de Louvain Louvain-la-Neuve Belgium Prof.YuriiNesterov Center forOperations Research and Econometrics(CORE) Université catholique de Louvain Louvain-la-Neuve Belgium ISSN 2190-5053 ISSN 2190-5061 (electronic) ISBN 978-3-319-06390-4 ISBN 978-3-319-06391-1 (eBook) DOI 10.1007/978-3-319-06391-1 Springer ChamHeidelberg New YorkDordrecht London LibraryofCongressControlNumber:2014939940 (cid:2)SpringerInternationalPublishingSwitzerland2014 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionor informationstorageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilar methodology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purposeofbeingenteredandexecutedonacomputersystem,forexclusiveusebythepurchaserofthe work. Duplication of this publication or parts thereof is permitted only under the provisions of theCopyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the CopyrightClearanceCenter.ViolationsareliabletoprosecutionundertherespectiveCopyrightLaw. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publicationdoesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexempt fromtherelevantprotectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Supervisors’ Foreword We are living in a world where the amount of data that is collected and stored is just staggering. Moreover, the information and communication technology required to have access to these data has become quite affordable so that every- bodywhowishescanhaveaccesstoit,asfarasitisinthepublicdomain.Thishas hadatremendousimpactnotonlyinscienceandtechnologybutalsoincommerce and recreation, where having access to the right bit of information is crucial. An obviousexampleofsuchasourceofinformationisthe‘‘internet,’’withwhichwe mean the World Wide Web and search engines such as Google. But social net- works have started to play a big role as well in getting access to data. Networks suchasFacebook,LinkedIn,andTwitterhave attracted billions ofusersinavery short time. These networks allow friends or colleagues to connect to each other andretrieveordistributeinformationthatwouldbehardtofindotherwise.Butthe networks themselves can also be viewed as data that can be analyzed to extract valuableinformation about the ‘‘nodes’’ ofthe network, which can bepeople,but also objects, pictures, texts, and so on. The structure of such networks plays an important role in the type of infor- mationonecanextractfromthem.Oneprominentfeatureofmanysocialnetworks is the clustering of nodes (people in this case). Friends tend to have many friends incommon,therebycreatingsocialgroupsinwhichmanypeopleknoweachother (and often have the same taste, behavior or habits). Knowing these social groups yields additional insight into the structure of these networks and can be used for commercial purposes by companies or by providers of certain services. To find these groups, the idea is to look for densely connected subgraphs in the network, which are only loosely connectedamongeach other.These are commonlyknown as‘‘communities’’andthefieldthatdealswithfindingsuchcommunitiesisknown as ‘‘community detection.’’ Several more mathematical criteria have been pro- posed to characterize these groups more precisely, such as the popular method called ‘‘modularity,’’ introduced by Newman and Girvan. In this book, the author analyzesindepththeproblemofcommunitydetectionandproposesanalternative method, calledthe Constant PottsModel,andexplains thatitsmajor advantage is that it has no resolution limit and hence can also detect relatively small commu- nities in large networks. Although the proposed solution does not suffer from the resolution limit, there are still some questions related to scale. The author then v vi Supervisors’Foreword introducestheconceptof‘‘significance’’whichhelpstodecidewhetherapartition should be rather coarse of rather fine. Both these developments are important contributions of his work. Although most methods for community detection focus on networks that have positivelinks,negativelinksalsoappearnaturallyandmayrepresentanimosityor distrust. Incorporating these negative links can be done in a relatively natural mannerbyinsistingonaslittlenegativelinksaspossiblewithinacommunity.This isillustratedhereusinganetworkofinternationalrelationsandacitationnetwork. Thestructureofnegativelinkshasbeenstudiedbythesocialsciencesbeforeinthe contextof‘‘socialbalance’’andisbasedontheadagethat‘‘theenemyofanenemy is a friend.’’ The main observation in that literature was that socially balanced networks can be split into at most two factions where each faction has only positive links within and negative links between the factions. Besides the impor- tantquestionofdetectingsuchfactions innetworks, theauthoralsoanalyzes how social balance may emerge and why it is observed so often. This is done using a new dynamical model that explains the emergence of social balance. In addition, there is a natural connection between negative links and the problem of the evo- lution of cooperation that one finds in the area of dynamical games. The author uses ideas borrowed from this literature to explain that social balance can lead to cooperation.Finally,theauthoralsolooksathowtodeterminewhowillcooperate with whom. This is especially pertinent in online markets such as eBay or Ama- zon,whereonewantstomakesureonecantrustones‘‘friends.’’Theauthorshows howtousethenetworkconsistingoflocallinks(whicharepositivefor‘‘trust’’and negativefor‘‘distrust’’)tocalculateaglobaltrustvalue,whichisthe‘‘reputation’’ of the corresponding node. This book makes the bridge between two distinct areas: (i) community detec- tion in large sparse graphs and (ii) social balance and evolution of cooperation. The author covers quite a wide range of topics in it since the two distinct areas requiredifferentbackgrounds.Thesynthesisofthestateoftheartintheseareasis well equilibrated and all the important concepts are well described. The book makes important novel contributions in a very competitive area of research. Louvain-la-Neuve, April 2014 Prof. Paul Van Dooren Prof. Yurii Nesterov Preface The first presentation ever of my research was on February 2009, Friday the 13th—how scary is that—and was in front of mathematicians in Louvain-la- Neuve—how scary is that. Having only a Master’s in Sociology in my pocket I arrived there to apply for a position as a Ph.D. candidate (although, if memory serves me well, that was not entirely clear for everyone). Of course, I was no complete stranger to mathematics, yet not having studied it and still wanting to pursue a Ph.D. in that direction did not quite seem to add up. Fortunately, my advisors Paul Van Dooren and Yurii Nesterov were happy to take me on board. I amgratefultothisdatethattheydidso.Theleewaytheyallowedmetopursuemy own interest is much appreciated. I have learned a lot from them, and both are impressively (if not intimidatingly) fast when doing mathematics. I was fortunate enough to be funded by the Actions de recherche concertées, Large Graphs and Networks of the Communauté Française de Belgique and the Belgian Network Dynamical Systems, Control, and Optimization (DYSCO), funded by the Inter- university Attraction Poles Programme, initiated by the Belgian State, Science Policy Office. My fellow Ph.D. students have also taught me a lot. Not having had the exact same training as most other Ph.D. candidates, I could borrow their expertise in trying to understand something. For some courses I was the designated teaching assistant, without actually ever having taken the course myself, making it some- what of a challenge. For example, I had to learn integer programming. Before beingabletolearnintegerprogramming,Ihadtolearnlinearprogramming,which also involved doing the simplex algorithm. If I say I will never forget that, it is probably true, but I would like to never make another simplex tableau again. Around the time I started, there were a few other students coming in from the privatesector:Pierre,François-Xavier,andArnaud,whichreassuredmethatIwas not the only one that had tried the private sector and returned to academia. Throughout the years, Arnaud and I collaborated on various projects, I have enjoyed our cooperation very much. Similarly for Pierre Deville and Adeline Decuyper,itwasapleasureworkingwithyou,andgoodluckorganizingNetMob next time around, for which Vincent Blondel was kind enough to invite us last year.Finally,IwouldliketothankeverybodyelseintheEulerbuilding(toomany peopletolist)forthegreatatmosphereduringcoffeebreaksandlunchtime.Ihave vii viii Preface enjoyed theconversationsinthecafeteriaverymuch,although forthemostpartI have only listened instead of actually engaging in the discussions. Iwouldliketothanktheothermembersofthejury,FrançoisGlineur,Vincent Blondel, Marco Saerens, and Patrick De Leenheer. Their comments and remarks have greatly improved this thesis. I have had the pleasure to collaborate with PatrickwhilehewasBelgiumin2012.Hishelpwasquintessentialtotheprogress on the social balance project, for which I am much obliged. Many friends and family have come to visit in Brussels, and it was always a pleasure having you. Bas, Hans-Hein, and Mathijs, you have always had that fingerspitzengefühl for coming to Brussels. Merijn, despite your busy job, two kids, moving two times, and an entire renovation, you still managed to come to Brussels: so good you could make it. Roel, our discussions on the balcony of the Rue Lebeau were marvellous—as always—I hope to continue many of them in Amsterdam.ManyaSundaymorningwasspentattheVossenplein/PlaceduJeude Balle when my family-in-law came over. Fortunately, due to long breakfasts we never arrived that early, you’re always welcome for such long breakfasts. From Brussels,IhaveverymuchenjoyedclimbingwithyouTom,Ihopetoseeyoustill after moving. Frank, our lunches were a pleasant distraction from the daily Ph.D. grind. Many friends go unnamed, but not forgotten: I hope to see you all more often when I am back in Amsterdam. Likewise for my parents, my brother and sister, Ernst and Susan, I hope to see you more often, Marco, Carlijn and Niels included of course. I hold you all very dear. Mom and dad, you have always supported me—both before and during my Ph.D.—I will always be grateful for your care and love. Finally,somebodythatmeritsaparagraphinitsown.Thefirsttwoyearsofmy Ph.D. our time together largely loomed in the shadow of the loss of your mother. Although such a loss will always leave a void, together I believe we have over- come.Afterhavingbeenpartedbyover200kmofrailforover3years,wefinally spentthelastyeartogetherinBrussels.Itwasablisstofinallylivetogether,andI hopetocontinuetoenjoyyourcompanyformanyyearstocome!Lio,youaremy true love. Contents 1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Part I Communities in Networks 2 Community Detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1 Modularity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Canonical Community Detection . . . . . . . . . . . . . . . . . . . . . 13 2.2.1 Reichardt and Bornholdt . . . . . . . . . . . . . . . . . . . . . 15 2.2.2 Arenas, Fernández and Gómez. . . . . . . . . . . . . . . . . 18 2.2.3 Ronhovde and Nussinov . . . . . . . . . . . . . . . . . . . . . 19 2.2.4 Constant Potts Model . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.5 Label Propagation. . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.6 Random Walker. . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.7 Infomap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.8 Alternative Clustering Methods . . . . . . . . . . . . . . . . 27 2.3 Algorithms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.1 Simulated Annealing. . . . . . . . . . . . . . . . . . . . . . . . 29 2.3.2 Greedy Improvement . . . . . . . . . . . . . . . . . . . . . . . 32 2.3.3 Louvain Method. . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.3.4 Eigenvector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 2.4 Benchmarks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4.1 Test Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4.2 Comparing Partitions . . . . . . . . . . . . . . . . . . . . . . . 39 2.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3 Scale Invariant Community Detection. . . . . . . . . . . . . . . . . . . . . 49 3.1 Issues with Modularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.1 Resolution Limit. . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.2 Non-locality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.1.3 Spuriously High Modularity. . . . . . . . . . . . . . . . . . . 55 ix