Archipelago: A Network Security Analysis Tool TuvaStang(cid:0) ,FahimehPourbayat(cid:0) ,MarkBurgess(cid:0) , Geoffrey Canright(cid:1) ,KenthEngø(cid:1) and A˚smundWeltzien(cid:1) OsloUniversityCollege,Norway andTelenorResearch,Oslo,Norway (cid:2) (cid:3) July26, 2003 Abstract ways to work around so-called strong se- curity mechanisms. File access security is Archipelagoissystemanalysisandvisualization a generic representation of communication toolwhichimplementsseveralmethodsofauto- flowaroundasystem,andweuseitasaway matedresourceandsecurityanalysisforhuman- ofdiscussingseveralotherproblems. Other computer networks; this includes physical net- issues like social engineering have previ- works,socialnetworks,knowledgenetworksand ously been notoriously difficult to address networksofcluesinaforensicanalysis. Access in quantitative terms, but fit easily into our control,intrusionsandsocialengineeringcanbe discussion.Wehavemadesomeprogressin discussedintheframeworkofgraphicalandin- thisareabyapplyinggraphtheoreticaltech- formation theoretical relationships. Groups of niquestotheanalysisofsystems[2]. Inthis users and shared objects, such as files or con- paper we implement a tool for using these versations,providecommunicationschannelsfor techniquesanddemonstrateitsuseinanum- thespreadofbothauthorizedandun-authorized berofexamples. information. We present a Java based analysis The paper begins with a brief discussion toolthatevaluatesnumericalcriteriaforthesecu- of the graph-theoretical model of security, rityofsuchsystemsandimplementsalgorithms andhowitisusedto representassociations forfindingthevulnerablepoints. that lead to the possible communication of data.Nextweconsiderhowcomplexgraphs canbeeasilyrepresentedinasimplifiedvi- 1 Introduction sual form. The purpose of this is to shed light on the logical structure of the graph, Networksecurityisasubjectthatcanbedis- ratherthanitsrawtopologicalstructure.We cussed from many viewpoints. Many dis- describea method of eigenvectorcentrality cussions focus entirely upon the technolo- forrankingnodesaccordingtotheirimpor- gies that protect individual system transac- tance, and explain how this can be used to tions, e.g. authentication methods, ciphers organize the graph into a logical structure. and tunnels. Less attention has been given Finally,wediscusstheproblemofhoweas- tothematterofsecuritymanagement,where ily information can flow through a system a general theoretical framework has been and find criteria for total penetration of in- lacking. formation. In this work, we explore two theoreti- cal methods to estimate systemic security, as opposed to system component security. 2 Graphs describe a tool (Archipelago) for scanning systems,calculatingandvisualizingthedata A graph is a set of nodes joined together andtestingtheresults. by edges or arcs. Graph theoretical meth- Ourpaperstartswiththeassumptionthat ods have long been used to discuss issues securityisapropertyofanentiresystem[1] incomputersecurity[3,4],typicallytrustre- andthatcovertchannels,suchassocialchat- lationshipsandrestrictedinformationflows ter and personal meetings, are often viable (privacy). To our knowledge, no one has 1 considered graphical methods as a practi- pathway for information that is not subject caltoolforperformingapartiallyautomated tosecuritycontrols. analysis of real computer system security. Ourbasicmodelisofanumberofusers, Computersystemscanformrelativelylarge relatedbyassociationsthataremediatedby graphs. The Internet is perhaps the largest human-computerresources. Thegraphswe graphthathaseverbeenstudied, andmuch discuss in this paper normally represent a research has been directed at analyzing the singleorganizationorcomputersystem.We flow of information through it. Research donotdrawanynodesfor outsiders; rather shows that the Internet[5] and the Web[6] we shall view outsiders as a kind of reser- (the latter viewedas a directed graph)each voir of potential danger in which our orga- haveapower-lawdegreedistribution. Such nizationisimmersed. a distribution is characteristic[7, 8, 9] of Inthesimplestcase,wecanimaginethat a self-organized network, such as a social usershaveaccesstoanumberoffiles.Over- network, rather than a purely technological lappingaccess to files allowinformation to one. Increasingly we see technology be- bepassedfromusertouser:thisisachannel ingdeployedinapatternthatmimicssocial forinformationflow. Forexample,consider networks,ashumansbindtogetherdifferent asetof(cid:1) files,sharedby users(fig.1). (cid:2) technologies, such as the Internet, the tele- phonesystemandverbalcommunication. a b c d e Socialnetworkshavemayinterestingfea- Files f i tures,butaspecialfeatureisthattheydonot ui always have a well defined centre, or point of origin; this makes them highly robust to failure, but also extremelytransparent to Users attack[10]. A question of particular inter- 1 2 3 4 5 6 7 est to a computer security analyst, or even a system administrator deployingresources is: canweidentifylikelypointsofattackin a general network of associations, and use Figure1: Users(darkspots)areassociatedwith thisinformationtobuildanalyticaltoolsfor one another through resources (light spots) that securinghuman-computersystems? they share access to. Each light spot contains (cid:3)(cid:5)(cid:4) files or sub-channels and defines a group , (cid:6) throughitsassociationwith (cid:4) linkstousers. In (cid:7) 3 Associations computerparlance,theyform‘groups’. Users relate themselves to one another by Here we see two kinds of object (a bi- file sharing, peer groups, friends, message partitegraph),connectedbylinksthatrepre- exchange,etc. Everysuchconnectionrepre- sentassociations. Abipartiteformisuseful sentsapotentialinformationflow. Ananal- fortheoreticaldiscussions,butinagraphical ysis of these can be useful in several in- tool it leads to too much ‘mess’ on screen. stances: Bi-partite graphs have been examined be- fore to provide a framework for discussing (cid:0) For finding the weakest points of a security[11]. We can eliminate the non- security infrastructure for preventative usernodesbysimplycolouringthelinksto measures. distinguishthem,orkeepingtheircharacter solelyforlook-upinadatabase. (cid:0) In forensic analysis of breaches, to Any channel that binds users together is tracetheimpactofradiated damageat apotentialcovertsecuritybreach. Sincewe a particular point, or to trace back to are estimating the probability of intrusion, thepossiblesource. allof these must be considered. For exam- ple,afile,orsetoffiles,connectedtoseveral Communication takes place over many usersclearlyformsasystemgroup,incom- channels,someofwhicharecontrolledand puter parlance. In graph-theory parlance othersthatarecovert. Acovertchannelisa thegroupissimplyacompletesubgraphor 2 4 Visualizing graphs in clique. In reality, there are many levels of association between users that could act as shorthand channelsforcommunication: Thecomplexityofthebasicbi-partitegraph (cid:0) Groupworkassociation(access). and the insight so easily revealed from the Venn diagrams beg the question: is (cid:0) Friends,familyorothersocialassocia- thereasimplerrepresentationofthegraphs tion. that summarizes their structure and which (cid:0) Physicallocationofusers. highlightstheirmostimportantinformation channels? An important clue is provided Ina recentsecurityincidentata University by the Venn diagrams; these reveal a con- in Norway, a cracker gained complete ac- venientlevelofdetailinsimplecases. cesstosystemsbecauseallhostshadacom- Let us define a simplification procedure mon root password. This is another com- basedonthis. mon factor that binds ‘users’ at the host Definition1(Trivialgroup) An ellipse level,formingagraphthatlookslikeagiant that encircles only a single user node is a centralhub. Inapostfactumforensicinves- trivialgroup. Itcontainsonlyoneuser. tigation, all of these possible routes of as- sociationbetweenpossibleperpetratorsofa Definition2(Elementarygroup) For crimearepotentiallyimportantclueslinking each file node , obtain the maximal group peopletogether.Eveninanapriorianalysis (cid:0) ofusersconnectedtothenodeandencircle suchgeneralizednetworksmightbeusedto these with a suitable ellipse (as in fig. 2). addressthelikelytargetsofsocialengineer- Anellipsethatcontainsonlytrivialgroups, ing. assubgroups,isanelementarygroup. Each user naturally has a number of file objects that are private. These are repre- Our aim in simplifying a graph is to orga- sented by a single line from each user to nizethegraphusingthelowresolutionpic- a single object. Since all users have these, turegeneratedbyasimplificationrule. they can be taken for granted and removed fromthediagraminordertoemphasizethe Definition3(Simplificationrule) For roleofmorespecialhubs(seefig. 2). each file node , obtain the maximal group (cid:0) ofusersconnectedtothenodeandencircle these with a suitable ellipse or other enve- lope (as in fig. 2). Draw a super-node for each group, labelled by the total degree of group (the number of users within it). For eachoverlappingellipse,drawanunbroken line between the groups that are connected by an overlap. These are cases where one or more users belongs to more than one group,i.e. thereisadirectassociation. For each ellipse that encapsulates more than Figure 2: An example of the simplest level at oneelementarygroups,drawadashedline. whichagraphmaybereducedtoaskeletonform andhow hot-spotsareidentified. Thisis essen- As a further example, we can take the tiallyahistogram,or‘heightabovesea-level’for graph used in ref. [12]. fig. 3 shows the thecontourpicture. graphfromthatreference. Fig. 4showsthe samegraphaftereliminatingtheintermedi- The resulting contour graph, formed by ate nodes. Finally, Figs. 5 and 6 showthis theVenndiagrams,isthe firstindicationof graphinournotation(respectively,theVenn potentialhot-spots inthelocalgraphtopol- diagram and the elementary-group short- ogy. Later we can replace this with a bet- hand). ter measure — the ‘centrality’ or ‘well- The shorthand graphs (as in fig. 6) may connectedness’ofeachnodeinthegraph. beusefulinallowingonetoseemoreeasily 3 1 2 3 4 H F I A B C D G K A B C D E F G H I J K E J Figure3:Theexamplebi-partitegraphfromref. Figure5:TheVenndiagramforthegraphinfig. [12]servesasanexampleoftheshorthandpro- 3 shows simple and direct associations that re- cedure. sembletheone-modeprojection,withoutdetails. A B H C F 3 E D I 5 4 G 4 K J Figure 6: The final compressed form of the Figure4: A‘one-mode’projectionofthegraph graph in fig. 3 eliminates all detail but retains infig. 3,asgivenbyref. [12]isformedbythe thesecuritypertinentfactsaboutthegraph. eliminationoftheintermediarynodes. Notethat bi-partitecliquesintheoriginalappearherealso ascliques. directedgraph,thenumberoflinksconnect- ing node to all other nodes is called the (cid:0) whenabiggrouporasmallgroupislikely degree ofthenode. (cid:0)(cid:2)(cid:1) tobeinfectedbybadinformation.Theyalso identify the logical structure of the nodes What are the best connected nodes in a clearly.However,thisprocedureiscomplex graph? These are certainly nodes that an and work intensive in any large graph. We attacker would like to identify, since they thereforeintroduceamoregeneralandpow- would lead to the greatest possible access, erfulmethodthatcanbeusedtoperformthe or spread of damage. Similarly, the se- same organization. This method identifies curity auditor would like to identify them coarse logical regions in a graph by iden- and secure them, as far as possible. From tifying nodes that are close to particularly the standpoint of security, then, important centralorimportantnodesandthenfinding nodes in a network (files, users, or groups thosenodesthatconnectthemtogether. in the shorthand graph) are those that are ’well-connected’. Thereforeweseekapre- ciseworkingdefinitionof’well-connected’, 5 Node centrality and the in order to use the idea as a tool for pin- spread of information pointingnodesofhighsecurityrisk. A simple starting definition of well- In this section, we consider the connected connectedcouldbe’ofhighdegree’:thatis, components ofnetworksandproposecrite- count the neighbours. We want howeverto riafordecidingwhichnodesaremostlikely embellishthissimpledefinitioninawaythat toinfectmanyothernodes,iftheyarecom- looksbeyondjustnearestneighbours.Todo promised. Wedothisbyexaminingtherel- this. weborrowanoldideafrombothcom- ative connectivity of graphs along multiple monfolkloreandsocialnetworktheory[13]: pathways. animportantpersonisnotjustwellendowed withconnections,butiswellendowedwith Definition4(Degreeofanode) In a non- connectionstoimportantpersons. 4 Themotivationforthisdefinitionisclear theadjacencymatrix . If is an (cid:0) (cid:0) ’)(*’ from the example in figure 7. It is clear matrix, it has eigenvectors(one for each ’ from this figure that a definition of ’well- node in the network), and correspondingly connected’ that is relevant to the diffusion manyeigenvalues.Theeigenvectorofinter- ofinformation(harmful orotherwise) must estistheprincipaleigenvector,i.e. thatwith look beyond first neighbours. In fact, we highesteigenvalue,sincethisistheonlyone believe that the circular definition given that results from summing all of the possi- above (important nodes have many impor- blepathwayswithapositivesign.Thecom- tantneighbours)isthebeststartingpointfor ponents of the principal eigenvector rank researchondamagediffusiononnetworks. how ‘central’ a node is in the graph. Note thatonlyratios (cid:6) ofthecomponentsare (cid:2) (cid:1)(cid:14)+ (cid:2) meaningfully determined. This is because B tdheetelremngintheds b(cid:2) y(cid:1) (cid:2) t(cid:1)hoefetihgeeneviegcetnovreecqtourastiaorne.not This form of well-connectedness is termed ’eigenvector centrality’ [13] in the fieldofsocialnetworkanalysis, wheresev- eralotherdefinitionsofcentralityexist. For A theremainderofthepaper,weusetheterms ‘centrality’ and ’eigenvector centrality’ in- terchangeably. We believe that nodes with high eigen- vectorcentralityplayaimportantroleinthe diffusionofinformationinanetwork.How- ever,weknowoffewstudies(seeref. [14]) Figure 7: Nodes and are both con- whichtestthisideaquantitatively. Wehave (cid:0) (cid:1) nectedbyfivelinkstotherestofthegraph, proposedthismeasureof centralityas a di- butnode isclearlymoreimportanttose- (cid:1) agnosticinstrument for identifying the best curity because its neighbours are also well connected nodes in networks of users and connected. files[2,15]. Nowwemakethiscirculardefinitionpre- When a node has high eigenvector cen- cise. Let denote a vector for the im- trality(EVC),itanditsneighborhoodhave portanceran(cid:2) k(cid:1) ing,orconnectedness,ofeach high connectivity. Thus in an important node . Then, the importance of node is sense EVC scores represent neighborhoods propo(cid:0)rtional to the sum of the importan(cid:0)ces asmuchasindividualnodes. Wethenwant ofallof ’snearestneighbours: to use these scores to define clusterings of (cid:0) nodes, with as little arbitrariness as possi- (cid:5) (cid:6)(cid:30)(cid:29) (1) ble. (Notethattheseclusteringsarenotthe (cid:2) (cid:1)(cid:4)(cid:3) (cid:2) (cid:6)(cid:8)(cid:7)(cid:10)(cid:9)(cid:12)(cid:11)(cid:14)(cid:13)(cid:15)(cid:17)(cid:16)(cid:19)(cid:18)(cid:21)(cid:20)(cid:8)(cid:22)(cid:24)(cid:23)(cid:26)(cid:25)(cid:27)(cid:20)(cid:8)(cid:28) same as user groups–although such groups (cid:1) Thismaybewrittenas areunlikelytobesplitupbyour clustering approach.) (cid:5) (cid:2) (cid:1)(cid:4)(cid:3) (cid:0) (cid:1)(cid:6) (cid:2) (cid:6) (cid:31) (2) To do this, we define as Centres those (cid:6) nodeswhoseEVCishigherthananyoftheir where istheadjacencymatrix,whoseen- neighbors’ scores (local maxima). Clearly (cid:0) tries (cid:6) are 1 if is a neighbour of , and theseCentresareimportantintheflowofin- (cid:0) (cid:1) (cid:0) ! 0otherwise. Noticethatthisself-consistent formationonthenetwork.Wealsoassociate equationisscaleinvariant;wecanmultiply aRegion(subsetofnodes)witheachCentre. " byanyconstantandtheequationremains TheseRegionsaretheclustersthatweseek. (cid:2) thesame. Wecanthusrewriteeqn. (2)as Wefindthatmorethanonerulemayberea- sonablydefinedtoassignnodestoRegions; " " (cid:31) (3) (cid:0) (cid:2)$#&% (cid:2) the results differ in detail, but not qualita- and, if non-negative solutions exist, they tively.Onesimpleruleistousedistance(in solve the self-consistent sum; i.e. the im- hops) as the criterion: a node belongs to a portance vector is hence an eigenvector of givenCentre(ie, toits Region)ifitis clos- 5 OUTSIDE WORLD 5 13 4 Web server 7. system admin 12 Marketing 11 10 3 DB 6 Register 9 Staff orders 1 Dispatch 8 2 Order 14 Management processing Figure8: Unstructuredgraphofahuman-computersystem—anorganizationthatdealswithInter- netordersanddispatchesgoodsbypost. est(innumberofhops)tothatCentre. With linkedtothewebserverthroughthesystem thisrule,somenodeswillbelongtomultiple administrator, and management sits on the regions,astheyareequidistantfromtwoor edge of the company, liaising with various moreCentres. Thissetofnodesdefinesthe staffmemberswhorunthedepartments. Borderset. Let us find the central resource sinks in Thepicture weget thenisofoneorsev- this organization, first assuming that all of eral regions of the graph which are well- the arcs are equally weighted, i.e. con- connected clusters–as signalled by their in- tributeaboutthesameamounttotheaverage cluding a local maximum of the EVC. The flowthroughtheorganization.Weconstruct Borderthendefinestheboundariesbetween the adjacencymatrix, compute its principal these regions. This procedure thus offers a eigenvectorand organizethe nodes into re- way of coarse-graining a large graph. This gions, as described above. The result is procedure is distinct from that used to ob- showninfig. 9. tain the shorthand graph; the two types of Node 7 is clearly the most central. This coarse-graining may be used separately, or is the system administrator. This is per- incombination. hapsasurprisingresultforanorganization, but it is a common situation where many 6 Centrality examples parts of an organization rely on basic sup- port services to function, but at an uncon- sciouslevel.Thisimmediatelysuggeststhat To illustrate this idea, consider a human- system administration services are impor- computersystemforInternetcommercede- tant to the organization and that resources picted in fig. 8. This graph is a mixture of shouldbegiventothisbasicservice. Node humanandcomputerelements:departments 6 is the next highest ranking node; this is andservers. Werepresenttheoutsideworld the order registration department. Again, byasingleoutgoingorincominglink(node thisisnotparticularlyobviousfromthedia- 5). gramalone:itdoesnotseemtobeanymore Theorganizationconsistsofawebserver important than order processing. However, connected to a sales database, that collects with hindsight, we can see that its impor- orders which are then passed on to the tance arises because it has to liaise closely order registration department. These col- withallotherdepartments. lect money and pass on the orders to or- der processing who collect the orders and Using the definitions of regions and sendthemtodispatchforpostaldeliveryto bridges from section 5, we can redraw the the customers. A marketing department is graph using centrality to organize it. The 6 OUTSIDE WORLD 5 11 4 2 10 8 14 1 6 MANAGER 7 12 9 3 13 MARKETING ORDER PROCESSING Figure9: Acentrality-organizedgraphshowingthestructureofthegraphcentredaroundtwolocal maximaor‘mostimportant’nodes,thataretheorderregistrationdepartmentandthesystemadmin- istrator.Therearealso4bridgenodesandabridginglinkbetweentheregions. result is shown in fig. 9. The structure aremanynodes(over550!)intheBorder. revealed by graph centrality accurately re- Infigure10wepresentavisualizationof flects the structure of the organization: it this graph, using Centres, Regions, and the is composed largely of two separate en- Borderasawayoforganizingtheplacement terprises: marketing and order processing. ofthenodesusingourArchipelagotool[17]. These departments are bound together by Boththefigure andthenumericalresults certain bridges that include management supportourdescriptionofthisgraphaswell- and staff that liaise with the departments. connected: it has only a small number of Surprisingly,systemadministrationservices Regions, and there are many connections fall atthe centreof the staff/marketingpart (bothBordernodes, andlinks)betweenthe of the organization. Again, this occurs be- Regions. We find these qualitative conclu- cause it is a critical dependency of this re- sionsto hold for other Gnutellagraphsthat gionofthesystem.Finallythewebserveris we have examined. Our criteria for a well- a bridge that connects both departments to connectedgraphareconsonantwithanother theoutsideworld—theoutsidehangingon one,namely,thatthegraphhasapower-law attheperipheryofthesystems. node degree distribution [10]. Power-law To illustrate the ideas further we present graphs are known to be well-connected in data from a large graph, namely, the the sense that they remain connected even Gnutella peer-to-peer file-sharing network, after the random removal of a significant viewed in a snapshot taken November 13, fractionofthenodes. Andinfactthe(self- 2001 [16]. In this snapshot the graph organized)Gnutellagraphshowninfig. 10 hastwodisconnectedpieces—onewith992 hasapower-lawnodedegreedistribution. nodes, and one with three nodes. Hence We believe that poorly-connected (but forallpracticalpurposeswe canignorethe still percolating) graphs will be revealed, smallpiece,andanalyzethelargeone. Here by our clustering approach, to have rel- wefindthattheGnutellagraphisverywell- atively many Centres and hence Regions, connected. There are only two Centres, withrelativelyfewnodesandlinksconnect- hence only two natural clusters. These re- ing these Regions. Thus, we believe that gions are roughly the same size (about 200 thecalculationofeigenvectorcentrality,fol- nodeseach). Thismeans,inturn,thatthere lowedby thesimple clusteringanalysis de- 7 Figure 10: Atoplevel, simplifiedrepresentationofGnutellapeertopeerassociations, organized aroundthelargestcentralitymaxima.Thegraphsconsistsoftwofragments,onewith992nodesand oneofmerely3nodesandorganizesthegraohintoRegions. Theupperconnectedfragmentshows tworegionsconnectedbyaringofbridgenodes. scribed here (and in more detail in [15]), Agraphissaidtopercolateifeverynode can give highly useful information about can reach every other by some route. This how well connected a graph is, which re- transitionpointissomewhatartificialforuse gions naturally lie together (and hence al- as a management criterion, since links are lowrapidspreadofdamaginginformation), constantly being made and broken, partic- andwherearetheboundariesbetweensuch ularlyin amobile partially-connectedenvi- easily-infectedregions. Allofthisinforma- ronmentofmodernnetworks.Ratherweare tion should be of utility in analyzinga net- interested in average properties and proba- workfromthepointofviewofsecurity. bilities. Oneofthesimplesttypesofgraphis the hierarchaltree. Hierarchical graphs are not 7 Percolation: the spread a good model of user-file associations, but of information in the they are representative of many organiza- tional structures. A very regular hierarchi- graph cal graph in which each node has the same degree (number of neighbours) is known How many links or channels can one add as the Cayley tree. Studies of percolation to a graph, at random, before the system phase transitions in the Cayley model can becomes essentially free of barriers? This givesomeinsightintothecomputersecurity question is known as the percolation prob- problem: at the ’percolation threshold’ es- lemandthebreakdownofbarriersisknown sentiallyallnodesareconnectedina’giant as the formation of a giant cluster in the cluster’—meaning that damage can spread graph. from one node to all others. For link den- 8 sity(probability)belowthisthresholdvalue, completely accurate only in the continuum such widespread damage spreading cannot limitofverylargenumberofnodes . ’ occur. We shall not reproduce here the argu- For small, fixed graphs there is often no ment of ref. [12] to derive the condition probleminexploringthewholegraphstruc- fortheprobableexistenceofagiantcluster, ture and obtaining an exact answer to this butsimplyquoteitforauni-partiterandom question. Themostprecisesmall-graphcri- graphwithdegreedistribution . # $ terion for percolation comes from asking Result1 Thelarge-graphconditionforthe howmanypairsofnodes,outofallpossible existence of a giant cluster(of infinite size) pairs,canreachoneanotherinafinitenum- issimply berofhops. Wethusdefinetheratio of (cid:0)(cid:2)(cid:1) connectedpairsofnodestothetotalnumber (cid:5) (cid:29) (6) ofpairsthatcouldbeconnected: (cid:0)(cid:4)(cid:13) (cid:0)%(cid:15)’&(cid:12)(cid:19)(#)$+*-, $ (cid:9) (cid:0) (cid:1) # (cid:5) (cid:10)(cid:12)(cid:9) (cid:11) (cid:1)(cid:14)(cid:13) (cid:11) (cid:1)(cid:16)(cid:15)(cid:18)(cid:17)(cid:20)(cid:19) #(cid:25)(cid:17) (cid:29) (4) pTlhieisdptoroavihduemsaans-icmomplpeutteesrtstyhsattemca,ninboerdaper- (cid:7)(cid:4)(cid:3)(cid:6)(cid:5)(cid:22)(cid:12)(cid:25)(cid:8)(cid:7) (cid:11)(cid:14)(cid:23)(cid:26)(cid:25) (cid:10) ’(cid:21)(cid:13) ’(cid:22)(cid:15)(cid:23)(cid:17)(cid:24)(cid:19) (cid:1) to estimate the possibility of complete fail- Thisissimplythecriterionthatthegraphbe urevia percolatingdamage. If we only de- connected. terminethe , thenwe haveanimmediate Ifwewishtosimplifythisruleforeaseof # $ machine-testablecriterionforthepossibility calculation,wecantake ,where (cid:11) (cid:1)(cid:27)(cid:26)(cid:29)(cid:28) (cid:1)(cid:12)(cid:30) (cid:17) ofasystemwidesecuritybreach. isthenumberoflinksincluster . Then, The problem with the above expression (cid:28) (cid:1) (cid:0) if isthetotalnumberoflinksinthegraph, is clearly that it is derived under the as- (cid:28) criterion(4)becomes sumption of there being a smooth differen- tiable structure to the average properties of (cid:28) (cid:13) (cid:28)!(cid:30) (cid:17)(cid:20)(cid:19) " (cid:29) (5) thegraphs. Forasmallgraphwith nodes (cid:0) (cid:31) # (cid:17) ’ ’(cid:21)(cid:13) ’(cid:22)(cid:15)(cid:18)(cid:17)(cid:20)(cid:19) thecriterionfor agiantclusterbecomesin- Thus we have one ’naive’ small-graph test accurate. Clusters do not grow to infinity, which is very simple, and one ’exact’ cri- they can only grow to size at the most, ’ terion which requires a little more work to hence we must be more precise and use a compute. dimensionful scale rather than infinity as a Theproblemwiththesecriteriaisthatone reference point. The correction is not hard does not always have access to perfect in- toidentify;thethresholdpointcanbetaken formation about real organizations. Even tobeasfollows. ifsuchinformationwereavailable,security Result2 The small-graph condition for administratorsarenotsomuchinterestedin widespread percolation in a uni-partite what appears to be an accurate snapshot of graphoforder is: the present, as in what is likely to happen ’ in the near future. Socially motivated net- . (cid:10) (cid:5) "(cid:18)465(cid:12)7 (cid:29) (7) worksarenotusuallyorderly,likehierarchi- (cid:0)0/ (cid:30) (cid:0)(cid:4)(cid:13) (cid:0)1(cid:15)2&3(cid:19)(cid:20)#($ (cid:13) ’!(cid:19) cal trees, buthaveastrongrandomcompo- $ nent. We therefore adapt results from the This can be understood as follows. If a theory of random graphs to obtain a statis- graph contains a giant component, it is of tical estimate for the likelihood of percola- order andthesizeofthenextlargestcom- ’ tion, based on remarkably little knowledge ponentistypically 465(cid:12)7 ; thus, accord- 89(cid:13) ’!(cid:19) ofthesystem. ingtothetheoryofrandomgraphsthemar- To study a random graph, all we need is gin for error in estimating a giant compo- an estimate or knowledge of their degree nent is of order 465(cid:12)7 . In the criterion : ’ distributions. Random graphs, with arbi- above,thecriterionforaclusterthatismuch trarynodedegreedistributions havebeen greaterthanunityisthattherighthandside #(cid:4)$ studied in ref. [12], using the method of isgreaterthanzero. Tothiswenowaddthe generating functionals. This method uses magntitudeoftheuncertaintyinordertore- a continuum approximation, using deriva- duce the likelihood of an incorrect conclu- tivestoevaluateprobabilities,andhenceitis sion. 9 The expression in eqn. (7) is not much more complex than the large-graph crite- rion. Moreover, all of our small-graph cri- teriaretainthiervalidityinthelimitoflarge . Henceweexpectthesesmall-graphcri- ’ teriato bethe mostreliablechoicefor test- ing percolation in small systems. This ex- pectation is borne out in the examples be- low. From testing of the various criteria, the exact and statistical estimates are roughly comparable in their ability to detect perco- lation. The statistical tests we have exam- ined are useful when only partial informa- tionaboutagraphisavailable. 8 Archipelago Figure 13: In fact the two graphs in fig. 11 and fig. 12 are not separate. Due to file shar- Our reference implementation of the above ing between staff and students, they are linked. criteriafortestingnodevulnerabilityandin- Whentheselinksaretakeintoaccount,thepic- formation flow, is a Java application pro- ture,evenconsideringonlyfile-sharing,becomes gram, with associated Perl scripts, which somewhatdifferent.Thisshowshowimportantit istounderstandtheboundariesofasystem. we call Archipelago. Thename ofthe pro- gramis basedonthewhimsical association of our model of regions and bridges. An can be entered manually or generated, e.g. archipelagoisavolcanicislandthatusually by a Perl script that scans Unix file group takestheformofacharacteristicarcoftops associations. Archipelago calculates cen- juttingoutofthewaterlevel. Thetopslook tralityandpercolationcriteriaandorganizes separate but are actually bridged justunder the regions into an archipelago of central waterbyavolcanicsaddle. Thisistheform peaks surrounded by their attendant nodes thatarisesnaturallyfromorganizingthevi- (colouredinblack).Nodesandlinksthatact suallayoutofgraphsaccordingtocentrality. asbridgesbetweentheregionsarecoloured red to highlight them, and disconnected fragmentsarecolouredwithdifferentback- groundtintstodistinguishthem(seefigs.11 and12). The Application allows one to zoom in andmovenodesaroundtoincreasetheclar- ityofrepresentation. Onecanalsoaddand removenodesandlinkstoexaminethevul- nerabilityofthenetworktoindividualnode removal,orspuriouslinkaddition. A database of information about the nodesis kept by the program, so that regu- larSQLsearchescanbemadetosearchfor Figure 12: AscanofthestaffnetworkatOslo covert links between users, based on com- UniversityCollege.Itiswidelybelievedthatthis mon properties such as same family name, networkismoresecurethanthestudentnetwork, orsameaddress.Theadditionofevenasin- however thisimageshowsotherwise. Sincethe staffaremoretrustingandmoreinterconnected, gle covert link can completely change the thenetworkispotentiallyfarlesssecure. landscapeofagraphandmakeitpercolate, ordeposeunstablecentres. The Archipelago application accepts, as Analyses in the right hand panel of the input,anadjacencymatrixofagraph. This main window(fig. 11) showthe histogram 10
Description: