ebook img

Statistical inference for network samples using subgraph counts PDF

1.3 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Statistical inference for network samples using subgraph counts

Statistical Inference for Network Samples Using Subgraph Counts 7 P-A. G. Maugis1, C. E. Priebe2, S. C. Olhede1, and P. J. Wolfe1 1 0 2 1University College London – Department of Statistical Science 2Johns Hopkins University – Department of Applied Mathematics and Statistics n a J 3 1 Abstract ] E We consider that a network is an observation, and a collection of ob- M servednetworksformsasample.Inthissetting,weprovidemethodstotest whether all observations in a network sample are drawn from a specified . model. We achieve this by deriving, under the null of the graphon model, t a thejointasymptoticpropertiesofaveragesubgraphcountsasthenumber t s of observed networks increases but the number of nodes in each network [ remains finite. In doing so, we do not require that each observed network containsthesamenumberofnodes,orisdrawnfromthesamedistribution. 2 v Our results yield joint confidence regions for subgraph counts, and there- 5 foremethodsfortestingwhethertheobservationsinanetworksampleare 0 drawn from: a specified distribution, a specified model, or from the same 5 modelasanothernetworksample.Wepresentsimulationexperimentsand 0 an illustrative example on a sample of brain networks where we find that 0 highly creative individuals’ brains present significantly more short cycles. . 1 0 Keywords: Statistical Testing, Graphon Model, Subgraph Statistics. 7 1 ThisworkwassupportedinpartbyDARPASIMPLEXcontractN66001-15-C-4041the USArmyResearchOfficeunderMultidisciplinaryUniversityResearchInitiativeAward58153- : v MA-MUR; by the US Office of Naval Research under Award N00014-14-1-0819; by the UK i EngineeringandPhysicalSciencesResearchCouncilunderMathematicalSciencesLeadership X FellowshipEP/I005250/1,EstablishedCareerFellowshipEP/K005413/1,DevelopingLeaders r AwardEP/L001519/1,andAwardEP/N007336/1;bytheUKRoyalSocietyunderaWolfson a ResearchMeritAward;andbyMarieCurieFP7IntegrationGrantPCIG12-GA-2012-334622 and the European Research Council under Grant CoG 2015-682172NETS, both within the SeventhEuropeanUnionFrameworkProgram.TheauthorsthanktheIsaacNewtonInstitute forMathematicalSciences,Cambridge,UK,forsupportandhospitalityduringtheprogram TheoreticalFoundationsforStatisticalNetworkAnalysis(EPSRCgrantno.EP/K032208/1) where a portion of the work on this paper was undertaken. The authors thank Joshua T. Vogelsteinandhisteamforconnectomedataandexpertise. 1 1 Introduction We show that subgraph counts are flexible and powerful statistics for inference on collections of networks. Specifically, we use subgraph counts to test the hypotheses that all networks in a sample are generated either from a given distribution, from distributions in a given model, or from the same model as that of another sample. Our results address the inference problem raised by the following experi- ment[1]: Thenetworksconnectingbrainregionsofindividualsofvariedlevelsof creativity is observed. However, while observations can be assumed to be inde- pendent, due to the variability of the brain structure and the instability of the observation technique, they cannot be assumed to be identically distributed; for instance, they need not contain as many nodes and edges. How, while allowing for such variations, can we test for significant differences between individuals with different levels of creativity? Formally, we consider that a network is an observation—say G —and a col- i lection of observed networks form a sample—say G =(G ,...,G ). Then, our 1 N goal is to infer distributional properties of the G -s as N grows. This parallels i more classical statistical settings, where an observation is a vector—such as X ∈Rk—and a sample is a matrix: X =(X ,...,X )∈Rk×N. However, our i 1 N setting strongly differs from the one where only one very large network is ob- served,and forwhich many methods already exist (see [2–10],to cite but a few). Surprisingly, no statistical method exists to compare samples of small networks, and currently only tools to compare two large networks are available [11,12]. Here we provide an analog of a multivariate t-test for network samples: methods to test whether a given network sample G presents averages consistent with either a specific model, or with that of another sample. The averages we use are subgraph counts; e.g., the number of or in the sample. The choice of subgraph counts as statistics is motivated by their success in comparing large networks [13,14], but also by results in random graph theory and the study of largegraphs.Inbothfields,subgraphcountshaveprovedtobethemostpowerful tool available to compare networks [15,16], and are known to have properties similar to moments of random variables [17]. Formally, to perform our tests, we are first embedding network samples into a space defined by subgraph counts. While related testing procedures also use embeddings[11,12],usingsubgraphcountspresentsthreekeyadvantages:first,if the G -s are generated by a blockmodel [2]—the most popular random network i model to date—and for an appropriate family of subgraphs, the embedding is one-to-one. This result is known as the finite forciblity of blockmodels [17]. Second,veryfewassumptionson eachG needtobemadeasN growstoobtain i consistency and asymptotic normality of the image of G in the embedding space. This enables us to work under a very flexible null model. Finally, because it relies on physical properties of the G -s—the number of , , and so on—this i embedding remains interpretable. In the remainder of this article, we first introduce subgraph counts and the graphon model. We then present, successively, the case where all the networks 2 (a) (b) (c) Figure 1: Example of subgraph counts. There are 6 copies of the edge ( ) in all three graphs. There are 2 copies of the triangle ( ) in a) and b), but 0 in c). There are 1, 0 and 3 copies of the square ( ) in a), b) and c) respectively. inthesamplecomefromthesamegraphonmodel(butarenotnecessarilyofthe samesize),andthecasewhereeachobservednetworkmaycomefromadifferent graphon. In both cases, we prove asymptotic normality of our estimator, and present representative examples showing the practical use of the result. We conclude with an application to connectomes, and a discussion. 2 Subgraph Counts in the Graphon Model We now define ourstatistics (subgraphcounts) andournullmodel(the graphon model). Subgraph counts are natural statistics to compare networks for two reasons. First,subgraphcounts intuitivelysummarize a networkthroughits fun- damental building blocks. This has historically given them purchase to address hard fundamental and empirical problems [13,14,18]. Second, subgraph counts presenttractableanalyticalproperties. Wewilldescribeandleveragetheseprop- erties below, in a manner paralleling what is done in related literatures [4,6,18]. A subgraph count is the number of copies of a given graph in another graph (seeFig.1).Throughout,wecallthesubgraph—anddenoteF—thegraphwhich is counted and G the larger graph in which the counting takes place. All graphs will be simple (unweighted, no self loops or multiple edges). Subgraphs are also termed motifs, pattern graphs or shapes depending on the field [13,19–21]. For clarity, we define subgraph counts formally as follows (we write |F| to be the number of nodes in F, and F ⊂ G if F is a (not necessarily induced) subgraph of G): Definition 1 (Subgraph count X (G)). For two graphs F and G, we call the F count of F in G—and write X (G)—the following quantity: F X (G)=#(cid:8)F(cid:48) ⊂G : F(cid:48) is a copy of F(cid:9), F where F(cid:48) is a copy of F if there exists an adjacency preserving bijection between F and F(cid:48). Furthermore, for a tuple F = (F ,...,F ) of subgraphs, we write 1 k X (G) to be the vector (X (G),...,X (G)). F F1 Fk Withthis notation,calling G ,G andG the graphs in Fig. 1,we have that a b c X (G )=1, X (G )=0, X (G )=3. a b c 3 The power of subgraph counts to study networks stems from their inherent linearity. Indeed, products of subgraph counts are but linear combinations of othersubgraphcounts.Intuitively,aproductoftwosubgraphcountswillinvolve counting pairs of copies,and therefore can be recovered by counting the number ofcopiesofallsubgraphsthatcanbeinducedbyapairofcopies. Moreprecisely, in the Appendix we show the following: Lemma 1 (Linearity of subgraph counts). For any two graphs F and F(cid:48), there are factors c and a set H of subgraphs—the set of subgraphs that can be H FF(cid:48) obtained using one copy of each F and F(cid:48) as building blocks—such that for any graph G (cid:88) X (G)X (G)= c X (G). F1 F2 H H H∈HFF(cid:48) For instance, as these will be used later on, we have that in any graph G, X (G)2 =2X (G)+2X (G)+2X (G)+X (G), X (G)2 =2X (G)+2X (G)+6X (G)+2X (G) +2X (G)+2X (G)+6X (G)+6X (G)+X (G). This algebraic property of subgraph counts underpins the proofs of [6,16,18], and the subgraph counting algorithms of [20,21], among many other examples. Crucially, as opposed to cases where the model enforces linearity—such as with assumptions of Normality—it is the nature of the statistics (subgraph counts) and the system (graphs) that makes the problem linear. The linearity of subgraph counts allows us to use as null the very flexible graphonmodel[17].Thisframeworksubsumesmostmodelsusedinthestatistical literature on networks; e..g, blockmodel [2] and dot-product models [5]. It has the intuitive structure of affixing to each node i a latent feature (here x ) and i of connecting nodes i and j (conditionally independently) with a probability determined by the node features (here f(x ,x )). i j Definition2(Graphonf andrandomgraphG (f)). Fixasymmetricintegrable n map f :[0,1]2 →[0,1], and call it a graphon. We call G (f) the random graph n distribution over graphs with n nodes such that: to each node is randomly and independently assigned a feature x ∈ [0,1], with x ∼ Unif([0,1]); and where i i edges form independently conditionally on {x } with probability i i∈[n] P[ij ∈G|x ,x ]=f(x ,x ). i j i j To recover a blockmodel with K blocks, it suffices to consider a partition of [0,1] in K sets (i.e., (P ,...,P )∈P ([0,1])) and set f as constant over each 1 K K P ×P . The dot-product model is recovered with a graphon f of finite rank; u v (cid:80) i.e., f(x,y)= λ f (x)f (y). u≤K u u u In the graphon framework, subgraph counts have direct interpretation as moments of f [17]. Specifically, if G∼G (f), then the moments of X (G) are n F 4 moments of f. Therefore, following results similar to the Hausdorff moment problem, subgraph counts are sufficient statistics to distinguish between any two graphons [16,17]. However, there are no guarantees on which subgraphs are needed to distinguish between two graphons. For blockmodels and finite rank models,we knowonlythata finite numberis sufficient(a conceptknowas finite forcibility, see [17, Chapter 16.7 & Appendix 4] for more details). Unfortunately,allknownresultsonsubgraphcountsunderthegraphonmodel considerthe setting where one verylarge graphis observed. Here we presentthe tools to address the problem where a sample of small graphs is observed. 3 The simple case: Samples from one graphon We now present a central limit theorem as well as practical methods to build confidence regions for the subgraph counts observed in a network sample G = (G ,...,G ). In this section, we assume that there is a graphon f such that 1 N each G is drawn independently from G (f) (where n =|G |). i ni i i Fix F ∈F and G∈G. In this setting, X (G) is a random variable, and the F first parameter to consider is its mean. To compute this mean, let F ,...F 1 m be all the copies of F in K (the complete graph over the nodes of G), so that G using the linearity of the expectation, we have that (cid:88) (cid:88) EX (G)=E 1 = E1 . F {Fj⊂G} {Fj⊂G} j∈[m] j∈[m] Then,directcomputationsshowthatE1 doesnotdependonj (seePropo- {Fj⊂G} sition A.1), and that (cid:90) (cid:89) (cid:89) E1 =µ (f):= f(x ,x ) dx , {Fj⊂G} F u v u [0,1]|F|uv∈F u∈F so that EX (G)=mµ (f). Observe that µ (f) is a moment of the graphon f, F F F as discussed above. Similarcomputationsforhighermoments,aidedbyLemma1,enableustouse the Lindeberg-Feller central limit theorem along with the Cramer-Wold device to obtain the following: Theorem 1 (Statistical properties of subgraph counts). Fix a tuple of graphs F,a graphon f and a sequence n=(ni)i∈N such that 2maxF∈F|F|≤mini∈Nni. Let G = (G ) be a graph sample such that for all i, G ∼ G (f). Set µˆ (G) = Ni−1i∈(cid:80)[N] X (G)/X (K ), µˆ (G) = (µˆ (G)) i, andnµi (f) = F G∈G F F G F √ F F∈F F (µ (f)) . Then, µˆ (G) is an unbiased, N-consistent and asymptotically F F∈F F normal estimator of µ (f); i.e., Eµˆ (G) = µ (f) and there exists Σ (n,f) F F F F such that asymptotically in N: √ (cid:0) (cid:1) (cid:0) (cid:1) N µˆ (G)−µ (f) →Normal 0,Σ (n,f) . F F F 5 5.2e-04 C3 4.4e-044.8e-04 C4 3.4e-053.8e-05 C4 3.4e-053.8e-05 4.0e-04 3.0e-05 3.0e-05 6.4e-02 6.8e-02 7.2e-02 7.6e-02 6.4e-02 6.8e-02 7.2e-02 7.6e-02 4.0e-04 4.4e-04 4.8e-04 5.2e-04 C2 C2 C3 Figure 2: Testing for a blockmodel using F ={C ,C ,C }={ , , }. The 2 3 4 sample G is such that: N = 300, n is the i-th digit of π plus 40, G is drawn i i from a two block graphon f (G ∼ G (f)). The density estimate µˆ (G) is i ni F denoted by a black cross. Overlaid are the expected densities (colored dots) and the confidence ellipse (shaded area) for two alternative graphons f and f . The a b p-valuesobtainedwiththeMahalanobisdistancearerespectively0.6and7e−15. Furthermore, for each F,F(cid:48) ∈F, (cid:88) (cid:0) (cid:1) cov(µˆ (G),µˆ (G))= ω (n;N) µ (f)−µ (f)µ (f) , (1) F F(cid:48) H H F F(cid:48) H∈HFF(cid:48)\{F(cid:116)F(cid:48)} withF(cid:116)F(cid:48) thedisjointunionofF,F(cid:48),andωH(n;N)=N1 (cid:80)Ni=1 XF(cKHnXiH)X(KF(cid:48)n(iK)ni). Crucial to the following is the covariance matrix Σ (n,f)—which can be F obtained by taking the limit in N in (1) for each F,F(cid:48) ∈ F—and which will enablethecomputationofconfidenceregions.Interestingly,itselicitationismore involved than for the study of large graphs, where only a few terms dominate. We refer to the Appendix for the proof as well as a simulation experiment. Theorem 1 enables testing against the null that all G are drawn from a i given graphon. To make this concrete, we consider an example in Fig. 2. There, we observe a graph sample G = (G ,...,G ), and aim to compare it to two 1 300 graphon models f (in red) and f (in blue) using Theorem 1; i.e., we assume a b that for i∈[N], G ∼G (f) and consider the null hypothesis H :f =f and i ni 0 a the alternative H : f = f . We draw as a black cross µˆ (G) and as smaller 1 b F blackpoints the µ (G ). The sizes ofthe networks in G,the n ,are non-random F i i but not constant. We achieve this by using the sequence of digits of π. First, since we have specified f and f , we can evaluate both µ (f ) and a b F a µ (f ) and draw them on the figure (as a red and blue dot respectively). Then, F b since n = (n ) is observed, we can compute Σ (n,f ) and Σ (n,f ) using i i≤N F a F b Theorem1,whichallowsustocomputetheconfidenceellipsearoundµ (f )and F a µ (f ) (in shaded red and blue respectively). Finally, since we know the limit F b distribution and covariance under the null, we can use Σ (n,f ) and µ (f ) to F a F a compute a p-value using Mahalanobis distance. In the following we consider the case where instead of testing against the null of a single graphon, we test against the null of a graphon class. 6 4 The general case: Flexible sampling design Here we expand our results to cases where the observed networks may be gener- atedfromdifferentgraphons. Indeed,inmanysettings,thesamplingmechanism may distort the structure of the underlying graphon; e.g., although the network connecting brain regions can be satisfactorily modeled by a blockmodel [22], the proportion of nodes of each block may be different in different experimental settings, so that each observation is drawn from a different blockmodel. In this practically important and conceptually challenging new setting, the proof techniques developed for Theorem 1 yield the following. Theorem 2. Fix a tuple of graphs F, a sequence of graphons f =(fi)i∈N and a sequence of integers n = (ni)i∈N such that 2maxF∈F|F| ≤ mini∈Nni. Let G = (G ,...,G ) be a graph sample such that for all i, G ∼ G (f ). Set i N i ni i µˆ (G)=N−1(cid:80) (X (G)/X (K )) and µ (f;N)=N−1(cid:80)N µ (f ). F G∈G F F G F∈F F i=1 F i Then, asymptotically in N, and for some matrix Σ∗(n,f), we have that F √ N(cid:0)µˆ (G)−µ (f;N)(cid:1)→Normal(cid:0)0,Σ∗(n,f)(cid:1). F F F Therefore,eveninthismuchmoreflexiblesetting,wecanrecoverthebarycen- terofthe µ (f ). However,the variance has now a more complex structure,and F i we refer to the Appendix for details. Following the intuition of our example of brain networks, and to make the usefulnessofTheorem2concrete,weintroducetheflexible stochastic blockmodel (FSBm). Definition 3 (FSBm and embedding shape). For a symmetric matrix B ∈ [0,1]K×K we call D(B) the set of all possible graphons with the same block structure as B; i.e., D(B)={f :∃(P ,...,P )∈P ([0,1])s.t.∀x∈P ,y ∈P ,f(x,y)=B }. 1 K K s t st For a tuple F of graphs, we call embedding shape the set µ (B)={µ (f) for f ∈D(B)}. F F For instance, with B ∈[0,1]2×2 and F ={ , }, then: µ (B)=(cid:8)(cid:0)π2B +2π(1−π)B +(1−π)2B , F 11 12 22 π3B3 +3π2(1−π)B B2 +3π(1−π)2B B2 +(1−π)3B3 (cid:1) 11 11 12 22 12 22 : π ∈[0,1]}. The most direct way of using the FSBm is to test for all the f being equal i to any blockmodel instance in a class; i.e assume that all G -s are drawn from i a graphon f and test for the null H : f ∈ D(B). This is achieved by using a 0 compositehypothesistest,andourresultsallowustoproduceconfidenceregions and p-values using the same tools as before. 7 Figure 3: Testing for a FSBm class. The sample G is such that: N =200, n is i the i-th digit of π plus 30, G is drawn from a graphon f (G ∼G (f)). With i i ni F = {C ,C ,C }, we estimate µˆ (G), and plot it as a black cross. Then, we 2 3 4 F draw in solid color the embedding shapes µ (B ) and µ (B ). In shaded color F a F b we draw the associated confidence regions; p-values can be obtained using the Mahalanobis distance associated with the closest point to µˆ (G) in µ (B ) and F F a µ (B ). F b WepresentsuchanexampleinFig.3. There,weobserveG =(G ,...,G ), 1 200 and consider two FSBm classes generated from B (in red) and B (in blue). a b Then, we assume that all networks in the sample are drawn from a graphon f and test for the null H : f ∈ D(B ) and the alternative H : f ∈ D(B ). We 0 a 1 b firstrepresentµ (G)asablackcross.UsingDefinition3,weplottheembedding F shapes µ (B ) and µ (B ) in solid red and blue respectively. The confidence F a F b regions (in shaded red and blue) are the union of the confidence ellipses at all points in µ (B ) and µ (B ). F a F b A more general use of Theorem 2 is to test for all graphs in a sample being drawn from elements of a FSBm class; i.e assume that the G -s are drawn from i the f -s and test for the null H :∀i∈[N],f ∈D(B) for some B. As before, we i 0 i faceacompositenull,andwemaycomputetheconfidenceregionandthep-value by scanning all possible sequences f. This, however, is clearly computationally intractable.Nonetheless,theformofthevarianceandthestructureoftheFSBm allows us to propose conservative confidence regions and p-values that can be efficiently computed (we fully describe the method in the Appendix). We present an example in Fig. 4. There we observe G =(G ,...,G ) and 1 103 considertwoFSBmclassesgeneratedbyB andB .Wefirstplotµ (G)asablack a b F cross. Then,usingDefinition3,weplottheconvexhulloftheembeddingshapes µ (B )andµ (B )(insolidredandbluerespectively)wherein—byTheorem2— F a F b µ (f;N)mustlie.Finally,weuseamethoddescribedintheAppendixtoproduce F the confidence region around each shape (in shaded color). 5 Application: Are creative brains different? WenowconsiderasampleofbrainnetworksG =(G ,...,G )[23].Thissample 1 114 wasproducedintwosteps:first,magneticresonanceimagesofeach113subjects’ 8 6e-01 4e-01 4e-01 4e-01 3e-01 3e-01 C3 2e-01 C4 1e-012e-01 C4 1e-012e-01 0e+00 3e-01 4e-01 5e-01 6e-01 7e-01 8e-01 9e-01 0e+00 3e-01 4e-01 5e-01 6e-01 7e-01 8e-01 9e-01 0e+00 1e-01 2e-01 3e-01 4e-01 5e-01 6e-01 C2 C2 C3 Figure 4: Testing for a full FSBm class. The sample G is such that: N = 103, n is the i-th digit of π plus 50, G is drawn from a graphon f (G ∼G (f )). i i i i ni i With F ={C ,C ,C }, we estimate µˆ (G), and plot it as a black cross. Then, 2 3 4 F we draw in solid color the convex hulls of the embedding shapes µ (B ) and F a µ (B ).Inshadedcolorwedrawtheassociatedconfidenceregions;approximate F b (and conservative) p-values can be obtained by determining the confidence level at which the observation ceases to be in the confidence region. brains were taken; then, the networks connecting each subjects’ brain regions were estimated using these images [1]. All networks in the sample contain 70 nodes. Furthermore, we have available a covariate C =(c ,...,c ) measuring 1 113 the subjects’ creativity. To study this network sample and use the covariate C, we introduce a direct extension of our results to compare two network samples: Corollary 1 (Two-sample test). Fix a tuple of subgraphs F and two network samplesG andG(cid:48) generatedrespectivelybythegraphonsf andf(cid:48) andthenetwork size sequences n and n(cid:48). Then, as both |G| and |G(cid:48)| tend to infinity, and if min(n,n(cid:48))≥2max |F|, we have that if f =f(cid:48), then F∈(cid:48)F (cid:112) |G||G(cid:48)| (cid:0)µˆ (G)−µˆ (G(cid:48))(cid:1)→Normal(cid:0)0,Σ (n,n(cid:48),f)(cid:1), (cid:112) F F F |G|+|G(cid:48)| (cid:112) and Σ (n,n(cid:48),f) may be estimated at rate |G|+|G(cid:48)| from the samples. F Unfortunately, estimating Σ (n,n(cid:48),f) requires counting subgraphs of order F 2max |F|, making the procedure computationally intensive. This compels F∈F us to work with F ⊂ { , , }. Furthermore, although our estimator of Σ (n,n(cid:48),f) is entrywise Normal, it is not Wishart as in the classical setting. F Therefore, we have no guarantee of the estimate being positive definite, and cannot use the Hotelling’s T-squared distribution to compute p-values. If the estimateispositivedefinite,werecommendignoringthevariationsinΣ (n,n(cid:48),f) F and using the χ2 distribution. If the estimate fails to be positive definite, we |F| recommend using only the marginals. BeforeanalyzingGusingourresults,wemakethefollowingtest:wesubsample uniformly at random and without replacement from G, yielding G and G such 1 2 thatG ∪G =G,anduseCorollary1totestforG andG beingdrawnfromthe 1 2 1 2 9 p-value Quantile (q) 0.5 0.126 0.110 0.115 0.4 0.077 0.051 0.050 0.3 0.062 0.042 0.040 0.2 0.014 0.011 0.012 0.1 0.046 0.047 0.061 Table1:TestingfordifferencesbetweenGq andGq.ForeachF ∈{ , , }and 1 2 q ∈{0.1,0.2,...,0.5}weproducethep-valueforthenullH :µ (Gq)=µ (Gq). 0 F 1 F 2 The p-values increase with q, except for q =0.1, in which case |Gq| and |Gq| are 1 2 too small for the test to be significant. same graphon f. Unless G presents characteristics that cannot be explained by ourresults,G andG shouldbeindistinguishable,andweexpecttoseep-values 1 2 that are uniformly distributed in [0,1]. We perform this experiment 100 times, and obtain a sample of p-values for which we fail to reject the null of a uniform distribution using the Kolmogorov– Smirnov test (D =0.09, p-value =0.3). For this test we use F ={ } because of the small sample (|G |+|G |=113) size and a very high level of correlation; 1 2 otherwise the estimated covariance matrix often failed to be positive definite. We now use C to split G in two samples. To do so, we choose to build a first subsampleG containingthelesscreative,andasecondsubsampleG containing 1 2 themorecreative.Moreprecisely,foraquantileq anddenotingQ theempirical C quantile function of C: Gq ={G ∈G : c ≤Q (q)} & Gq ={G ∈G : c >Q (1−q)}. 1 i i C 2 i i C Interestingly,forq =0.5 andq =0.4,we failto rejectthe nullthatthe networks inGq andGq comefromthesamegraphon(seeTable1forthep-values).However, 1 2 forq =0.3wecanrejectthenullofthesamegraphonatthe5%confidencelevel using or , but not . Thus,we observe that individuals with a very high level of creativity present significantly more and than those with a very low level of creativity. We now aim to understand whether the added and arise from a few edges completing partially present shapes or from fully new and . To do so, we firstobservethatifG∼G (f),thenG∼G (1−f),whereG¯ isthecomplement n n graph of G. Therefore, we may use our tests on G, which can be understood as estimating µ (1−f) instead of µ (f) to compare network samples. F F Then, using the Gq = {G : G ∈ Gq}, we can test whether there are sig- i i nificantly more fully absent subgraphs in Gq compared to Gq. There, we find 1 2 q we cannot reject this null; i.e., we cannot reject the null of the networks in G 1 q and G coming from the same graphon for q ≥0.3. Therefore, we conclude that 2 the added and in the highly creative arise from a few edges completing partially present and . 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.