Semidefinite tests for latent causal structures Aditya Kela,1 Kai von Prillwitz,2 Johan Åberg,1,∗ Rafael Chaves,3 and David Gross1,4 1 Institute for Theoretical Physics, University of Cologne, 50937 Cologne, Germany 2 Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, 26111 Oldenburg, Germany 3 International Institute of Physics, Federal University of Rio Grande do Norte, 59070-405 Natal, Brazil 4 Centre for Engineered Quantum Systems, School of Physics, The University of Sydney, Sydney, NSW 2006, Australia (Dated: January4,2017) TestingwhetheraprobabilitydistributioniscompatiblewithagivenBayesiannetworkisafunda- mentaltaskinthefieldofcausalinference,whereBayesiannetworksmodelcausalrelations. Herewe considertheclassofcausalstructureswhereallcorrelationsbetweenobservedquantitiesaresolely duetotheinfluencefromlatentvariables. Weshowthateachmodelofthistypeimposesacertain 7 signature on the observable covariance matrix in terms of a particular decomposition into positive 1 semidefinitecomponents. Thissignature,andthustheunderlyinghypotheticallatentstructure,can 0 2 betestedinacomputationallyefficientmannerviasemidefiniteprogramming. Thisstandsinstark contrastwiththealgebraicgeometrictoolsrequiredifthefullobservableprobabilitydistributionis n takenintoaccount. Thesemidefinitetestiscomparedwithtestsbasedonentropicinequalities. a J 3 ] L I. INTRODUCTION M In spite of the primal importance of discovering causal relations in science, the statistical analysis of empirical . t datahashistoricallyshiedawayfromcausality. Onlyreleativelyrecentlyhasarigoroustheoryofcausalityemerged a t (see, for instance, [1, 2]), showing that empirical data indeed can contain information about causation rather than s mere correlation. Since then, causal inference has quickly become influential. Examples range from applications [ to the inference of genetic [3] and social networks [4], to a better understanding of the role of causality within 1 quantumphysics[5–13]. v To formalize causal mechanisms it has become popular to use directed acyclic graphs (DAGs) where nodes 2 5 denote random variables and directed edges (arrows) account for their causal relations. Central problems within 6 thiscontextincludeinferenceormodelselection: ‘Givensamplesfromanumberofobservablevariables,whichDAG 0 should we associate with them?’, as well as hypothesis testing: ‘Can the observed data be explained in terms of an 0 assumedDAG?’Here,weconcentrateonthelatterproblemandproposeanovelsolutionbasedonthecovariances . 1 thatagivencausalstructuregivesriseto. Tounderstandtherelevanceandapplicabilityofthismethoditisuseful 0 tosummarizethedifficultiesthatwetypicallyfacewhenapproachingsuchproblems. 7 The most common method to infer the set of possible DAGs compatible with empirical observations is based 1 : on the Markov condition and the faithfulness assumption [1, 2]. Under these conditions, and in the case where v all variables composing a given DAG can be assumed to be empirically accessible, the conditional statistical in- i X dependencies implied by the graph contain all the information required to test for the compatibility of some data r with the causal structure. However, for a variety of practical and fundamental reasons, we do quite generally face a causal discovery in the presence of latent (hidden) variables, that is, variables that may play an important role in the causal model, but nonetheless cannot be accessed empirically. In this case we have to characterize the set of marginal probability distributions that a given DAG can give rise to. Unfortunately, as is widely recognized, genericcausalmodelswithlatentvariablesimposehighlynon-trivialconstraintsonthepossiblecorrelationscom- patible with it [14–25]. Although the marginal compatibility in principle can be completely characterized in terms of semi-algebraic sets [16], it appears that the resulting tests in practice are computationally intractable beyond a fewvariables[18,22]. Onepossibleapproachtodealwiththeapparentintractabilityistoconsiderrelaxationsoftheoriginalproblem, that is, to design tests that define incomplete lists of constraints (outer approximations) to the set of compatible distributions [17–20, 26–28]. For instance, this approach has previously been considered in [27–30], with tests ∗[email protected] 2 based on entropic information theoretic inequalities; an idea originally conceived to tackle foundational questions in quantum mechanics [31–37]. Here we consider a relaxation in a similar spirit, but based on covariances rather thanentropies. Beyond dealing with potential computational intractabilities, an additional benefit with a relaxation based on covariancesisthatitatmostinvolvesbipartitemarginals,anditseemsreasonabletoexpectthatthiswouldbeless data-intensivethanmethodsbasedonthefullmultivariatedistributionoftheobservables. L L L 1 2 3 O O O O O 1 2 3 4 5 FIG.1. BipartiteDAGs. Inthisinvestigationwefocusontheclassofcausalmodelswhereallcorrelationsamongtheobserv- ablesareduetoacollectionofindependentlatentvariables. ThissettingcanbedescribedintermsofDAGsthatarebipartite, where the latter means that all edges are directed from latent variables (L ,L ,L ) to the observables (O ,O ,O ,O ,O ), and 1 2 3 1 2 3 4 5 wheretherearenoedgeswithineachofthesesubsets. A. Mainassumptionsandresults We focus on a particular class of latent causal structures, where we assume that there are no direct causal influ- ences between the observables, but only from latent variables to observables (see figure 1). Hence, all correlations among the observables are due to the latent variables. This setting can be described by the class of DAGs where all edges are directed from latent vertices to observable vertices, but no edges within these two groups (see figure 1). In other words, we consider the case of DAGs that are bipartite, with the coloring ‘observable’ and ‘latent’. Alternatively, this can be described in terms of hypergraphs, where each independent latent cause is associated withahyperedgeconsistingoftheaffectedobservablevertices(seee.g.[38]). This class of graphs has previously been considered in the context of marginalization of Bayesian networks [26, 29,38]. Theymoreoverprovideexamplesofthedifficultiesthatarisewhencharacterizinglatentstructures[6,27,28, 39–41],wherestandardtechniquesbasedontheuseofconditionalindependenciesevencanyielderroneousresults (for a discussion, see e.g. [42]). This type of latent structures furthermore emerges in the context of Bell’s theorem [43], as well as in recent generalizations [6, 23, 24, 39–41, 44, 45], where they can be used to show that quantum correlationsbetweendistantobservers–thuswithoutdirectcausalinfluencesbetweenthem–areincompatiblewith ourmostbasicnotionsofcauseandeffect. Irrespective of the nature of the observables (categorical or continuous) we are free to assign vectors to each possible outcome of the observables. Our main result is to show that each bipartite DAG implies a particular de- compositionoftheresultingcovariancematrixintopositivesemidefinitecomponents. Hence,wecantestwhether the observed covariance matrix is compatible with a hypothetical bipartite DAG by checking whether it satisfies the corresponding positive semidefinite decomposition, and we will in the following somewhat colloquially refer tothisasthe‘semidefinitetest’. Thesemidefinitetestcanthusbephrasedasasemidefinitemembershipproblem, whichinturncanbesolvedviasemidefiniteprogramming. Thelatterisknowntobecomputationallyefficientfrom atheoreticalpointofview,andhasagoodtrackrecordconcerningalgorithmsthatareefficientalsoinpractice(see discussionsin[46]). B. Structureofthepaper InsectionIIwederiveageneraldecompositionofcovariancematrices,whichformsthebasisofoursemidefinite test. In section III we rephrase this general result to fit with the particular structure of observables and latent 3 variables that we employ, and in section IV we derive the main result, namely that every bipartite DAG implies a particular semidefinite decomposition of the observable covariance matrix. Section V focuses on the converse, namely that every covariance matrix that satisfies the decomposition of a given bipartite DAG can be realized by a corresponding causal model. Section VI relates the semidefinite decomposition to previous types of operator inequalitiesintroducedin[47]. Toobtainacovariancematrixwemayberequiredtoassignvectorstotheoutcomes of the random variables, and section VII discusses the dependence of the semidefinite test on this assignment. In section VIII we briefly discuss the fact that the compatibility with a given bipartite DAG is not affected if the observables are processed locally, and that the semidefinite test respects this basic property under suitable conditions. Section IX considers a specific class of distributions where it is possible to analytically determine the conditions for a semidefinite decomposition. This class of distribution does in section X serve as a testbed for comparisonswiththeabovementionedentropictests. WeconcludewithasummaryandoutlookinsectionXI. II. SEMIDEFINITEDECOMPOSITIONOFCOVARIANCEMATRICES Inthissectionwedevelopthebasicstructurethatformsthecoreofthesemidefinitetest. Inessenceitisobtained viaarepeatedapplicationofalawoftotalvarianceforcovariancematrices. For a vector-valued random variable Y, in a real or complex inner product space V, we define the covariance matrixofY as Cov(Y) := E(cid:16)(cid:0)Y−E(Y)(cid:1)(cid:0)Y−E(Y)(cid:1)†(cid:17) = E(YY†)−E(Y)E(Y)†, (1) where E(Y) denotes the expectation of Y and † denotes the transposition if the underlying vector space is real, and the Hermitian conjugation if the space is complex. One should note that E(Y)† = E(Y†). We also define the cross-correlationforapairofvector-valuedvariablesY(cid:48),Y (notnecessarilybelongingtothesamevectorspace) Cov(Y(cid:48),Y) := E(Y(cid:48)Y†)−E(Y(cid:48))E(Y)†, (2) whereCov(Y,Y) = Cov(Y). Forapairofrandomvariables X,Y wedenotetheexpectationofY conditionedon X as E(Y|X). Viatheconditionalexpectationwecanalsodefinetheconditionalcovariancematrix Cov(Y|X) :=E(cid:16)(cid:0)Y−E(Y|X)(cid:1)(cid:0)Y−E(Y|X)(cid:1)†(cid:12)(cid:12)X(cid:17) = E(YY†|X)−E(Y|X)E(Y|X)†. (3) (cid:12) Inasimilarmannerwecanalsoobtainaconditionalcross-correlationbetweentworandomvectorsY(cid:48),Y Cov(Y(cid:48),Y|X) :=E(cid:16)(cid:0)Y(cid:48)−E(Y(cid:48)|X)(cid:1)(cid:0)Y−E(Y|X)(cid:1)†(cid:12)(cid:12)X(cid:17) = E(Y(cid:48)Y†|X)−E(Y(cid:48)|X)E(Y|X)†. (4) (cid:12) Thestartingpointforourderivationsisthelawoftotalexpectation E(Y) = E(cid:0)E(Y|X)(cid:1), (5) where the ‘outer’ expectation corresponds to the averaging over the random variable E(Y|X). The law of total expectation can be iterated, such that for three random variablesY,X,Z, we have a law of total conditional expec- tation (cid:16) (cid:12) (cid:17) E(Y|Z) = E E(Y|X,Z)(cid:12)Z , (6) (cid:12) (cid:0) (cid:1) (cid:16) (cid:0) (cid:12) (cid:1)(cid:17) andthus E(Y) = E E(Y|Z) = E E E(Y|X,Z)(cid:12)Z . Fromthelawoftotalexpectation(5)onecanobtainacovariance-matrixversionofthelawoftotalvariance Cov(Y) =Cov(cid:0)E(Y|Z)(cid:1)+E(cid:0)Cov(Y|Z)(cid:1), (7) whichcanbeconfirmedbyexpandingthetwosidesoftheaboveequalityandapplying(5). ForthreerandomvariablesY,W,Z aconditionalversionofthelawoftotalcovariancereads (cid:16) (cid:12) (cid:17) (cid:16) (cid:12) (cid:17) Cov(Y|Z) =Cov E(Y|W,Z)(cid:12)Z +E Cov(Y|W,Z)(cid:12)Z , (8) (cid:12) (cid:12) whichcanbeobtainedbyexpandingtherighthandsideandapplyingthelawoftotalconditionalexpectation(6). Thefollowinglemmaisobtainedviaaniteratedapplicationofthelawoftotalcovariance(7)andthelawoftotal conditional covariance (8). One may note the similarities with the chain-rule for entropies (see e.g. chapter 2 in [48]). 4 Lemma 1. Let Y be a vector-valued random variable on a finite-dimensional real or complex inner product space V, let X ,...,X be random variables over the same probability space. Assuming that the underlying measure is such that all 1 N involvedconditionalexpectationsandcovariancesarewelldefined,then N Cov(Y) = R+ ∑ C , (9) n n=1 where RandC ,...,C arepositivesemidefiniteoperatorsonthespaceV,definedby 1 N (cid:0) (cid:1) C :=Cov E(Y|X ) , 1 1 Cn :=E(cid:16)Cov(cid:0)E(Y|X1,...,Xn)(cid:12)(cid:12)X1,...,Xn−1(cid:1)(cid:17), n =2,...,N, (10) (cid:0) (cid:1) R :=E Cov(Y|X ,...,X ) . 1 N Onemaynotethattheabovedecompositionisnotnecessarilyunique;wecouldpotentiallyobtainanewdecom- positionifthevariablesinthesequence X ,...,X arepermuted. 1 N Proof. The law of total covariance (7) for Z = X , combined with the law of total conditional covariance (8) for 1 Z := X ,W := X yields 1 2 Cov(Y) =Cov(cid:0)E(Y|X1)(cid:1)+E(cid:16)Cov(cid:0)E(Y|X2,X1)(cid:12)(cid:12)X1(cid:1)(cid:17)+E(cid:0)Cov(Y|X2,X1)(cid:1). (11) Supposethatforsome j ≥2itwouldbetruethat (cid:0) (cid:1) Cov(Y) =Cov E(Y|X ) 1 j (cid:18) (cid:16) (cid:12) (cid:17)(cid:19) + ∑ E Cov E(Y|X1,...,Xn)(cid:12)(cid:12)X1,...,Xn−1 (12) n=2 (cid:0) (cid:1) +E Cov(Y|X ,...,X ) . 1 j Thelawoftotalconditionalcovariance(8),withW := Xj+1 and Z := X1,...,Xj,gives (cid:16) (cid:12) (cid:17) (cid:16) (cid:12) (cid:17) Cov(Y|X1,...,Xj) =Cov E(Y|X1,...,Xj,Xj+1)(cid:12)(cid:12)X1,...,Xj +E Cov(Y|X1,...,Xj,Xj+1)(cid:12)(cid:12)X1,...,Xj . By inserting this expression into the last line of (12) one does again obtain (12) but with j substituted for j+1. By (11)wecanseethat(12)istruefor j =2. Thus,byinductionto j = N,andtheidentificationsin(10),weobtain(9). (cid:0) (cid:12) (cid:1) Note that Cov E(Y|X1,...,Xn−1,Xn)(cid:12)X1 = x1,...,Xn−1 = xn−1 is a positive semidefinite operator on V for each value of x1,...,xn−1. Hence, by averaging over these variables, and thus implementing the expectation that yields C , we do still have a positive semidefinite operator on V. The same observation applies to R = n (cid:0) (cid:1) E Cov(Y|X ,...,X ) . 1 N III. OBSERVABLEVS.LATENTVARIABLES,ANDFEATUREMAPS Hereweconsiderthedecompositiondevelopedintheprevioussectionforthemorespecificsettingofobservable andlatentvariables. WeconsideracollectionofobservablevariablesO ,...,O . ToeachofthesevariablesO weassociateamapping 1 M m Y(m), in some contexts referred to as a ‘feature map’ [49], into a finite-dimensional vector space V . We denote m the resulting vector-valued random variables by Y := Y(m)(O ), and for the sake of simplicity we will in the m m followingtendtoabusetheterminologyandrefertothevectorsY themselvesasfeaturemaps. Wealsodefinethe m joint random vector Y := ∑M Y on V := (cid:76)M V . (Hence, we can view Y as the concatenation of the vectors m=1 m m=1 m Y .) OneshouldnotethatwhileweregardtheobservablevariablesO asbeingpartofthesetupthatis‘given’,the m m feature maps Y(m) are part of the analysis, and we are free to assign these as we see fit. (Concerning the question ofhowthetestdependsonthischoice,seesectionVII.) Let P denote the projector onto the subspace V in V. We divide the total covariance matrix Cov(Y) into the m m cross-correlations between the separate observable quantities Cov(Y) = [Cov(Ym,Ym(cid:48))]mM,m(cid:48)=1. One can note that Cov(Ym,Ym(cid:48)) = PmCov(Y)Pm(cid:48). 5 Y 1 O 1 Y L 2 2 L Y = Y3 O3 4 O 2 L Y4 O4 L1 O5 3 Y 5 FIG.2. Observables,latentvariables,andfeaturemaps. ThemodelconsistsofacollectionofobservablevariablesO ,...,O 1 M andacollectionoflatentvariablesL1,...,LN. Viafeaturemaps,eachOm ismappedtoavectorYm inavectorspaceVm. Onthe vectorspaceV =(cid:76)mM=1Vm wedefinethejointrandomvectorY:=Y1+···+YM. For a collection of latent variables L ,...,L , we make the identifications X := L in Lemma 1. Similarly as 1 N j j for the covariance matrix we decompose the operators C and R into ‘block-matrices’ C = [Cm,m(cid:48)]M and n n n m,m(cid:48)=1 R = [Rm,m(cid:48)]mM,m(cid:48)=1,withCnm,m(cid:48) := PmCnPm(cid:48) and Rm,m(cid:48) := PmRPm(cid:48),wherewecanwrite C1m,m(cid:48) =Cov(cid:0)E(Ym|L1), E(Ym(cid:48)|L1)(cid:1), (cid:18) (cid:16) (cid:12) (cid:17)(cid:19) Cnm,m(cid:48) =E Cov E(Ym|L1,...,Ln), E(Ym(cid:48)|L1,...,Ln)(cid:12)(cid:12)L1,...,Ln−1 , (13) Rm,m(cid:48) =E(cid:0)Cov(Ym, Ym(cid:48)|L1,...,LN)(cid:1), for2≤ n ≤ N. Intermsoftheseblockswecanthusreformulate(9)as N Cov(Ym,Ym(cid:48)) = Rm,m(cid:48) + ∑ Cnm,m(cid:48). (14) n=1 One should keep in mind that Cm,m(cid:48) and Rm,m(cid:48) in the general case are matrices (rather than scalar numbers) for n eachsinglepair m,m(cid:48). IV. DECOMPOSITIONOFTHECOVARIANCEMATRIXFORBIPARTITEDAGS WedefineabipartiteDAGasafiniteDAGG = (V,E)withverticesV andedgesE,withabipartitionV =O∪L, O∩L = ∅ such that all edges in E are directed from the elements in L (the latent variables) to the elements in O (the observables). Since G is finite, we enumerate the elements of O as O ,...,O and the elements of L as 1 M L ,...,L . OnemaynotethatwegenerallywilloverloadthenotationandletO and L denotetheverticesinthe 1 N m n underlyingbipartiteDAG,aswellasdenotingtherandomvariablesassociatedwiththesevertices. For a vertex v in a directed graph G we let ch(v) denote the children of v, i.e., the set of vertices v(cid:48) for which there is an edge directed from v to v(cid:48). We let pa(v) denote the parents of v, i.e., the set of vertices v(cid:48) for which there is an edge directed from v(cid:48) to v. For bipartite DAGs an element in L can only have children in O (and have no parents), and an element in O can only have parents in L (and no children). As an example, for the bipartite DAG in figure 1 we have ch(L ) = {O ,O ,O }, ch(L ) = {O ,O }, and ch(L ) = {O ,O }, and pa(O ) = {L }, 1 1 2 3 2 2 5 3 3 5 1 1 pa(O ) = {L ,L },pa(O ) = {L ,L },pa(O ) = ∅,andpa(O ) = {L ,L }. 2 1 2 3 1 3 4 5 2 3 ForacausalmodeldefinedbyageneralDAGG = (V,E)theunderlyingprobabilitydistributioncanbedescribed viatheMarkovconditionwhereeachedgerepresentsadirectcausalinfluence,andthuseachvertex v canonlybe directly influenced by its parents pa(v), resulting in distributions of the form P = Πv∈VP(cid:0)v(cid:12)(cid:12)pa(v)(cid:1). Hence, for a bipartite DAG we get P = ΠmP(cid:0)Om(cid:12)(cid:12)pa(Om)(cid:1)ΠnP(Ln), and thus all the latent variables are independent, and the observablesareindependentwhenconditionedonthelatentvariables. 6 Asintheprevioussection,wemaptheobservablesO ,...,O tovectorsY ,...,Y invectorspacesV ,...,V . 1 M 1 M 1 M Foreach n wedefinetheprojector P(n) in V by P(n) := ∑ P . (15) m m∈ch(Ln) Hence,P(n) istheprojectorontoallsubspacesofV thatareassociatedwiththechildrench(L )ofthelatentvariable n L . (Intheabovesumweshouldstrictlyspeakingwrite∑ . However,inordertoavoidatoocumbersome n m:Om∈ch(Ln) notationwewillfromtimetotimetakethelibertyofwritingm ∈ch(L )ratherthanO ∈ch(L ),andn ∈pa(O ) n m n m ratherthan L ∈pa(O ).) n m FIG.3. Example: TriangularbipartiteDAG.ThecovariancematrixresultingfromtheobservablesinabipartiteDAGissubject to a decomposition where each latent variable gives rise to a positive semidefinite component, and where the support of that component is determined by the children of the corresponding latent variable. In the case of the ‘triangular’ scenario of the thebipartiteDAGtotheleft,eachofthethreelatentvariableshastwochildren. Thecovariancematrix,schematicallydepicted totheright, canconsequentlybedecomposedintothreepositivesemidefinitecomponents, eachwithbipartitesupports. This observationyieldsamethod(whichwerefertoasthe‘semidefinitetest’)tofalsifyagivenbipartiteDAGasanexplanationof anobservedcovariancematrix. Proposition1. ForabipartiteDAGwithlatentvariablesL ,...,L andobservablesO ,...,O withassignedfeaturemaps 1 N 1 M Y ,...,Y intofinite-dimensionalrealorcomplexinner-productspaces V ,...,V ,thecovariancematrixofY = ∑M Y 1 M 1 M m=1 m satisfies N Cov(Y) = R+ ∑ C , R ≥0, C ≥0, (16) n n n=1 where M P(n)C P(n) =C , R = ∑ P RP . (17) n n m m m=1 and where the projectors P(n) are as defined in (15) with respect to the given bipartite DAG, and where P is the projector m ontoV in(cid:76)M V . m m=1 m Onemaynotethatifthespanofthesupportsof{P(n)}N coversV,thenwecandistributetheblocks P RP of n=1 m m R and add them to the different C in such a way that the new operators still are positive semidefinite and satisfy n the support structure of the original C s. The exception is if there is some observable that has no parent (asO in n 4 figure1). Proof. Select an enumeration L ,...,L of the latent variables. By Lemma 1 we know that the covariance matrix 1 N Cov(Y) can be decomposed as in (9) with the positive semidefinite operators R and C as defined in (10). In the n following we will make use of the block-decomposition C = [Cm,m(cid:48)]M and R = [Rm,m(cid:48)]M with respect to n n m,m(cid:48)=1 m,m(cid:48)=1 thesubspaces V ,...,V asin(13). 1 M If L ∈/ pa(O ) thenitmeansthatY isindependentof L andthus n m m n E(Ym|L1,...,Ln) = E(Ym|L1,...,Ln−1). 7 Theanalogousstatementistrueif Ln ∈/ pa(Om(cid:48)). Bythisitfollowsthat (cid:16) (cid:12) (cid:17) Cov E(Ym|L1,...,Ln), E(Ym(cid:48)|L1,...,Ln)(cid:12)(cid:12)L1,...,Ln−1 =0, if Ln ∈/ pa(Om)∩pa(Om(cid:48)). (18) Note that Ln ∈ pa(Om)∩pa(Om(cid:48)) ⇔ Om,Om(cid:48) ∈ ch(Ln). By comparing (18) with (13) we can conclude that Cnm,m(cid:48) =0ifOm ∈/ ch(Ln) orOm(cid:48) ∈/ ch(Ln). Thedefinitionoftheprojector P(n) in(15)thusyields P(n)CnP(n) = Cn. Moreover,weknowfromLemma1thatC ≥0. n By construction, all the observables O ,...,O and thus also Y ,...,Y are independent when conditioned on 1 M 1 M thelatentvariables. Hence, Rm,m(cid:48) = E(cid:0)Cov(Ym, Ym(cid:48)|L1,...,LN)(cid:1) = δm,m(cid:48)E(cid:0)Cov(Ym|L1,...,LN)(cid:1), andthus R = ∑M P RP . m=1 m m One may note that although the operators C potentially may change if we generated them via a permutation n of the sequence of latent variables L ,...,L , the resulting projectors P(n) would not change. Hence, the support- 1 N structuredescribedby(16)and(17)isstableunderrearrangementsofthesequence. Deciding whether a given matrix is of the form (16) can be done via semi-definite programming (SDP). We end thissectionbydescribinganexplicitSDPformulation. Theoptimizationwillbeovermatrices Z whichcanbeinterpretedasthedirectsumofcandidatesfor R andthe C ’s. Moreprecisely,let n Z := V ⊕···⊕V ⊕W ⊕···⊕W , (19) 1 M 1 N W := (cid:77) V . (20) i m m∈ch(Lp) Let Z be a matrix on Z. According to the direct sum decomposition (19), the matrix Z is a block matrix with (M+N)×(M+N)blocks. Wethinkofthefist Mdiagonalblocksascarryingcandidatesfor R = P RP (which m m m completelydefines R,accordingto(17));whiletherear N diagonalblockscorrespondtocandidateC ’s. Notethat n the N rear sumands in (19) are dirct sums themselves. It therefore makes sense to use double indices to refer to spaces inside the W’s. Concretely, the SDP includes affine constraints on the blocks Z(M+n,m),(M+n,m(cid:48)). The first i partoftheindicesselectsthespaceW in(19). ThesecondpartreferstothespaceV withinW accordingto(20). n m n Weusetheconventionthat Z(M+n,m),(M+n,m(cid:48)) denotes0ifeither Vm or Vm(cid:48) doesnotoccurin Wn. Withthesedefinitions,thesemi-definiteprogramthatverifieswhetheracovariancematrixCov(Y)isoftheform (16)reads maximize 0 (21) M N subjectto δm,m(cid:48) ∑ Z(m),(m)+ ∑ Z(M+n,m),(M+n,m(cid:48)) =Cov(Y)m,m(cid:48), (m,m(cid:48) =1,...M) (22) m=1 n=1 Z ≥0, (23) where the optimization is over symmetric (hermitian) matrices Z on Z. Up to a trivial re-expression of the linear functions of Z in terms of trace inner products with suitable matrices F, the optimization problem above is in the i (dual)standardformofanSDP[46,Section3]. The left-hand side of (22) impliclity defines a linear map A from matrices on Z to matrices on V. Explicitly, A mapsoff-diagonalblocksto0andactsonblock-diagonalmatricesas ∑ ∑ A : R ⊕···⊕R ⊕C ⊕···⊕C (cid:55)→ R + C . 1 M 1 N m n m n TheconstraintsoftheSDPcanthusbewrittenslightlymoretransparentlyas A(Z) =Cov(Y), (24) Z ≥0 (25) Inthislanguage,thedualoftheaboveSDPis minimize tr(cid:0)XCov(Y)(cid:1) (26) subjectto A†(X) ≥0. (27) 8 Let X(cid:63) be the optimizer of (26). If tr(cid:0)X(cid:63)Cov(Y)(cid:1) < 0, then the original SDP is infeasible and therefore, Cov(Y) is not of the form (16). Indeed, by construction, such an X(cid:63) has a negative trace inner product with the covariance matrix,butapositivetraceinnerproduct tr(cid:0)A(Z)X(cid:1) =tr(cid:0)ZA†(X)(cid:1) ≥0 ∀Z ≥0 with all matrices A(Z),Z ≥ 0 that could potentially be feasible for the primal SDP (24). Thus, the dual SDP (26) can beused to finda witness or a dualcertificate X(cid:63) for theincompatibility of acovariance matrix witha presumed causalstructure. Thegeometryoftheinvolvedobjectsisshowninfigure4. Wewillrefertothisdualconstruction insectionXI,wherewesketchpossibilitiestobasestatisticalhypothesistestssuchwitnesses. Γ X* Cov(Y ) FIG.4. DualCertificates. Thesetofcovariancematricescompatiblewithacertaincausalstructureinthesenseofproposition1 formsaconvexconeΓ. TheconeisthefeasiblesetoftheSDP(21). IfagivencovariancematrixCov(Y)isnotanelementofthat cone, then there exists a hyperplane (depicted in red) seperating the two convex sets. A normal vector X(cid:63) for the seperating hyperplanecanbefoundusingthedualSDP(26). V. REALIZINGAGIVENDECOMPOSITION In the previous section we have shown that the observable covariance matrix associated with a given bipartite DAG always satisfies a particular semidefinite decomposition implied by that DAG. Here we show the converse, in the sense that if we have a positive semidefinite operator that satisfies the decomposition obtained from a particularbipartiteDAG,thenthereexistsacausalmodelassociatedwiththatDAGthathasthegivenoperatoras itsobservablecovariancematrix(seefigure5). Theproofisbasedontheobservationthateachpositivesemidefinite operator on a vector space can be interpreted as the covariance of a vector-valued random variable on that space (e.g. as the covariance of a multivariate normal distribution, or of variable over finite alphabets, as discussed in section VB). The essential idea is that we assign an independent random variable to each component in the decomposition,andtaketheseasthelatentvariables,andthatthesupportstructureofthecomponentsfurthermore determinesthechildrenofthelatentvariables. A. Realizationofdecompositions LetO be a finite set, and let {Ω }N be a collection of subsets ofO. The collection {Ω }N defines a bipartite n n=1 n n=1 DAGwithOasobservablenodes,andasetoflatentnodesL ,...,L ,withtheedgesassignedbytheidentification 1 N ch(L ) := Ω for n =1,...,N. InthefollowingwedenotethisbipartiteDAGby B({Ω }N ). n n n n=1 Proposition2. LetV ,...,V befinite-dimensionalrealorcomplexinner-productspaces. Foranumber N let{Ω }N be 1 M n n=1 acollectionofsubsetsΩ ⊂ {1,...,M}. SupposethatQisapositivesemidefiniteoperatoronthespaceV = V ⊕···⊕V , n 1 M 9 + + + FIG. 5. A positive semidefinite operator on a set of selected orthogonal subspaces can be regarded as the covariance ma- trix of a corresponding collection of vector-valued variables. If this operator separates into positive semidefinite components (as schematically depicted to the left), then the support structures of these components define a bipartite DAG (on the right). The components in the decomposition can be interpreted as the covariance matrices of independent vector-valued latent vari- ables. Moreover,thecollectionofsubspacesonwhichsuchanoperatorhassupportdeterminestheobservablechildrenofthe correspondinglatentvariable. Eachobservablevariablecanbeconstructedbyaddingthecomponentscollectedfromitsparents. andthatitcanbewritten N Q = R+ ∑ C , P(n)C P(n) =C , R ≥0, C ≥0, (28) n n n n n=1 for M P(n) = ∑ P , R = ∑ P RP , (29) m m m m∈Ωn m=1 withP beingtheprojectorsontothesubspacesV . ThenthereexistsacausalmodelforthebipartiteDAGB({Ω }N )with m m n n=1 vector-valuedvariablesY ,...,Y inV ,...,V suchthatY =Y +···+Y satisfies 1 M 1 M 1 M Cov(Y) = Q. (30) Proof. Let us define the set Ω := ∪N Ω and its complement Ωc := {1,...,M}\Ω. By construction, Ωc is the set n=1 n ofobservablenodesinthebipartiteDAG B({Ω }N )thathavenoparents(likevertex4infigure1)andthuseach n n=1 elementinΩhasatleastoneparent. BythedefinitionofP(n)in(29)itfollowsthat∑m(cid:48)∈Ωc Pm(cid:48)Cn =Cn∑m(cid:48)∈Ωc Pm(cid:48) = 0. In other words, the operators C have no support on the subspaces belonging to parentless observable nodes. n Let us now turn to the operator R and its block diagonal decomposition R = ∑M R with R := P RP . We m=1 m m m m can write R = ∑m(cid:48)∈Ωc Rm(cid:48) +∑m∈ΩRm. Consequently, Q can be decomposed in one operator ∑m∈ΩRm+∑nN=1Cn onthesubspace(cid:76)m∈ΩVm,andacollectionofblocks{Rm(cid:48)}m(cid:48)∈Ωc onthecorrespondingsubspacesVm(cid:48) form(cid:48) ∈ Ωc. Since Rm(cid:48) ispositivesemidefinite,itcanbeinterpretedasthecovariancematrixofsomerandomvectorYm(cid:48) in Vm(cid:48). In the following we assume that we have made such an assignment for all m(cid:48) ∈ Ωc. We also assume that these randomvectorsareindependent. Each R form ∈ ΩhasitssupportinsidethesupportofatleastoneC . Hence,wecan‘distribute’theoperators m n R for m ∈ Ω byformingnewpositivesemidefiniteoperatorsC˜ ≥0suchthat m n N N ∑ R + ∑ C = ∑ C˜ =: Q˜, (31) m n n m∈Ω n=1 n=1 whereonemaynotethat Q = ∑m(cid:48)∈Ωc Rm(cid:48) +Q(cid:101). In the following we shall assign observable and latent random variables to the vertices of the bipartite DAG B({Ω }N ). For each n ∈ {1,...,N} and each m ∈ Ω, let Ln be a vector space that is isomorphic to V , and n n=1 m m let φn : Ln → V be an arbitrary isomorphism. (We assume that these isomorphisms preserve the inner-product m m m structure, such that φn maps orthonormal bases of Ln to orthonormal bases of V .) We regard the spaces in the m m m collection {Lnm}m∈Ω,n=1,...,N as being orthogonal to each other. Define Ln := (cid:76)m∈ΩLnm, and the corresponding isomorphism φn := ∑m∈Ωφmn. Sinceeach C˜n ispositivesemidefinite,itcanbeinterpretedasthecovariancematrix ofavector-valuedrandomvariableon (cid:76)m∈ΩVm. Consequently,wecanalsofindavector-valuedrandomvariable L on Ln suchthat n C˜ =Cov(φnL ) = φnCov(L )φn†. (32) n n n 10 WeassumethattherandomvariablesL1,...,LN areindependentofeachother,andalsoindependentof{Ym(cid:48)}m(cid:48)∈Ωc. The variables L ,...,L serve as the latent variables corresponding to the latent nodes in the bipartite DAG 1 N B({Ωn}nN=1). In the following we shall construct a collection of vector-valued variables {Ym}m∈Ω as determin- istic functions of the latent variables L ,...,L , in such a way that these functions correspond to the arrows in 1 N B({Ω }N ),thusguaranteeingavalidcausalmodelassociatedwiththisbipartiteDAG. n n=1 Let us decompose the vector L into its projections Ln onto the subspaces Ln. For each m ∈ Ω = ch(L ), the n m m n n vector Ln isassociatedtotheobservablenodeO . (OnecanimagineittobetransferredtonodeO .) Equivalently m m m wecansaythateachobservablenodem ∈ ΩreceivesthevectorLn fromitsancestorn ∈pa(O ). Ontheobservable m m node m ∈ Ω weconstructanewvectorY byaddingallthevectors‘senttoit’fromitsparents m N Y := ∑ φnLn = ∑ φnL = ∑ φnL , (33) m m m m n m n n∈pa(Om) n∈pa(Om) n=1 where the last equality follows since P C P = 0 if O ∈/ ch(L ), or equivalently if L ∈/ pa(O ), and thus m n m m n n m φmnLn =0if n ∈/ pa(Om). Thecollection {Ym(cid:48)}m(cid:48)∈Ωc ∪{Ym}m∈Ω wetakeastheobservablevariables,andwedefine Y := ∑m(cid:48)∈ΩcYm(cid:48) +∑m∈ΩYm = ∑m(cid:48)∈ΩcYm(cid:48) +∑nN=1φnLn. DuetothefactthatallYm(cid:48) for m(cid:48) ∈ Ωc areindependent,andalsoindependentofall Ln,weget (cid:16) N N (cid:17) Cov(Y) = ∑ Cov(Ym(cid:48))+Cov ∑ φnLn, ∑ φn(cid:48)Ln(cid:48) m(cid:48)∈Ωc n=1 n(cid:48)=1 N = ∑ Rm(cid:48) + ∑ φnCov(Ln,Ln(cid:48))φn(cid:48)† m(cid:48)∈Ωc n,n(cid:48)=1 [L ,...,L areindependent] 1 N N = ∑ Rm(cid:48) + ∑ φnCov(Ln)φn† m(cid:48)∈Ωc n=1 [By(32)] N = ∑ Rm(cid:48) + ∑ C˜n m(cid:48)∈Ωc n=1 [By(31)] =Q. B. Positivesemidefiniteoperatorsascovariancematricesofvector-valuedrandomvariablesoverfinitealphabets The material in the previous section presumes the existence of realizations of positive semidefinite operators as thecovarianceofsomevector-valuedvariable,withoutmakinganyrestrictiononthernature. Asmentionedabove, eachpositivesemi-definiteoperator(overafinite-dimensionalrealorcomplexvectorspace)canberegardedasthe covariance of a multivariate normal distribution. However, suppose that we would require that the variable only cantakeafinitenumberofoutcomes. Herewebrieflydiscusstheconditionsforsuchrealizations,andprovidean explicitconstruction(intheproofofLemma3). For a (possibly vector-valued) random variable over a finite alphabet, we say that that the supported alphabet sizeis D,ifthereareprecisely D outcomesthatoccurwithanon-zeroprobability. Lemma 2. If a random variableY on a finite-dimensional real or complex inner-product space has a supported alphabet size (cid:0) (cid:1) D,thenrank Cov(Y) ≤ D−1. Proof. We first note that Cov(Y) = ∑Dj=1pjyjy†j −∑Dj=1pjyj∑Dj(cid:48)=1pj(cid:48)y†j(cid:48). Since ∑Dj=1pjyj very manifestly is a linear combinationofy1,...,yD,itfollowsthattherangeof∑Dj=1pjyj∑Dj(cid:48)=1pj(cid:48)y†j(cid:48) isasubsetoftherangeof∑Dj=1pjyjy†j,and thusrank(cid:0)Cov(Y)(cid:1) ≤rank(cid:0)∑D p y y†(cid:1) ≤ D. However,inthefollowingweshallshowthatthestrongerinequality j=1 j j j

