ebook img

Generalized hyper Markov laws for directed acyclic graphs PDF

0.44 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Generalized hyper Markov laws for directed acyclic graphs

Generalized Hyper Markov Laws For Directed Acyclic Graphs Emanuel Ben-David Bala Rajaratnam 2 1 StanfordUniversity StanfordUniversity 0 2 n a Abstract J 1 Recent theoretical work ([15, 9]) in graphical models have introduced classes of flexible multi-parameter Wishart distributions for high dimensional Bayesian infer- ] T ence. A parallel analysis for DAGs or Bayesian networks, arguably one of the most S widelyusedclassesofgraphicalmodels,ishowevernotavailable. ForGaussianDAG h. models the parameter of interest is the Cholesky space of lower triangular matrices t withfixedzeroscorresponding tothemissingarrowsofadirectedacyclicgraph . In a G m thispaper weconstruct afamilyofDAGWishart distributions thatformarichconju- [ gate family of priors with multiple shape parameters for Gaussian DAG models, and proceedtoundertakeatheoreticalanalysisofthisclasswiththegoalofposteriorinfer- 3 v ence. Wefirst prove that our family of DAG Wishart distributions satisfies the strong 1 directed hyper Markov property. Operating on the Cholesky space we derive closed 7 form expressions for normalizing constants, posterior moments, Laplace transforms 3 4 and posterior modes, and demonstrate the use of the DAG Wishart class in posterior 9. analysis. Wethenconsider submanifolds oftheconeofpositivedefinitematrices that 0 correspond to covariance and concentration matrices of Gaussian DAG models. In 1 generalthesespacesarecurvedmanifoldsandthustheDAGWishartshavenodensity 1 : w.r.t Lebesgue measure. Hence tools for posterior inference on these spaces are not v immediately available. We tackle the problem in three parts, with each part building i X ontheprevious one,untilacompletesolution isavailable forALLDAGs. r In Part I we note that when is perfect, associated covariance and concentration a G spaces are open cones and hence weproceed toderive theinduced DAGWishart dis- tribution on these cones. A comprehensive analysis is however only possible for the classofperfectDAGs. InPartIIweformallyestablish thatforanynon-perfect DAG, covariance and concentration spaces have Lebesgue measure zero in any Euclidean vector space containing them, and hence the DAG Wishart family introduced above does not have a density w.r.t. Lebesgue measure for non-perfect. We therefore G propose a unified approach for all Gaussian DAG models by appealing to the theory ofHausdorff measuretheory. Firstwederivethefunctional formoftheDAGWishart densityw.r.tHausdorffmeasure. Wedemonstratehoweverthatevenforthesimplestof graphs, the Hausdorff density isnotamenable toposterior analysis. InpartIIIwede- finenewspacesthatareprojections ofcovarianceandconcentration DAGspacesonto Euclidean space that yield natural isomorphisms. We exploit this bijection to derive the densities of DAG Wishart and DAG inverse Wishart distributions w.r.t Lebesgue 1 measure, and thus avoid recourse to Hausdorff densities. We demonstrate that this third approach is extremely beneficial and is readily amenable for high dimensional posterior analysis. We derive hyper Markov properties and posterior moments for DAGWishart andinverseWishart distributions corresponding toarbitrary DAGs,and notjustfortheclassofperfectDAGs. 1 Introduction Graphical models yield compact representations of the joint probability distribution of a multivariate random vector, and they have therefore proved to be very useful in discov- ering structure, especially in high-dimensional data. These models use nodes of graphs to represent components of a random vector, and edges between these nodes, to capture the relationships between these variables. In general these graphs can have three types of edges: directed, undirected or bi-directed. Undirected graphs are often used to represent association through conditional independences whereas bi-directed graphs are often used torepresentmarginalindependences. Directedacyclicgraphicalmodels(DAGs),orsome- times referred to as Bayesian networks1, are often used to represent causal relationships among random variables. Graphical Markov models corresponding to DAGs have useful statistical properties, especially in high dimensional settings. The joint probability den- sity function (pdf) of a DAG model factorizes according to the graph into the product of the conditional pdfs for each variable given its parents, and can thus lead to a substantial reduction in the dimensionalityof the parameter space. Directed acyclic graphical models havealsofoundwidespreaduseinthebiomedicalsciences,socialsciencesandincomputer science. Estimating the covariance or inverse covariance corresponding to such DAGs is thereforean importantarea ofresearch, especiallyin highdimensionalsettings. FromatheoreticalstatisticsperspectiveDAGmodelscorrespondtocurvedexponential families, and are distinctly different from the standard undirected or concentration graph models, which correspond to natural exponential families. Unlike in the natural exponen- tial family setup, a default prior, as given by the Diaconis-Yvilsaker (DY) framework, is notavailableforageneralDAG(see[9]forathoroughdiscussion). Aconnectionbetween DAGsandundirectedorconcentrationgraphscanhoweverbeusedtoderiveadefaultprior forasubclasscalledperfectgraphs. Indeedifthegraphis“perfect”2,suchDAGsaresaidto beMarkovequivalenttodecomposableconcentrationgraphmodels,i.e.,theybothcapture the same set of conditional independences. This connection can be exploited in the sense thatinferentialtoolsforperfectDAGscan beborrowedfromthedecomposableconcentra- tion graph setting. More specifically, in their pioneering work, Dawid and Lauritzen [7] developedtheDYpriorforthisclassofmodelswhenthegraphisdecomposable. Inpartic- ular,theyintroducedthehyper-inverseWishartastheDYconjugatepriorforconcentration graphmodels. Thisworkhasbeenextendedbytherecentmethodologicalcontributionsby LetacandMassam[15]whodeveloparichfamilyofmulti-parameterconjugatepriorsthat subsumes the DY class. Both the hyper inverse Wishart priors and the “Letac-Massam” 1SometimesalsocalledrecursiveMarkovmodels. 2Aconceptthatwillbeformallydefinedlater. 2 priors have attractive properties which enable Bayesian inference, with the latter allow- ing multiple shape parameters and hence suitable in high-dimensional settings. Bayesian procedures corresponding to these Letac-Massam priors have been derived in a decision theoretic framework in therecent work of Rajaratnam et al. [19]. A parallel theory for the classofGaussiancovariancegraphmodels,graphicalmodelswhichencodemarginalinde- pendencies, has been recently developed by Khare and Rajaratnam ([9], [10]). With a few exceptions(see[21]forinstance),alloftheabovemethodologicalcontributionsareforde- composable graph models. Furthermore, the results for undirected or concentration graph models cannot be carried over to DAGs when is no longer perfect. This is because the G MarkovequivalencepropertybetweenDAGsandundirectedgraphicalmodelsbreaksdown in the sensethat the DAG model,and the undirected graphical model, capture different set ofconditionalindependencieswhen G isnotperfect (orequivalentlynon-decomposable). The literatureon graphical modelsin general, and DAGs in particular, is extensiveand thus we do not undertake a literature review of the work in this area. We shall however briefly reviewwork that isdirectly relevanttothispaper when notationand terminologyis introduced. The principal objective of this paper is to develop a framework for flexible high di- mensional Bayesian inference for Gaussian DAG models, i.e., for the class of Gaussian distributionsthat havethedirected Markovproperty. Thepreviouslyestablishedclasses of generalized multi-parameterWishartdistributionsdevelopedbyLetacandMassam[15]in theconcentrationgraph setting,and byKhare and Rajaratnam ([9]in thecovariancegraph model setting, are not directly applicable to the general DAG setting, though they provide usefulinsights,as willbedemonstratedinthispaper. For Gaussian DAG models the parameter of interest, denoted by Θ , is the space of G lowertriangularmatriceswithfixedzeroscorrespondingtothemissingarrowsofadirected acyclic graph . We introduce a rich class of generalized multi-parameter DAG Wishart G distributionsonΘ wasproposed,andstudiedwiththeexplicitgoalofBayesianinference G in high dimensional settings. This family extends the classical Wishart distribution in the sense as the latter becomes a special case of our family of DAG Wishart distributions. A comprehensive analysis of this family of generalized Wishart distributions was possible for arbitrary DAGs when working on Θ . Indeed analytic expressions for posterior mo- G ments, Laplace transforms, posterior modes and hyper Markov properties are established. The hyper Markov property in turn enables the explicit computation of expected values and Laplace transforms in the Cholesky parameterization. Unlike their concentration and covariance graph counterparts, we show that sampling from our Wishart distribution for an arbitrary DAG model does not require recourse to MCMC. Once more we note that for concentration graph models, sampling from the posterior can be done in closed form only for decomposablemodels. Forcovariancegraph model, samplingfrom the posteriorwith- out recourse to MCMC can be done only for homogeneous graphs. We also show that our DAGWishartdistributionscan bederivedinanequivalentwayusingthegeneral approach in [7] under the so-called global independence assumption. The latter approach does not howeverimmediatelygiveameanstospecifyhyper-parametersthatwillcorrespondtoour DAG Wishartdistributions. Wealsoprovideadiscussionofthefact that ourDAG Wishart distributions are in general different from the Letac-Massam priors. However, when the 3 underlying graph is homogeneous, the Letac-Massam W priors are a special case of P G our distributions. We also provide a discussion of the fact thaGt our DAG Wishart distribu- tions are in general different from the priors introduced in Khare and Rajaratnam([9] for covariancegraph models. AfterintroducingourclassofflexibleHyperMarkovlawsweexplicitlytackletheques- tionofderivingtoolsforBayesianinferenceforGaussianDAGmodels. Inordertodothis we then consider submanifoldsof the cone of positivedefinite matrices that correspond to covariance and concentration matrices of Gaussian DAG models. In general these spaces are curved manifolds and thus the DAG Wisharts have no density w.r.tLebesgue measure. Hencetoolsforposteriorinferenceonthesespacesarenotimmediatelyavailable. Wepro- ceedtotacklethisprobleminthreeparts,witheachpartbuildingonthepreviousone,until acompletesolutionis availableforALLDAGs. In Part I we derive DAG Wishart densities for perfect DAGs. It was noted above that the space of covariance and concentration matrices corresponding to Gaussian DAGs are ingeneral curvedsub-manifoldsofEuclideanspace. When isperfect however,theseare G open cones and the induced DAG Wishart density on these cones can be derived. We then proceed to derive Laplace transforms and expected values in this setting. Computation of expectedvaluesofcovarianceandconcentrationmatricescorrespondingtoDAGmodelsis nolongerpossiblewiththisapproach,exceptwhen isperfect,asthespaceonwhichthese G matrices live are in general curved manifolds. We note that a comprehensive framework for Bayesian inference that goes beyond “perfect” graphs is however critical for practical applications. The induced Wishart and inverse Wishart distributions on concentration and covariance spaces for general DAGs require more sophisticated tools and is the subject of Parts IIand III. In Parts II we undertake the endeavor of deriving the induced Wishart and inverse WishartdensitiesoncovarianceandconcentrationspacescorrespondingtoarbitraryDAGs. We first establishthat for any non-perfect DAG, covarianceand concentration spaces have Lebesgue measure zero in any Euclidean vector space containing it, and hence the DAG P Wishartfamilyπ introducedinourpreviousworkdoesnothaveadensityw.r.t. Lebesgue UG,α measure. We propose to overcome this in two novel ways. First we derive the functional form of the density of πP w.r.t Hausdorff measure by developing the appropriate tools UG,α which allow us to work on concentration spaces corresponding to DAGs. This approach entailsworkingwithcurvedmanifoldsandHausdorffmeasuresonarbitrary metricspaces. Wethenproceedtodemonstratethatevenforthesimplestofgraphs,theHausdorffdensity isnot amenableto posterioranalysis. In Part III we define new spaces that are projections of covariance and concentration DAGspacesontoEuclideanspacethatyieldnaturalisomorphisms. Inparticular,thesenew spaces termed as the space of “incomplete” covariance and concentration spaces corre- spond to functionally independent elements of the covariance and concentration matrix of GaussianDAGmodels. Givenincompletematricesfromthesespaces,itisalwayspossible to “complete” them in polynomial time, so that the completion corresponds to covariance and concentration matrices of Gaussian DAG models. We exploit these bijections to de- rive the densities of DAG Wishart and DAG inverse Wishart distributions w.r.t. Lebesgue measure and thus avoid recourse to Hausdorff densities. We demonstrate that the latter 4 approach isnovel,extremelybeneficial, and isreadilyamenableforhighdimensionalpos- terioranalysis. We then proceed to establishhyperMarkov properties and deriveposterior moments for DAG Wishart and inverse Wishart distributions corresponding to arbitrary DAG models and not just for the class of perfect DAGs. In doing so we succeed in devel- oping a unified framework for all Gaussian DAG models - that is suitable for both perfect andnon-perfect DAGs. Ourapproach alsoallowsusto formallydemonstratethattheclass ofinverseDAGWishartsintroducedinthispapernaturallycontainsanimportantsub-class ofinverseWishartdistributionsforthatwasintroducedbyKhareandRajaratnam[9]inthe contextofGaussiancovariancegraph models. Table1summarizesthepropertiesofthevariousmulit-parameterWishartdistributions that havebeen recently introducedto themathematicalstatisticsliteratureforusein Gaus- siangraphicalmodels. ItisclearfromthistablethattheWishartdistributionsintroducedin thispaperareapplicableinallgenerality-andnotjustwhenthegraphisperfect, orequiv- alently, decomposable, and in this sense very powerful. The ability to specify the induced Wishartdistributionsand posteriormomentsforarbitrary graphsisespecially useful. DAG UG COVG ALL P H ND D H ND D H Conjugacy property " " " % " " % " " Normalizingconstant in closedform " " " % " " % % " Posteriormoments in closedfrom " " " % " " % % " Posteriormode in closedfrom " " " % " " % % " HyperMarkov properties " " " % " " % % " Tractablesamplingfrom thedistribution " " " " " " % " " Table 1: Properties of Wishart distributions for the three classes of Gaussian graphical models. Abbreviations. ND: Non-decomposable,D/P: Decomposable/Perfect, H:Homogeneous. This paper is structured as follows. Section 2 introduces required preliminaries, no- tation and Section 3 formally defines Gaussian DAG models and parameterization corre- spondingtoGaussianDAGmodels. Theintroduction,preliminariesand parameterizations for DAG models are discussed in some detail to make the paper self-contained and for es- tablishingconsistent notation. These sections can be skipped by a reader familiar with the subject matter. In Section 5, the class of generalized Wishart distributions for Gaussian DAG models are formally constructed. Conjugacy to the class of Gaussian DAG mod- els and necessary and sufficient conditions for integrability are established. Furthermore, 5 comparisonto conjugatepriors in concentration graph and covariancegraph modelsis un- dertaken. Section6establisheshyperMarkovpropertiesforourfamilyofpriors. InSection 7 we evaluateLaplace transforms, posteriormoments and posteriormodes for ourclass of distributionscorrespondingtotheCholeskyparameterizationwhenGisanarbitraryDAG. AnalysisofourDAGWishartdistributionsoncorrespondingcovarianceandconcentra- tionspaces witha view to developingtoolsforhigh dimensionalBayesian inference using the class of DAG Wishart distributions in three Parts. Part I considers the class of perfect DAGsandderivesposteriorquantities. PartI(Section8)derivestheinducedDAGWishart densities on covariance and concentration spaces for perfect DAGs. We the proceed to showthattheexpectedvaluesofthecovarianceandconcentrationmatrixcanbecomputed easilyforperfect DAGs. Part II (Section 9)introducesderivesthedensityofourpriors w.r.t. Hausdorffmeasure when is arbitrary, i.e., no longer perfect. Part III (Sections 10 and Section 11) defines G functionally independent projection of spaces of concentration and covariance matrices that correspond to arbitrary DAG models and proceed to derive the induced measure of our class of Wishart distributions on these spaces. Consequently we proceed to establish hyperMarkovpropertiesandderivetheexpectedvalueofthecovarianceandconcentration matrix for DAG models. We also demonstrate that when is no longer perfect the class G of DAG Wishart distributions do not belong to the class of general exponential families. Section 12concludes bysummarizingtheresultsinthepaper. 2 Preliminaries In this section, we give the necessary notation, background and preliminaries required in subsequentsections. 2.1 Graph theoretic notation and terminology In this subsection, we introducesomenecessary graph theoreticnotation and terminology. Ournotationpresented herecloselyfollowsthenotationestablishedin[13], [6]. Agraph isapairofobjects(V,E),whereV isafinitesetrepresentingthevertices(or G nodes)of ;and E isasubsetofV V consistingoftheedges. Anedge(i, j) E iscalled G × ∈ directed if (j,i) < E. We write this as i j and say that i is a parent of j, and that j is −→ a child of i. The set of parents of a vertex j is denoted by pa(j), and the set of children of a vertex i is denoted by ch(i). The family of j, denoted by fa(j), is fa(j) = pa(j) j . ∪ { } Two distinct vertices i and j are said to be adjacent if (i, j) or (j,i) are in E, i.e., if there is any type of edge, directed or undirected, between these two vertices. We write i j if ∼ thereisan undirectededge3 betweenverticesi and jand saythati isaneighborof j, j isa neighborofi,ori and jare neighbors. Theset ofneighborsofiis denotedbyne(i). More generally, for A V we define pa(A), ch(A), ne(A) and bd(A) as the collection ⊂ 3Notethatinenumeratingthenumberofedgesofagraph,eachundirectededge,thoughconsistingoftwo pairs,countsonlyonce. 6 of the parents, children, neighbors, and boundary respectively, of the members of A, but excludingany vertexin A: pa(A) = pa(i) A, ch(A) = ch(i) A, ne(A) = ne(i) A, i A i A i A ∪∈ \ ∪∈ \ ∪∈ \ An undirected graph, “UG”, is a graph with all of its edges undirected, whereas a di- rected graph,“DG”, isagraph withallofitsedgesdirected. Weshallusethesymbol to G denoteageneral graph,andmakeclearwithinthecontextinwhichitisused,whether is G undirectedordirected. We say that the graph = (V ,E ) is a subgraph of = (V,E), denoted by , ′ ′ ′ ′ G G G ⊂ G if V V and E E. In addition, if and E = V V E, we say that is an ′ ′ ′ ′ ′ ′ ′ ⊂ ⊂ G ⊂ G × ∩ G induced subgraph of . We shall consider only induced subgraphs in what follows. For a G subset A V, the induced subgraph = (A,A A E) is said to be the graph induced A ⊂ G × ∩ by A. A graph is called complete if every pair of vertices are adjacent. A clique of G G is an induced complete subgraph of that is not a subset of any other induced complete G subgraphsof . Moresimply,asubset A V iscalledacliqueiftheinducedsubgraph A G ⊂ G isacliqueof . G A path of length k 1 from vertex i to j is a finite sequence of distinct vertices v = 0 ≥ i,...,v = j in V and edges (v ,v ),...,(v ,v ) E. We say that the path is directed if k 0 1 k 1 k − ∈ at least one of the edges is directed. We say i leads to j, denoted by i j, if there is a 7−→ directed path from i to j. A graph = (V,E) is called connected if for any pair of distinct G verticesi, j V thereexistsapathbetweenthem. Ann-cyclein isapathoflengthnwith ∈ G the additional requirement that the end points are identical. A directed n-cycle is defined accordingly. A graph is acyclic if it does not have any cycles. An acyclic directed graph, denotedby DAG (orADG),is adirected graphwith nocyclesoflengthgreater than1. Theundirectedversionofagraph = (V,E),denotedby u = (V,Eu),istheundirected G G graphobtainedbyreplacingallthedirectededgesof byundirectedones. Animmorality G in a directed graph is an induced subgraph of the from i k j. Moralizing an G −→ ←− immorality entails adding an undirected edge between the pair of parents that have the same children. Then the moral graph of , denoted by m = (V,Em), is the undirected G G graph obtained by first moralizing each immorality of and then making the undirected G version of the resulting graph. Naturally there are DAGs which have no immoralities and thisleads tothefollowingdefinition. Definition 2.1. A DAG is said to be “perfect” if it has no immoralities; i.e., the parents G of all vertices are adjacent, or equivalently if the set of parents of each vertex induces a completesubgraphof . G Given a directed acyclic graph (DAG), the set of ancestors of a vertex j, denoted by an(j), is the set of those vertices i such that i j. Similarly, the set of descendants of 7−→ a vertex i, denoted by de(i), is the set of those vertices j such that i j. The set of 7−→ non-descendants of i is nd(i) = V (de(i) i ). A set A V is called ancestral when A \ ∪{ } ⊂ contains the parents of its members. The smallest ancestral set containing the subset B of V is denotedby An(B). 7 2.2 Decomposable graphs Anundirectedgraph issaidtobedecomposableifnoinducedsubgraphcontainsacycle G of length greater than or equal to four. The reader is referred to Lauritzen [13] for all the commonnotionsofdecomposablegraphsthatwewillusehere. Onesuchimportantnotion is that of a perfect order of the cliques. Every decomposable graph admits a perfect order ofitscliques. Let(C , ,C )beonesuchperfectorderofthecliquesofthegraph . The 1 k ··· G historyforthegraph is givenby H = C and 1 1 H = C C C , j = 2,3, ,k, j 1 2 j ∪ ∪···∪ ··· and the(minimalvertex)separators ofthegraph aregivenby S = H C , j = 2,3, ,k. j j 1 j − ∩ ··· Let R = C H for j = 2,3, ,k. j j j 1 \ − ··· Let k k 1 denote the numberof distinctseparators and ν(S) denotethe multiplicityof ′ ≤ − S,i.e.,thenumberof jsuchthatS = S. Generally,wewilldenotebyC thesetofcliques j ofagraph and byS itsset ofseparators. G G 2.3 Markov properties for directed acyclic graphs Let V be a finite set of indices and (X) a collection of random variables, where each X i i V i ∈ is a random variable on the probability space . Let the probability space be defined i X X as the product space = . Now let = (V,E) be a DAG. For simplicity, and i V i X ×∈ X G without loss of generality, we always assume that the given DAG is connected and the G edgeset E containsall theloops(i,i),i V4. Wesay thataprobabilitydistribution P on ∈ X has the recursive factorization property w.r.t. , denoted by DF (the directed factorization G property), if there are σ-finite measures µ on and nonnegative functions ki(x,x ), i i i pa(i) X referred to askernels, defined on such that fa(i) X ki(y,x )dµ(y) = 1, i V, i pa(i) i i Z ∀ ∈ and P has adensity p, w.r.t. theproductmeasureµ = µ, givenby i V i ⊗∈ p(x) = ki(x,x ). i pa(i) Yi V ∈ In this case, each kernel ki(x,x ) is in fact a version of p(x x ), the conditional dis- i pa(i) i pa(i) | tribution of X given X . An immediate consequence of this definition is the following i pa(i) lemma. 4Forconveniencewedrawthegraphswithouttheirloops. 8 Lemma 2.1. (from [13]) If P admits a recursive factorization w.r.t. the directed graph , G thenitalsoadmitsafactorizationw.r.t. theundirectedgraph m,and,consequently,obeys G theglobalMarkovproperty5 w.r.t. m. G Proof. Note that for each vertex i V the set fa(i) is a complete subset of m. Thus if ∈ G we define ψ (x ) = ki(x,x ), then p(x) = p(x x ) = ki(x,x ) = fa(i) fa(i) i pa(i) i V i pa(i) i V i pa(i) ψ (x ). Therefore, P admits a factorization ∈w.r.t. | m and by pr∈oposition 3.8 in i V fa(i) fa(i) Q Q [13∈]italso obeystheglobalMarkovpropertyw.r.t. m. G (cid:3) Q G Another direct implication of the DF property is that if P admits a recursive factoriza- tion w.r.t. , then, for each ancestral set A, the marginal distribution P admitsa recursive A G factorization w.r.t. the induced graph . Combining this result with Lemma 2.1 we ob- A G tain the following: P admits a recursive factorization w.r.t. then A BS [P], whenever G ⊥ | A and B are separated by S in ( )m. We call this property the directed global An(A B S) G ∪ ∪ Markov property, DG, and any distribution that satisfies this property is said to be a di- rected Markov field over . For DAGs the directed Markov property plays the same role G as the global Markov property does for undirected graphs, in the sense that it provides an optimalruleforrecoveringtheconditionalindependencerelationsencodedbythedirected graph. We now introducebelow anotherMarkov property for DAGs. A distribution P on is X saidto obeythedirectedlocal Markovproperty(DL)w.r.t. ifforeach i V G ∈ i nd(i) pa(i). ⊥ | Now for a given DAG consider the so-called “parent graph” defined as follows: The G parent graph of is a DAG isomorphic to and obtained by relabeling the vertex par G G G set V as 1,2,..., V , in such a way that pa(i) i + 1,..., V for each vertex i V. It | | ⊂ { | |} ∈ is easily shown that for any given DAG it is possibleto relabel the vertices so that parents always have a higher numbering that their respective children though such an ordering is not unique in general. For a given parent ordering we say that P obeys the parent ordered Markovproperty(PO) w.r.t. ifforeveryvertexi wehave G i i+1,..., V pa(i) pa(i). ⊥ { | |}\ | It can be shown that if P has a density w.r.t. µ, then P obeys one of the directed MarkovpropertiesDF,DG,DL,POifandonlyifitobeysallofthem,i.e.,thefourMarkov propertiesforDAGs areequivalentundermildconditions[13]. 3 Gaussian directed acyclic graphical models In this section we focus on multivariate Gaussian distributions which obey the directed Markov property w.r.t. a DAG . From now on and unless otherwise stated, we shall G always assume without loss of generality that = (V,E) is given in a parent ordering. G 5see[13]fordefinition. 9 A Gaussian Bayesian network over (or Gaussian DAG over ), denoted by N ( ), is G G G thestatisticalmodel that consistsofall multivariateGaussian distributionsN (µ,Σ) which m follow the directed Markov property w.r.t. where µ Rm and Σ PD (R), the set of m G ∈ ∈ m mreal positivedefinitematrices. × 3.1 Linear recursive properties of Gaussian DAGs Let x = (x ,...,x )t be a random vector in Rm with the multivariatedistributionN (0,Σ). 1 m m Considerthesystemoflinearrecursiveregressionequations: x +β x +β x + +β x = ǫ orequivalently x = β x β x β x +ǫ 1 12 2 13 3 1m m 1 1 12 2 13 3 1m m 1 ··· − − −···− x +β x + +β x = ǫ x = β x β x +ǫ 2 23 3 2m m 2 2 23 3 2m m 2 ··· . . − −···− . . . . x = ǫ x = ǫ , m m m m where β is the partial regression coefficient of x (j > i) in the regression of x on ij j i − its predecessors x ,...,x ,...,x . Now β is zero if and only if i y i + 1,..., V i+1 j m ij { | |} \ pa(i) pa(i) . Hence the partial regression coefficient β is zero if there does not exist an ij | arrow from j to i, i.e., j < pa(i), j > i. In addition,theresidualsǫ are normallydistributed i and mutually independent with mean zero and variance σ2 . We can rewrite the first ipa(i) system of equations in the form of a linear system Bx = ǫ, w|here B is the upper triangular matrix 1 β ... β x ǫ 12 1m 1 1 0 1 ... β x ǫ From thisweobtaBin:= 00 ...... .0.. 12...m, x = x...m2 and ǫ = ǫ...m2. Var[Bx] = Var[ǫ] BΣBt = diag(σ2 ,...,σ2 ,σ2 ) =: D ⇒ 1pa(1) m 1pa(m 1) mm | − | − Σ = B 1D(Bt) 1 − − ⇒ Σ 1 = BtD 1B. (3.1) − − ⇒ Thus,ifwedefineL = Bt,thenΣ 1 = LD 1Lt istheso-calledmodifiedCholeskydecompo- − − sition of Σ 1, in terms of the lower triangular matrix L and the diagonal matrix D 1. Now − − consider a DAG denoted by = (V,E). In [25] it has been shown that N (0,Σ) obeys the m G directed Markov property w.r.t. if and only if L = 0 whenever there is no arrow from i ij G to j, i.e., i < pa(j). Equation (3.1) above therefore gives a very convenient description of theGaussianBayesian networkN ( ). Weexplorethismodelinmoredetailbelow. G 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.