Tail dependence of recursive max-linear models with regularly varying noise variables Nadine Gissibl∗ Claudia Klu¨ppelberg∗ Moritz Otto∗ 7 1 0 January 26, 2017 2 n a J Abstract 5 2 Weinvestigatemultivariateregularlyvaryingrandomvectorswithdiscretespectralmea- sure induced by a directed acyclic graph (DAG). The tail dependence coefficient measures ] E extreme dependence between two vector components, and we investigate how the matrix of M tail dependence coefficients canbe used to identify the full dependence structure of the ran- dom vector on a DAG or even the DAG itself. Furthermore, we estimate the distributional . t model by the matrix of empirical tail dependence coefficients. Here we assume that we do a t notknowtheDAG,butobserveonlythemultivariatedata.Fromtheseobservationswewant s [ to infer the causal dependence structure in the data. 1 AMS 2010 Subject Classifications: primary: 62G32, 60G70, 05C75 secondary: 62-09, 65S05 v 1 Keywords: DAG; directed acyclic graph; exponent measure; Markov graph; max-linear distribu- 5 tion; max-stable model; regular variation; structural equation model; extreme value theory; tail 3 7 dependence coefficient. 0 . 1 1 Max-linear models on directed acyclic graphs 0 7 1 Extreme value (or max-stable) distributions occur naturally as limit models for centered and : v scaled maxima. Multivariate max-stable models and their domains of attraction have been in- i X vestigated from a probabilistic and statistical point of view; see e.g. [2, 4, 15, 16]. r a We investigate multivariate regularly varying random vectors on graphs, which are in the maximum domain of attraction of a Fr´echet distribution. The dependence between the com- ponents is modelled by the exponent measure, which in our case is induced by a graph. More precisely, the dependence structure of a random vector X is given by a DAG D = (V,E) with node set V ={1,...,d} and edge set E ={(k,i) for all k ∈pa(i)}, where pa(i) denotes the par- ents of node i. As is commonly done, we identify the nodes with the components of the vector X. The necessary background on graphical models can be found in Edwards [7] and Lauritzen [12] (also Lauritzen [11] provides a very readable summary), for structural equation models we refer to Bollen [3] and Pearl [13]. Throughoutweassumeasdatageneratingmechanismarecursive max-linear structural equa- tion model, which has representation Xi = ⋁ cikXk∨ciiZi, i=1,...,d, (1.1) k∈pa(i) ∗Technische Universit¨at Mu¨nchen, Center for Mathematical Sciences, 85748 Garching, Boltzmannstrasse 3, Germany,e-mail: [email protected], [email protected],[email protected] 1 where Z ,...,Z are iid random variables with absolutely continuous distribution and support 1 d R =(0,∞), and the ci >0 for all k ∈Pa(i)∶=pa(i)∪{i}. Extending classical notation slightly, + k ci are called the edge weights of D. This model is max-linear (cf. Theorem 2.2 of [8]) with k representation Xi = ⋁ bjiZj, i=1,...,d, (1.2) j∈An(i) whereAn(i)∶=an(i)∪{i} andan(i) are theancestors of node i.We call thevector X a recursive max-linear (ML) model on D. The coefficients b are given as maxima of products of the edge weights ci along paths from ji j j to i. More precisely, for a path p = j = k0 → k1 → ⋅⋅⋅ → kn = i of length n from j to i, we [ ] define n−1 dji(p)∶=ckk00 ∏ckkll+1 and bji ∶= ⋁ dji(p), (1.3) l=0 p∈Pji where P denotes all paths from j to i; furthermore, we define b ∶=ci and b ∶=0 for all i∈V ji ii i ji andj ∈V ∖An i .Fori∈V andj ∈an i wecallapathpfromj toimax-weighted, ifb =d p . ji ji ( ) ( ) ( ) We summarize these coefficients in the max-linear (ML) coefficient matrix B = b . (1.4) ij d×d ( ) Throughout this paper we assume that the distributions of the noise variables are in the maximum domain of attraction of a standard α-Fr´echet distribution given for α>0 by Φ x =exp −x−α 1 x . α (0,∞) ( ) { } ( ) Thus, a generic noise variable Z is regularly varying with index −α for some α > 0. However, the dependence structure, which is introduced by the DAG, will be max-linear not only in the max-stablelimit,butalready intheoriginalregularlyvaryingvector (seeRemark2.6forobvious extensions). By max-linearity, we can write down the full dependence structure of the model explicitly. In this paper we investigate the tail dependence coefficient defined for equally distributed random variables X ,X as i j χ i,j =χ j,i = lim P F X >u F X >u . i i j j ( ) ( ) u ∞ ( ( ) ∣ ( ) ) → We shallshow thatin arecursiveML modelon aDAG withregularly varyingnoisevariables the tail dependence coefficient is positive, if and only if the two nodes i and j have joint ancestors. The goal of this paper is two-fold. Firstly, we investigate how far the matrix χ of all tail dependence coefficients can identify the dependence structure of the recursive ML model X on a DAG or even the DAG D itself. Secondly, we estimate χ with the goal of statistical estimation of such graphical dependencestructures. We assume that we do not know the DAG, but observe only data of the vector X. From these observations we want to infer the DAG and the ML coefficient matricx B and the edge weight matrix C; i.e., the causal ML structure of the data. More precisely, in section 2 we provide some preliminary results and definitions, both in the context of extreme value theory and graphical models. In particular, we derive the exponent measure,which characterizes thedependencein regularly varying models,whichis herediscrete, 2 since induced by the DAG. Furthermore, we introduce the recursive ML max-weighted model, which will beour leading model in what follows. Thetail dependencecoefficient is introduced in section3andthelinkbetweenthedependencestructureofXandthepositivityofχisdiscussed. The important question of identifiability of the recursive ML model X from its tail dependence coefficients is investigated in section 4. Moreover, we explain the prominent role of the initial nodes V of the DAG D for the identifiability of the model. In section 5, the previous theory is 0 applied to derive an identifiability result for the important class of recursive ML max-weighted models.Section6isdevotedtothestatisticalestimationofthetaildependencecoefficientmatrix. We consider an empirical estimator of χ and give precise conditions for asymptotic multivariate normality. The result holds for more general models, but for a recursive ML model we give the asymptotic covariance matrix explicitly in terms of the max-linear coefficient matrix (1.4). The following notation will be used throughout the paper. For a node i∈V the sets an i , pa i , and de i denote the ancestors, parents, and descen- ( ) ( ) ( ) dants of i with respect to D. Furthermore, we use the notation An i = an i ∪ i , Pa i = ( ) ( ) { } ( ) pa i ∪ i , and De i =de i ∪ i . We denote by j →i an edge and by j ⇒i any path from i ( ) { } ( ) ( ) { } [ ] to j. For any DAG D we call the initial nodes V ∶= i ∈ V ∶ an i = ∅ and the terminal nodes 0 { ( ) } V ∶= i∈V ∶de i =∅ . ∞ { ( ) } In general, for arbitrary (possibly random) ai ≥0 we set ⋁i∈∅ai =0, ⋀i∈∅ai =∞, ∑i∈∅ai =0, and ∏i∈∅ai =1. 2 Preliminaries 2.1 Distributional properties of X Throughout this paper we assume the iid noise variables Z ,...,Z to be regularly varying. 1 n More precisely, a generic noise variable Z has distribution tail F x =1−F x =L x x−α for Z Z ( ) ( ) ( ) x > 0 for a slowly varying function L and index of regular variation α > 0. We abbreviate this as Z ∈ R −α or F ∈ R −α . For details on multivariate regular variation and extreme value Z ( ) ( ) theory we refer to Resnick [15, 16]. Since Z∶= Z ,...,Z has independentregularly varying components, its exponent measure 1 d ( ) is concentrated on the axes. Like every exponent measure it is uniquely defined by its definition on complements of the sets 0,x , denoted by 0,x c for x = x ,...,x > 0,...,0 = 0. 1 d More precisely, for an chosen[such] that nP Z >[an ]ÐnÐ→Ð∞→ 1, we( find (tak)ing o(rder rel)ations ( ) componentwise) d d µZ 0,x c ∶=nP a−n1Z∈ 0,x c =n 1−P Z≤anx ÐnÐ→Ð∞→−log∏Φα xk = ∑x−kα. (2.1) ([ ] ) ( [ ] ) ( ( )) ( ) k=1 k=1 Proposition 2.1. Let X be a recursive ML model on a DAG. Then the distribution function of X= X ,...,X is in the maximum domain of attraction of 1 d ( ) d b α G x =exp − ∑ ⋁ ki 1 x . (2.2) ( ) { k=1i∈De(k)(xi ) } (0,∞)( ) Hence, X has exponent measure d b α µX 0,x c = ∑ ⋁ ki . (2.3) ([ ] ) k=1i∈De(k)(xi ) 3 In particular, the one-dimensional and bivariate marginal distribution functions are given for i,j =1,...,d, by Gi x =exp −x−α ∑ bαki 1(0,∞) x , ( ) { } ( ) k∈An(i) b α b α Gij(xi,xj)=exp{−k∈An(∑i)∩An(j)(xkii) ∧(xkjj) }1(0,∞)×(0,∞)(xi,xj), Proof. With X= X ,...,X and x= x ,...,x as above, we have by (1.2), 1 d 1 d ( ) ( ) d n 1−P X≤anx =n 1−P ⋁ bkiZk ≤anxi, i=1,...,d ( ( )) ( ( )) k=1 x =n 1−P Zk ≤an ⋀ i , k=1,...,d . ( ( b )) i∈De(k) ki By (2.1), we conclude that n(1−P(X≤anx))ÐnÐ→Ð∞→∑dk=1⋁i∈De(k)(bxkii)α. This implies (2.2). The marginal distribution functions are obtained by letting all other arguments of G tend to ∞. Remark 2.2. The one-dimensional marginal distributions and also the exponent measure de- pend on the max-linear coefficients. This can cause difficulties, when the marginal distributions differ; i.e., when ∑k∈An(i)bki ≠ ∑k∈An(j)bkj for i,j ∈ V. Thus we follow standard practice and calculate the exponent measure after standardization of the marginals to standard unit Fr´echet (cf. [15], Section 5.4.1.). This requires the definition of the generalized inverse of an increasing function f given by f y ∶=inf x∶f x ≥y . If f is strictly increasing, then f coincides with ← ← ( ) { ( ) } the analytic inverse. Then a standardized version of G is obtained as follows: 1 1 G x ∶=G ← x ,..., ← x 1 x (2.4) ∗( ) ((−logG1) ( 1) (−logGd) ( d)) (0,∞)( ) with exponent measure given for x= x ,...,x > 0,...,0 by 1 d ( ) ( ) d d µ∗ 0,x c ∶=−logG∗ x = −logG x1∑bαl1 1/α,..., xd∑bαld 1/α ([ ] ) ( ) (( ) ( ) ) l=1 l=1 d bα = ∑ ⋁ ki x−1. (2.5) k=1i∈De(k)∑dl=1bαli i ◻ Remark2.3. Thedependencestructure,whichisintroducedbytheDAG,ismax-linearnotonly inthemax-stablelimit,butalreadyintheoriginalregularlyvaryingvectorX.Thisratherlimited dependence structure can be generalized naturally within the framework of regular variation. Theextent of thepossiblegeneralization can bebestunderstood,whenconsideringanequiv- alent representation of the dependence in a regularly varying vector. According to Resnick [16], Theorem6.1, X∈Rd is multivariate regularly varying,if andonlyif thereexists arandomvector Θ∈Sd−1 = x∈Rd ∶ x =1 such that for t>0: { ∥ ∥ } P(∥X∥P>tXx,X>/x∥X∥∈⋅) Ðw→t−αP(Θ∈⋅), x→∞, (2.6) (∥ ∥ ) where Ðw→ denotes weak convergence of measures and ⋅ is any norm in Rd. The probability ∥ ∥ distribution S of Θ is called the spectral measure of X. From (2.6) we find immediately that the 4 dependence structure of X is for moderate values of X arbitrary, only when the norm of the ∥ ∥ vector becomes large, the dependence structure becomes that of Θ. By Proposition 6.4 of [16] the exponent measure and spectral measure are related by a polar coordinate transform giving d a α µ 0,x c =c i S da , x>0, ([ ] ) ∫S+d−1i⋁=1(xi) ( ) forsomeconstant c>0anda= a ,...,a .If S hadbeenacontinuous measure,a= a ,...,a 1 d 1 d ( ) ( ) could take any value on the positive unit sphere Sd−1 in Rd. In our setting, however, the depen- + dence structure of X becomes discrete in the limit, such that each component a can only take i values b for k=1,...,d. ki Whenstatistical estimation is basedontheextreme valuesof asample,as inthispaperwhen estimating the tail dependence coefficients, the restriction to the (limiting) discrete dependence provides a sufficient model. Furthermore, the slightly simpler model induced by the DAG for all values of the data allows for more concise notation and makes the new ideas given by the DAG dependence structure more transparent. ◻ 2.2 Structural properties The following lemma describes a property of the max-linear coefficient matrix B. Note that by Proposition 2.1 the condition on the ML coefficients implies that the one-dimensional marginal distributions are standard unit Fr´echet. Lemma 2.4. Let B = b be the ML coefficient matrix of a recursive ML model on a DAG ij d×d ( ) such that bα =1 for all i∈V. Then for all i,j ∈V, ∑l∈An(i) li b <b . ji jj Proof. Note that b >0 for all j ∈V. If b =0, which is by (1.3) the case if j ∈V ∖An i , then jj ji it obviously holds that b < b . Assume now that j ∈ An i . Using b ≥ bljbji for all l(∈)An j ji jj ( ) li bjj ( ) (cf. Lemma 2.10 of [8]) and the fact that b =1, we obtain ∑l∈An(j) lj b 1= b = b + b ≥ ji + b , ∑ li ∑ li ∑ li ∑ li b l∈An(i) l∈An(j) l∈An(i)∖An(j) jj l∈An(i)∖An(j) Since b >0 for all l∈An i ∖An j , we have that bji <1, equivalently b <b . li ( ) ( ) bjj ji jj From (1.3) we know that not all paths may be relevant for obtaining the maximum in definition (1.1) of the model. In the next definition we summarize the relevant paths, where the maximum is realised. Definition 2.5. [Gissibl and Klu¨ppelberg [8], Definition 2.6] For i∈V and j ∈an i we call a path p= j =k0 →k1 →⋯→kn =i from j to i max-weighted, ( ) [ ] if p realises the maximum in (1.3); i.e., if b =d p =ck0 k−1ckl+1. ◻ ji ji( ) k0∏l=0 kl The DAG with the minimal number of edges corresponding to a given reachability matrix R=sgn B is defined as follows. ( ) Definition 2.6. A DAG Dtr = V,Etr is called transitive reduction of D, if ( ) (i) for all i,j ∈V, Dtr has a path from j to i if and only if D has a path from j to i, and (ii) there is no DAG with less edges satisfying condition (a). ◻ 5 As shown in Aho et al. [1], the transitive reduction of a finite DAG D is unique and also a subgraph of D. We define an analog of the transitive reduction of a DAG D in the context of recursive ML models. Intuitively, we cannot dispose of those paths, which are max-weighted in the sense of Definition 2.5. Thus, for a given ML coefficient matrix B the DAG DB with minimal number of edges, such that X is a recursive ML model on a DAG corresponding to B, has in general more edges than the transitive reduction DAG. Definition 2.7. A DAG DB is called max-linear reduction if for all i,j ∈ V, DB has an edge j →i if and only if this is the only max-weighted path from j to i in D. ◻ It has been shown in Theorem 3.5 of Gissibl and Klu¨ppelberg[8] that DB is unique and also a subgraph of D. Moreover, the recursive ML model X with ML coefficient matrix B is minimal with respect to DB, which can be characterized as b b DB = V,EB ∶= V, k,i ∈E ∶b > kl li . (2.7) ki ⋁ ( ) ( {( ) b }) l∈de(k)∩pa(i) ll With the next definition we introduce our leading example of a recursive ML model that will be further investigated in Section 5. Definition 2.8. Let X be a recursive ML model on a DAG with ML coefficient matrix B = b . Assume that all paths are max-weighted; i.e., for arbitrary nodes i,j ∈V and all paths ij d×d ( ) p= j =k0 →k1 →...→kn =i , independent of the specific path p, [ ] b =ck0ck1...ckn−1ckn =d p . ji k0 k0 kn−2 kn−1 ji( ) Then we call X recursive ML max-weighted model. Since every path in D is max-weighted, we observe from (1.3) by comparing two existing paths j ⇒i and j ⇒km ⇒i that [ ] [ ] b b b = jkm kmi for m=1,...,n−1. (2.8) ji b kmkm ◻ Example 2.9. [Polytree] Let X be a recursive ML model relative to a polytree D; i.e., D has no cycles, equivalently, the underlying undirected graph is a tree (cf. Koller and Friedman [10], Definition 2.2). Since there exists at most one path between every pair of nodes, all paths must be max-weighted. ◻ The following example shows that for every DAG we can find a recursive ML max-weighted model. Example 2.10. [Homogeneous model] Let X be a recursive ML structural equation model as in (1.1) defined by 1 X ∶= An k 1/αX ∨Z , i=1,...,d. i ⋁ k i An i 1/α( ∣ ( )∣ ) k∈pa(i) ∣ ( )∣ Let p= j =k0 →k1 →⋅⋅⋅→kn =i be a path of length n from j to i. Then the coefficient dji p [ ] ( ) from (1.3) is given by d p = An i −1/α such that ji ( ) ∣ ( )∣ 1 X = Z , i=1,...,d; i ⋁ j An i 1/α j∈An(i) ∣ ( )∣ i.e., the ML coefficient matrix is B = b = 1 1 j . ◻ ( ji)d×d (∣An(i)∣1/α An(i)( ))d×d 6 Lemma 2.11. Let X be a recursive ML max-weighted model with respect to a DAG D. Then DB =Dtr. Proof. Since Dtr is a subgraph of DB and minimal, we only have to consider i ∈ V and k ∈ pa i ∖patr i , where patr i denote the transitive parents of i. Since every path from k to i is ( ) ( ) ( ) max-weighted, we have also a max-weighted path which contains a node of pa i ∩de k . Thus ( ) ( ) we know from Lemma 2.10(a) of [8] that b b kl li b = . ki ⋁ b l∈pa i ∩de k ll () ( ) This contradicts (2.7). 3 Extreme dependence measures In what follows we assume that X = X ,...,X is a recursive ML structural equation model 1 d ( ) relative to a DAG given by (1.2) with iid noise variables Z ∈R −α for k =1,...,d. Denote by k ( ) F the joint distribution function of X and by F the marginal distribution function of X for i i i=1,...,d. Various functions and coefficients have been suggested to describe the extreme dependence of a random vector in a different and possibly simpler way than by the full exponent measure (2.3) or its standardised version (2.5). We focus on the following. Definition 3.1. For nodes i,j ∈V define the tail dependence coefficient between X and X by i j χ i,j ∶= limP F X >u F X >u . (3.1) i i j j ( ) u 1 ( ( ) ∣ ( ) ) → We summarize all χ i,j in a matrix χ ∶= χ i,j and call χ tail dependence coefficient d×d ( ) ( ( )) matrix. ◻ Suchmeasureshavebeendefinedandusedinthecontextofmultivariatedistributions(seefor example Beirlant et al. [2, Section 9.5.1]). Moreover, they are usually restricted to distributions with equal marginals, or applied after transforming different marginals as we also suggest in Remark 2.2. In the following theorem we express the tail dependence coefficient and its multivariate extensions explicitly in terms of the ML coefficient matrix B. Theorem 3.2. For i,j ∈V define bα b ∶= ij . ij d bα ∑l=1 lj Let i ,...,i ∈V be an arbitrary index set. Then 1 l limP F X >u,...,F X >u F X >u u 1 ( i1( i1) il( il) ∣ i1( i1) ) → 1 l = lim P F X >u,...,F X >u = b . (3.2) u→11−u ( i1( i1) il( il) ) k∈An(i1)∑∩⋅⋅⋅∩An(il)m⋀=1 kim In particular, the tail dependence coefficient χ i,j between X and X is given by i j ( ) χ i,j = b ∧b . (3.3) ∑ ki kj ( ) k∈An i ∩An j () ( ) 7 Proof. For i,j ∈ V, eq. (3.2) is equivalent to (3.3). The general formula (3.2) can by derived from the multivariate distribution function (2.2) by the usual inclusion-exclusion argument. For notational ease we only present a proof of (3.3). By [2], Section 9.5.1, for i,j ∈V (3.2) is equal to the tail dependence coefficient χ i,j , which is given by ( ) 1−F ∞,...,∞,F u ,∞,...,∞,F u ,∞,...,∞ χ i,j =2−lim ( i←( ) j←( ) ). ( ) u 1 1−u → Substituting t ∶= 1 and observing that for the generalized inverse F 1− 1 = 1 ← t 1−u i←( t) (1−Fi) ( ) holds, we have with t=(t1,...,td) for F∗(t)∶=F((1−1F1)←(t1),...,(1−1Fd)←(td)) that χ i,j =2− lim t 1−F∗ t ei∧ej , ( ) t ∞ ( ( ( ))) → where the vector e is 1 at position i and ∞ elsewhere, and the minimum is understood compo- i nentwise. By [15], Prop. 5.15(b), F∗ is in the maximum domain of attraction of the distribution function G∗ as given in Remark 2.2. Using Prop.5.15(a) and Eq. (5.38) of [15], we conclude that t 1−F∗ t ei∧ej Ðt→Ð∞→−logG∗ ei∧ej =µ∗ 0,ei∧ej c . ( ( ( ))) ( ) ([ ] ) This yields d bα bα χ i,j =2− ki ∨ kj . ( ) ∑ d bα d bα k=1∑l=1 li ∑l=1 lj Since d bα bα d bα bα ki ∨ kj + ki ∧ kj =2, ∑ d bα d bα ∑ d bα d bα k=1∑l=1 li ∑l=1 lj k=1∑l=1 li ∑l=1 lj and b ∧b =0 for k ∈V ∖ An i ∩An j (cf. (1.3)), we finally obtain ki kj ( ( ) ( )) bα bα χ i,j = ki ∧ kj . ( ) ∑ d bα d bα k∈An i ∩An j ∑l=1 li ∑l=1 lj () ( ) Remark 3.3. By Theorem 3.3 of [8], the matrix B = bij d×d is again a ML coefficient matrix ( ) of a recursive ML model. ◻ Corollary 3.4. The following are equivalent. a χ i,j =0 ( ) ( ) b An i ∩An j =∅ ( ) ( ) ( ) c X and X are independent i j ( ) Proof. (a) ⇐⇒ (b) is immediate by (3.3). (b) ⇐⇒ (c) holds by definition of Xi and Xj. Since Xi ∶=⋁k∈An i bkiZk with independent noise () variables Z , X and X are independent if and only if An i ∩An j =∅. k i j ( ) ( ) 8 Remark 3.5. Inthe moregeneral framework of Remark 2.6,parts (b)and (c) of Cor.3.4 would have to be replaced by: (b’) The limiting distribution can be presented on a DAG such that An i ∩An j =∅ ( ) ( ) (c’) X and X are asymptotically independent. ◻ i j The following result states some useful properties of the tail dependence coefficients. Proposition 3.6. Let i∈V. a For j ∈an i we have 0<χ i,j ≤ b <1. ∑k∈An j ki ( ) ( ) ( ) ( ) b For j ∈An i we have χ i,j ≥ bji with equality if and only if all paths from k ∈An j to ( ) ( ) ( ) bjj ( ) i passing through j are max-weighted. Let V be the set of initial nodes of D, then 0 c For j ∈An i ∩V we have χ i,j =b . 0 ji ( ) ( ) ( ) d For j ∈An i ∩V we have 0 ( ) ( ) χ j,k χ k,i ≥χ j,i , ⋁ ( ) ( ) ( ) k∈De j ∩An i ( ) () with equality if for every k ∈De j ∩An i , every path from l∈An k to i passing through ( ) ( ) ( ) k is max-weighted. Proof. (a) We obtain from (3.3), since b ∧b =0 for k ∉An j , ki kj ( ) d χ i,j = b ∧b = b ∧b ≤ b < b =1, ∑ ki kj ∑ ki kj ∑ ki ∑ ki ( ) k=1 k∈An j k∈An j k∈An i ( ) ( ) () where the inequality is due to An j ⊊An i . (b)FromLemma2.10of[8]wekno(w)thatb( )≥ bkjbji forallk ∈An j .Remark3.3andLemma2.4 ki bjj ( ) imply bji <1. Hence, bjj b b b b χ i,j = b ∧b ≥ kj ji ∧b = ji b = ji. ∑ ki kj ∑ kj ∑ kj ( ) b b b k∈An j k∈An j jj jj k∈An j jj ( ) ( ) ( ) The second statement follows from the fact that a path from k ∈ An j to i through j is max- weighted if and only if b = bkjbji. ( ) ki bjj (c) Since b =1 implies b =1 for all j ∈V , the statement is a special case of (b). ∑l∈An j lj jj 0 (d) Since j ∈De( )j ∩An i and χ j,j =1, it holds that ( ) ( ) ( ) χ j,k χ k,i ≥χ j,j χ j,i =χ j,i . ⋁ ( ) ( ) ( ) ( ) ( ) k∈De j ∩An i ( ) () Let k ∈De j ∩An i . If all paths from l∈An k to i through k are max-weighted, we have by ( ) ( ) ( ) parts (b) and (c) that χ j,k =b and χ k,i = bki, implying jk ( ) ( ) bkk b ki χ j,k χ k,i = b =b =χ j,i , ⋁ ⋁ jk ji ( ) ( ) b ( ) k∈De j ∩An i k∈De j ∩An i kk ( ) () ( ) () where the equality b bki =b is due to Lemma 2.10 of [8]. ⋁k∈De j ∩An i jkb ji ( ) () kk 9 Example 3.7. Consider the following DAGs D (left) and D (right). 1 2 1 2 1 2 4 3 4 3 (1) In the second statement of Lemma 3.6(d), the assumption that all paths from l∈An k to ( ) i through k are max-weighted cannot be dropped: consider a max-linear model with α =1 and d b =1 for all i∈V on the left hand DAG D with c3 <c4 =b . Since B =B and c4 <1, we ∑k=1 ki 1 1 1 14 3 have by Lemma 3.6: χ 2,3 =c3, χ 3,4 =c4, and χ 2,4 =c3c4. 2 3 2 3 ( ) ( ) ( ) Hence, χ 2,3 χ 3,4 >c32c43 =χ 2,4 . Since the path 1→3→4 is not max-weighted, equality ( ) ( ) ( ) [ ] as in Lemma 3.6(d) does not hold. (2) χ j,k χ k,i =χ j,i does not imply that the path j →k →i is max-weighted: consider ( ) ( ) ( ) [ ] the max-linear model with α = 1 on the right hand DAG D with weight and ML coefficient 2 matrices 1 0 0.1 0.085 1 0 0.1 0.085 ⎛ ⎞ ⎛ ⎞ 0 1 0.8 0.5 0 1 0.8 0.5 C =⎜ ⎟, B =⎜ ⎟. ⎜ 0 0 0.1 0.4 ⎟ ⎜ 0 0 0.1 0.04 ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 0 0 0 0.375 0 0 0 0.375 ⎝ ⎠ ⎝ ⎠ Hence, we obtain by (3.3) and since b =1 for all i=1,...,4, ∑l∈An i li () χ 2,3 =0.8, χ 3,4 =0.625, and χ 2,4 =0.5. ( ) ( ) ( ) Hence, χ 2,3 χ 3,4 = χ 2,4 . However, b b34 = 0.8⋅0.4 < 0.5 = b , implying that the path ( ) ( ) ( ) 23b33 24 2→3→4 is not max-weighted. ◻ [ ] 4 Identifiability from the tail dependence coefficients The goal of this section is to investigate how far the matrix of tail dependence coefficients χ i,j of a recursive ML model on a DAG with regularly varying noise Z ,...,Z deter- i,j∈V 1 d ( ( )) mines the ML coefficient matrix B and, hence, the reachability matrix R=sgn B . ( ) Definition 4.1. We call two recursive ML models X and X χ-equivalent, if their tail depen- 1 2 dence coefficient matrices agree, i.e., if χ =χ . ◻ 1 2 The definition has its analogue in the classical framework of structuralequation models with linear functions and Gaussian noise. For instance, it is shown in Heckerman and Geiger [9] that for all graphsinthe sameMarkov equivalence class thereexists a structuralequation modelthat leads to the same distribution X. Extensions and ramifications can be found in Peters [14]. The following example investigates the relation between the χ-equivalence of two recursive ML models and the Markov equivalence of their underlying DAGs. 10