Bayesian shrinkage prediction for the regression problem 7 0 0 2 Kei Kobayashi and Fumiyasu Komaki ∗ † n a J 1 2 Abstract ] T We consider Bayesian shrinkage predictions for the Normal regression problem S under the frequentist Kullback-Leibler risk function. . Firstly, we consider the multivariate Normal model with an unknown mean and h t a known covariance. While the unknown mean is fixed, the covariance of future a m samples can be different from training samples. We show that the Bayesian predic- [ tive distribution based on the uniform prior is dominated by that based on a class of priors if the prior distributions for the covariance and future covariance matrices 1 v are rotation invariant. 3 Then, we consider a class of priors for the mean parameters depending on the 8 futurecovariance matrix. With such a prior,we can construct aBayesian predictive 5 1 distribution dominating that based on the uniform prior. 0 Lastly, applying this result to the prediction of response variables in the Normal 7 linear regression model, we show that there exists a Bayesian predictive distribu- 0 / tion dominating that based on the uniform prior. Minimaxity of these Bayesian h predictions follows from these results. t a m : Key words: Bayesian prediction, shrinkage estimation, Normal regression, superharmonic v i function, minimaxity, Kullback-Leibler divergence. X r a 1 Introduction Suppose that we have observations y N (y;µ,Σ). Here N is the density function of d d ∼ the d-dimensional multivariate Normal distribution with mean vector µ and covariance matrix Σ. We consider the prediction of y˜ N (y˜;µ,Σ˜) using a predictive density d ∼ pˆ(y˜y). We assume that the mean of the distribution of unobserved (future) samples is | the same as the one of the observed samples. However, the covariance matrices, Σ and Σ˜, are not necessarily the same or proportional to each other. We call a problem with such settings the “problem with changeable covariances.” As we will show below, the ∗[email protected] †[email protected] 1 changeable covariance is a natural assumption when we consider the linear regression problems. In the present work, we assume that the mean vector µ is unknown and the covariance matrix Σ is known. We consider both cases where the future covariance Σ˜ is known and unknown. We evaluate predictive densities pˆ(y˜y) by the KL loss function | p˜(y˜θ) D(p˜(y˜θ) pˆ(y˜y)) := p˜(y˜θ)log | dy˜ (1) | k | | pˆ(y˜y) Z | and the (frequentist) risk function R (pˆ,θ) := p(y θ)D(p˜(y˜θ) pˆ(y˜y))dy˜. (2) KL | | k | Z We consider the Bayesian predictive density p˜(y˜θ)p(y θ)π(θ)dθ p (y˜y) := | | π | p(y θ)π(θ)dθ R | with prior π(θ). For the Normal model, theRBayesian predictive density with the uniform prior π (µ) = 1 becomes I 1 (y˜ y)⊤(Σ+Σ˜)−1(y˜ y) p (y˜y;Σ,Σ˜) = exp − − , π | (2π)d/2 Σ+Σ˜ 1/2 − 2 | | (cid:16) (cid:17) as we will see in Section 2. Let p (y˜y;Σ,Σ˜) denote p (y˜y) for short. π π When Σ˜ is proportional to Σ, i.e.|Σ˜ = aΣ for a > 0, th|e problem is reduced to the one with Σ = vI and Σ˜ = v˜I for positive scalar values v and v˜. This case with ‘unchangeable d d covariances’ has been well studied. The Bayesian predictive density 1 y˜ y 2 p (y˜y;Σ,Σ˜) = exp k − k I | 2π(v+v˜) d/2 − 2(v +v˜) { } (cid:16) (cid:17) based on the uniform prior π (µ) = 1 dominates the plug-in density I 1 y˜ y 2 p(y˜µˆ) = exp k − k | 2πv˜ d/2 − 2v˜ { } (cid:16) (cid:17) with MLE, where µˆ = y. Moreover, by Murray (1977) and Ng (1980), the Bayesian predictive density p (y˜y) is the best predictive density that is invariant under the trans- I | lation group. In Liang & Barron (2004) and George et al. (2006), the minimaxity of p I was proved. In Komaki (2001), it was proved that the Bayesian predictive density p (y˜y) with S | Stein prior π (µ) := µ −(d−2) (3) S k k 2 dominates the Bayesian predictive density p (y˜y) with the uniform prior π (µ). I I | George et al. (2006) generalized the result of Komaki (2001). Define the marginal distribution m by π m (z;Σ) := N(z;µ,Σ)π(µ)dµ. (4) π Z As we will see in Theorem 2.4 below, George et al. (2006) proved a sufficient condition on the prior π(µ) or the marginal distribution m for p (y˜y) to dominate p (y˜y) when π π I Σ is proportional to Σ˜. In the present work, we generalize|the results of Komak|i (2001) and George et al. (2006) to the corresponding problem with the changeable covariances, considering only finite sample cases. Asymptotic properties of Bayesian prediction are studied in Komaki (1996), Corcuera & Giummol´e (2000), and Komaki (2006). 2 Prior distributions independent of the future co- variance In this section, we develop and prove our main results concerning properties of p (y˜y) in π | the problem with changeable covariances. First we give three lemmas generalizing results proved in George et al. (2006) for the problem with “unchangeable” variances. Define the marginal distribution m by (4). π Lemma 2.1 If m (z;Σ) < for all z, then p (y˜y) is a proper probability density. π π ∞ | Moreover, the mean of p (y˜y) is equal to the posterior mean E [µ y] if it exists. π π | | Let w := (Σ−1 +Σ˜−1)−1(Σ−1y +Σ˜−1y˜) and Σ := (Σ−1 +Σ˜−1)−1. (5) w As a function of the predictive density based on the uniform prior, the Bayesian predictive density based on a prior π(µ) becomes as follows: Lemma 2.2 m (w;Σ ) π w p (y˜y) = p (y˜y) . π I | | m (y;Σ) π The following lemma is used for proving minimaxity of p (y˜y). π | Lemma 2.3 The Bayesian predictive density p (y˜y) is minimax under KL risk function I | R (pˆ,µ). KL 3 Since the proofs of Lemma 2.1 and Lemma 2.3 are almost same as those of Lemma 1 and Lemma 3 in George et al. (2006), we omit them. We prove only Lemma 2.2. Proof of Lemma 2.2 p(y µ,Σ)p(y˜µ,Σ˜) | | 1 (y µ)⊤Σ−1(y µ) 1 (y˜ µ)⊤Σ˜−1(y˜ µ) = exp − − exp − − (2π)d/2 Σ 1/2 − 2 (2π)d/2 Σ˜ 1/2 − 2 | | (cid:16) (cid:17) | | (cid:16) (cid:17) 1 1 (w µ)⊤Σ−1(w µ) y⊤Σ−1y y˜⊤Σ˜−1y˜ = exp − w − exp (2π)d/2 Σ 1/2(2π)d/2 Σ˜ 1/2 − 2 − 2 − 2 | | | | (cid:16) (cid:17) (cid:16) (cid:17) (Σ−1y +Σ˜−1y˜)⊤(Σ−1 +Σ˜−1)−1(Σ−1y +Σ˜−1y˜) exp 2 1 1 (w(cid:16) µ)⊤Σ−1(w µ) (y y˜)⊤(Σ+Σ˜)−1((cid:17)y y˜) = exp − w − exp − − (2π)d/2 Σ 1/2(2π)d/2 Σ˜ 1/2 − 2 − 2 | | | | (cid:16) (cid:17) (cid:16) (cid:17) (6) In the last equation, we use Σ−1(Σ−1 +Σ˜−1)−1Σ−1 Σ−1 − = Σ−1(Σ−1 +Σ˜−1)−1Σ−1 Σ−1(Σ−1 +Σ˜−1)−1(Σ−1 +Σ˜−1) − = Σ−1(Σ−1 +Σ˜−1)−1Σ˜−1 − = (Σ+Σ˜)−1. − From (6), the predictive density with the uniform prior I(µ) = 1 is given by ˜ p(y µ,Σ)p(y˜µ,Σ)dµ p (y˜y) = | | I | p(y µ,Σ)dµ R | 1 1 (y y˜)⊤(Σ+Σ˜)−1(y y˜) = R Σ−1 +Σ˜−1 −1/2(2π)d/2exp − − (2π)d/2 Σ 1/2(2π)d/2 Σ˜ 1/2| | − 2 | | | | (cid:16) (cid:17) (y y˜)⊤(Σ+Σ˜)−1(y y˜) = (2π)−d/2 Σ+Σ˜ −1/2exp − − . | | − 2 (cid:16) (cid:17) Therefore p(y µ,Σ)p(y˜µ,Σ˜)π(µ)dµ p (y˜y) = | | π | p(y µ,Σ)π(µ)dµ R | p (y˜y) N(w;µ,Σ )π(µ)dµ = I |R w N(y;µ,Σ)π(µ)dµ R m (w;Σ ) = p (y˜yR) π w . I | m (y;Σ) π 4 (cid:3) Next, the difference of the risk functions of the two priors is evaluated. Let R (π,µ) := p(y µ,Σ)D(p(y˜µ,Σ˜) p (y˜y))dy KL π | | k | Z φ (µ,Σ) := N(z;µ,Σ)logm (z;Σ)dz. π π Z Then from Lemma 2.2, p (y˜y) R (π,µ) R (π ,µ) = p(y µ,Σ)p(y˜µ,Σ˜)log I | dydy˜ KL KL I − | | p (y˜y) π Z | m (y;Σ) = p(y µ,Σ)p(y˜µ,Σ˜)log π dydy˜ | | m (w;Σ ) π w Z = φ (µ,Σ) φ (µ,Σ ). (7) π π w − Now Σ = (Σ−1 + Σ˜−1)−1 Σ. In order to prove R (π,µ) < R (π ,µ), it suffices to w KL KL I ≺ prove φ (µ,Σ) < φ (µ,Σ ). π π w Before stating themain results for the problemwith changeable covariances, we review some results with a special setting, i.e., unchangeable covariances. An extended real-valued function π(µ) on an open set R Rp is said to be superhar- ⊂ monic when it satisfies the following properties: 1. < π(µ) and π(µ) on any component of R. −∞ ≤ ∞ 6≡ ∞ 2. π(µ) is lower semi-continuous on R. 3. If G is an open subset of R with compact closure G¯ R, w(µ) is a continuous function on G¯, w(µ) is harmonic on G, and π(µ) w(µ⊂) on ∂G, then π(µ) w(µ) ≥ ≥ on G. If π(µ) is a C2 function, then π(µ) is superharmonic on R if and only if ∆π 0 on R. ≤ Theorem 2.4 (Komaki (2001) and George et al. (2006)) Assume d 3. ≥ (i) If π(µ) is the Stein prior π (µ), S v > v > 0 φ (µ,v I ) < φ (µ,v I ) for all µ. 1 2 π 1 d π 2 d ⇒ (ii) If π(µ) is a superharmonic function and m (z;vI ) < for any z and v, π d ∞ v > v > 0 φ (µ,v I ) φ (µ,v I ) for all µ. 1 2 π 1 d π 2 d ⇒ ≤ Furthermore, if m (z;vI ) is also not constant for all v v v , the inequality π d 2 1 ≤ ≤ holds strictly. 5 (iii) If m (z;vI ) is a superharmonic function for any v and m (z;vI ) < for π d π d ∞ any z and v, p v > v > 0 φ (µ,v I ) φ (µ,v I ) for all µ. 1 2 π 1 d π 2 d ⇒ ≤ Furthermore, if m (z;vI ) is also not constant for any v v v , the inequality π d 2 1 ≤ ≤ holds strictly. We note that (iii) implies (ii) and (ii) implies (i). (i) was proved in Komaki (2001). (ii) and (iii) were proved in George et al. (2006). Theorem 2.5 is a generalization of (ii) of Theorem 2.4 to the problem with changeable covariances. For each prior π(µ), define a rescaled prior with respect to a positive definite d d matrix Σ∗ by × πΣ∗(µ) := π(Σ∗−1/2µ). In particular, call πS;Σ∗(µ) := πS(Σ∗−1/2µ) as a rescaled Stein prior with respect to Σ∗. We consider Bayesian risk with priors p(Σ) and p˜(Σ˜): (π,µ) = p(Σ)p˜(Σ˜)R (π,µ)dΣdΣ˜, KL KL R Z where dΣ means a Lebesgue measure for a vector space of all components of a matrix Σ. Define ϕ (µ) := p(Σ)p˜(Σ˜)φ (µ,Σ)dΣdΣ˜ π π Z = p(Σ)p˜(Σ˜)N(z;µ,Σ)logm (z;Σ)dzdΣdΣ˜ (8) π Z ϕw(µ) := p(Σ)p˜(Σ˜)φ (µ,Σ )dΣdΣ˜ π π w Z = p(Σ)p˜(Σ˜)N(z;µ,Σ )logm (z;Σ )dzdΣdΣ˜. (9) w π w Z Then from (7), (π,µ) (π ,µ) = ϕ (µ) ϕw(µ). (10) RKL −RKL I π − π We consider the case where p(Σ), p˜(Σ˜), and π(µ) are rotation invariant. Here, a function f(Σ) of a matrix Σ Rd×d and a function f(µ) of a vector µ Rd×d are said ∈ ∈ to be rotation invariant if f(Σ) = f(PΣP⊤) and g(µ) = g(Pµ), respectively, for every orthogonal matrix P Rd×d. ∈ Theorem 2.5 Let d 3. If p(Σ) and p˜(Σ˜) are rotation invariant functions and π is a rotation invariant ≥ superharmonic prior, then (π ,µ) (π ,µ) KL Σ KL I R ≤ R for any µ. In particular, the Bayesian predictive distribution p (y y˜) with π dominates Σ Σ | that based on π if π is also not constant. I 6 Proof. We note that m (z;Σ) < for every z Rd and positive definite matrix π Σ Rd×d from Lemma A.1 in the app∞endix. ∈ ∈ First, we prove invariance of ϕ (µ) and ϕw (µ) under rotations of µ. πΣ πΣ Let P be a d d orthogonal matrix, then × ϕ (Pµ) = p(Σ)p˜(Σ˜)N(z;Pµ,Σ)log N(z;µ′,Σ)π (µ′)dµ′dzdΣdΣ˜ πΣ Σ Z Z = p(Σ)p˜(Σ˜)N(z˜;µ,P⊤ΣP)log N(z˜;µ˜′,P⊤ΣP)π(Σ−1/2Pµ˜′)dµ˜′dz˜dΣdΣ˜ Z Z = p(PΣP⊤)p˜(Σ˜)N(z˜;µ,Σ)log N(z˜;µ˜′,Σ)π(Σ−1/2µ˜′)dµ˜′dz˜dΣdΣ˜ Z Z = ϕ (µ). πΣ Proof of the rotation invariance of ϕw (µ) is nearly the same. πΣ We define Σ−1/2µ′ µ∗ := arg max k k kµ′k=kµk Σ−1/2µ′ w k k and Σ−1/2µ∗ τ := k k. (11) Σ−1/2µ∗ w k k Note that 0 < τ < 1, because Σ˜ is positive definite. Moreover, Σ−1/2µ˜′ τΣ−1/2µ˜′ = τ Σ−1/2µ˜′ k k Σ−1/2µ˜′ = Σ−1/2µ˜′ k w k k w k ≥ Σ−1/2µ˜′ k w k k k w k k for every µ˜′. From the rotation invariance of φ , πΣ ϕ (µ) = ϕ (µ∗) πΣ πΣ = E [ N(z;µ∗,Σ)log N(z;µ˜,Σ)π(Σ−1/2µ˜)dµ˜dz] Σ,Σ˜ Z Z = E [ N(z˜;Σ−1/2µ∗,I )log N(z˜;µ˜′,I )π(µ˜′)dµ˜′dz˜] Σ,Σ˜ d d Z Z = E [ N(z˜;τΣ−1/2µ∗,I )log N(z˜;µ˜′,I )π(µ˜′)dµ˜′dz˜] Σ,Σ˜ w d d Z Z = E [ N(z˜;Σ−1/2µ∗,τ−2I )log N(z˜;µ˜′,τ−2I )π(τµ˜′)dµ˜′dz˜] Σ,Σ˜ w d d Z Z E [ N(z˜;Σ−1/2µ∗,I )log N(z˜;µ˜′,I )π(τµ˜′)dµ˜′dz˜] (12) ≤ Σ,Σ˜ w d d Z Z = E [ N(z˜;µ∗,Σ )log N(z˜;µ˜′,Σ )π(τΣ−1/2µ˜′)dµ˜′dz˜] Σ,Σ˜ w w w Z Z Here, inequality (12) is given by Theorem 2.4 (ii). 7 Since every rotation invariant superharmonic function is radially nonincreasing, π(τΣ−1/2µ˜′) π(Σ−1/2µ˜′). w ≤ From this inequality, E [ N(z˜;µ∗,Σ )log N(z˜;µ˜′,Σ )π(τΣ−1/2µ˜′)dµ˜′dz˜] Σ,Σ˜ w w w Z Z E [ N(z˜;µ∗,Σ )log N(z˜;µ˜′,Σ )π(Σ−1/2µ˜′)dµ˜′dz˜] ≤ Σ,Σ˜ w w Z Z = ϕw (µ∗) πΣ = ϕw (µ) (13) πΣ In particular, if π is not constant, inequality (12) holds strictly. Therefore, p domi- Σ nates p . (cid:3) I From Lemma 2.3, p is proved to be minimax. Σ Corollary 2.6 Assume d 3. Let p(Σ) and p˜(Σ˜) be rotation invariant continuous functions. If π is a ≥ rotation invariant superharmonic prior, Bayesian predictive density p (y˜y) is minimax Σ | under . KL R Theorem 2.5 and Corollary 2.6 can be generalized to the case with a semi-positive definite future covariance matrix Σ˜. Let Σ˜ be a d-dimensional semi-positive matrix whose rank is k > 0. Then there is a d k matrix L satisfying Σ˜ = LL⊤. Let a d−k be a × { i}i=1 set of orthogonal normalized vectors that are orthogonal to each column vector of L, i.e. L⊤a = 0 and a⊤a = δ for i,j = 1,...,d k. Define the Normal distribution with i i j ij − semi-positive definite covariance matrix by 1 (y µ)⊤Σ˜†(y µ) d−k N (y;µ,Σ˜) = exp − − δ(a⊤(y µ)) d (2π)k/2 L⊤L 1/2 − 2 i − ! | | i=1 Y where Σ˜† is Moore-Penrose pseudo-inverse of Σ˜. From the results of functional analysis, N (y;µ,Σ) for any semi-positive definite Σ˜ is d equivalent to lim N (y;µ,Σ+ǫI ) as a functional on Schwartz functions of y. ǫ→0 d d Using this equivalence and the bounded convergence theorem, equation (7) is valid for a semi-definite future covariance matrix if we define Σ := (Σ−1+Σ˜†)−1. Because Σ˜† = 0, w 6 τ defined by (11) takes value in (0,1). Therefore, Theorem 2.5 and Corollary 2.6 hold for each semi-definite future covariance matrix Σ˜. 3 Prior distributions depending on the future covari- ance In this section, we consider prior distributions depending on the future covariance matrix. Theorem 3.2 below says that every Bayesian prediction with an adequately metrized 8 prior dominates that based on the uniform prior. Although the assumption that priors can depend on the future covariance may seem strange, this assumption is natural when we consider the linear regression problem, as we will see in Section 4. First, we generalize Theorem 2.4 to the case with non-identity covariances. Let µ and z be vectors in Rd and let Σ Rd×d be a positive definite matrix. ∈ Let Σ and Σ be positive definite matrices such that Σ Σ . An orthogonal 1 2 1 2 matrix U and a diagonal matrix Λ are given by a diagonalizati(cid:22)on of Σ1/2Σ−1Σ1/2, i.e. 1 2 1 Σ1/2Σ−1Σ1/2 = U⊤ΛU. Let A∗ := Σ1/2U⊤(Λ−1 I )1/2. 1 2 1 1 − d Proposition 3.1 If π is a prior s.t. π(A∗µ) is a superharmonic function of µ, then φ (µ,Σ ) φ (µ,Σ ) (14) π 1 π 2 ≥ for any µ Rd. Inequality (14) becomes strict if π is not a constant function. ∈ The following theorem is a direct result of Proposition 3.1. Theorem 3.2 If π(A∗µ) is a superharmonic function of µ, then R (π,µ) R (π ,µ). KL KL I ≤ Furthermore, if π is not a constant function, a Bayesian predictive distribution p domi- π nates the one with the uniform prior π . I Note that π(A∗µ) can be superharmonic only if rank(Σ Σ ) 3. 2 1 − ≥ ProofofProposition3.1and Theorem 3.2. Assume0 Σ Σ andletΣ1/2Σ−1Σ1/2 = ≺ 1 (cid:22) 2 1 2 1 U⊤ΛU be a diagonalization. Then, 1 (x ν)⊤Σ−1(x ν) φ (µ,Σ) = log π(ν) exp − − dν π (2π)d/2 Σ 1/2 − 2 Z (cid:26)Z | | (cid:18) (cid:19) (cid:27) 1 (x µ)⊤Σ−1(x µ) exp − − dx. (2π)d/2 Σ 1/2 − 2 | | (cid:18) (cid:19) Let x˜ := UΣ−1/2x, µ˜ = UΣ−1/2µ, and ν˜ = UΣ−1/2ν. By Σ −1/2 Σ 1/2 = Λ 1/2, 2 1 | | | | | | 1 (x˜ ν˜)⊤Λ−1(x˜ ν˜) φ (µ,Σ ) = log π(Σ1/2U⊤ν˜) exp − − dν˜ π 2 (2π)d/2 Λ 1/2 − 2 Z (cid:26)Z | | (cid:18) (cid:19) (cid:27) 1 (x˜ µ˜)⊤Λ−1(x˜ µ˜) exp − − dx˜ (2π)d/2 Λ 1/2 − 2 | | (cid:18) (cid:19) = φ (µ˜,Λ−1), (15) π(Σ1/2U⊤·) 1 whereπ(Σ1/2U⊤ )isapriordistributionwhosedensityfunctionisrepresentedbyπ(Σ1/2U⊤µ) 1 · 1 with a prior density π(µ). Putting Σ = Σ , we get 2 1 φ (µ,Σ ) = φ (µ˜,I ), (16) π 1 π(Σ1/2U⊤·) d 1 9 where I is the d-dimensional identity matrix. d We denote each diagonal component of Λ by λ . Now 0 < λ 1 for each i since i i ≤ Σ Σ . Let a (t) := 1+t(λ−1 1) and A := diag(a ). Then 1 (cid:22) 2 i i − i φ (µ,Σ ) φ (µ,Σ ) π 2 π 1 − = φ (µ˜,Λ−1) φ (µ˜,I ) π(Σ11/2U⊤·) − π(Σ11/2U⊤·) d 1 d ∂a (t) ∂ i = φ (µ˜,A) dt ∂t ∂a π(Σ1/2U⊤·) Zt=0 Xi=1 i 1 (cid:12)(cid:12)ai(t) (cid:12) 1 d ∂a˜ (t) ∂ (cid:12) i ˜ (cid:12) = φπ(A∗·)(µˆ,A) dt ∂t ∂a˜ Zt=0 Xi=1 i (cid:12)(cid:12)a˜i(t) (cid:12) 1 d ∂ (cid:12) = φπ(A∗·)(µˆ,A˜) dt(cid:12) ∂a˜ Zt=0 Xi=1 i (cid:12)(cid:12)a˜i(t) (cid:12) (cid:12) where a˜ := (λ−1 1)−1a and µˆ := (Λ−1 I )−1/2µ˜(cid:12). i i − i − d By assumption, π(A∗ ) for A∗ = Σ1/2U⊤(Λ−1 I )1/2 is superharmonic. Now it is · 1 − d (cid:3) sufficient to prove Lemma 3.3 iii) below. Lemma 3.3 i) Σd ∂ N(x;µ,A) = 1∆N(x;µ,A). i=1∂ai 2 ii) f(x t)dµ(t) is a superharmonic function of x if f is a superharmonic function and − µ is a positive measure on Rd. R iii) d ∂ φ (µ,A) 0 for any µ Rd, a > 0, and A = diag(a ) for each superhar- i=1 ∂ai π ≤ ∈ i i monic prior π. P Proof of Lemma 3.3. Lemma i) follows from direct calculation. For a proof of ii), see Problem 1.7.16 of Lehmann & Casella (1998). d d ∂ ∂ φ (µ,A) = log π(ν)N(x;ν,A)dν N(x;µ,A)dx π ∂a ∂a } i=1 i i=1 i Z (cid:26)Z (cid:27) X X d ∂ π(ν)N(x;ν,A)dν = i=1 ∂ai N(x;µ,A)dx π(ν)N(x;ν,A)dν Z P R d R ∂ + log π(ν)N(x;ν,A)dν N(x;µ,A)dx. (17) } ∂a Z (cid:26)Z (cid:27) i=1 i X Now, d ∂ 1 π(ν)N(x;ν,A)dν = ∆ π(ν)N(x;ν,A)dν 0 ∂a 2 ≤ i i=1 Z Z X 10