ebook img

DTIC ADA477302: An Inverse Problem Statistical Methodology Summary PDF

1.7 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview DTIC ADA477302: An Inverse Problem Statistical Methodology Summary

An Inverse Problem Statistical Methodology Summary H.T. Banks, M. Davidian, J.R. Samuels, Jr. and Karyn L.Sutton Center for Research in Scientific Computation and Center for Quantitative Sciences in Biomedicine North Carolina State University Raleigh, NC 27695-8205 and Department of Mathematics & Statistics Arizona State University Tempe, AZ, 85287-1804 January 12, 2008 Abstract We discuss statistical and computational aspects of inverse or parameter estima- tion problems based on Ordinary Least Squares and Generalized Least Squares with appropriate corresponding data noise assumptions of constant variance and noncon- stant variance (relative error), respectively. Among the topics included here are math- ematicalmodel, statisticalmodelanddataassumptions, andsometechniques(residual plots, sensitivity analysis, model comparison tests) for verifying these. The ideas are illustrated throughout with the popular logistic growth model of Verhulst and Pearl as well as with a recently developed population level model of pneumococcal disease spread. Keywords: Inference, least squares inverse problems, parameter estimation, sensi- tivity and generalized sensitivity functions. 1 Introduction In this Chapter we discuss mathematical and statistical aspects of inverse or parameter estimation problems. While we briefly discuss maximum likelihood estimators (MLE), our focusherewillbeonordinary least squares(OLS)andgeneralized least squares(GLS)estima- tion formulations and issues related to use of these techniques in practice. While we choose a 1 Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden, to Washington Headquarters Services, Directorate for Information Operations and Reports, 1215 Jefferson Davis Highway, Suite 1204, Arlington VA 22202-4302. Respondents should be aware that notwithstanding any other provision of law, no person shall be subject to a penalty for failing to comply with a collection of information if it does not display a currently valid OMB control number. 1. REPORT DATE 3. DATES COVERED 12 JAN 2008 2. REPORT TYPE 00-00-2008 to 00-00-2008 4. TITLE AND SUBTITLE 5a. CONTRACT NUMBER An Inverse Problem Statistical Methodology Summary 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) 8. PERFORMING ORGANIZATION North Carolina State University,Center for Research in Scientific REPORT NUMBER Computation,Raleigh,NC,27695-8205 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR/MONITOR’S ACRONYM(S) 11. SPONSOR/MONITOR’S REPORT NUMBER(S) 12. DISTRIBUTION/AVAILABILITY STATEMENT Approved for public release; distribution unlimited 13. SUPPLEMENTARY NOTES 14. ABSTRACT We discuss statistical and computational aspects of inverse or parameter estimation problems based on Ordinary Least Squares and Generalized Least Squares with appropriate corresponding data noise assumptions of constant variance and nonconstant variance (relative error), respectively. Among the topics included here are mathematical model, statistical model and data assumptions, and some techniques (residual plots, sensitivity analysis, model comparison tests) for verifying these. The ideas are illustrated throughout with the popular logistic growth model of Verhulst and Pearl as well as with a recently developed population level model of pneumococcal disease spread. 15. SUBJECT TERMS 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18. NUMBER 19a. NAME OF ABSTRACT OF PAGES RESPONSIBLE PERSON a. REPORT b. ABSTRACT c. THIS PAGE Same as 56 unclassified unclassified unclassified Report (SAR) Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-18 general nonlinear ordinary differential equation mathematical model to discuss concepts and ideas, the discussions are also applicable to partial differential equation models and other deterministic dynamical systems. As we shall explain, the choice of an appropriate statisti- cal model is of critical importance, and we discuss at length the difference between constant variance and nonconstant variance noise in the observation process, the consequences for incorrect choices in this regard, and computational techniques for investigating whether a good decision has been made. In particular, we illustrate use of residual plots to suggest whether or not a correct statistical model has been specified in an inverse problem formu- lation. We illustrate these and other techniques with examples including the well known Verhulst-Pearl logistic population model and a specific epidemiological model (a pneumo- coccal disease dynamics model). We discuss the use of sensitivity equations coupled with the asymptotic theory for sampling distributions and the computation of associated covariances, standard errors and confidence intervals for the estimators of model parameters. We also discuss sensitivity functions (traditional and generalized) and their emerging use in design of experiments for data specific to models and mechanism investigation. Traditional sensitivity involves sensitivity of outputs to parameters while the recent concept of generalized sensi- tivity in inverse problems pertains to sensitivity of parameters (to be estimated) to data or observations. That is, generalized sensitivity quantifies the relevance of data measurements for identification of parameters in a typical parameter estimation problem. In a final section we present and illustrate some methods for model comparison. 2 Parameter Estimation: MLE, OLS, and GLS 2.1 The Underlying Mathematical and Statistical Models We consider inverse or parameter estimation problems in the context of a parameterized ~ (with vector parameter θ) dynamical system or mathematical model d~x ~ (t) =~g(t,~x(t),θ) (1) dt with observation process ~ ~y(t) = ~x(t;θ). (2) C Following usual convention (which agrees with the data usually available from experiments), we assume a discrete form of the observations in which one has n longitudinal observations corresponding to ~ ~y(t ) = ~x(t ;θ), j = 1,...,n. (3) j j C In general the corresponding observations or data ~y will not be exactly ~y(t ). Because of j j { } the nature of the phenomena leading to this discrepancy, we treat this uncertainty pertaining to the observations with a statistical model for the observation process. 2 2.2 Description of Statistical Model In our discussions here we consider a statistical model of the form ~ ~ ~ Y = f(t ,θ )+~ǫ , j = 1,...,n, (4) j j 0 j ~ ~ ~ where f(t ,θ) = ~x(t ;θ), j = 1,...,n, corresponds to the solution of the mathematical j j C model (1) at the jth covariate for a particular vector of parameters θ~ Rp,~x RN,f~ Rm, ∈ ∈ ∈ ~ and is an m N matrix. The term θ represents the “truth” or the parameters that 0 generCate the ob×servations Y~ n . (The existence of a truth parameter θ~ is standard in { j}j=1 0 statistical formulations and this along with the assumption that the means E[~ǫ ] are zero j yieldsimplicitlythatthe(1)isacorrectdescriptionoftheprocessbeingmodeled.) Theterms ~ǫ are random variables which can represent measurement error, “system fluctuations” or j ~ ~ other phenomena that cause observations to not fall exactly on the points f(t ,θ) from the j ~ ~ smooth path f(t,θ). Since these fluctuations are unknown to the modeler, we will assume~ǫ j isgeneratedfromaprobabilitydistribution(withmeanzerothroughoutourdiscussions)that reflects the assumptions regarding these phenomena. For instance, in a statistical model for pharmacokineticsofdruginhumanbloodsamples, anaturaldistributionfor~ǫ = (ǫ ,...,ǫ )T 1 n mightbeamultivariatenormaldistribution. Inotherapplicationsthedistributionfor~ǫmight be much more complicated [22]. The purpose of our presentation here is to discuss methodology related to the estima- ~ tion of the true value of the parameters θ from a set Θ of admissible parameters, and its 0 dependence on what is assumed about the variance var(~ǫ ) of the error ~ǫ . We discuss two j j ˆ ~ inverse problem methodologies that can be used to calculate estimates θ for θ : the ordinary 0 least-squares (OLS) and generalized least-squares (GLS) formulations as well as the popular maximum likelihood estimate (MLE) formulation in the case one assumes the distributions of the error process ~ǫ are known. j { } 2.3 Known error processes: Normally distributed error In the introduction of the statistical model we initially made no mention of the probability distribution that generates the error ~ǫ . In many situations one readily assumes that the j errors ~ǫ = 1,...,n, are independent and identically distributed (we make the standing j assumptions of independence across j throughout our discussions in this Chapter). We discuss a case where one is able to make further assumptions on the error, namely that the distribution is known. In this case, maximum likelihood techniques may be used. We discuss first one such case for a scalar observation system, i.e., m = 1. If, in addition, there is sufficient evidence to suspect the error is generated by a normal distribution then we may be willing to assume ǫ (0,σ2), and hence Y (f(t ,θ~ ),σ2). We can then obtain an j ∼ N 0 j ∼ N j 0 0 expression for determining θ~ and σ by seeking the maximum over (θ~,σ2) Θ (0, ) of 0 0 ∈ × ∞ 3 ~ the likelihood function for ǫ = Y f(t ,θ) which is defined by j j j − n 1 1 L(θ~,σ2 Y~) = exp [Y f(t ,θ~)]2 . (5) | √2πσ2 {−2σ2 j − j } j=1 Y Theresultingsolutionsθ andσ2 arethemaximumlikelihoodestimators(MLEs)forθ~ MLE MLE 0 and σ2, respectively. We point out that these solutions θ = θn (Y~) and σ2 = σ2n (Y~) 0 MLE MLE MLE MLE ~ are random variables by virtue of the fact that Y is a random variable. The corresponding maximum likelihood estimates are obtained by maximizing (5) with Y~ = (Y ,...,Y )T 1 n replaced by a given realization ~y = (y ,...,y )T and will be denoted by θˆ = θˆn and 1 n MLE MLE σˆ = σˆn respectively. Inourdiscussionshereandbelow, almosteveryquantityofinterest MLE MLE is dependent on n, the size of the set of observations or the sampling size. On occasion we will express this dependence explicitly by use of superscripts or subscripts, especially when we wish to remind the reader of this dependence. However, for notational convenience we will often suppress the notation of explicit dependence on n. Maximizing (5) is equivalent to maximizing the log likelihood n n n 1 logL(θ~,σ2 Y~) = log(2π) logσ2 [Y f(t ,θ~)]2. (6) | −2 − 2 − 2σ2 j − j j=1 X We determine the maximum of (6) by differentiating with respect to θ~ (with σ2 fixed) and with respect to σ2 (with θ~ fixed), setting the resulting equations equal to zero and solving for θ~ and σ2. With σ2 fixed we solve ∂ logL(θ~,σ2 Y~) = 0 which is equivalent to ∂θ~ | n ~ ~ [Y f(t ,θ)] f(t ,θ) = 0, (7) j j j − ∇ j=1 X where as usual f = ∂ f = f . We see that solving (7) is the same as the least squares ∇ ∂θ~ θ~ optimization n θ (Y~) = argminJ(Y~,θ~) = argmin [Y f(t ,θ~)]2. (8) MLE θ~∈Θ θ~∈Θ j − j j=1 X We next fix θ~ to be θ and solve ∂ logL(θ ,σ2 Y~) = 0, which yields MLE ∂σ2 MLE | 1 σ2 (Y~) = J(Y~,θ ). (9) MLE n MLE Note that we can solve for θ and σ2 separately – a desirable feature, but one that does MLE MLE not arise in more complicated formulations discussed below. The 2nd derivative test (which is omitted here) verifies that the expressions above for θ and σ2 do indeed maximize MLE MLE (6). 4 If, however, we have a vector of observations for the jth covariate t then the statistical j model is reformulated as ~ ~ ~ Y = f(t ,θ )+~ǫ (10) j j 0 j where f~ Rm and ∈ V = var(~ǫ ) = diag(σ2 ,...,σ2 ) (11) 0 j 0,1 0,m for j = 1,...,n. In this setting we have allowed for the possibility that the observation coordinates Yi may have different constant variances σ2 , i.e., σ2 does not necessarily have j 0,i 0,i to equal σ2 . If (again) there is sufficient evidence to claim the errors are independent and 0,k identically distributed and generated by a normal distribution then ǫ~ (0,V ). We thus j m 0 ∼ N ~ ~ can obtain the maximum likelihood estimators θ ( Y ) and V ( Y ) for θ and V by MLE { j} MLE { j} 0 0 ~ ~ ~ determining the maximum of the log of the likelihood function for~ǫ = Y f(t ,θ) defined j j j − by m m n n 1 1 logL(θ~,V Y1,...,Ym ) = logσ2 [Yi fi(t ,θ~)]2 |{ j j } −2 0,i − 2 σ2 j − j i=1 i=1 0,i j=1 X X X m n n = logσ2 [Y~ f~(t ,θ~)]TV−1[Y~ f~(t ,θ~)]. −2 0,i − j − j j − j i=1 j=1 X X Using arguments similar to those given for the scalar case, we determine the maximum ~ likelihood estimators for θ and V to be 0 0 n θ = argmin [Y~ f~(t ,θ~)]TV−1[Y~ f~(t ,θ~)] (12) MLE θ~∈Θ j − j MLE j − j j=1 X n 1 V = diag [Y~ f~(t ,θ )][Y~ f~(t ,θ )]T . (13) MLE n j − j MLE j − j MLE ! j=1 X Unfortunately, this is a coupled system, which requires some care when solving numerically. We will discuss this issue further in Sections 2.4.2 and 2.4.5 below. 2.4 Unspecified Error Distributions and Asymptotic Theory ~ In Section 2.3 we examined the estimates of θ and V under the assumption that the error 0 0 is normally distributed, independent and constant longitudinally. But what if it is suspected thattheerrorisnotnormallydistributed, ortheerrordistributionisunknowntothemodeler ~ beyond the assumptions on E[Y ] embodied in the model and the assumptions made on j ~ var(~ǫ ) (as in most applications)? How should we proceed in estimating θ and σ (or V ) j 0 0 0 in these circumstances? In this section we will review two estimation procedures for such situations: ordinary least squares (OLS) and generalized least squares (GLS). 5 2.4.1 Ordinary Least Squares (OLS) The statistical model in the scalar case takes the form ~ Y = f(t ,θ )+ǫ (14) j j 0 j where the variance var(ǫ ) = σ2 is assumed constant in longitudinal data (note that the j 0 error’s distribution is not specified). We also note that the assumption that the observation errors are uncorrelated across j (i.e., time) may be a reasonable one when the observations are taken with sufficient intermittency or when the primary source of error is measurement error. If we define n θ (Y~) = θn (Y~) = argmin [Y f(t ,θ~)]2 (15) OLS OLS θ~∈Θ j − j j=1 X then θ can be viewed as minimizing the distance between the data and model where all OLS observationsaretreatedasofequalimportance. Wenotethatminimizingin(15)corresponds ~ to solving for θ in n ~ ~ [Y f(t ,θ)] f(t ,θ) = 0. (16) j j j − ∇ j=1 X ~ We point out that θ is a random variable (ǫ = Y f(t ,θ) is a random variable); hence OLS j j − j if y n is a realization of the random process Y n then solving { j}j=1 { j}j=1 n θˆ = θˆn = argmin [y f(t ,θ~)]2 (17) OLS OLS θ~∈Θ j − j j=1 X provides a realization for θ . (A remark on notation: for a random variable or estimator θ OLS ˆ we will always denote a corresponding realization or estimate with an over hat, e.g., θ is an estimate for θ.) Noting that n 1 σ2 = E[ [Y f(t ,θ~ )]2] (18) 0 n j − j 0 j=1 X suggests that once we have solved for θ in (15), we may obtain an estimate σˆ2 = σˆ2n OLS OLS MLE for σ2. 0 Even though the error’s distribution is not specified we can use asymptotic theory to approximate the mean and variance of the random variable θ [31]. As will be explained OLS in more detail below, as n , we have that → ∞ θ = θn (θ~ ,Σn) (θ~ ,σ2[χnT(θ~ )χn(θ~ )]−1) (19) OLS OLS ∼ Np 0 0 ≈ Np 0 0 0 0 6 where the sensitivity matrix χ(θ~) = χn(θ~) = χn is defined as { jk} ~ ∂f(t ,θ) χn (θ~) = j , j = 1,...,n, k = 1,...,p, jk ~ ∂θ k and Σn σ2[nΩ ]−1 (20) 0 ≡ 0 0 with 1 Ω lim χnT(θ~ )χn(θ~ ), (21) 0 0 0 ≡ n→∞ n where the limit is assumed to exist–see [31]. However, θ~ and σ2 are generally unknown, so 0 0 one usually will instead use the realization ~y = (y ,...,y )T of the random process Y~ to 1 n obtain the estimate n θˆ = argmin [y f(t ,θ~)]2 (22) OLS θ~∈Θ j − j j=1 X and the bias adjusted estimate n 1 σˆ2 = [y f(t ,θˆ)]2 (23) OLS n p j − j − j=1 X to use as an approximation in (19). We note that (23) represents the estimate for σ2 of (18) with the factor 1 replaced by 0 n the factor 1 (in the linear case the estimate with 1 can be shown to be biased downward n−p n and the same behavior can be observed in the general nonlinear case– see Chap. 12 of [31] and p. 28 of [22]). We remark that (18) is true even in the general nonlinear case (it does not rely on any asymptotic theories although it does depend on the assumption of constant variance being correct). Both θˆ= θˆ and σˆ2 = σˆ2 will then be used to approximate the covariance matrix OLS OLS Σn Σˆn σˆ2[χnT(θˆ)χn(θˆ)]−1. (24) 0 ≈ ≡ ˆ WecanobtainthestandarderrorsSE(θ )(discussedinmoredetailinthenextsection)for OLS,k the kth element of θˆ by calculating SE(θˆ ) Σˆn . Also note the similarity between OLS OLS,k ≈ kk the MLE equations (8) and (9), and the scalar OLSqequations (22) and (23). That is, under a normality assumption for the error, the MLE and OLS formulations are equivalent. If, however, we have a vector of observations for the jth covariate t and we assume the j variance is still constant in longitudinal data, then the statistical model is reformulated as ~ ~ ~ Y = f(t ,θ )+~ǫ (25) j j 0 j 7 where f~ Rm and ∈ V = var(~ǫ ) = diag(σ2 ,...,σ2 ) (26) 0 j 0,1 0,m for j = 1,...,n. Just as in the MLE case we have allowed for the possibility that the observation coordinates Yi may have different constant variances σ2 , i.e. σ2 does not j 0,i 0,i necessarily have to equal σ2 . We note that this formulation also can be used to treat the 0,k case where V is used to simply scale the observations, i.e., V = diag(v ,...,v ) is known. 0 0 1 m In this case the formulation is simply a vector OLS (sometimes also called a weighted least squares (WLS)). The problem will consist of finding the minimizer n θ = argmin [Y~ f~(t ,θ~)]TV−1[Y~ f~(t ,θ~)], (27) OLS θ~∈Θ j − j 0 j − j j=1 X ~ ~ ~ where the procedure weights elements of the vector Y f(t ,θ) according to their variability. j j − (Some authors refer to (27) as a generalized least squares (GLS) procedure, but we will make use of this terminology in a different formulation in subsequent discussions). Just as in the ~ ~ ~ scalar OLS case, θ is a random variable (again because ~ǫ = Y f(t ,θ) is); hence if OLS j j − j ~y n is a realization of the random process Y~ n then solving { j}j=1 { j}j=1 n θˆ = argmin [~y f~(t ,θ~)]TV−1[~y f~(t ,θ~)] (28) OLS θ~∈Θ j − j 0 j − j j=1 X ˆ ˆ provides an estimate (realization) θ = θ for θ . By the definition of variance OLS OLS n 1 V = diag E [Y~ f~(t ,θ~ )][Y~ f~(t ,θ~ )]T , 0 j j 0 j j 0 n − − ! j=1 X so an unbiased estimate of V for the realization ~y n is 0 { j}j=1 n 1 Vˆ = diag [~y f~(t ,θˆ)][~y f~(t ,θˆ)]T . (29) j j j j n p − − ! − j=1 X ˆ However, the estimate θ requires the (generally unknown) matrix V and V requires the 0 0 ~ ˆ ˆ unknown vector θ so we will instead use the following expressions to calculate θ and V: 0 n θ~ θˆ= argmin [~y f~(t ,θ~)]TVˆ−1[~y f~(t ,θ~)] (30) 0 j j j j ≈ θ~∈Θ − − j=1 X n 1 V Vˆ = diag [~y f~(t ,θˆ)][~y f~(t ,θˆ)]T . (31) 0 j j j j ≈ n p − − ! − j=1 X 8 ˆ ˆ Note that the expressions for θ and V constitute a coupled system of equations, which will require greater effort in implementing a numerical scheme. JustasinthescalarcasewecandeterminetheasymptoticpropertiesoftheOLSestimator (27). As n , θ has the following asymptotic properties [22, 31]: → ∞ OLS θ (θ~ ,Σn), (32) OLS ∼ N 0 0 where n −1 Σn DT(θ~ )V−1D (θ~ ) , (33) 0 ≈ j 0 0 j 0 ! j=1 X and the m p matrix D (θ~) = Dn(θ~) is given by × j j ∂f1(tj,θ~) ∂f1(tj,θ~) ∂f1(tj,θ~) ∂θ1 ∂θ2 ··· ∂θp . . .  .. .. .. .  ∂fm(tj,θ~) ∂fm(tj,θ~) ∂fm(tj,θ~)   ∂θ1 ∂θ2 ··· ∂θp    ~ ˆ ˆ Since the true value of the parameters θ and V are unknown their estimates θ and V will 0 0 be used to approximate the asymptotic properties of the least squares estimator θ : OLS θ (θ~ ,Σn) (θˆ,Σˆn) (34) OLS ∼ Np 0 0 ≈ Np where n −1 Σn Σˆn = DT(θˆ)Vˆ−1D (θˆ) . (35) 0 ≈ j j ! j=1 X The standard errors can then be calculated for the kth element of θˆ (SE(θˆ )) by OLS OLS,k ˆ ˆ SE(θ ) Σ . Again, we point out the similarity between the MLE equations (12) OLS,k ≈ kk and (13), and the OLS equations (30) and (31) for the vector statistical model (25). p 2.4.2 Numerical Implementation of the OLS Procedure ˆ In the scalar statistical model (14), the estimates θ and σˆ can be solved for separately (this is also true of the vector OLS in the case V = σ2I , where I is the m m identity) and 0 0 m m ˆ × thus the numerical implementation is straightforward - first determine θ according to (22) OLS and then calculate σˆ2 according to (23). The estimates θˆ and Vˆ in the case of the vector OLS statistical model (25), however, require more effort since they are coupled: n θˆ= argmin [~y f~(t ,θ~)]TVˆ−1[~y f~(t ,θ~)] (36) j j j j θ~∈Θ − − j=1 X n 1 Vˆ = diag [~y f~(t ,θˆ)][~y f~(t ,θˆ)]T . (37) j j j j n p − − ! − j=1 X To solve this coupled system the following iterative process will be followed: 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.