Workshop on Mathematical Model Analysis and Evaluation Sassari, Italy, 2004 Assessment of the Adequacy of Mathematical Models Luis Orlindo Tedeschi1 Department of Animal Science Cornell University, Ithaca, NY 14850 1. Abstract Models are mathematical representations of mechanisms that govern natural phenomena that are not fully recognized, controlled, or understood. They are an abstraction of the reality. Models have become indispensable tools via decision support systems for policy makers and researchers to provide ways to express the scientific knowledge. Model usefulness has to be assessed through its sustainability for a particular purpose. Nonetheless, model testing is often designed to demonstrate the rightness of a model and the tests are typically presented as evidences to promote its acceptance and usability. Adequate statistical analysis is an indispensable step during all phases of model development and evaluation because a model without statistics is like a body without a soul. Therefore, in this paper we discussed and compared several techniques for mathematical model comparison. It was concluded the identification and acceptance of wrongness of a model is an important step towards the development of more reliable and accurate models. The assessment of the adequacy of mathematical models is only possible through the combination of several statistical analyses and proper investigation regarding the purposes for which the mathematical model was initially conceptualized and developed for. The use of few techniques may be misleading in selecting the appropriate model in a given scenario. Keywords: Modeling, Simulation, Evaluation, Validation, Adequacy 2. Introduction Models are mathematical representations of mechanisms that govern natural phenomena that are not fully recognized, controlled, or understood. Mathematical modeling has become an indispensable tool via computerized decision support systems for policy makers and researchers to provide ways to express the scientific knowledge, to lead to new discoveries, and/or to challenge old dogmas. The notion that all models are wrong but some are useful (Box, 1979) is bothersome at first sight. However, the understanding and acceptance of the wrongness and weaknesses of a model strengthens the modeling process, making it more resilient and powerful in all aspects during the development, evaluation, and revision phases. Rather than ignoring the fact that a model may fail, one should incorporate the failures of a model in the learning process and exploit the principles in developing an improved redesigned model. In systems thinking, the understanding that models are wrong and humility about the limitations of our knowledge is essential in creating an environment in which we can learn about the complexity of systems in which we are embedded (Sterman, 2002). Model testing is often designed to demonstrate the rightness of a model and the tests are typically presented as evidences to promote its acceptance and usability (Sterman, 2002). Models can be evaluated rather than validated in the strict sense of establishing truthfulness if one accepts the concept that all models are wrong. One should focus on the modeling process instead of model results per se. The modeling process encompasses several steps that begin with a clear statement of the objectives of the model, assumptions about the model boundaries, appropriateness of the available data, design of the model structure, evaluation of the simulations, and providing feedback for recommendations and redesign processes. 1 Email: [email protected] NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION Two procedures for model development are commonly used: (a.) develop conceptual ideas and interactions, and then perform the parameterization of the variables of the model with suitable data, or (b.) analyze experimental data that explain a biological phenomenon, and then combine them in a methodological manner. The former will generate a conceptual (or theoretical) model whereas the second one will be an empirical model. When developing a model, one should make the best decision despite the foreseeable confines of our scientific knowledge and current modeling techniques. Adequate statistical analysis is an indispensable step during all phases of model development simply because a model without (proper) statistics is like a body without a soul. A computer program containing the statistical procedures for model evaluation presented in this review may be downloaded at http://www.cncps.cornell.edu/modeval. 3. Evaluation of Mathematical Models The evaluation of model adequacy is an essential step of the modeling process because it indicates the level of precision and accuracy of the model predictions. This is an important phase either to build up confidence on the current model or to allow selection of alternative models. Forrester (1961) emphasizes that the validity of a mathematical model has to be judged by its sustainability for a particular purpose; that means, it is a valid and sound model if it accomplishes what is expected of it. Shaeffer (1980) developed a methodological approach to evaluate models that consisted of six tasks: (a.) model examination, (b.) algorithm examination, (c.) data evaluation, (d.) sensitivity analysis, (e.) validation studies, and (f.) code comparison studies. A comprehensive list of publications regarding model validation was compiled by Hamilton (1991). 3.1. Conflicts in the Modeling Terminology Terminology use and its interpretation are often discussed when considering the appropriateness of mathematical models. The concept of validation or verification has been strongly criticized because it is philosophically impossible to prove that all components of models of real systems are true or correct (Oreskes et al., 1994). Since models are an abstraction and representation of the reality, they can never fully mimic the reality under all conditions; and in fact, some models may be programmed to predict quantities or relationships that cannot be measured or observed in the real system; therefore, they cannot be validated. Harrison (1991) differentiates verification from validation as follows; verification is designed to ensure that a mathematical model performs as intended, while validation examines the broader question of whether the intended structure is appropriate. Similarly, Sterman (2000) indicates that validation means “having a conclusion correctly derived from premises” while verification implies “establishment of the truth, accuracy, or reality of”. Nonetheless, regardless of the definition adopted, none of these terms can support modeling validation or verification. Hamilton (1991) proposes that validation is used to assess the extent to which a model is rational and fulfills its purposes. It is comprised of three tasks: (1.) verification (design, programming, and checking processes of the computer program), (2.) sensitivity analysis (behavior of each component of the model), and (3.) evaluation (comparison of model outcomes with real data). There is the misuse of the word calibration. The concept of calibration means model fine tuning or fitting; it is the estimation of values of parameters or unmeasured variables using (proper) available information from the real system. Models can substantiate an assumption or hypothesis by offering evidence that would strengthen concepts that may be already fully or partly established through other means (Oreskes et al., 1994) such as experimentation or observational studies. Therefore, the terms evaluation or testing are proposed to indicate measurement of model adequacy (or robustness) based on pre-established criteria NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 2 of model performance acceptance such as functionality, accuracy, and precision for its intended purpose. Regardless of the objective of a model, a functional model is an abstraction of the reality and an approximation of the real system. Therefore, the evaluation phase is used to determine whether the model is an adequate representation for the process it was designed for rather than establishing the truth of the model in any absolute sense. Mathematical models cannot be proven valid, only whether it is appropriate for its intended purpose for given conditions. 3.2. Errors in the Modeling Process As shown in Table 1, the acceptance of a model is subjected to a selection process that may lead to an erroneous decision. The type I error (rejecting an appropriate model) is likely to occur when biased or incorrect observations are chosen to evaluate the model. For instance, these observations may result from the lack of adequate adjustment for random factors or covariate. A type II error (accepting an inappropriate model) happens when biased or incorrect observations were used during the model development phase, and assuming that independent incorrect observations were used to evaluate the model; therefore, the model will reflect it. The occurrence of a type II error can be as bad as or worse than the type I error. Both types of errors can be minimized with appropriate statistical analysis to detect model appropriateness and the use of unbiased observations during both the development and evaluation phases. Therefore, failure to prove a significant difference between observed and model-predicted values may only be due to insufficient replication or lack of power of the applied statistical test (Mayer and Butler, 1993) rather than the correctness of a particular model. Table 1. Two-way decision process in model evaluation Model Predictions Decision Correct Wrong Reject Type I Error (α) Correct (1 - β) Accept Correct (1 - α) Type II Error (β) The thorough evaluation of a mathematical model is a continuous and relentless process. The fact that one or a series of evaluation tests indicated the mathematical model is performing adequately implies nothing about the future predictions of the mathematical model regarding its prediction ability. On the contrary, as the redesign (or revision) of the mathematical model progresses, the evaluation phase becomes more crucial. 3.3. The Concepts of Accuracy and Precision Accuracy measures how closely model-predicted values are to the true values. Precision measures how closely individual model-predicted values are within each other. In other words, accuracy is the model’s ability to predict the right values and precision is the ability of the model to predict similar values consistently. Figure 1 illustrates the difference between accuracy and precision using the analogy of a target practice. Inaccuracy or bias is the systematic deviation from the truth; therefore, even though the points in case 1 (Figure 1) are more sparsely distributed than in case 2 (Figure 1), both are equally inaccurate (or biased) because they are not centered in the target bull’s eye. In contrast, imprecision or uncertainty refers to the magnitude of the scatter about the average mean; therefore, even though the points in case 3 and 4 (Figure 1) are equally accurate, case 3 has a higher imprecision than case 4 because the points in case 4 are more tightly grouped. Similarly, Figure 2 depicts the concepts illustrated in Figure 1 in a numerical form. NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 3 In Figure 2, the X-axis and Y-axis represent model-predicted and observed values, respectively. The regression estimate of coefficient of determination (r2) is a good indicator of precision: the higher the r2 the higher the precision. The regression estimates of the intercept and the slope are good indicators of accuracy; the simultaneously closer to zero and unity, respectively, the higher the accuracy. Therefore, precision and accuracy are two independent measures of prediction error of a model: high precision does not guarantee high accuracy. The question then becomes which measure is better? It could be argued that precision is not as important as accuracy because the true mean can be detected using an imprecise method simply by averaging a large number of data points (case 3 in Figure 1). On the other hand, it could be argued that precision is more important than accuracy because the true mean is irrelevant when detecting differences among model predictions. It is easy to understand why accuracy could be argued as the most important measure because it measures the ability of a model to predict true values. The judgment of model adequacy always has to be made on relative comparisons and it is subject to the suitability of the model given its objectives. The lack of accuracy shown in case 2 (Figure 2) is also known as translation discrepancy (Gauch et al., 2003); despite the slope being equal to unity, the intercept is different from zero. Case 3 Case 4 Figure 1. Schematization of accuracy versus precision. Case 1 is inaccurate and imprecise, y c case 2 is inaccurate and precise, case 3 is a ur accurate and imprecise, and case 4 is accurate c c A and precise. Case 1 Case 2 Precision Case 3 Case 1 3 3 2 2 1 1 y = x y = 0.5x + 0.8333 r2 = 0.5714 r2 = 0.1071 0 0 0 1 2 3 0 1 2 3 NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 4 Case 4 Case 2 3 3 2 2 1 1 y = x y = x + 1 r2 = 1 r2 = 1 0 0 0 1 2 3 0 1 2 3 Figure 2. Comparison of accuracy and precision measures. Case 1 is inaccurate and imprecise, case 2 is inaccurate and precise, case 3 is accurate and imprecise, and case 4 is accurate and precise. The dotted line represents the Y = X line. 4. Techniques for Model Evaluation 4.1. Analysis of Linear Regression Table 2 shows the mathematical notations used throughout this paper. Note that X is the input variable used in the mathematical model. The model-predicted values ( f(X ,...,X ) ) are plotted in the X-axis while the observed values 1 p i are plotted in the Y-axis because the observed values contain natural variability whereas the model- predicted values are deterministic with no random variation (Harrison, 1990; Mayer and Butler, 1993; Mayer et al., 1994). In this plot format, data points lying below and above the Y = X line indicate over- and under-prediction by the mathematical model, respectively. In contrast, Analla (1998) states that both variables (Y and f(X ,...,X ) ) are random; hence, it does not matter which variable is to be i 1 p i regressed on the other. Stochastic models can be re-run many times, decreasing the error of the mean value, which in practice does not invalidate the linear regression technique (Gunst and Mason, 1980). However, as discussed above, unless the mathematical model is considered stochastic (or probabilistic), the model-predicted values are fixed, deterministic, and associated with no variance; consequently, Y i should be regressed on f(X ,...,X ) . 1 p i Table 2. Mathematical notations, acronyms, and variable descriptions Notations Description Y This is the ith observed or measured value i Y Mean of the observed (or measured) values ˆ Predicted values by the statistical regression model Y f(X ,...,X ) This is the ith model-predicted (or simulated) value 1 p i f (X ,...,X ) Mean of model-predicted (or simulated) values 1 p X ,...,X Input variables for the mathematical model 1 p Deviation It is the difference between model-predicted and observed values (Y − f(X ,...,X ) ) i 1 p i Residue (or error) It is the difference between observed and regression- ˆ predicted values (Y −Y ) i i NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 5 Several quantitative approaches involving statistical analysis have been used to evaluate model adequacy. A linear regression between observed and predicted values is commonly used. The hypothesis is that the regression passes through the origin and has a slope of unity (Dent and Blackie, 1979). Nonetheless, the use of the least-square method to derive a linear regression of observed on model-predicted values for model evaluation has little interest since the predicted value is useless in evaluating the mathematical model; therefore, the r2 is irrelevant since one does not intend to make predictions from the fitted line (Mitchell, 1997). Additionally, necessary assumptions have to be considered when performing a linear regression: (a.) the X-axis values are known without errors; this is true if and only if the model is deterministic and the model-predicted values are used in the X-axis, (b.) the Y-axis values have to be independent, random, and homoscedasticity, and (c.) residuals are independent and identically distributed ~ (cid:96)(0,σ2). Shapiro-Wilk’s W test is frequently used to ensure normal distribution of the data (Shapiro and Wilk, 1965). Equation [1] has the general format of a first-order equation representing a linear regression used for model evaluation. Y =β +β× f(X ,...,X ) +ε i 0 1 1 p i i [1] Where Y is the ith observed value, f(X ,...,X ) is the ith model-predicted value, β and β are the i 1 p i 0 1 regression parameters for the intercept and the slope, respectively; and ε is the ith random error (or i residue) that is independent and identically distributed ~ (cid:96)(0,σ2). The estimators of the regression parameters β and β (Equation [1]) are computed using the 0 1 least-square method (Neter et al., 1996) in which the square of the difference between expected or ˆ regression-predicted value (Y ) and the observed value (Y) is minimized, as shown in Equation [2]. Therefore, the estimators of regression parameters β and β are those values a and b that minimize Q 0 1 (Equation [2]). Q=∑n (Y −β −β× f(X ,...,X ) )2 =∑n (Y −Yˆ)2 i 0 1 1 p i i i i=1 i=1 [2] The values of a and b can be algebraically computed by a sequence of calculations. The mean for observed and model-predicted values are computed as shown in Equation [3] and [4], respectively. n ∑Y i Y = i=1 n [3] n ∑ f(X ,...,X ) 1 p i f (X ,...,X )= i=1 1 p n [4] The slope (b) and the intercept (a) of the linear regression are computed as shown in Equation [5] and [6], respectively. n ∑⎡(Y −Y)×( f(X ,...,X ) − f (X ,...,X ))⎤ ⎣ i 1 p i 1 p ⎦ b= i=1 n ∑( f(X ,...,X ) − f (X ,...,X ))2 1 p i 1 p i=1 [5] a=Y −b× f (X ,...,X ) 1 p [6] The sample standard deviation for Y and f(X ,...,X )are computed as indicated in Equations 1 p [7] and [8], respectively. NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 6 ∑n (Y −Y)2 i s = i=1 Y n-1 [7] n ∑( f(X ,...,X ) − f (X ,...,X ))2 1 p i 1 p s = i=1 f(X1,...,Xp) n-1 [8] The (co)variance is computed as indicated in Equation [9]. s = 1 ∑n ⎡(Y −Y)×( f(X ,...,X ) − f (X ,...,X ))⎤ f(X1,...,Xp)Y n-1 ⎣ i 1 p i 1 p ⎦ i=1 [9] The Person’s coefficient of correlation (r) is computed as indicated in Equation [10]. The coefficient of determination (r2) is computed as r×r using Equation [10]. ∑n (Y -Yˆ)2 i i s r =1− i=1 = f(X1,...,Xp)Y ∑n (Y -Y)2 sY ×sf(X ,...,X ) 1 p i i=1 [10] The r coefficient, and its counterpart r2, requires careful interpretation. There are several misunderstandings surrounding this statistic as discussed by Neter et al. (1996): (a.) a high coefficient of correlation does not indicate that useful predictions can be made by a given mathematical model since it measures precision and not accuracy, (b.) a high coefficient of correlation does not imply that the estimated regression line is a good fit of the model prediction since the relation can be curvilinear, and (c.) an r near zero does not indicate that observed and model-predicted values are not correlated because they may have a curvilinear relationship. The unbiased estimator of the variance of the random error (σ2) is the mean square error, or residual mean square, or standard error of the estimate (MSE or s ), which is computed using y.x Equation [11]. Analla (1998) proposes the proof of model validity should be based on the MSE when comparing several mathematical models. ∑n (Y −Yˆ)2 i i MSE = i=1 n-2 [11] Note that MSE can be computed from the estimates of r (Equation [10]) and s2 (Equation [7]) Y with adjustments for degrees of freedom as shown in Equation [12]. s2×(n−1)×(1−r2) Y MSE = n−2 [12] The standard deviations of the intercept and slope estimates can be computed as shown in Equations [13] and [14], respectively. ⎛ ⎞ ⎜1 f (X ,...,X )2 ⎟ s = MSE×⎜ + 1 p ⎟ a ⎜n ∑n ( f(X ,...,X ) − f (X ,...,X ))2 ⎟ ⎜ 1 p i 1 p ⎟ ⎝ ⎠ i=1 [13] NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 7 MSE s = b n ∑( f(X ,...,X ) − f (X ,...,X ))2 1 p i 1 p i=1 [14] The studentized statistics of the intercept and slope is t-distributed with (n – 2) degrees of freedom. Therefore, the confidence interval to check for intercept ≠ 0 or slope ≠ 1 can be computed as: a±t ×s and (b−1)±t ×s , respectively. (1−α/2,n−2) a (1−α/2,n−2) b Unfortunately, the t-test for the slope has an ambiguous interpretation because the more scatter in the data points, the greater is the standard error of the slope, the smaller is the computed value for the statistical test and therefore, the harder it is to reject the null hypotheses, which states the slope is equal to unity. Therefore, the test can fail to reject the null hypothesis either because the slope is really not different from unity or there is much scatter around the line (Harrison, 1990). Alternatively, the confidence interval should be used to investigate the range of the slope (Mitchell, 1997). Dent and Blackie (1979) have suggested a more relevant null hypothesis that tests if the intercept and the slope coefficients simultaneously are not different from zero and unity, respectively. Equation [15] shows the appropriate statistic calculation using the F-test with 2 and (n – 2) degrees of freedom. Mayer et al. (1994) showed the derivation of Equation [16], which is similar to Equation [15] without the term (n – 2) in the numerator and the term (n) in the denominator. This adjustment may cause different acceptance threshold Equations [15] and [16] based on the probability of the F-test. (n−2)×⎛n×a2 +2×n× f (X ,...,X )×a×(b−1)+∑n ( f(X ,...,X )2×(b−1)2)⎞ ⎜ ⎟ 1 p 1 p i ⎝ ⎠ F = i=1 1 2×n×MSE [15] n×a2 +2×n× f (X ,...,X )×a×(b−1)+∑n ( f(X ,...,X )2×(b−1)2) 1 p 1 p i F = i=1 2 2×MSE [16] Where n is the sample size, a is the intercept of the linear regression, b is slope of the linear regression, f (X ,...,X ) is the mean of model-predicted values, and f(X ,...,X ) is the ith model-predicted 1 p 1 p i value. The F-test calculation presented in Equation [15] is valid only for deterministic models; in fact the slope should not be expected to be unity when using stochastic (or probabilistic) models. Nonetheless, as shown by Analla (1998) the greater the MSE (Equation [11]) the more difficult it is to reject H that intercept and slope simultaneously differ from zero and unity, respectively; a larger 0 confidence interval is expected, which increases its tolerance. Additionally, the independently distributed assumption may cause some problems since many evaluation datasets are time series or autocorrelated in some other way (Harrison, 1990), theoretically resulting in the expected values of the slope and the intercept of the linear regression being less than one and greater than zero, respectively. Therefore, the simultaneous F-test may not perform acceptably (Mayer et al., 1994). The remedial for autocorrelation is the average of subsequent pairs of data (or triplets) within the time series; unfortunately, there is loss of power of the F-test as the number of observations decreases (Mayer et al., 1994). Because of disagreements of wrongness and rightness of the regression of observed on model- predicted values, Kleijnen et al. (1998) proposed a sequence of analysis consisting of the regression of the difference between model-predicted and observed values on their associated sums to evaluate model adequacy. Stringent assumptions require that a model is valid if and only if the observed and the model-predicted values have (a.) identical means, (b.) identical variances, and (c.) they are positively correlated. NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 8 There are several concerns about the appropriateness of linear regression in model evaluation (Harrison, 1990; Mitchell, 1997). The assumptions are rarely satisfied, the null hypothesis tests give ambiguous results depending on the scatter of the data, and regression lacks sensitivity because distinguishing lines between random clouds or data points is not necessary at this stage in model development. Regression analysis tests the slope equal to unity, which can reject a reasonably good agreement if the residual errors are small. 4.1.1. Uncertainties in the Y- and X-Variates When the mathematical model is stochastic (or probabilistic) or when an inherent variability or uncertainty of the predicted values is assumed, the least-square method is not appropriate for estimating the parameters of the linear regression. Mayer et al. (1994) indicated that if the X-variate contains some error, the rejection rates of valid models may increase up to 27% to 47% depending on whether the errors are uncorrelated or correlated, respectively. A solution to this inadequacy is the use of Monte Carlo techniques to provide goodness-of-fit tests regardless of within-model peculiarities (Shaeffer, 1980; Waller et al., 2003). The Monte Carlo test verifies if the observed data appear consistent with the model in contrast to whether the model appears consistent with the observed data. Another solution is the use of least square regression weighted by the standard deviation of the data points. The χ2 merit function is represented in Equation [17] (Press et al., 1992). Unfortunately, an iterative process is required to minimize Equation [17] with respect to a and b since the b parameters occur in the denominator, which makes the resulting equation for the slope ∂χ2 ∂b=0 nonlinear. Lybanon (1984) has provided an algorithm to minimize Equation [17]; additional comments to the procedure have also been provided (Jefferys, 1980, 1981, 1988a, b; Reed, 1989, 1990, 1992; Squire, 1990). ( )2 n Y −a−b× f(X X ) χ2 =∑ i 1,..., p i σ2 +b2×σ2 i=1 Yi f(X1,...,Xp)i [17] Where σ2 and σ2 are the variance of the ith data point for observed and model-predicted Y f(X X ) i 1,..., p i values, respectively. 4.2. Analysis of Fitting Errors 4.2.1. Analysis of Deviations This is an empirical evaluation method proposed by Mitchell (1997) and Mitchell and Sheehy (1997). In this case, the deviations (model-predicted minus observed values) are plotted against the observed values (or model-predicted values as in Mitchell, 1997) and the percentage of points lying within an acceptable range (envelope) is used as a criterion of model adequacy. The acceptable range is based on the purpose of the model; usually the limits of 95% confidence interval of the observed values are used as a guideline. Ultimately, the differences between observed and model-predicted values provide adequate information on how far the model fails to simulate the system (Mitchell and Sheehy, 1997). 4.2.2. Analysis of Residuals: Extreme Points Analysis of residuals (regression-predicted minus observed values)) is an important process in identifying data points that causes departures of the assumptions considered in the linear regression. Such data points are known as outliers or extreme and the study of their behavior is exceptionally important in assessing the aptness of the model and/or the measures necessary to improve the adequacy NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 9 of the model. Therefore, analysis of residuals also has an important role in model development and in the redesign phases. The diagnostics for residues have been comprehensively discussed by Neter et al. (1996, Ch. 3 and 9). The departures of nonlinearity of the regression and nonconstancy of error variance is assessed by the plot of residual versus the predictor variable or the plot of residual versus the fitted values. Leverage value. The detection of outliers or extreme data points is not a simple or an easy task. Nonetheless, it is very important in studying the outlying cases carefully and deciding whether they should be retained or removed from the dataset. There are several effective statistics that assist in the detection of these misbehavior data points. Most of these statistics use the leverage value (h ), which is ii computed from the diagonal element of the hat matrix. Equation [18] shows the special case of the calculation of the leverage values for the linear regression. ( ) ( ) h = c +c × f(X ,...,X ) + c +c × f(X ,...,X ) × f(X ,...,X ) ii 1 2 1 p i 2 3 1 p i 1 p i [18] Where coefficients c , c , and c are computed as: 1 2 3 1 f (X ,...,X )2 −f (X ,...,X ) 1 c = + 1 p ;c = 1 p ;c = 1 n (n−1)×s2 2 (n−1)×s2 3 (n−1)×s2 f(X ,...,X ) f(X ,...,X ) f(X ,...,X ) 1 p 1 p 1 p The leverage value has special properties that are helpful in identifying outliers. The sum of h ii is always equal to the number of parameters of the linear regression; in this case this value is two. Additionally, the leverage value is a measure of the distance between the f(X ,...,X ) value and the 1 p i mean of all model-predicted values ( f (X ,...,X )). Therefore, the larger the leverage value the farther 1 p the ith data point is from the center of all f(X ,...,X ) values; that is why it is called the leverage of 1 p i the ith data point. As a rule of thumb, a data point may be considered an outlier when its leverage value is greater than 2×p/n (for linear regression p = 2; thus greater than 4/n). Studentized and semi-studentized residues. The studentized residue is another statistic that accounts for the magnitude of the standard error of each residue (Neter et al., 1996). It is computed as shown in Equation [19]. Similarly, the semi-studentized residue can be computed based only on the MSE as shown in Equation [20]. ˆ Y −Y Studentized Residue = i i i MSE×(1−h ) ii [19] ˆ Y −Y Semi-Studentized Residue = i i i MSE [20] PRESS statistic. The PRESS statistic is the measurement of the ith residue when the fitted regression is based on all of the cases except the ith one. This is because if Y is far outlying, the fitted i least squares regression function based on all cases including the ith data point value may be influenced to come close to Y, yielding a fitted value Yˆ near Y. On the other hand, if the ith data point is excluded i i i ˆ before the regression function is fitted, the least squares fitted value Y is not influenced by the outlying i Y data point, and the residual for the ith data point will tend to be larger and therefore more likely to i expose the outlying observation. The difference between the actual observed value Y and the estimated i ˆ expected value Y is computed as shown in Equation [21]. i(i) ˆ Y −Y PRESS =Y −Yˆ = i i i i i(i) 1-h ii [21] NOT FOR PUBLICATION OR DISTRIBUTION WITHOUT AUTHORIZATION 10
Description: