StatisticalScience 2011,Vol.26,No.1,47–48 DOI:10.1214/11-STS345A MainarticleDOI:10.1214/10-STS345 (cid:13)c InstituteofMathematicalStatistics,2011 Discussion of “Feature Matching in Time Series Modeling” by Y. Xia and H. Tong Bruce E. Hansen 2 1 1. INTRODUCTION 2. CATCH-ALL ESTIMATION 0 2 XiaandTonghavewrittenaprovocativeandstim- XiaandTongrecommendwhattheycalla“catch- n ulating paper. Among the many topics raised in all” approach, where the distance metric is a weigh- a J their paper, I would like in particular to endorse ted sum of squared k-step forecast residuals. They 6 several of their postulates: show that in some situations this criterion allows consistent estimation of the parameters of the true ] 1. All models are wrong. E latent process. Their Theorem C requires that the 2. Observations are not error-free. M latent process is deterministic, but the result might 3. Estimation needs to account for the above two hold more broadly. . issues. at This can be illustrated in a very simple example t As described in the paper, suppose that we ob- of a latent AR(1) with additive measurement error. s [ serve a process {yt:t = 1,...} for which we have Suppose that the latent process is a model {x (θ):t=1,...} which depends upon an 1 t v unknown parameter θ. Let Fx(θ) denote the joint xt=θxt−1+εt 6 distribution of the x (θ) process and F the joint t y and the observed process is 5 distribution of the observables. When we say that 3 1 the model is wrong,we mean that there is no θ such yt=xt+ηt, . that F (θ)=F . If we think of the distribution F 1 x y y where ε and η are independentwhite noise. In this 0 as a member of a large space of potential joint dis- t t 2 tributions, then the set of joint distributions F (θ) case, it is well known that yt has an ARMA(1,1) x 1 representation constitutesalow-dimensionalsubspaceofthislarger : v space. While there is no true θ, we can define the i (1) yt=θyt−1+ut−αut−1, X pseudo-true θ as the value which makes Fx(θ) as r closeaspossibletoFy.Thisrequiresspecifyingadis- where ut is white noise and 0≤α<1. a tance metric between the joint distributions Xia and Tong propose estimation based on k-step forecast errors. The k-step forecast equation for the d(θ)=d(F (θ),F ) x y observables is and then we can define the best-fitting model F (θ) by selecting θ to minimize d(θ). The relevant qxues- (2) yt−1+k =θkyt−1+et(k), tion is then: what is the appropriate distance met- where ric? k−1 et(k)= θj(ut+k−j−1−αut+k−j−2). Bruce E. Hansen is Professor, Department of Xj=0 Economics, University of Wisconsin, 1180 Observatory Drive, Madison, Wisconsin 53706, USA e-mail: Xia and Tong’s estimator is based on a weighted [email protected]. average of squared forecast errors. For simplicity, suppose all the weight is on the kth forecast error. This is an electronic reprint of the original article published by the Institute of Mathematical Statistics in The estimator is Statistical Science, 2011, Vol. 26, No. 1, 47–48. This T reprint differs from the original in pagination and θˆ{k}=argmin (yt−1+k−θkyt−1)2 typographic detail. θ Xt=1 1 2 B. E. HANSEN which has the explicit solution to identify the parameters of an unobserved latent process.Itisnotsufficienttosimplyintroduceanew θˆ = Tt=1yt−1yt−1+k 1/k. estimator. {k} (cid:18)P T y2 (cid:19) We can see this quite simply by examining the t=1 t−1 spectral density. Suppose as above that x is the P t We calculate that as n→∞ process of interest and the observed process is y = t θˆ →p θ =θ(1−c)1/k, xt+ηt whereηt isi.i.d.measurementerrorwithvari- {k} {k} ance σ2. Letting f (λ) and f (λ) be the spectral η x y where c=ασ2/θσ2, σ2 =Eu2 and σ2=Ey2. densities of x and y , we know that u y u t y t t t Thus for any k, θˆ is inconsistent as an estimator 2 f (λ)=f (λ)+σ . ofθ.Butaskgetslargethediscrepancygetssmaller, y x η as (1−c)1/k →1 since c<1. Thus as k→∞ Thedistributionoftheobservablesy identifiesf (λ), t y but f (λ) is not identified from knowledge of f (λ) (3) θ →θ. x y {k} alone. Under the realistic assumption that σ2 is un- η This derivation assumed that the estimator is based known, fx(λ) can only beidentified by knowledge of on the kth forecast error, but it extends to the case the structure of x [e.g., by knowing that x is an t t of a weighted average. AR(1) as in the example of the previous section]. The convergence (3) is an extension of Xia and But if we acknowledge that our models for x are t Tong’sTheoremC.Itshowsthatestimationbymin- misspecified, we should view the true f (λ) as non- x imizing the squared k-step forecast residual is con- parametric and hence without structure. It follows sistent for the parameter of the latent AR(1), as k that the spectral density f (λ) is not nonparamet- x is made large. rically identified, and thus neither is the autocorre- One trouble with this approach is that the esti- lation structure of x . t mator is quite inefficient. We can calculate that Nevertheless, some features are identified. While the spectral density is not point identified, it is in- 2 1 Tvar(θˆ )≃ →∞ terval identified. Let f =minλfy(λ). Observe that {k} (cid:18)kθk(cid:19) f (λ)−f ≤f (λ)≤f (λ). y x y as k→∞. This means that the variance of the Xia– Tong estimator is increasing in k (and unbounded). The two bounds are identified from fy(λ), so the This is especially troubling since the parameters spectral density fx(λ) of xt can be bounded within of (1) can be estimated by standard ARMA meth- this interval. The width of the interval is f = ods. The implication is that while the catch-all ap- minλfx(λ)+ση2, which is thus an upper bound for proach has some useful robustness properties, there the measurement error variance σ2. η is no reason to expect the estimator to be efficient. What is particularly interesting is that while the level of f (λ) is not identified, many of its most x 3. MEASUREMENT ERROR AND important features are identified, specifically, the NONPARAMETRIC IDENTIFICATION peaks and troughs. What this means is that while full knowledge of the x process is not possible, im- Xia and Tong emphasize that measurement er- t portant features can be identified from the distribu- ror is empirically relevant and time series methods tion of the observables y . Knowledge of which fea- should take it seriously. While I agree, we also need t tures are identified in the presence of measurement to acknowledge that measurement errorraises many error and/or misspecification helps focus attention troubling problems. Of primary importance, I be- on what can be learned about unobserved processes lieve, isthevexingissueofnonparametricidentifica- from observational data. tion—whethertheparametersofinterestareunique- lydeterminedbythedistributionoftheobservables. ACKNOWLEDGMENTS As is known from the random sampling context, measurementerrorcomplicatesidentification.Ingen- ResearchsupportedbytheNationalScienceFoun- eral, additional information or structure is required dation.