0 1 Asymptotic equivalence and sufficiency for volatility 0 2 n estimation under microstructure noise a J 8 1 Markus Reiß ] T Institute of Mathematics S . h Humboldt-Universita¨t zu Berlin t a m [email protected] [ January 26, 2010 1 v 6 0 0 Abstract 3 . 1 The basic model for high-frequency data in finance is considered, 0 0 whereanefficientpriceprocessisobservedundermicrostructurenoise. 1 : ItisshownthatthisnonparametricmodelisinLeCam’ssenseasymp- v i X totically equivalent to a Gaussian shift experiment in terms of the r a squarerootofthe volatilityfunctionσ.Asanapplication,simple rate- optimal estimators of the volatility and efficient estimators of the in- tegrated volatility are constructed. Key words and Phrases: High-frequency data, integrated volatility, spot volatility estimation, Le Cam deficiency, equivalence of experiments, Gaussian shift. AMS subject classification: 62G20, 62B15, 62M10,91B84 1 2 Markus Reiß 1 Introduction In recent years volatility estimation from high-frequency data has attracted a lot ofattentioninfinancialeconometricsandstatistics.Due toempiricalevidencethat the observed transaction prices of assets cannot follow a semi-martingale model, a prominentapproachistomodeltheobservationsasthesuperpositionofthetrue(or efficient) price process with some measurement error, conceived as microstructure noise. The main features are already present in the basic model of observing Y =X +ε , i=1,...,n, (1.1) i i/n i t with an efficient price process X = σ(s)dB , B a standard Brownian motion, t 0 s R andε N(0,δ2)allindependent.Theaimistoperformstatisticalinferenceonthe i ∼ volatilityfunctionσ :[0,1] R+,e.g.estimatingthe so-calledintegratedvolatility → 1σ2(t)dt over the trading day. 0 R The mathematical foundation on the parametric formulation of this model has been laid by Gloter and Jacod (2001a) who prove the interesting result that the model is locally asymptotically normal (LAN) as n , but with the unusual → ∞ rate n 1/4, while without microstructure noise the rate is n 1/2. Starting with − − Zhang, Mykland, and A¨ıt-Sahalia (2005), the nonparametric model has come into the focus of research. Mainly three different, but closely related approaches have been proposed afterwards to estimate the integrated volatility: multi-scale estima- tors (Zhang 2006), realized kernels or autocovariances (Barndorff-Nielsen, Hansen, Lunde, and Shephard 2008)and preaveraging(Jacod, Li, Mykland, Podolskij, and Vetter 2009). Under various degrees of generality, especially also for stochastic volatility, all authors provide central limit theorems with convergence rate n 1/4 − and an asymptotic variance involving the so-calledquarticity 1σ4(t)dt. Recently, 0 R alsotheproblemofestimatingthespotvolatilityσ2(t)itselfhasfoundsomeinterest (Munk and Schmidt-Hieber 2009). Theaimofthepresentpaperistoprovideathoroughmathematicalunderstand- Asymptotic equivalence for volatility estimation 3 ing of the basic model, to explain why statistical inference is not so canonical and to propose a simple estimator of the integrated volatility which is efficient. To this end we employ Le Cam’s concept of asymptotic equivalence between experiments. In fact, our main theoretical result in Theorem 6.2 states under some regularity conditions that observing (Y ) in (1.1) is for n asymptotically equivalent to i → ∞ observing the Gaussian shift experiment dYt = 2σ(t)dt+δ1/2n−1/4dWt, t [0,1], ∈ p with Gaussian white noise dW. Not only the large noise level δ1/2n 1/4 is appar- − ent, but also a non-linear σ(t)-form of the signal, from which optimal asymp- p totic variance results can be derived. Note that a similar form of a Gaussian shift was found to be asymptotically equivalent to nonparametric density estimation (Nussbaum 1996). A key ingredient of our asymptotic equivalence proof are the results by Grama and Nussbaum (2002) on asymptotic equivalence for generalized nonparametric regression, but also ideas from Carter (2006) and Reiß (2008) play a role. Moreover, fine bounds on Hellinger distances for Gaussian measures with different covariance operators turn out to be essential. Roughly speaking, asymptotic equivalence means that any statistical infer- ence procedure can be transferred from one experiment to the other such that the asymptotic risk remains the same, at least for bounded loss functions. Techni- cally,twosequencesofexperimentsEn andGn,definedonpossiblydifferentsample spaces, but with the same parameter set, are asymptotically equivalent if the Le Cam distance ∆(En,Gn) tends to zero. For E = (X ,F ,(Pi) ), i = 1,2, by i i i ϑ ϑ Θ ∈ definition, ∆(E ,E ) = max(δ(E ,E ),δ(E ,E )) holds in terms of the deficiency 1 2 1 2 1 2 δ(E ,E ) = inf sup MP1 P2 , where the infimum is taken over all ran- 1 2 M ϑ∈Θk ϑ − ϑkTV domisations or Markov kernels M from (X ,F ) to (X ,F ), see e.g. Le Cam 1 1 2 2 and Yang (2000) for details. In particular, δ(E ,E ) = 0 means that E is more 1 2 1 informative than E in the sense that any observation in E can be obtained from 2 2 E , possibly using additional randomisations. Here, we shall always explicitly con- 1 4 Markus Reiß struct the transformations and randomisations and we shall then only use that ∆(E ,E )6sup P1 P2 holds when both experiments are defined on the 1 2 ϑ∈Θk ϑ − ϑkTV same sample space. The asymptotic equivalence is deduced stepwise. In Section 2 the regression- typemodel(1.1)isshowntobeasymptoticallyequivalenttoacorrespondingwhite noise model with signal X. Then in Section 3, a very simple construction yields a Gaussian shift model with signal log(σ2( )+c), c > 0 some constant, which is • asymptoticallylessinformative,butonlybyaconstantfactorintheFisherinforma- tion. Inspired by this construction, we present a generalisation in Section 4 where the information loss can be made arbitrarily small (but not zero), before applying nonparametric local asymptotic theory in Section 5 to derive asymptotic equiva- lence with our finalGaussianshift model for shrinkinglocalneighbourhoodsofthe parameters. Section 6 yields the global result, which is based on an asymptotic sufficiency result for simple independent statistics. ExtensionsandrestrictionsarediscussedinSection7beforeweusethetheoret- ical insight to construct in Section 8 a rate-optimalestimator of the spot volatility and an efficient estimator of the integrated volatility by a locally-constantapprox- imation. Remarkably, the asymptotic variance is found to depend on the third moment 1σ3(t)dt andfor non-constantσ2( ) ourestimator outperformsprevious 0 • R approachesappliedto the basicmodel. Constructionsneededfor the proofare pre- sented and discussed alongside the mathematical results, deferring more technical parts to the Appendix, which in Section 9.1 also contains a summary of results on white noise models, the Hellinger distance and Hilbert-Schmidt norm estimates. 2 The regression and white noise model In the main part we shall work in the white noise setting, which is more intuitive to handle than the regression setting, which in turn is the observation model in Asymptotic equivalence for volatility estimation 5 practice.Letus define both models formally.For that we introduce the Ho¨lder ball f(x) f(y) Cα(R):={f ∈Cα([0,1])|kfkCα 6R} with kfkCα =kfk∞+xsu=py | x −y α |. 6 | − | 2.1 Definition. Let E = E (n,δ,α,R,σ2) with n N, δ > 0, α (0,1), R > 0, 0 0 ∈ ∈ σ2 >0 be the statistical experiment generated by observing (1.1). The volatility σ2 belongs to the class S(α,R,σ2):= σ2 Cα(R) min σ2(t)>σ2 . n ∈ (cid:12) t [0,1] o (cid:12) ∈ (cid:12) Let E = E (ε,α,R,σ2) with ε > 0, α (0,1), R >0, σ2 > 0 be the statistical 1 1 ∈ experiment generated by observing dY =X dt+εdW , t [0,1], t t t ∈ t with X = σ(s)dB as above, independent standard Brownian motions W and t 0 s R B and σ2 S(α,R,σ2). ∈ From Brown and Low (1996) it is well known that the white noise and the Gaussianregressionmodelareasymptoticallyequivalentfornoiselevelε=δ/√n → 0asn ,providedthesignalisβ-H¨oldercontinuousforβ >1/2.SinceBrownian →∞ motionandthusalsoourpriceprocessX isonlyHo¨ldercontinuousoforderβ <1/2 (whatever α is), it is not clear whether asymptotic equivalence can hold for the experiments E and E . Yet, this is true. Subsequently, we employ the notation 0 1 A .B if A =O(B ) and A ∼B if A .B as well as B .A and obtain: n n n n n n n n n n 2.2 Theorem. For any α > 0, σ2 > 0 and δ,R > 0 the experiments E and E 0 1 with ε=δ/√n are asymptotically equivalent ; more precisely: ∆(E (n,δ,α,R,σ2),E (δ/√n,h,α,R,σ2)).Rδ 2n α. 0 1 − − Interestingly, the asymptotic equivalence holds for any positive H¨older regu- larity α > 0. In particular, the volatility σ2 could be itself a continuous semi- martingale,but suchthatX conditionallyonσ2 remainsGaussian.As the proofin 6 Markus Reiß Section 9.2 of the appendix reveals, we construct the equivalence by rate-optimal approximations of the anti-derivative of σ2 which lies in C1+α. Similar techniques have been used by Carter (2006) and Reiß (2008), but here we have to cope with the random signal for which we need to bound the Hilbert-Schmidt norm of the respective covarianceoperators.Note further thatthe asymptotic equivalence even holdswhenthelevelofthemicrostructurenoiseδtendstozero,providedδ2nα →∞ remains valid. 3 Less informative Gaussian shift experiments Fromnow onwe shallworkwith the white noise observationexperiment E , where 1 the main structures are more clearly visible. In this section we shall find easy Gaussianshift models which are asymptotically not more informative than E , but 1 already permit rate-optimal estimation results. The whole idea is easy to grasp once we can replace the volatility σ2 by a piecewise constant approximation on smallblocksofsize h.Thatthis is nolossofgenerality,is shownby the subsequent asymptotic equivalence result, proved in Section 9.3 of the appendix. 3.1 Definition. Let E = E (ε,h,α,R,σ2) be the statistical experiment generated 2 2 by observing dY =Xhdt+εdW , t [0,1], t t t ∈ with Xh = tσ( s )dB , s := s/h h for h > 0 and h 1 N, and indepen- t 0 ⌊ ⌋h s ⌊ ⌋h ⌊ ⌋ − ∈ R dent standard Brownian motions W and B. The volatility σ2 belongs to the class S(α,R,σ2). 3.2 Proposition. Assume α>1/2 and σ2 >0. Then for ε 0, hα =o(ε1/2) the → experiments E and E are asymptotically equivalent ; more precisely: 1 2 ∆(E1(ε,α,R,σ2),E2(ε,h,α,R,σ2)).Rσ−3/2hαε−1/2. In the sequel we always assume hα = o(ε1/2) to hold such that we can work equivalently with E . Recall that observing Y in a white noise model is equivalent 2 Asymptotic equivalence for volatility estimation 7 to observing ( e dY) for an orthonormalbasis (e ) of L2([0,1]), cf. also m m>1 m m>1 R Subsection 9.1 below. Our first step is thus to find an orthonormal system (not a basis) which extracts as much local information on σ2 as possible. For any ϕ ∈ L2([0,1]) with ϕ L2 =1 we have by partial integration k k 1 1 1 ϕ(t)dY = ϕ(t)Xhdt+ε ϕ(t)dW Z t Z t Z t 0 0 0 1 =Φ(1)Xh Φ(0)Xh Φ(t)σ( t )dB +ε ϕ(t)dW 1 − 0 −Z ⌊ ⌋h t Z t 0 1 1/2 = Φ2(t)σ2( t )dt+ε2 ζ (3.1) (cid:16)Z0 ⌊ ⌋h (cid:17) ϕ 1 whereΦ(t)= ϕ(s)dsistheantiderivativeofϕwithΦ(1)=0andζ N(0,1) − t ϕ ∼ R holds.Toensurethat Φ hasonly supportinsome interval[kh,(k+1)h],we require ϕ to have support in [kh,(k+1)h] and to satisfy ϕ(t)dt = 0. The function ϕ k R with supp(ϕk) = [kh,(k +1)h], ϕk L2 = 1, ϕk(t)dt = 0 that maximizes the k k R information load Φ2(t)dt for σ2(kh) is given by (use Lagrange theory) k R ϕ (t)=√2h 1/2cos π(t kh)/h 1 (t), t [0,1]. (3.2) k − [kh,(k+1)h] − ∈ (cid:0) (cid:1) The L2-orthonormalsystem (ϕ ) for k=0,1,...,h 1 1 is now used to construct k − − Gaussian shift observations. In E we obtain from (3.1) the observations 2 1/2 y := ϕ (t)dY = h2π 2σ2(kh)+ε2 ζ , k =0,...,h 1 1, (3.3) k Z k t (cid:16) − (cid:17) k − − with independent standard normal random variables (ζk)k=0,...,h−1 1. Observing − (y ) is clearly equivalent to observing k z :=log(y2h 2π2) E[log(ζ2)]=log σ2(kh)+ε2h 2π2 +η (3.4) k k − − k (cid:16) − (cid:17) k for k =0,...,h 1 1 with η :=log(ζ2) E[log(ζ2)]. − − k k − k We have found a nonparametric regression model with regression function log(σ2( )+ε2h 2π2)andh 1 equidistantobservationscorruptedbynon-Gaussian, − − • but centered noise (η ) of variance 2. To ensure that the regression function does k not change under the asymptotics ε 0, we specify the block size h=h(ε)=h ε 0 → with some fixed constant h >0. 0 8 Markus Reiß It is not surprising that the nonparametric regression experiment in (3.4) is equivalentto a correspondingGaussianshift experiment. Indeed, this followsread- ily from results by Grama and Nussbaum (2002) who in their Section 4.2 derive asymptotic equivalence already for our Gaussian scale model (3.3). Note, however, that their Fisher information should be I(ϑ)= 1ϑ 2 and we thus have asymptotic 2 − equivalence of (3.3) with the Gaussian regressionmodel wk = √12log(σ2(kh)+h−02π2)+γk, k =0,...,h−1−1, where γ N(0,1) i.i.d. Since by the classical result of Brown and Low (1996) k ∼ the Gaussian regression is equivalent to the corresponding white noise experiment (notethatlog(σ2(•)+h−02π2)isalsoα-H¨oldercontinuous),wehavealreadyderived an important and far-reaching result. 3.3 Theorem. For α > 1/2 and σ2 > 0 the high frequency experiment E (ε,α,R,σ2) is asymptotically more informative than the Gaussian shift exper- 1 iment G (ε,α,R,σ2,h ) of observing 1 0 dZt = √12log(cid:16)σ2(t)+h−02π2(cid:17)dt+h10/2ε1/2dWt, t∈[0,1]. Here h >0 is an arbitrary constant and σ2 S(α,R,σ2). 0 ∈ 3.4 Remark. Moving the constants from the diffusion to the drift part, the exper- iment G is equivalent to observing 1 dZ˜t =(2h0)−1/2log(σ2(t)+h−02π2)dt+ε1/2dWt, t∈[0,1]. (3.5) The Gaussian shift experiment is nonlinear in σ2 which is to be expected. Writing ε=δ/√n gives us the noise level δ1/2n 1/4 which appears in all previous work on − the model E . 0 To quantify the amount of information we have lost, let us study the LAN- property of the constant parametric case σ2(t) = σ2 > 0 in G . We consider the 1 local alternatives σ2 =σ2+ε1/2 for which we obtain the Fisher information I = ε 0 h0 Asymptotic equivalence for volatility estimation 9 (2h0)−1h40/(π2+h20σ02)2. Maximizing over h0 yields h0 = √3πσ0−1 and the Fisher information is at most equal to sup Ih0 =σ0−333/2/(32π)≈0.0517σ0−3. h0>0 BytheLAN-resultofGloterandJacod(2001a)forE0 thebestvalueisI(σ0)= 18σ0−3 which is clearly larger. Note, however, that the relative (normalized) efficiency is √33/2/(32π) already 0.64, which means that we attain about 64% of the precision √1/8 ≈ when working with G instead of E or E . 1 0 1 4 A close sequence of simple models In order to decrease the information loss in G , we now take into account higher 1 frequencies in each block [kh,(k+1)h]. In a frequency-location notation (j,k) we consider for k =0,1,...,h 1 1, j >1 − − ϕ (t)=√2h 1/2cos(jπ(t kh)/h)1 (t), t [0,1]. (4.1) jk − [kh,(k+1)h] − ∈ This gives the corresponding antiderivatives √2h Φ (t)= sin(jπ(t kh)/h)1 (t), t [0,1]. jk πj − [kh,(k+1)h] ∈ Not only the (ϕ ) and (Φ ) are localized on each block, also each single family jk jk of functions is orthogonal in L2([0,1]). Working again on the piecewise constant experiment E , we extract the observations 2 1 1/2 y := ϕ (t)dY = h2π 2j 2σ2(kh)+ε2 ζ ,j >1, k =0,...,h 1 1, jk Z0 jk t (cid:16) − − (cid:17) jk − − (4.2) with ζ N(0,1) independent over all (j,k). The same transformation as before jk ∼ leads for each j >1 to the regressionmodel for k =0,...,h 1 1 − − z :=log(y2 ) log(h2π 2j 2) E[log(ζ2 )]=log(σ2(t)+ε2h 2π2j2)+η . (4.3) jk jk − − − − jk − jk Applying the asymptotic equivalence result by Grama and Nussbaum (2002) for each independent level j separately, we immediately generalize Theorem 3.3. 10 Markus Reiß 4.1 Theorem. For α > 1/2 and σ2 > 0 the high frequency experiment E (ε,α,R,σ2) is asymptotically more informative than the combined experiment 1 G (ε,α,R,σ2,h ,J) of independent Gaussian shifts 2 0 dZtj = √12log(σ2(t)+h−02π2j2)dt+h10/2ε1/2dWtj, t∈[0,1], j =1,...,J, with independent Brownian motions (Wj) and σ2 S(α,R,σ2). The con- j=1,...,J ∈ stants h >0 and J N are arbitrary, but fixed. 0 ∈ 4.2 Remark. Let us again study the LAN-property of the constant parametric case σ2(t)=σ2 >0 for the local alternatives σ2 =σ2+ε1/2. We obtain the Fisher ε 0 information I = J (2h ) 1h4(π2j2+h2σ2) 2 = J h−01 . h0,J Xj=1 0 − 0 0 0 − Xj=1 2(π2(jh−01)2+σ02)2 In the limit J and h we obtain by Riemann sum approximation 0 →∞ →∞ ∞ dx 1 lim lim I = = . h0→∞J→∞ h0,J Z0 2(π2x2+σ02)2 8σ03 This is exactly the optimal Fisher information, obtained by Gloter and Jacod (2001a) in this case. Note, however, that it is not at all obvious that we may let J,h , in the asymptotic equivalence result. Moreover, in our theory the re- 0 → ∞ striction hα = o(ε1/2) is necessary, which translates into h = o(ε(1 2α)/2α). Still, 0 − the positive aspect is that we can come as close as we wish to an asymptotically almost equivalent, but much simpler model. 5 Localisation We know from standard regression theory (Stone 1982) that in the experiment G 1 wecanestimateσ2 Cα insup-normwithrate(εlog(ε 1))α/(2α+1),usingthatthe − ∈ log-function is a C -diffeomorphism for arguments bounded away from zero and ∞ infinity. Since E is for α > 1/2 asymptotically more informative than G , we can 1 1 therefore localize σ2 in a neighbourhood of some σ2. Using the local coordinate s2 0 inσ2 =σ2+v s2 forv 0wedefinealocalizedexperiment,cf.Nussbaum(1996). 0 ε ε →

