Robust Estimation of Change-Point Location Carina Gerstenberger∗ We introduce a robust estimator of the location parameter for the change-point in the 7 mean based on the Wilcoxon statistic and establish its consistency for L near epoch 1 1 dependent processes. It is shown that the consistency rate depends on the magnitude 0 of change. A simulation study is performed to evaluate finite sample properties of the 2 Wilcoxon-typeestimatorinstandardcases,aswellasunderheavy-taileddistributionsand n disturbances by outliers, and to compare it with a CUSUM-type estimator. It shows that a theWilcoxon-typeestimatorisequivalenttotheCUSUM-typeestimatorinstandardcases, J but outperforms the CUSUM-type estimator in presence of heavy tails or outliers in the 9 data. KEYWORDS: Wilcoxon statistic; change-point estimator; near epoch dependence ] T S 1 Introduction . h t a In many applications it can not be assumed that observed data have a constant mean m overtime. Therefore,extensiveresearchhasbeendoneintestingforchange-pointsinthe [ mean,seee.g. Giraitiset al.(1996),Cs¨org¨oandHorv´ath(1997),Ling(2007),andothers. 1 A number of papers deal with the problem of estimation of the change-point location. v Bai (1994) estimates the unknown location point for the break in the mean of a linear 1 process by the method of least squares. Antoch et al. (1995) and Cs¨org¨o and Horv´ath 7 2 (1997) established the consistency rates for CUSUM-type estimators for independent 2 data, while Cs¨org¨o and Horv´ath (1997) considered weakly dependent variables. Horv´ath 0 and Kokoszka (1997) established consistency of CUSUM-type estimators of location of . 1 change-point for strongly dependent variables. Kokoszka and Leipus (1998, 2000) dis- 0 7 cussed CUSUM-type estimators for dependent observations and ARCH models. In spite 1 of numerous studies on testing for changes and estimating for change-points, however, : v just a few procedures are robust against outliers in the data. In a recent work Dehling et i X al. (2015) address the robustness problem of testing for change-points by introducing a r Wilcoxon-type test which is applicable under short-range dependence (see also Dehling a et al. (2013) for the long-range dependence case). In this paper we suggest a robust Wilcoxon-type estimator for the change-point location based on the idea of Dehling et al. (2015) and applicable for L near epoch dependent 1 Date: January 9, 2017. *Fakulta¨t fu¨r Mathematik, Ruhr-Universit¨at Bochum, 44780 Bochum, Germany 1 processes. The Wilcoxon change-point test statistic is defined as k n (cid:88) (cid:88) W (k) = (1 −1/2) (1) n {Xi≤Xj} i=1j=k+1 and counts how often an observation of the second part of the sample, X ,...,X , k+1 n exceedsanobservationofthefirstpart,X ,...,X . Assumingachangeinmeanhappens 1 k atthetimek∗,theabsolutevalueofW (k∗)isexpectedtobelarge. Hence,theWilcoxon- n type estimator for the location of the change-point, (cid:110) (cid:12) (cid:12) (cid:12) (cid:12)(cid:111) kˆ = min k : max (cid:12)W (l)(cid:12) = (cid:12)W (k)(cid:12) , (2) (cid:12) n (cid:12) (cid:12) n (cid:12) 1≤l<n can be defined as the smallest k for which the Wilcoxon test statistic W (k) attains n its maximum. Since the Wilcoxon test statistic is a rank-type statistic, outliers in the observeddatacannotaffecttheteststatisticsignificantly. Onthecontrary,theCUSUM- type test statistic k n 1 (cid:88) 1 (cid:88) C (k) = X − X , n i i k n i=1 i=1 which compares the difference of the sample mean of the first k observations and the sample mean over all observations, can be significantly disturbed by a single outlier. The outline of the paper is as follows. In Section 2 we discuss the consistency and the rates of the estimator kˆ in (2). Section 3 contains the simulation study. Section 4 provides useful properties of the Wilcoxon test statistic and the proof of the main result. Sections 5 and 6 contain some auxiliary results. 2 Definitions, assumptions and main results Assume the random variables X ,...,X follow the change-point model 1 n (cid:40) Y +µ, 1 ≤ i ≤ k∗ i X = (3) i Y +µ+∆ , k∗ < i ≤ n, i n where the process (Y ) is a stationary zero mean short-range dependent process, k∗ j denotes the location of the unknown change-point and µ and µ+∆ are the unknown n means. WeassumethatY hasacontinuousdistributionfunctionFwithboundedsecond 1 derivative and that the distribution functions of Y −Y , k ≥ 1 satisfy 1 k P(x ≤ Y −Y ≤ y) ≤ C|y−x|, (4) 1 k for all 0 ≤ x ≤ y ≤ 1, where C does not depend on k and x,y. We allow the magnitude of the change ∆ vary with the sample size n. n Assumption 2.1. a) The change-point k∗ = [nθ], 0 < θ < 1, is proportional to the sample size n. 2 b) The magnitude of change ∆ depends on the sample size n, and is such that n ∆ → 0, n∆2 → ∞, n → ∞. (5) n n Nextwespecifytheassumptionsontheunderlyingprocess(Y ). Thefollowingdefinition j introduces the concept of an absolutely regular process which is also known as β-mixing. Definition 2.1. A stationary process (Z ) is called absolutely regular if j j∈Z βk = supE sup (cid:12)(cid:12)P(cid:0)A|Fn∞+k(cid:1)−P(A)(cid:12)(cid:12) → 0 n≥1 A∈Fn 1 as k → ∞, where Fb is the σ-field generated by random variables Z ,...,Z . a a b The coefficients β are called mixing coefficients. For further information about mixing k conditions see Bradley (2002). The concept of absolute regularity covers a wide range of processes. However, important processes like linear processes or AR processes might not be absolutely regular. To overcome this restriction, in this paper we discuss functionals of absolutely regular processes, i.e. instead of focusing on the absolute regular process (Z ) itself, we consider process (Y ) with Y = f(Z ,Z ,Z ,...), where f : RZ → R j j j j j−1 j−2 is a measurable function. The following near epoch dependence condition ensures that Y mainly depends on the near past of (Z ). j j Definition 2.2. We say that stationary process (Y ) is L near epoch dependent (L j 1 1 NED)onastationaryprocess(Z )withapproximationconstantsa , k ≥ 0, ifconditional j k expectations E(Y |Gk ), where Gk is the σ-field generated by Z ,...,Z , have property 1 −k −k −k k (cid:12) (cid:12) E(cid:12)Y −E(Y |Gk )(cid:12) ≤ a , k = 0,1,2,... (cid:12) 1 1 −k (cid:12) k and a → 0, k → ∞. k Note that L NED is a special case of more general L near epoch dependence, where 1 r approximationconstantsaredefinedusingLr norm: E(cid:12)(cid:12)Y1−E(Y1|G−kk)(cid:12)(cid:12)r ≤ ak, r ≥ 1. Lr NED processes are also called r-approximating functionals. In testing problems consid- ered in this paper we allow for heavy-tailed distributions. Hence, we deal with L near 1 epoch dependence, which assumes existence of only the first moment E|Y |. The con- 1 cept of near epoch dependence is applicable e.g. to GARCH(1,1) processes, see Hansen (1991), and linear processes, see Example 2.1 below. Borovkova et al. (2001) provide additional examples and information about properties of L near epoch dependent pro- r cess. Example 2.1. Let (Y ) be a linear process, i.e. Y = (cid:80)∞ ψ Z , where (Z ) is white- j t j=0 j t−j j noise process and the coefficients ψ , j ≥ 0, are absolutely summable. Since (Z ) is j j stationary and Z is Gk measurable for |t−j| ≤ k, we get t−j −k ∞ (cid:88) E|Y −E(Y |Gk )| ≤ |ψ |E|Z −E(Z |Gk )| t t −k j t−j t−j −k j=k+1 ∞ ∞ (cid:88) (cid:88) ≤ 2 |ψ |E|Z | = 2E|Z | |ψ |. j t−j 1 j j=k+1 j=k+1 3 Thus, the linear process (Y ) is L NED on (Z ) with approximation constants a = j 1 j k 2E|Z |(cid:80)∞ |ψ |. 1 j=k+1 j We will assume that the process (Y ) in (3) is L near epoch dependent on some abso- j 1 lutely regular process (Z ). In addition, we impose the following condition on the decay j of the mixing coefficients β and approximation constants a : k k ∞ (cid:88) √ k2(β + a ) < ∞. (6) k k k=1 The next theorem states the rates of consistency of the Wilcoxon-type change-point estimator kˆ given in (2) and the estimator θˆ= kˆ/n of the true location parameter θ for the change-point k∗ = [nθ]. Theorem 2.1. Let X ,...,X follow the change-point model (3) and Assumption 2.1 1 n be satisfied. Assume that (Y ) is a stationary zero mean L near epoch dependent process j 1 on some absolutely regular process (Z ) and (6) holds. Then, j (cid:12) (cid:12) (cid:16) 1 (cid:17) (cid:12)kˆ−k∗(cid:12) = O , (7) (cid:12) (cid:12) P ∆2 n and (cid:12) (cid:12) (cid:16) 1 (cid:17) (cid:12)θˆ−θ(cid:12) = O . (8) (cid:12) (cid:12) P n∆2 n The rate of consistency of θˆ in (8) is given by n∆2. The assumption n∆2 → ∞ in (5) n n implies kˆ−k∗ = o (k∗) and yields consistency of the estimator: θˆ→ θ. In particular, P p for ∆n ≥ n−1/2+(cid:15), (cid:15) > 0, the rate of consistency in (8) is n2(cid:15): (cid:12)(cid:12)θˆ−θ(cid:12)(cid:12) = OP (cid:0)n−2(cid:15)(cid:1). The same consistency rate n(cid:15) for the CUSUM-type change-point location estimator θ˜ = k˜ /n, given by C C (cid:26) (cid:12) i n (cid:12) (cid:12) k n (cid:12)(cid:27) k˜C = min k : max (cid:12)(cid:12)(cid:88)Xj − i (cid:88)Xj(cid:12)(cid:12) = (cid:12)(cid:12)(cid:88)Xj − k (cid:88)Xj(cid:12)(cid:12) , (9) 1≤i≤n(cid:12) n (cid:12) (cid:12) n (cid:12) j=1 j=1 j=1 j=1 wasestablishedbyAntochet al.(1995)forindependentdataandbyCs¨org¨oandHorv´ath (1997) for weakly dependent data. 3 Simulation results In this simulation study we compare the finite sample properties of the Wilcoxon-type change-pointestimatorkˆ, givenin(2), withtheCUSUM-typeestimatork˜ , givenin(9). C We refer to the Wilcoxon-type change-point estimator by W and to the CUSUM-type estimator by C. 4 We generate the sample of random variables X ,...,X using the model 1 n (cid:40) Y +µ ,1 ≤ i ≤ k∗ i X = (10) i Y +µ+∆ ,k∗ < i ≤ n i whereY = ρY +(cid:15) is an AR(1) process. In our simulations we consider ρ = 0.4, which i i−1 i yields a moderate positive autocorrelation in X . The innovations (cid:15) are generated from i i a standard normal distribution and a Student’s t-distribution with 1 degree of freedom. We consider the time of change k∗ = [nθ], θ = 0.25,0.5,0.75, the magnitude of change ∆ = 0.5,1,2 and the sample sizes n = 50,100,200,500. All simulation results are based on 10.000 replications. Note that we report estimation results not for kˆ and k˜ , but C θˆ= kˆ/n and θ˜ = k˜ /n. C C Figure 1 contains the histogram based on the sample of 10.000 values of Wilcoxon-type estimator θˆand the CUSUM-type estimator θ˜ , for the model (10) with ∆ = 1, θ = 0.5, C n = 50 and independent standard normal innovations (cid:15) . Both estimation methods give i very similar histograms. Table 1 reports the sample mean and the sample standard deviation based on 10.000 values of θˆand θ˜ for other choices of parameters ∆ and θ. It shows that performance C of both estimators improves when the sample size n and the magnitude of change ∆ are rising, and when the change happens in the middle of the sample. In general, Wilcoxon- type estimator performs in all experiments as good as the CUSUM-type estimator. Figure2showsthehistogrambasedon10.000valuesofθˆandθ˜ ,forthemodel(10)with C t -distributed heavy-tailed iid innovations (cid:15) , ∆ = 1, θ = 0.5 and n = 500. For heavy- 1 i tailedinnovations(cid:15) ,bothestimatorsdeviatefromthetruevalueoftheparameterθmore i significantly than under normal innovations. Nevertheless, the Wilcoxon-type estimator seems to outperform the CUSUM-type estimator. Figure 3 shows the histogram based on 10.000 values for θˆ and θ˜ when the data C X ,...,X is generated by (10) with ∆ = 1, θ = 0.5, n = 200 and (cid:15) ∼ NIID(0,1) 1 n i and contains outliers. The outliers are introduced by multiplying observations X , [0.2n] X , X and X by the constant M = 50. The histogram shows that the [0.3n] [0.6n] [0.8n] Wilcoxon-type estimator is rarely affected by the outliers, whereas the CUSUM-type estimator suffers large distortions. Table 2 reports the sample mean and the sample standard deviation based on 10.000 values of θˆand θ˜ for ∆ = 1 and θ = 0.5 for sample size n = 50,100,200,500 in the case C of the normal, normal with outliers and t -distributed innovations. Figures 1, 2 and 3 1 presents results for n = 50,200,500. Ingeneral,weconcludethattheWilcoxon-typechange-pointlocationestimatorperforms equally well as the CUSUM-type change-point estimator in standard situations, but outperforms the CUSUM-type estimator in presence of heavy tails and outliers. 5 0 0 0 0 5 5 3 3 0 0 0 0 5 5 Frequency 15002 Frequency 15002 0 0 0 0 5 5 0 0 0.0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 (a) CUSUM θ˜ (b) Wilcoxon θˆ C Figure 1: Histogram based on 10.000 values for the Wilcoxon-type estimator θˆand the CUSUM-type estimator θ˜ . X follows the model (10) with ∆ = 1, θ = 0.5, C i n = 50 and normal innovations (cid:15) ∼ NIID(0,1). i n=50 n=100 n=200 n=500 ∆ θ C W C W C W C W 0.5 0.25 mean 0.46 0.46 0.43 0.44 0.40 0.40 0.34 0.34 sd 0.21 0.21 0.20 0.20 0.18 0.18 0.13 0.13 0.50 mean 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 sd 0.18 0.18 0.16 0.16 0.13 0.13 0.08 0.08 0.75 mean 0.54 0.54 0.57 0.56 0.61 0.61 0.66 0.66 sd 0.20 0.20 0.20 0.20 0.18 0.18 0.13 0.13 1 0.25 mean 0.39 0.39 0.35 0.35 0.31 0.31 0.28 0.28 sd 0.18 0.18 0.14 0.14 0.10 0.10 0.05 0.06 0.50 mean 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 sd 0.12 0.12 0.09 0.09 0.05 0.05 0.02 0.02 0.75 mean 0.61 0.60 0.65 0.65 0.69 0.69 0.72 0.72 sd 0.17 0.17 0.15 0.15 0.10 0.10 0.05 0.06 2 0.25 mean 0.30 0.31 0.28 0.29 0.27 0.28 0.26 0.26 sd 0.10 0.10 0.06 0.07 0.04 0.04 0.02 0.02 0.50 mean 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 sd 0.05 0.05 0.03 0.03 0.02 0.01 0.01 0.01 0.75 mean 0.69 0.68 0.72 0.71 0.73 0.73 0.74 0.74 sd 0.09 0.10 0.06 0.07 0.04 0.04 0.02 0.02 Table 1: Sample mean and the sample standard deviation based on 10.000 values of θˆ and θ˜ . X follows the model (10) with normal innovations (cid:15) ∼ NIID(0,1). C i i 6 0 0 0 0 2 0 6 0 0 Frequency 400 Frequency 100015 0 20 00 5 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 (a) CUSUM θ˜ (b) Wilcoxon θˆ C Figure 2: HistogramofCUSUM-typeestimatorθ˜ andWilcoxon-typeestimatorθˆbased C on 10.000 values of θ˜ and θˆfor the model (10) with iid t -distributed innova- C 1 tions, ∆ = 1, θ = 0.5 and n = 500. 0 0 00 40 5 2 0 Frequency 1500 Frequency 2000300 0 0 00 10 5 0 0 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 (a) CUSUM θ˜ (b) Wilcoxon θˆ C Figure 3: Histogram based on 10.000 values of θ˜ and θˆfor the model (10) with normal C innovations (cid:15) ∼ NIID(0,1), ∆ = 1, θ = 0.5, n = 200 and outliers. i n=50 n=100 n=200 n=500 Innovations C W C W C W C W normal mean 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 sd 0.12 0.12 0.09 0.09 0.05 0.05 0.02 0.02 t mean 0.52 0.50 0.51 0.50 0.51 0.50 0.50 0.50 1 sd 0.23 0.20 0.24 0.19 0.24 0.17 0.25 0.14 normal with mean 0.50 0.49 0.50 0.50 0.50 0.50 0.51 0.50 outliers sd 0.17 0.13 0.16 0.09 0.15 0.06 0.09 0.02 Table 2: Sample mean and the sample standard deviation of θˆand θ˜ based on 10.000 C replicationsforthenormal, normalwithoutliersandt -distributedinnovations, 1 ∆ = 1 and θ = 0.5. 7 4 Useful properties of the Wilcoxon test statistic and proof of Theorem 2.1 This section presents some useful properties of the Wilcoxon test statistic and the proof of Theorem 2.1. Throughout the paper without loss of generality, we assume that µ = 0 and ∆ > 0. We n let C denote a generic non-negative constant, which may vary from time to time. The notation a ∼ b means that two sequences a and b of real numbers have property n n n n a /b → c, as n → ∞, where c (cid:54)= 0 is a constant. (cid:107)g(cid:107) = sup |g(x)| stands for the n n ∞ x d supremum norm of function g. By −→ we denote the convergence in distribution, by → p d the convergence in probability and by = we denote equality in distribution. 4.1 U-statistics and Hoeffding decomposition The Wilcoxon test statistic W (k) in (1) under the change-point model (3) can be n decomposed into two terms k n (cid:88) (cid:88) W (k) = (1 −1/2) n {Xi≤Xj} i=1j=k+1 (cid:40)(cid:80)k (cid:80)n (1 −1/2)+(cid:80)k (cid:80)n 1 , 1 ≤ k ≤ k∗ = i=1 j=k+1 {Yi≤Yj} i=1 j=k∗+1 {Yj<Yi≤Yj+∆n} (cid:80)k (cid:80)n (1 −1/2)+(cid:80)k∗ (cid:80)n 1 , k∗ < k ≤ n, i=1 j=k+1 {Yi≤Yj} i=1 j=k+1 {Yj<Yi≤Yj+∆n} (cid:40) U (k)+U (k,k∗), 1 ≤ k ≤ k∗ n n = (11) U (k)+U (k∗,k), k∗ < k ≤ n, n n where k n (cid:88) (cid:88) U (k) = (1 −1/2), 1 ≤ k ≤ n, (12) n {Yi≤Yj} i=1j=k+1 k n (cid:88) (cid:88) U (k,k∗) = 1 , 1 ≤ k ≤ k∗, (13) n {Yj<Yi≤Yj+∆n} i=1j=k∗+1 k∗ n (cid:88) (cid:88) U (k∗,k) = 1 , k∗ < k ≤ n. (14) n {Yj<Yi≤Yj+∆n} i=1j=k+1 The first term U (k) depends only on the underlying process (Y ), while the terms n j U (k,k∗) and U (k∗,k) depend in addition on the change-point time k∗ and the magni- n n tude ∆ of the change in the mean. n The term U (k) can be written as a second order U-statistic n k n (cid:88) (cid:88) U (k) = (h(Y ,Y )−Θ), 1 ≤ k ≤ n, n i j i=1j=k+1 8 with the kernel function h(x,y) = 1 and the constant Θ = Eh(Y(cid:48),Y(cid:48)) = 1/2, {x≤y} 1 2 where Y(cid:48) and Y(cid:48) are independent copies of Y . 1 2 1 We apply to U (k) Hoeffding’s decomposition of U-statistics established by Hoeffding n (1948). It allows to write the kernel function as the sum h(x,y) = Θ+h (x)+h (y)+g(x,y), (15) 1 2 where h (x) = Eh(cid:0)x,Y(cid:48)(cid:1)−Θ = 1/2−F(x), h (y) = Eh(cid:0)Y(cid:48),y(cid:1)−Θ = F(y)−1/2, 1 2 2 1 g(x,y) = h(x,y)−h (x)−h (y)−Θ. 1 2 By definition of h and h , Eh (Y ) = 0 and Eh (Y ) = 0. Hence, Eg(x,Y ) = 1 2 1 1 2 1 1 Eg(Y ,y) = 0, i.e. g(x,y) is a degenerate kernel. 1 The term U (k,k∗) in (13) (and U (k∗,k) in (14)) can be written as a U-statistic n n k n (cid:88) (cid:88) U (k,k∗) = h (Y ,Y ), 1 ≤ k ≤ k∗, n n i j i=1j=k∗+1 with the kernel h (x,y) = h(x,y+∆ )−h(x,y) = 1 . The Hoeffding decom- n n {y<x≤y+∆n} position allows to write the kernel as h (x,y) = Θ +h (x)+h (y)+g (x,y), (16) n ∆n 1,n 2,n n with Θ = E1 , ∆n {Y2(cid:48)≤Y1(cid:48)≤Y2(cid:48)+∆n} h (x) = Eh (cid:0)x,Y(cid:48)(cid:1)−Θ = F(x)−F(x−∆ )−Θ , 1,n n 2 ∆n n ∆n h (y) = Eh (cid:0)Y(cid:48),y(cid:1)−Θ = F(y+∆ )−F(y)−Θ , 2,n n 1 ∆n n ∆n g (x,y) = h (x,y)−h (x)−h (y)−Θ . n n 1,n 2,n ∆n By assumption the distribution function F of Y has bounded probability density f and 1 bounded second derivative. This allows to specify the asymptotic behaviour of Θ , as ∆n n → ∞, Θ = E1 = P(cid:0)Y(cid:48) < Y(cid:48) ≤ Y(cid:48)+∆ (cid:1) ∆n {Y2(cid:48)<Y1(cid:48)≤Y2(cid:48)+∆n} 2 1 2 n (cid:90) (cid:18)(cid:90) (cid:19) = (F(y+∆ )−F(y))dF (y) = ∆ f2(y)dy+o(1) . (17) n n R R Note that Eh (Y ) = 0 and Eh (Y ) = 0. Therefore, g (x,y) is a degenerate kernel, 1,n 1 2,n 1 n i.e. Eg (x,Y ) = Eg (Y ,y) = 0. Furthermore, (cid:107)h (cid:107) → 0, as n → ∞, since n 1 n 1 1,n ∞ |h (x)| ≤ |F(x)−F(x−∆ )−Θ | ≤ C∆ +Θ ≤ C∆ , (18) 1,n n ∆n n ∆n n where C > 0 is a constant and ∆ → 0, as n → ∞. n 9 4.2 1-continuity property of kernel functions h and h n Asymptotic properties of near epoch dependent processes (Y ) introduced in Section 2 j are well investigated in the literature, see e.g. Borovkova et al. (2001). In the context of change-point estimation we are interested in asymptotic properties of the variables h(Y ,Y ), where h(x,y) = 1 is the Wilcoxon kernel, and also in properties of the i j {x≤y} termsh (Y )andh (Y )oftheHoeffdingdecompositionofthekernelsin(15)and(16). 1 j 1,n j We will need to show that the variables (h(Y ,Y )), (h (Y )) and (h (Y )) retain some i j 1 j 1,n j properties of (Y ). To derive them, we will use the fact that the kernels h in (15) and j h in (16) satisfy the 1-continuity condition introduced by Borovkova et al. (2001). n Definition 4.1. We say that the kernel h(x,y) is 1-continuous with respect to a distri- bution of a stationary process (Y ) if there exists a function φ((cid:15)) ≥ 0, (cid:15) ≥ 0 such that j φ((cid:15)) → 0, (cid:15) → 0, and for all (cid:15) > 0 and k ≥ 1 E(cid:16)(cid:12)(cid:12)h(Y1,Yk)−h(cid:0)Y1(cid:48),Yk(cid:1)(cid:12)(cid:12)1{|Y1−Y1(cid:48)|≤(cid:15)}(cid:17) ≤ φ((cid:15)), (19) E(cid:16)(cid:12)(cid:12)h(Yk,Y1)−h(cid:0)Yk,Y1(cid:48)(cid:1)(cid:12)(cid:12)1{|Y1−Y1(cid:48)|≤(cid:15)}(cid:17) ≤ φ((cid:15)), and E(cid:16)(cid:12)(cid:12)h(cid:0)Y1,Y2(cid:48)(cid:1)−h(cid:0)Y1(cid:48),Y2(cid:48)(cid:1)(cid:12)(cid:12)1{|Y1−Y1(cid:48)|≤(cid:15)}(cid:17) ≤ φ((cid:15)), (20) E(cid:16)(cid:12)(cid:12)h(cid:0)Y2(cid:48),Y1(cid:1)−h(cid:0)Y2(cid:48),Y1(cid:48)(cid:1)(cid:12)(cid:12)1{|Y1−Y1(cid:48)|≤(cid:15)}(cid:17) ≤ φ((cid:15)), where Y(cid:48) is an independent copy of Y and Y(cid:48) is any random variable that has the same 2 1 1 distribution as Y . 1 For a univariate function g(x) we define the 1-continuity property as follows. Definition 4.2. The function g(x) is 1-continuous with respect to a distribution of a stationary process (Y ) if there exists a function φ((cid:15)) ≥ 0, (cid:15) ≥ 0 such that φ((cid:15)) → 0, j (cid:15) → 0, and for all (cid:15) > 0 E(cid:16)(cid:12)(cid:12)g(Y1)−g(cid:0)Y1(cid:48)(cid:1)(cid:12)(cid:12)1{|Y1−Y1(cid:48)|≤(cid:15)}(cid:17) ≤ φ((cid:15)), (21) where Y(cid:48) is any random variable that has the same distribution as Y . 1 1 Corollary4.1belowestablishesthe1-continuityoffunctionsh(x,y) = 1 andh (x,y) = {x≤y} n 1 , n ≥ 1. For h , n ≥ 1 we assume that (19) and (20) hold with the same {y<x≤y+∆n} n φ((cid:15)) for all n ≥ 1. We start the proof by showing the 1-continuity of the more general kernel function h(x,y;t) = 1 . {x−y≤t} Lemma 4.1. Let (Y ) be a stationary process, Y have distribution function F which j 1 has bounded first and second derivative and Y −Y , k ≥ 1 satisfy (4). Then the function 1 k h(x,y;t) = 1 is 1-continuous with respect to the distribution function of (Y ). {x−y≤t} j 10