ebook img

A Novel Approach for Fast Detection of Multiple Change Points in Linear Models PDF

0.36 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Novel Approach for Fast Detection of Multiple Change Points in Linear Models

A Novel Approach for Fast Detection of Multiple Change Points in Linear Models a a b Xiaoping Shi , Yuehua Wu and Baisuo Jin aDepartment of Mathematics and Statistics, York University, Toronto, Ontario, Canada; 1 1 bDepartment of Statistics and Finance, University of Science and Technology of China, 0 2 Hefei, Anhui, China n a J 1 2 Abstract A change point problem occurs in many statistical applications. If there exist ] E change points ina model, it is harmful to make a statistical analysis without any consideration M of the existence of the change points and the results derived from such an analysis may be . t a misleading. There are rich literatures on change point detection. Although many methods t s [ have been proposed for detecting multiple change points, using these methods to find multiple 1 v change points in a large sample seems not feasible. In this article, a connection between 5 8 1 multiple change point detection and variable selection through a proper segmentation of data 4 . sequence is established, and a novel approach is proposed to tackle multiple change point 1 0 1 detection problem via the following two key steps: (1) apply the recent advances in consistent 1 : v variable selection methods such as SCAD, adaptive LASSO and MCP to detect change points; i X (2) employ a refine procedure to improve the accuracy of change point estimation. Five r a algorithms are hence proposed, which can detect change points with much less time and more accuracy compared to those in literature. In addition, an optimal segmentation algorithm based on residual sum of squares is given. Our simulation study shows that the proposed algorithms are computationally efficient with improved change point estimation accuracy. The new approach is readily generalized to detect multiple change points in other models such as generalized linear models and nonparametric models. KEY WORDS: Adaptive LASSO; Asymptotic normality; Least squares; Linear model; MCP; 1 Multiple change point detection algorithm; SCAD; Variable selection. 1. Introduction The most popular statistical model used in practice is a linear model, which has been ex- tensively studied in the literature. This model is simple and can be used to approximate a nonlinear function locally. However there may be change points in a linear model such that the regression parameters may change at these points. Thus if there do exist change points in a linear model, the linear model is actually a segmented linear model. A change point problem occurs in many statistical applications in the areas including medical and health sciences, life science, meteorology, engineering, financial econometrics and risk management. To detect all change points are of great importance in statistical applications. If there exists a change point, it is harmful to make a statistical analysis without any consideration of the existence of this change point and the results derived from such an analysis may be misleading. There are rich literatures on change point detection, see, e.g., Cs¨orgo˝ and Horva´th (1997) and Chen and Gupta (2000). Compared with the detection of one change point, to locate all change points is a very challenge problem. Although, it has been studied in literature (see Davis, Lee, and Rodriguez- Yam (2006), Pan and Chen (2006), and Kim, Yu and Feuer (2009), and Loschi, Pontel and Cruz (2010) among others), a powerful and efficient method still needs to be explored. Thus this paper is mainly concerned with the multiple change point detection problem in linear regression. ConsideralinearmodelwithK 6 K < multiplechangepointslocatedata(0),...,a(0) : 0 U ∞ 1,n K0,n q K0 q y = x β + x δ(ℓ)I(a(0) < i 6 n)+ε i,n i,j,n j,0 i,j,n j,0 ℓ,n i,n j=1 ℓ=1 j=1 X XX K0 = xT β + δ I(a(0) < i 6 n) +ε , i = 1,...,n, (1) i,n 0 ℓ,0 ℓ,n i,n " # ℓ=1 X 2 where x = (x ,...,x )T isasequenceofq-dimensionalpredictors, β = (β ,...,β )T { i,n i,1,n i,q,n } 0 1,0 q,0 = 0 is unknown q-dimensional vector of regression coefficients, K is unknown number of 0 6 (0) (0) change points, a , ..., and a are unknown change point locations (or change points), 1,n K0,n δ , 1 6 ℓ 6 K , denote unknown amounts of changes in regression coefficient vectors at ℓ,0 0 change points, and ε ,...,ε are random errors. In this paper, we assume that K is an 1,n n,n U (0) upper bound of K . Set a = n. If there is no change point, K = 0 and the model (1) 0 K0+1,n 0 becomes q y = x β +ε , i = 1,...,n. i,n i,j,n j,0 i,n j=1 X Otherwise, K > 1, and we assume that 0 0 < a(0)/n τ < 1, for 1 6 ℓ 6 K . (2) ℓ,n → ℓ 0 If K > 2, we assume that 0 min (τ τ ) > 0 (3) ℓ+1 ℓ 16ℓ6K0 1 − − (0) (0) is unknown. The problem studied in this paper is to estimate K , a , ..., and a or in 0 1,n K0,n other words to detect multiple change points. If there is no confusion, the superscript “(0)”, subscript “0”, and subscript n will be suppressed. For detecting multiple change points, it may be convenient to consider the following linear model with probable multiple change points located at 1 < a < < a < n 1,n K,n ··· K y = xT β + δ I(a < i 6 n) +ε , i = 1,...,n, (4) i i ℓ ℓ,n i " # ℓ=1 X where β, δ , ..., δ are unknown q-dimensional parameter vectors. We can instead test the 1 K following null hypothesis: H : There is no change point, i.e., for any 1 < a < < a < n, 0 1,n K,n ··· δ = (δ(ℓ), ..., δ(ℓ))T = 0 for any ℓ 1,...,K , where 1 6 K 6 K ℓ 1 q ∈ { } U 3 versus the alternative hypothesis: H : There exist 1 6 K 6 K change points, i.e., there exist 1 < a < < a < n 1 U 1,n K,n ··· such that δ = (δ(ℓ),...,δ(ℓ))T = 0 for any ℓ 1,...,K . ℓ 1 q 6 ∈ { } Many classical methods have been given in literature for detecting change points, which in- clude the popular model selection based change point detection method and the well known cumulative sum (CUSUM) method. However the amounts of computing time required by these two typical change point detection methods are respectively O(2n) and O(n2). When n is very large, using these methods to find multiple change points seems not feasible. If the set of all true change points in the model (4) is a subset of a , 1 6 ℓ 6 K , it is ℓ,n { } easy to see that a is a change point if and only if δ = 0. We rewrite (4) as follows: j,n j 6 y = X β˜ +ε , (5) n n n where y = (y ,y , ,y )T, β˜ = (βT,δT,...,δT )T, ε = (ε ,ε ,...,ε )T, and 1 2 ··· n 1 K n 1 2 n X 0 0 0 (0,1) (0,1) (0,1) (0,1) ··· X X 0 0 X =  (.1,2) (.1,2) (1,2) ··· (1.,2)  n . . . . . ... ... .    X X X X  (K,K+1) (K,K+1) (K,K+1) (K,K+1)  ··· n (K+1)q   × with 0 is a zero matrix of dimension (a a ) q, and a = 0, (j 1,j) j,n j 1,n 0,n − − − × x x aj−1,n+1,1 ··· aj−1,n+1,q . . X(j 1,j) =  .. ..  for j = 1,...,K +1. − ··· x x  aj,n,1 ··· aj,n,q (aj,n aj−1,n) q   − × Thus to detect all the true change points and remove the pseudo change points in (4) can be considered as a variable selection problem for the linear regression model (5), and we may tackle the problem by employing variable selection methods. This leads us to explore a possibility by first properly segmenting data sequence and then applying variable selection methods and/or other methods for detecting probable multiple change points. 4 The paper is arranged as follows. The segmentation of data sequence and multiple change point estimation are discussed in Section 2. Five algorithms for detecting probable multiple change points are proposed in Section 3. Simulation studies and practical recommendations are given in Section 4. Two real data examples are provided in Section 5. Throughout the rest of the paper, 1 = (1,...,1)T is the q-dimensional vector, I is the q q q q identity matrix, an indicator function is written as I( ), the transpose of a matrix A × · is denoted by AT, and c is the integer part of a real number c. For a vector a, aT is its ⌊ ⌋ transpose, a(j) is itsjth component, a , a and a arerespectively its L -norm, L -norm 1 2 | | k k k k∞ (Euclidean norm) and L norm. If is a set, its complement and its size are denoted by ∞ A ¯ and , respectively. In addition, the notations “ ” and “ ” denote convergence in p d A |A| → → probability and convergence in distribution, respectively. Furthermore, the (1 α)th quantile − of the chi-square distribution with ℓ degrees of freedom is denoted by χ2 . α,ℓ 2. Segmentation and Change Point Estimation For a multiple change point detection problem, the multiple change point locations are un- known and in practice their approximate locations within a permissible range is main concern, which inspires us to partition the data sequence to search for change points. We thus divide the data sequence into p + 1 segments. Let m = m = n/(p +1) . The segmentation is n n n ⌊ ⌋ such that the first segment has length 0 < m 6 n p m 6 c m with some c > 1 and each n 0 0 − of the rest p segments has length m. Without loss of generality, we assume that p as n n → ∞ n . The partition of the data sequence yields the following segmented regression model: → ∞ pn y = xT β + d I(n (p ℓ+1)m < i 6 n) i i ℓ − n − " ℓ=1 X pn + ω (i)I(n (p ℓ+1)m < i 6 n (p ℓ)m) +ε , i = 1,...,n, (6) ℓ n n i − − − − # ℓ=1 X where two sets d ,...,d and 0,δ ,...,δ are equal, and ω are defines as follows: { 1 pn} { 1 K0} { ℓ} if there is a change point located in n (p ℓ+1)m+1,...,n (p ℓ)m 1 , say a , n n k,n { − − − − − } 5 then δ , n (p ℓ+1)m < i 6 a < n (p ℓ)m, ω (i) = − k − n − k,n − n − ℓ 0, elsewhere; (cid:26) otherwise, ω (i) = 0, i = 1,...,n. ℓ The model (6) can be written as pn y = X˜ θ +X ω~ +ε , (7) n n n ω ℓ n ℓ=1 X where y and ε are defined in Section 1, θ = (θ ,...,θ )T = (βT,dT,...,dT )T, n n n 1 q(pn+1) 1 pn d = (d ,...,d )T, r = 1,...,p , r r1 rq n X 0 0 0 (1) m q m q m q × × ··· × X X 0 0 X˜ =  .(2) .(2) m×q ··· m.×q  = (X(1),...,X(pn+1)) (8) n . . . n n . . ... ... .    X X X X   (pn+1) (pn+1) (pn+1) ··· (pn+1) n (pn+1)q   × with X(j) = (0 ,...,0 ,XT ,...,XT )T, n q×m q×m (j) (pn+1) x x 1,1 1,q ··· . . X(1) =  .. ..  , ··· x x  n−pnm,1 ··· n−pnm,q (n pnm) q   − × x x n−(pn−j.+2)m+1,1 ··· n−(pn−j.+2)m+1,q X(j) =  .. ..  , for j = 2,...,pn +1, ··· x x  n−(pn−j+1)m,1 ··· n−(pn−j+1)m,q m q   × X = diag(xT,...,xT),andω~ = (ωT(1),...,ωT(n))T. Itiseasytoseethatx X pn ω~ ω 1 n ℓ ℓ ℓ ω ≡ ω ℓ=1 ℓ is an n dimensional vector and all its elements excluding at most K (m 1) of them arPe zeros. 0 − It is noted that in Harchaoui and Levy-Leduc (2008), the mean-shift model is considered and the length of each of their segments is only 1. Consider a special case that each true change point is at an end of a segment. Then an end of a segment is a true change point if and only if the corresponding d = 0. Thus to locate r 6 6 all the true change points in (1) is equivalent to carry out variable selection. Since p , n → ∞ we may take advantage of the recent advances in consistent variable selection methods for a linear regression model as (7) with a large number of regression coefficients, which include the SCAD (Fan and Li (2001)), the adaptive LASSO (Zhou (2006)), and the MCP (Zhang (2010)) among others. Let us examine the relationship between the models (1) and (7). It can be seen that under the null hypothesis H , β = β , and d = 0, r 1, ,p . We now assume that H hold. 0 0 r ∈ { ··· n} 1 Thus, there exist r , k = 1, ,K such that a n p m+(r 1)m,...,n p m+ k 0 k,n n k n { ··· } ∈ { − − − r m 1 . Since K is finite with an upper bound K , in view of (2) and (3), it follows that k 0 U − } β = β , d = 0, d = δ = 0, and d = 0 (9) 0 rk−1 rk k 6 rk+1 for large n. Thus in order to detect all the change points a ,...,a , we may estimate { 1,n K0,n} d in advance. i { } The following assumptions are made for investigating the asymptotic properties of the estimates of d : i { } Assumption C1. t x xT/(t s) W > 0 as t s . i=s i i − → − → ∞ ItisnotedthatAsPsumptionC1isacommonassumptionmadeinchangepointanalysisfora mean shift model. Under Assumption C1, it can be shown that XT X /(n p m) W > 0, (1) (1) − n → and XT X /m W > 0 for i 2,...,p +1 . (i) (i) → ∈ { n } Remark 1. AssumptionC1issimilartoCondition(b)inZhou(2006). Ifweonlyconsider the consistency of change point estimators, Assumption C1 can be relaxed to the following weaker one: For b ,b > 0, b I 6 t x xT/(t s) 6 b I when t s is large enough. 1 2 1 q i=s i i − 2 q − Assumption C2. ε , i = 1,2P,... is a sequence of independently and identically dis- i { } tributed (i.i.d.) random variables with mean 0 and variance σ2. Remark 2. Thisassumptioncanbereplacedbyaweaker assumptionofthestrongmixing condition in (2.1) in Kuelbs and Philipp (1980), which adapts to the autoregressive models in 7 Davis, Huang and Yao (1995) and Wang, Li and Tsai (2007). Let ε , i = 1,2,... be a weak i { } sensestationarysequenceofrandomvariableswithmean0and(2+δ)thmomentsfor0 < δ 6 1 thatareuniformlyboundedbysomepositiveconstant. Supposethat ε , i = 1,2,... satisfies i { } the strong mixing condition P(AB) P(A)P(B) 6 ρ(n) 0 for all n, s > 1, all A s | − | ↓ ∈ M1 and B , where b is the σ-field generated by the random vectors ε ,ε , ,ε , ∈ M∞s+n Ma a a+1 ··· b and ρ(n) << n (1+t)(1+2/δ) for some t > 0. Then Theorem 4 and Lemma 3.4 in Kuelbs and − Philipp (1980) warrant the same results as given in Theorems 1-3 below. For simple presentation below, we assume that each of X is of full rank in this paper. (r) { } If a X is not of full rank, Moore-Penrose matrix inverse can be used instead of the matrix (r) inverse. 2.1. Estimate d by least squares i { } By least squares method, we estimate d , r = 1,...,p , as follows: r n dˆ = XT X −1XT y(r+1) XT X −1XT y(r), r = 1,...,p , (10) r (r+1) (r+1) (r+1) − (r) (r) (r) n (cid:0) (cid:1) (cid:0) (cid:1) where y(1) = (y ,...,y )T, and y(r) = (y ,...,y )T, r = 2, ..., 1 n pnm n (pn r+2)m+1 n (pn r+1)m − − − − − p +1. It is easy to see that n dˆ +dˆ = XT X −1XT y(r+2) XT X −1XT y(r). r r+1 (r+2) (r+2) (r+2) − (r) (r) (r) (cid:0) (cid:1) (cid:0) (cid:1) It is obvious that under H , for any ℓ 1,...,p and any i n p m+1,...,n , 0 n n ∈ { } ∈ { − } ω (i) = 0 and d = 0. ℓ ℓ We have the following theorem. Theorem 1. Assume that m as n . If H holds, under the assumptions C1-C2, 0 → ∞ → ∞ it follows that √mdˆ N 0,2σ2W 1 , i = 1,...,p . i d − n → (cid:0) (cid:1) 8 We now assume that H holds. In view of (9), it follows that d + d = δ . By the 1 rk rk+1 k definition of ω (i) , we have ℓ { } pn ω (i)I(n (p ℓ+1)m < i 6 n (p ℓ)m) ℓ n n − − − − ℓ=1 X δ , if r such that n (p r +1)m < a < n (p r )m, k k n k k,n n k − ∃ − − − − = (11)  0, otherwise.  It can also be verified that  pn d I(n (p ℓ+1)m < i 6 n) ℓ n − − ℓ=1 X rk−1 d , if n (p r +2)m < i 6 n (p r +1)m, ℓ n k n k − − − −  ℓ=1 =  X (12)    rk+1 d , if n (p r )m < i 6 n (p r 1)m. ℓ n k n k − − − − −  ℓ=1  X   Thus, we have thefollowing theorem:  Theorem 2. If Assumptions C1-C2 hold, then under H , 1 √m dˆ +dˆ δ N 0,2σ2W 1 , k = 1,...,K . rk rk+1 − k →d − 0 (cid:16) (cid:17) (cid:0) (cid:1) The proofs of Theorems 1-2 follow from the least squares theory. The details are omitted. 2.2. Estimate d by recent advances in consistent variable selection methods i { } 2.2.1. Estimate d by the adaptive LASSO i { } The adaptive LASSO, extending the LASSO in Tibshirani (1996), was proposed in Zhou (2006) and possesses oracle properties for fixed number of regression coefficients. In light of Zhou (2006), the adaptive LASSO type estimator of θ for the model (7) is n defined by pn 1 θ˘ = argmin y X θ 2 +λ d , (13) n θn (|| − n n|| nXr=1 |d˜r|ν | r|) 9 where ν > 0, λ is a thresholding parameter and d˜ r = 1, ,p are initial estimators n r n { ··· } satisfying certain conditions. Remark 3. The adaptive LASSO estimate of θ may also be defined by n pn q 1 q 1 θˇ = argmin y X θ 2 +λ d +γ β , (14) n θ || − n n|| n d˜ ν | ri| n β˜ ν | 0i| n Xr=1 Xi=1 | ri| Xi=1 | 0i| where µ > 0, λ and γ are thresholding parameters satisfying certain conditions. The n n difference between (13) and (14) is that the variable selection in addition to the multiple change point detection is also considered in (14). Due to the similarity in the techniques for finding the asymptotic behavior of both θ˘ and θˇ , we only consider θ˘ in this paper for n n n simple presentation. Since the dimension of θ increases with n in (7), the asymptotic results in Zhou (2006) n are not applicable here. In the following we will investigate the limiting behavior of those d s associated with change points under the condition that K > 1, i.e., there exists at least i 0 one change point in the model (1). As stated before, the subscript n may be suppressed for convenience if there is no confusion. Before we proceed, we define some notations as follows: Let = κ ,κ ,...,κ 1 2 ι B { } ⊂ 2,...,p +1 such that κ < ... < κ . Denoteθ = (dT , ,dT )T, X = (X(κ1),...,X(κι)), { n } 1 ι B κ1 ··· κι B n n (i) where X are given in (8). n { } Recall that for each δ in (1), there exists r such that d = δ , or equivalently there k k rk k exists a change point within n (p r +1)m,...,n (p r )m 1 for k = 1,...,K . n k n k 0 { − − − − − } Define = i : d = 0, d = 0, d = 0 , = i : d = 0, d = 0, d = 0 , c i 1 i i+1 1 i 1 i i+1 A { − 6 } A { − 6 } = i : d = 0, d = 0, d = 0 , = i : d = 0, d = 0, d = 0 . 2 i 1 i i+1 3 i 1 i i+1 A { − 6 } A { − } ¯ It is easy to see that for large n, = . c 1 2 3 A A ∪A ∪A 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.