ebook img

Estimation of block sparsity in compressive sensing PDF

1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Estimation of block sparsity in compressive sensing

Estimation of block sparsity in compressive sensing Zhiyong Zhou∗, Jun Yu Department of Mathematics and Mathematical Statistics, Ume˚a University, Ume˚a, 901 87, Sweden January 5, 2017 7 1 0 2 Abstract:Inthispaper,weconsiderasoftmeasureofblocksparsity,kα(x) = ((cid:107)x(cid:107)2,α/(cid:107)x(cid:107)2,1)1−αα ,α ∈ [0,∞] n andproposeaproceduretoestimateitbyusingmultivariateisotropicsymmetricα-stablerandomprojections a J without sparsity or block sparsity assumptions. The limiting distribution of the estimator is given. Some 4 simulations are conducted to illustrate our theoretical results. ] Keywords: Block sparsity; Multivariate isotropic symmetric α-stable distribution; Compressive sensing; P A Characteristic function. . t a t 1 Introduction s [ 1 Since its introduction a few years ago [4, 5, 6, 9], Compressive Sensing (CS) has attracted considerable v interests (see the monographs [12, 15] for a comprehensive view). Formally, one considers the standard CS 5 5 model, 0 1 0 y = Ax+ε, (1.1) . 1 0 where y ∈ Rm×1 is the measurements, A ∈ Rm×N is the measurement matrix, x ∈ RN is the unknown 7 1 signal, ε is the measurement error, and m (cid:28) N. The goal of CS is to recover the unknown signal x by : v using only the underdetermined measurements y and the matrix A. Under the assumption of sparsity of the i X signal, that is x has only a few nonzero entries, and the measurement matrix A is properly chosen, x can r be recovered from y by a lot of algorithms, such as the Basis Pursuit (BP), or (cid:96) -minimization approach, a 1 the Orthogonal Matching Pursuit (OMP) [28], Compressive sampling matching pursuit (CoSaMP) [23] and the Iterative Harding Thresholding algorithm [2]. Specifically, when the sparsity level of the signal x is s = (cid:107)x(cid:107) = card{j : x (cid:54)= 0}, if m ≥ Csln(N/s) with some universal constant C, A is subgaussian random 0 j matrix, then accurate or robust recovery can be guaranteed with high probability. The sparsity level parameter s plays a fundamental role in CS, as the number of measurements, the properties of measurement matrix A, and even some recovery algorithms all involves it. However, the spar- sity level of the signal is usually unknown in practice. To fill the gap between theory and practice, very (cid:16) (cid:17) α recently [16, 17] proposed a numerically stable measure of sparsity s (x) = (cid:107)x(cid:107)α 1−α, which is in ratios of α (cid:107)x(cid:107)1 norms. By random linear projection using i.i.d univariate symmetric α-stable random variables, the author constructedtheestimationequationfortheparameterwiththecharacteristicfunctionmethodandobtained the asymptotic normality of the estimators. ∗Corresponding author, [email protected]. 1 As a natural extension of the sparsity with nonzero entries arbitrarily spread throughout the signal, we can consider the sparse signals exhibit additional structure in the form of the nonzero entries occurring in clusters. Such signals are referred to as block-sparse [10, 11, 13]. Block sparsity model appears in many practical scenarios, such as when dealing with multi-band signals [22] or in measurements of gene expression levels[25].Moreover,blocksparsitymodelcanbeusedtotreattheproblemsofmultiplemeasurementvector (MMV) [7, 8, 13, 21] and sampling signals that lie in a union of subspaces [3, 13, 22]. Tomakeexplicituseoftheblockstructuretoachievebettersparserecoveryperformance,thecorrespond- ing extended versions of sparse representation algorithms have been developed, such as mixed (cid:96) /(cid:96) -norm 2 1 recovery algorithm [11, 13, 27], group lasso [29] or adaptive group lasso [18], iterative reweighted (cid:96) /(cid:96) re- 2 1 covery algorithms [30], block version of OMP algorithm [11] and the extensions of the CoSaMP algorithm and of the Iterative Hard Thresholding to the model-based setting [1], which includes block-sparsity as a special case. It was shown in [13] that if the measurement matrix A has small block-restricted isometry constants which generalizes the conventional RIP notion, then the mixed (cid:96) /(cid:96) -norm recovery algorithm 2 1 is guaranteed to recover any block-sparse signal, irrespectively of the locations of the nonzero blocks. Fur- thermore, recovery will be robust in the presence of noise and modeling errors (i.e., when the vector is not exactly block-sparse). [1] showed that the block versions of CoSaMP and Iterative Hard Thresholding exhibit provable recovery guarantees and robustness properties. In addition, with the block-coherence of A is small, the robust recovery of mixed (cid:96) /(cid:96) -norm method, and the block version of the OMP algorithm are 2 1 guaranteed in [11]. The block sparsity level plays the same central role in recovery for block-sparse signals as the sparsity level in recovery for sparse signals. And, in practice, the block-sparsity level of the signals are also unknown. To obtain its estimator is very important from both the theoretical and practical views. In this paper, as a extension of the sparsity estimation in [16, 17], we consider a stable measure of block sparsity and obtain its estimator by using multivariate isotropic symmetric α-stable random projection. When the block size is 1, then our estimation procedure reduces to the case considered in [17]. To estimate the conventional sparsity, the author used the random projection by i.i.d univariate symmetric α-stable random projection. While, to estimate the block sparsity, we need the multivariate isotropic symmetric α-stable random projection, and the components in the multivariate random vectors are dependent, except in multivariate normal case. With minor modification, the limiting distributions of the estimators can be obtained similar to the results presented in [17]. The remainder of the paper is organized as follows. In Section 2, we introduce the definition of block sparsity and the soft measure of block sparsity. In Section 3, we present the estimation procedure for the block sparsity and obtain the limiting results for the estimators. In Section 4, we conduct some simulations to illustrate the theoretical results. Section 5 is devoted to the conclusion. Finally, the proofs are postponed to the Appendix. Throughout the paper, we denote vectors by boldface lower letters e.g., x, and matrices by upper letters e.g., A. Vectors are columns by default. xT is the transpose of the vector x. For any vector x ∈ RN, we N denote the (cid:96) -norm (cid:107)x(cid:107) = ((cid:80) |x |p)1/p for p > 0. I(·) is the indicative function. E is the expectation p p j j=1 function. (cid:98)·(cid:99) is the bracket function, which takes the maximum integer value. Re(·) is the real part function. p i is the imaginary unit. (cid:104)·,·(cid:105) is the inner product of two vectors. → indicates convergence in probability, d while → is convergence in distribution. 2 2 Block Sparsity Measures In this section, we firstly introduce some basic concepts for block sparsity and propose a new measure p (cid:80) of block sparsity. With N = d , we define the j-th subblock x[j] of a length-N vector x over I = j j=1 {d ,··· ,d }. The j-th subblock is of length d , and the blocks are formed sequentially so the 1 p j x = (x ···x x ···x ···x ···x )T. (2.1) 1 d1 d1+1 d1+d2 N−dp+1 N (cid:124) (cid:123)(cid:122) (cid:125)(cid:124) (cid:123)(cid:122) (cid:125) (cid:124) (cid:123)(cid:122) (cid:125) xT[1] xT[2] xT[p] Without loss of generality, we assume that d = d = ··· = d = d, then N = pd. A vector x ∈ RN is called 1 2 p block k-sparse over I = {d,··· ,d} if x[j] is nonzero for at most k indices j. In other words, by denoting p (cid:88) (cid:107)x(cid:107) = I((cid:107)x[j](cid:107) > 0), 2,0 2 j=1 a bock k-sparse vector x can be defined by (cid:107)x(cid:107) ≤ k. 2,0 Despitetheimportanttheoreticalroleoftheparameter(cid:107)x(cid:107) ,ithasaseverepracticaldrawbackofbeing 2,0 sensitive to small entries of x. To overcome this drawback, it is desirable to replace the mixed (cid:96) /(cid:96) norm 2 0 with a soft version. Specifically, we generalize the sparsity measure based on entropy to the block sparsity measure. For any non-zero signal x given in (2.1), it induces a distribution π(x) ∈ Rp on the set of block p (cid:80) indices{1,··· ,p},assigningmassπ (x) = (cid:107)x[j](cid:107) /(cid:107)x(cid:107) atindexj ∈ {1,··· ,p},where(cid:107)x(cid:107) = (cid:107)x[j](cid:107) . j 2 2,1 2,1 2 j=1 Then the entropy based block sparsity goes to  exp(H (π(x))) if x (cid:54)= 0 α k (x) = (2.2) α 0 if x = 0, where H is the R´enyi entropy of order α ∈ [0,∞]. When α ∈/ {0,1,∞}, the R´enyi entropy is given explicitly α by H (π(x)) = 1 ln((cid:80)p π (x)α), and the cases of α ∈ {0,1,∞} are defined by evaluating limits, with α 1−α j=1 j H being the ordinary Shannon entropy. Then, for x (cid:54)= 0 and α ∈/ {0,1,∞}, we have the measure of block 1 sparsity written conveniently in terms of mixed (cid:96) /(cid:96) norm as 2 α (cid:18) (cid:19) α (cid:107)x(cid:107)2,α 1−α k (x) = , α (cid:107)x(cid:107) 2,1 (cid:32) (cid:33)1/α p wherethemixed(cid:96) /(cid:96) norm(cid:107)x(cid:107) = (cid:80)(cid:107)x[j](cid:107)α forα > 0.Thecasesofα ∈ {0,1,∞}areevaluatedas 2 α 2,α 2 j=1 limits: k (x) = lim k (x) = (cid:107)x(cid:107) , k (x) = lim k (x) = exp(H (π(x))), and k (x) = lim k (x) = (cid:107)x(cid:107)2,1 , 0 α→0 α 2,0 1 α→1 α 1 ∞ α→∞ α (cid:107)x(cid:107)2,∞ where(cid:107)x(cid:107) = max(cid:107)x[j](cid:107) .Whentheblocksizedequals1,thenourblocksparsitymeasurek (x)reduces 2,∞ 2 α 1≤j≤p (cid:16) (cid:17) α to the conventional sparsity measure s (x) = (cid:107)x(cid:107)α 1−α given by [17]. The quantity k (x) has the some α (cid:107)x(cid:107)1 α important properties similar as s (x), such as the continuity, range equal to [0,p], scale-invariance and α non-increasing in α. It is a sensible measure of block sparsity for non-idealized signals. Before presenting the estimation procedure for the (cid:107)x(cid:107)α and k (x) with α ∈ (0,2], we give the block 2,α α sparse signal recovery results in terms of k (x) by using mixed (cid:96) /(cid:96) -norm optimization algorithm. 2 2 1 TorecovertheblocksparsesignalinCSmodel(1.1),weusethefollowingmixed(cid:96) /(cid:96) -normoptimization 2 1 algorithm proposed in [11, 13]: xˆ = arg min(cid:107)e(cid:107) , subject to (cid:107)y−Ae(cid:107) ≤ (cid:15), (2.3) 2,1 2 e∈RN 3 where (cid:15) ≥ 0 is a upper bound on the noise level (cid:107)ε(cid:107) . Then, we have the following result concerning on the 2 robust recovery for block sparse signals. Lemma 1.[13] Let y = Ax+ε be noisy measurements of a vector x and fix a number k ∈ {1,··· ,p}. Let xk denote the best block k-sparse approximation of x, such that xk is block k-sparse and minimizes (cid:107)x−f(cid:107) over all the block k-sparse vectors f, and let xˆ be a solution to (2.3), a random Gaussian matrix 2,1 A of size m×N with entries A ∼ N(0, 1), and block sparse signals over I = {d = d,··· ,d = d}, where ij m 1 p N = pd for some integer p. Then, there are constants c ,c ,c ,c > 0, such that the following statement is 0 1 2 3 true. If m ≥ c kln(eN/kd), then with probability at least 1−2exp(−c m), we have 0 1 (cid:107)xˆ −x(cid:107) (cid:107)x−xk(cid:107) (cid:15) 2 2,1 ≤ c √ +c . (2.4) 2 3 (cid:107)x(cid:107)2 k(cid:107)x(cid:107)2 (cid:107)x(cid:107)2 Remark 1. Note that the first term in (2.4) is a result of the fact that x is not exactly k-block sparse, while the second term quantifies the recovery error due to the measurement noise. When the block size d = 1, then this Lemma goes to the conventional CS result for sparse signals. Explicit use of block sparsity reduces the required number of measurements from O(kdln(eN/kd)) to O(kln(eN/kd)) by d times. Next, we present an upper bound of the relative (cid:96) -error by an explicit function of m and the new 2 proposed block sparsity measure k (x). Its proof is left to Appendix. 2 Lemma 2. Let y = Ax + ε be noisy measurements of a vector x, and let xˆ be a solution to (2.3), a random Gaussian matrix A of size m × N with entries A ∼ N(0, 1), and block sparse signals over ij m I = {d = d,··· ,d = d}, where N = pd for some integer p. Then, there are constants κ ,κ ,κ ,κ > 0, 1 n 0 1 2 3 such that the following statement is true. If m and N satisfy κ ln(κ eN) ≤ m ≤ N, then with probability 0 0 m at least 1−2exp(−κ m), we have 1 (cid:115) (cid:107)xˆ −x(cid:107) k (x)dln(eN) (cid:15) 2 ≤ κ 2 m +κ . (2.5) 2 3 (cid:107)x(cid:107) m (cid:107)x(cid:107) 2 2 3 Estimation Method for (cid:107)x(cid:107)α and k (x) 2,α α The core idea to obtain the estimators for (cid:107)x(cid:107)α and k (x) with α ∈ (0,2] is using random projection. 2,α α Contrastwiththeconventionalsparsityestimation byusingunivariatesymmetricα-stablerandomvariables [17, 31], we use the multivariate isotropic symmetric α-stable random vectors [24, 26] for the block spar- sity estimation. Specifically, we firstly give the definition of the multivariate centered isotropic symmetric α-stable distribution. Definition 1. For d ≥ 1, a d-dimensional random vector v has a centered isotropic symmetric α-stable distribution if there are constants γ > 0 and α ∈ (0,2] such that its characteristic function has the form E[exp(iuTv)] = exp(−γα(cid:107)u(cid:107)α), for all u ∈ Rd. (3.1) 2 We denote the distribution by v ∼ S(d,α,γ), and γ is referred to as the scale parameter. 4 Remark 2. The most well-known examples of multivariate isotropic symmetric stable distribution are the case of α = 2 (Multivariate Independent Gaussian Distribution), and in this case, the components of the Multivariate Gaussian random vector are independent. Another case is α = 1 (Multivariate Spherical Symmetric Cauchy Distribution [26]), unlike Multivariate Independent Gaussian case, the components of Multivariate spherical Cauchy are uncorrelated, but dependent. The multivariate centered isotropic sym- metric α-stable random vector is a direct extension of the univariate symmetric α-stable random variable, whichisthe special casewhenthedimensionparameterd = 1.Byrandomprojectionusingi.i.dmultivariate centered isotropic symmetric α-stable random vectors, we can obtain the estimators for (cid:107)x(cid:107)α and k (x) 2,α α with α ∈ (0,2], which will be presented in the followings. We estimate the (cid:107)x(cid:107)α by using the random linear projection measurements: 2,α y = (cid:104)a ,x(cid:105)+σε , i = 1,2,··· ,n, (3.2) i i i where a ∈ RN is i.i.d random vector, and a = (aT,··· ,aT)T with a ,j ∈ {1,··· ,p} i.i.d drawn from i i i1 ip ij S(d,α,γ). The noise term ε are i.i.d from a distribution F and assume its characteristic function is ϕ , i 0 0 the sets {ε ,··· ,ε } and {a ,··· ,a } are independent. {ε ,i = 1,2,··· ,n} are assumed to be symmetric 1 n 1 n i about 0, with 0 < E|ε | < ∞, but they may have infinite variance. The assumption of symmetry is only for 1 convenience, it was explained how to drop it in Section III-B2.e of [17]. A minor technical condition we place on F is that the roots of its characteristic function ϕ are isolated (i.e. no limit points). This condition is 0 0 satisfied by many families of distributions, such as Gaussian, Students t, Laplace, uniform[a, b], and stable laws. And we assume that the noise scale parameter σ > 0 and the distribution F are treated as being 0 known for simplicity. Since our work involves different choices of α, we will write γ instead of γ. Then the link of the norm α (cid:107)x(cid:107)α hinges on the following basic lemma. 2,α Lemma 3. Let x = (x[1]T,··· ,x[p]T)T ∈ RN be fixed, and suppose a = (aT ,··· ,aT )T with a ,j ∈ 1 11 1p 1j {1,··· ,p} i.i.d drawn from S(d,α,γ ) with α ∈ (0,2] and γ > 0. Then, the random variable (cid:104)a ,x(cid:105) has α α 1 the distribution S(1,α,γ (cid:107)x(cid:107) ). α 2,α Remark 3. When x = (x[1]T,··· ,x[p]T)T ∈ RN has different block lengths which are {d ,d ,··· ,d } re- 1 2 p spectively, then we need choose the projection random vector a = (aT ,··· ,aT )T with a ,j ∈ {1,··· ,p} 1 11 1p 1j i.i.d drawn from S(d ,α,γ ). In that case, the conclusion in our Lemma and all the results in the followings j α still hold without any modifications. This Lemma is a direct extension of Lemma 1 in [17] from i.i.d uni- variate symmetric α-stable projection to i.i.d multivariate isotropic symmetric α-stable projection. Byusingthisresult,ifwegenerateasetofi.i.dmeasurementrandomvectors{a ,··· ,a }asgivenabove 1 n andlety˜ = (cid:104)a ,x(cid:105),then{y˜ ,··· ,y˜ }isani.i.dsamplefromthedistributionS(1,α,γ (cid:107)x(cid:107) ).Hence,inthe i i 1 n α 2,α specialcaseofrandomlinearmeasurementswithoutnoise,estimatingthenorm(cid:107)x(cid:107)α reducestoestimating 2,α the scale parameter of a univariate stable distribution from an i.i.d sample. Next, we present the estimation procedure by using characteristic functions [17, 19, 20]. We use two sep- (cid:92) (cid:92) arate sets of measurements to estimate (cid:107)x(cid:107) and (cid:107)x(cid:107)α . The respective sample sizes of each measurements 2,1 2,α (cid:92) are denoted by n and n . To unify the discussion, we will describe just the procedure to obtain (cid:107)x(cid:107)α for 1 α 2,α 5 any α ∈ (0,2], since α = 1 is a special case. The two estimators are combined to obtain the estimator for k (x), which follows as: α (cid:16)(cid:107)(cid:92)x(cid:107)α (cid:17)1−1α kˆ (x) = 2,α (3.3) α (cid:16)(cid:107)(cid:92)x(cid:107) (cid:17)1−αα 2,1 In fact, the characteristic function of y has the form: i Ψ(t) = E[exp(ity )] = exp(−γα(cid:107)x(cid:107)α |t|α)·ϕ (σt), (3.4) i α 2,α 0 where t ∈ R. Then, we have (cid:107)x(cid:107)α = − 1 log|Re( Ψ(t) )|. By using the empirical characteristic function 2,α γαα|t|α ϕ0(σt) 1 (cid:88)nα Ψˆ (t) = eityi nα n α i=1 to estimate Ψ(t), we obtain the estimator of (cid:107)x(cid:107)α given by 2,α (cid:12) (cid:32) (cid:33)(cid:12) (cid:107)(cid:92)x(cid:107)α =: vˆ (t) = − 1 log(cid:12)(cid:12)Re Ψˆnα(t) (cid:12)(cid:12), (3.5) 2,α α γα|t|α (cid:12) ϕ (σt) (cid:12) α (cid:12) 0 (cid:12) when t (cid:54)= 0 and ϕ (σt) (cid:54)= 0. 0 Then, similar to the Theorem 2 in [17], we have the Uniform CLT for vˆ (t). We introduce the noise-to- α signal ratio constant σ ρ = . α γ (cid:107)x(cid:107) α 2,α Theorem 1. Let α ∈ (0,2]. Let tˆbe any function of {y ,··· ,y } that satisfies 1 nα γ tˆ(cid:107)x(cid:107) −p→ c , (3.6) α 2,α α as (n ,N) → ∞ for some finite constant c (cid:54)= 0 and ϕ (ρ c ) (cid:54)= 0. Then, we have α α 0 α α (cid:32) (cid:33) √ vˆ (tˆ) α d n −1 −→ N(0,θ (c ,ρ )) (3.7) α (cid:107)x(cid:107)α α α α 2,α as (n ,N) → ∞, where the limiting variance θ (c ,ρ ) is strictly positive and defined according to the α α α α formula 1 (cid:18) exp(2|c |α) ϕ (2ρ |c |) (cid:19) θ (c ,ρ ) = α + 0 α α exp((2−2α)|c |α)−1 . α α α |c |2α 2ϕ (ρ |c |)2 2ϕ (ρ |c |)2 α α 0 α α 0 α α Forsimplicity,weusethetˆ insteadoftheoptimaltˆ in[17].Sinceitissimpletoimplement,andstill pilot opt givesareasonablygoodestimator.Todescribethepilotvalue,letη > 0beanynumbersuchthatϕ (η) > 1 0 0 2 for all η ∈ [0,η ] (which exists for any characteristic function). Also, define the median absolute deviation 0 statistic mˆ = median{|y |,··· ,|y |}. Then we define tˆ = min{ 1 , η0}. Then we obtain the consistent α 1 nα pilot (cid:16) mˆα σ (cid:17) estimator cˆ = γ tˆ [vˆ (tˆ )]1/α of a constant c = min 1 , η0 , where random variable α α pilot α pilot α median(|S1+ραε1|) ρα S ∼ S(1,1,1) (see the Proposition 3 in [17]), and the consistent estimator of ρ , ρˆ = σ . 1 α α γα[vˆα(tˆpilot)]1/α 6 Therefore, the consistent estimator of the limiting variance θ (c ,ρ ) is θ (cˆ ,ρˆ ). Thus, we immediately α α α α α α have the following corollary to obtain the confidence intervals for (cid:107)x(cid:107)α . 2,α Corollary 1. Under the conditions of Theorem 1, as (n ,N) → ∞, we have α (cid:32) (cid:33) (cid:114) n vˆ (tˆ ) α α pilot d −1 −→ N(0,1). (3.8) θ (cˆ ,ρˆ ) (cid:107)x(cid:107)α α α α 2,α As a consequence, the asymptotic 1−β confidence interval for (cid:107)x(cid:107)α is 2,α  (cid:115)   (cid:115)   θ (cˆ ,ρˆ ) θ (cˆ ,ρˆ ) 1− α nα α z1−β/2vˆα(tˆpilot),1+ α nα α z1−β/2vˆα(tˆpilot), (3.9) α α where z is the 1−β/2 quantile of the standard normal distribution. 1−β/2 Then we can obtain a CLT and a confidence interval for kˆ (x) by combining the estimators vˆ and vˆ α α 1 with their respective tˆ . Before we present the main result, for each α ∈ (0,2]\{1}, we assume that there pilot is a constant π¯ ∈ (0,1), such that (n ,n ,N) → ∞, α 1 α n π := α = π¯ +o(n−1/2). α n +n α α 1 α Theorem 2. Let α ∈ (0,2]\{1}, and the conditions of Theorem 1 hold, then as (n ,n ,N) → ∞, 1 α (cid:114)n +n (cid:32)kˆ (x) (cid:33) 1 α α d −1 −→ N(0,1), (3.10) wˆ k (x) α α where wˆ = θα(cˆα,ρˆα)( 1 )2 + θ1(cˆ1,ρˆ1)( α )2. And consequently, the asymptotic 1−β confidence interval α πα 1−α 1−πα 1−α for k (x) is α (cid:34)(cid:32) (cid:114) (cid:33) (cid:32) (cid:114) (cid:33) (cid:35) wˆ wˆ 1− α z kˆ (x), 1+ α z kˆ (x) , (3.11) n +n 1−β/2 α n +n 1−β/2 α 1 α 1 α where z is the 1−β/2 quantile of the standard normal distribution. 1−β/2 Next, we present the approximation of kˆ (x) to (cid:107)x(cid:107) when α is close to 0. To state the theorem, we α 2,0 define the block dynamic range of a non-zero signal x ∈ RN given in (2.1) as (cid:107)x(cid:107) 2,∞ BDNR(x) = , (3.12) |x| 2,min where |x| is the smallest (cid:96) norm of the non-zero block of x, i.e. |x| = min{(cid:107)x[j](cid:107) : x[j] (cid:54)= 0,j = 2,min 2 2,min 2 1,··· ,p}. The following result involves no randomness and is applicable to any estimator k˜ (x). α Theorem 3. Let α ∈ (0,1), x ∈ RN is non-zero signal given in (2.1), and let k˜ (x) be any real number. α Then, we have (cid:12) (cid:12) (cid:12) (cid:12) (cid:12)k˜ (x) (cid:12) (cid:12)k˜ (x) (cid:12) α (cid:16) (cid:17) (cid:12) α −1(cid:12) ≤ (cid:12) α −1(cid:12)+ ln(BDNR(x))+αln((cid:107)x(cid:107) ) . (3.13) (cid:12)(cid:107)x(cid:107) (cid:12) (cid:12)k (x) (cid:12) 1−α 2,0 (cid:12) 2,0 (cid:12) (cid:12) α (cid:12) 7 Remark 4. The theorem is a direct extension of Proposition 5 in [17], which corresponds the special case of the block size d = 1. When choosing k˜ (x) to be the proposed estimator kˆ (x), the first term in (3.13) α α is already controlled by Theorem 2. As pointed out in [17], the second term is the approximation error that improves for smaller choices of α. When the (cid:96) norms of the signal blocks are similar, the quantity of 2 ln(BDNR(x)) will not be too large. In this case, the bound behaves well and estimating (cid:107)x(cid:107) is of interest. 2,0 On the other hand, if the (cid:96) norms of the signal blocks are very different, that is ln(BDNR(x)) is large, then 2 (cid:107)x(cid:107) may not be the best measure of block sparsity to estimate. 2,0 4 Simulation In this section, we conduct some simulations to illustrate our theoretical results. We focus on choosing α = 2,thatisweusekˆ (x)toestimatetheblocksparsitymeasurek (x).Whenestimatingk (x),werequires 2 2 2 a set of n measurements by using multivariate isotropic symmetric cauchy projection, and a set of n by 1 2 using multivariate isotropic symmetric normal projection. We generated the samples y1 ∈ Rn1 and y2 ∈ Rn2 according to y = A x+σε and y = A x+σε , (4.1) 1 1 1 2 2 2 where A1 = (a1,··· ,an1) ∈ Rn1×N, with ai ∈ RN is i.i.d random vector, and ai = (aTi1,··· ,aTip)T with aij,j ∈ {1,··· ,p} i.i.d drawn from S(d,1,γ1), we let γ1 = 1. Similarly, A2 = (b1,··· ,bn2) ∈ Rn2×N, with b ∈ RN is i.i.d random vector, and b = (bT,··· ,bT)T with b ,j ∈ {1,··· ,p} i.i.d drawn from S(d,2,γ ), i i i1 ip ij 2 √ weletγ = 2.Thenoisetermsε andε aregeneratedwithi.i.dentriesfromastandardnormaldistribution. 2 2 1 2 Weconsideredasequenceofpairsforthesamplesizes(n ,n ) = (50,50),(100,100),(200,200),··· ,(500,500). 1 2 For each experiments, we replicates 500 times. Then, we have 500 realizations of kˆ (x) for each (n ,n ). We 2 1 2 then averaged the quantity |kˆ2(x) −1| as an approximation of E|kˆ2(x) −1|. k2(x) k2(x) We let our signal x be a very simple block sparse vector, that is 1 x = (√ 1T ,0T )T, 10 N−10 10 where 1 is a vector of length q with entries all ones, 0 is the zero vector. Then it is obvious that (cid:107)x(cid:107) = 1, q q 2,2 while(cid:107)x(cid:107) andk (x)dependontheblocksizedthatwechoose.Wesetη = 1.Thesimulationisconducted 2,1 2 0 under several choices of the parameters, N, d and σ–with each parameter corresponding to a separate plot in Figure 1. The signal dimension N is set to 1000, except in the top left plot, where N = 20,100,500,1000. We set d = 5 in all cases, except in top right plot, where d = 1,2,5,10, corresponding the real value k (x) = 10,5,2,1, which is also the exact block sparsity level of our signal x with the block size to be d. 2 In turn, σ = 0.1 in all cases, except in the bottom plot where σ = 0,0.1,0.3,0.5. In all three plots, the √ theoretical curves are computed in the following way. From Theorem 2, we have |kˆ2(x) −1| ≈ √ ω2 |Z|, k2(x) n1+n2 where Z is a standard Gaussian random variable, and we set ω = θ2(c2,ρ2) +4θ1(c1,ρ1). Since E|Z| = (cid:112)2/π, √ 2 π2 1−π2 the theoretical curves are simply √n21ω+2/nπ2, as a function of n1+n2. Note that ω2 depends on σ and d, which is why there is only one theoretical curve in top left plot for error dependence on N. From Figure 1, we can see that the black theoretical curves agree well with the colored empirical ones. In addition, the averaged relative error has no observable dependence on N or d (when σ is fixed), as expected from Theorem 2. The dependence on the σ is mild, except in the case σ = 0.5 which is a bit large. Next, a simulation study is conducted to illustrate the asymptotic normality of our estimators in Corollary 1 and Theorem 2. We have 1000 replications for these experiments, that is we have 1000 sam- 8 (cid:18) (cid:19) ples of the standardized statistics res1 = (cid:113) n1 (cid:16)vˆ1(tˆpilot) −1(cid:17), res2 = (cid:113) n2 vˆ2(tˆpilot) −1 and θ1(cˆ1,ρˆ1) (cid:107)x(cid:107)2,1 θ2(cˆ2,ρˆ2) (cid:107)x(cid:107)22,2 res = (cid:113)n1+n2 (cid:16)kˆ2(x) −1(cid:17). We consider four cases, with (n ,n ) = (200,200),(400,400) and the noise is wˆ2 k2(x) 1 2 standard normal and t(2) which has infinite variance. In all the cases, we set N = 1000, d = 5, and σ = 0.1. Figure 2 shows that the density curves of the standardized statistics all are very close to the standard normal density curve, which verified our theoretical results. And these results hold even when the noise dis- tributionisheavy-tailed.Comparingthefourplots,weseethatitleadstoimprovethenormalapproximation by increasing the sample size n +n and reducing the noise variance. 1 2 5 Conclusion Inthispaper,weproposedarandomprojectionmethodbyusingmultivariatecenteredisotropicsymmet- ric α-stable random vectors to estimate the block sparsity without sparsity or block sparsity assumptions. The asymptotic properties of the estimators were obtained. Some simulation experiments illustrated our theoretical results. Appendix A Proofs Our main theoretical results Theorem 1 and Theorem 2 follow directly from Theorem 2 and Corollary 1 in [17], since in both estimation procedures, the measurements without noise both have the univariate symmetric stable distribution but with different scale parameters after the random projection, γ (cid:107)x(cid:107) for α α sparsity estimation, γ (cid:107)x(cid:107) for block-sparsity estimation. Therefore, the asymptotic results for the scale α 2,α parameters estimators by using characteristic function method are almost the same. In addition, Theorem 3 follows from Proposition 5 in [17] with some minor modifications. In order not to repeat, all the details are omitted. Next, we only present the proofs for Lemma 2 and Lemma 3. Proof of Lemma 2. Let c be as in Lemma 1, and let κ ≥ 1 be any number such that 0 0 2ln(κ )+2 2 1 0 + ≤ . (A.1) dκ κ c 0 0 0 Define the positive number t = m/κ0 , and choose k = (cid:98)t(cid:99) in Lemma 1. Note that when m ≤ N, this choice dln(eN) m of k is clearly at most p, and hence lies in {1,··· ,p}. Then we have eN eN kln( ) ≤ (t+1)ln( ) kd td (cid:32) (cid:33) (cid:18) (cid:19) m/κ κ eN eN 0 0 = +1 ·ln ·ln( ) dln(eN) m m m (cid:32) (cid:33) (cid:20) (cid:21) m/κ eN ≤ 0 +1 ·ln (κ )2 dln(eN) 0 m m (cid:18) (cid:19) 2m/κ eN eN 0 = ln(κ )+ln( ) +2ln(κ ) dln(eN) 0 m 0 m m (cid:18) (cid:19) 2ln(κ )+2 2 m 0 ≤ + m ≤ dκ κ c 0 0 0 by using our assumption N ≥ m ≥ κ ln(κ eN). Hence, our choice of κ ensures m ≥ c kln(eN/kd). To 0 0 m 0 0 finish the proof, let κ = c be as in Lemma 1 so that the bound (2.4) holds with probability at least 1 1 9 Figure 1: The averaged relative error |kˆ2(x) −1| depending on N, d and σ. k2(x) 10

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.