ebook img

On robust width property for Lasso and Dantzig selector PDF

0.11 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview On robust width property for Lasso and Dantzig selector

On robust width property for Lasso and Dantzig selector models Hui Zhanga,∗ a College of Science, National University of Defense Technology, Changsha, Hunan, China, 410073 Abstract 6 Recently, Cahill and Mixon completely characterized the sensing operators in many com- 1 0 pressed sensing instances with a robust width property. The proposed property allows uni- 2 formly stable and robust reconstruction of certain solutions from an underdetermined linear r p system via convex optimization. However, their theory does not cover the Lasso and Dantzig A selector models, both of which are popular alternatives in the statistics community. In this 3 letter, we show that the robust width property can be perfectly applied to these two models ] T as well. Our results solve an open problem left by Cahill and Mixon. I . s Keywords: robust width property; compressed sensing; Lasso model; Dantzig selector c [ model 2 v 3 1. Introduction 4 6 One of the main assignments of compressed sensing is to understand when it is possible 3 0 to recover structured solutions to underdetermined systems of linear equations [1]. During . 1 the past decade, there have developed many reconstruction guarantees; well-known concepts 0 5 include restrictedisometry property, null spaceproperty, coherence property, dual certificate, 1 : and more (the interested readers could refer to [2, 3, 4]). However, none of them is proved v i necessary for uniformly stable and robust reconstruction. Recently, Cahill and Mixon in [5] X r introduced a new notion–robust width property, which completely characterizes the sensing a operators in many compressed sensing instances. They restricted their attention into the following constrained optimization problem: min x , subject to Φx y ǫ (Q ) ♯ 2 ǫ k k k − k ≤ such that their theorycannot cover theLasso andDantzig selector models, bothof which are popular alternatives in the statistics community. Here, is some norm used to promote ♯ k·k certain structured solutions, operator Φ and data y are given, and ǫ measures the error. In thisletter, we extendtheir results totwo other probablymorepopularoptimizationproblems ∗ Corresponding author Email address: [email protected](Hui Zhang) Preprint submitted to Elsevier April 5, 2016 of the Lasso/Basis Pursuit and Dantzig selector types. Our derived results completely solve an open problem left by Cahill and Mixon and hence prove that the notion of robust width is indeed a ubiquitous property. In the following, we recall some notations appeared in the paper [5]. Let x♮ be some unknown member of a finite-dimensional Hilbert space , and let Φ : H FM denote some known linear operator, where F is either R or C. Subset is a H → A ⊆ H particular subset that consists of some type of structured members. B is the unit ♯-ball. ♯ 2. Robust width The robust width property was formally proposed in [5]. We write down the definition and its equivalent form as follows. Definition 1. ([5]) We say a linear operator Φ : FM satisfies the (ρ,α)-robust width H → property over B if ♯ x ρ x 2 ♯ k k ≤ k k for every x such that Φx < α x ; or equivalently if 2 2 ∈ H k k k k Φx α x 2 2 k k ≥ k k for every x such that x > ρ x . 2 ♯ ∈ H k k k k Here, we would like to point out the definition above is not completely new. In fact, when restricted to the case of ℓ -minimization, it reduces to the ℓ -constrained minimal singular 1 1 value property which was originally defined in [6]. Definition 2. For any k 1,2, ,N and matrix Φ RM×N, define the ℓ -constrained 1 ∈ { ··· } ∈ minimal singular value of Φ by Φx 2 r (Φ) = min k k k x6=0,x∈Sk x 2 k k where S = x RN : x √k x . If r (Φ) > 0, then we say Φ satisfies the ℓ - k 1 2 k 1 { ∈ k k ≤ k k } constrained minimal singular value property with r (Φ). k Work [7] exploited the geometrical aspect of the ℓ -constrained minimal singular value 1 property. 3. Main results We first introduce the definition of compressed sensing space. 2 Definition 3. ([5]) A compressed sensing space ( , , ) with bound L consists of a ♯ H A k · k finite-dimensional Hilbert space , a subset , and a norm on with following ♯ H A ⊆ H k·k H properties: (i) 0 . ∈ A (ii) For every a and v , there exists a decomposition v = z +z such that 1 2 ∈ A ∈ H a+z = a + z , z L v . 1 ♯ ♯ 1 ♯ 2 ♯ 2 k k k k k k k k ≤ k k The subdifferential ∂f(x) of a convex function f at x is the set-valued operator [8] given by ∂f(x) = u : f(y) f(x)+ u,y x , y . { ∈ H ≥ h − i ∀ ∈ H} The following lemma will be useful to establish our main results. Lemma 1. Let be the dual norm of on . If u ∂ x , then u 1. If x = 0 ⋄ ♯ ♯ ⋄ k·k k·k H ∈ k k k k ≤ 6 and u ∂ x , then u = 1. ♯ ⋄ ∈ k k k k Proof. From the convexity of and the subdifferential definition, for any u ∂ x and ♯ ♯ k·k ∈ k k v it holds ∈ H v x + u,v x . ♯ ♯ k k ≥ k k h − i Set v = 0 and v = 2x to get u,x x and u,x x respectively. This implies ♯ ♯ h i ≥ k k h i ≤ k k u,x = x and hence u,v v . Similarly, by taking v , we can get u,v ♯ ♯ h i k k h i ≤ k k − ∈ H −h i ≤ v . Thus, u,v v . Therefore, ♯ ♯ k k |h i| ≤ k k u = sup u,v sup v 1. ⋄ ♯ k k |h i| ≤ k k ≤ kvk♯≤1 kvk♯≤1 When x = 0, by the Cauchy-Schwartz inequality we get that x = u,x x u and ♯ ♯ ⋄ 6 k k h i ≤ k k k k hence u 1. So it must have u = 1. ⋄ ⋄ k k ≥ k k Now, we state the characterization of uniformly stable and robust reconstruction via the Lasso/Basis Pursuit type model by utilizing the (ρ,α)-robust width property. Theorem 1. For any CS space ( , , ) with bound L and any linear operator Φ : ♯ H A k·k H → FM, the following are equivalent up to constants: (a) Φ satisfies the (ρ,α)-robust width property over B . ♯ (b) For every x♮ ,κ (0,1),λ > 0 and ω FM satisfying ΦTω κλ, any solution ⋄ ∈ H ∈ ∈ k k ≤ x∗ to the unconstrained optimization model 1 min Φx (Φx♮ +ω) 2 +λ x (P ) 2k − k2 k k♯ λ satisfies x∗ x♮ C x♮ a +C λ for every a . 2 0 ♯ 1 k − k ≤ k − k · ∈ A 3 In particular, (a) implies (b) with 1 κ −1 1+κ C = − L , C = 0 (cid:18) 2ρ − (cid:19) 1 α2ρ provided ρ < 1−κ. Also, (b) implies (a) with 2L κ ρ = 2C , α = , 0 2τC 1 where τ = sup Φx . kxk♯≤1k k2 Proof. Let z = x∗ x♮. We divide the proof of (a) (b) into four steps. They are partially − ⇒ inspired by [9] and [5]. Step 1: Prove the first relationship: x∗ κ z x♮ . (1) ♯ ♯ ♯ k k − k k ≤ k k Since x∗ is a minimizer to (P ), we have λ 1 1 Φx∗ (Φx♮ +w) 2 +λ x∗ Φx♮ (Φx♮ +w) 2 +λ x♮ . 2k − k2 k k♯ ≤ 2k − k2 k k♯ Hence, 1 1 (Φx∗ Φx♮) w 2 +λ x∗ w 2+λ x♮ . 2k − − k2 k k♯ ≤ 2k k2 k k♯ Rearrange terms to give 1 λ x∗ Φ(x∗ x♮) 2 + Φ(x∗ x♮),w +λ x♮ x∗ x♮,ΦTw +λ x♮ . k k♯ ≤ −2k − k2 h − i k k♯ ≤ h − i k k♯ By the Cauchy-Schwartz inequality and the condition ΦTw κλ, we obtain that ⋄ k k ≤ x∗ x♮,ΦTw x∗ x♮ ΦTw κλ x∗ x♮ . ♯ ⋄ ♯ h − i ≤ k − k k k ≤ k − k Thus, λ x∗ κλ x∗ x♮ +λ x♮ from which the first relationship follows. ♯ ♯ ♯ k k ≤ k − k k k Step 2: Prove the second relationship: 2 2L z x♮ a + z . (2) ♯ ♯ 2 k k ≤ 1 κk − k 1 κk k − − Pick a , and decompose z = x∗ x♮ = z +z according to the property (ii) in Definition 1 2 ∈ A − 3 so that a+z = a + z and z L z . In light of (1), we derive that 1 ♯ ♯ 1 ♯ 2 ♯ 2 k k k k k k k k ≤ k k a + x♮ a x♮ ♯ ♯ ♯ k k k − k ≥ k k x∗ κ z ♯ ♯ ≥ k k − k k = x♮ +(x∗ x♮) κ x∗ x♮ ♯ ♯ k − k − k − k = a+(x♮ a)+z +z κ z +z 1 2 ♯ 1 2 ♯ k − k − k k a+z x♮ a (1+κ) z κ z 1 ♯ ♯ 1 ♯ 2 ♯ ≥ k k −k − k − k k − k k = a + z x♮ a (1+κ) z κ z ♯ 1 ♯ ♯ 2 ♯ 1 ♯ k k k k −k − k − k k − k k = a +(1 κ) z x♮ a (1+κ) z . ♯ 1 ♯ ♯ 2 ♯ k k − k k −k − k − k k 4 Rearrange terms to give 2 1+κ z x♮ a + z 1 ♯ ♯ 2 ♯ k k ≤ 1 κk − k 1 κk k − − which implies 2 2 z z + z x♮ a + z . ♯ 1 ♯ 2 ♯ ♯ 2 ♯ k k ≤ k k k k ≤ 1 κk − k 1 κk k − − Thus, the second relationship follows by invoking z L z . 2 ♯ 2 k k ≤ k k Step 3: Derive the upper bound: Φz 2 (1+κ)λ z . (3) 2 ♯ k k ≤ k k The optimality condition of (P ) reads λ ΦT(Φx♮ +w Φx∗) λ ∂ x∗ . ♯ − ∈ · k k By using Lemma 1, we get ΦT(Φx♮ +w Φx∗) λ. Thus, ⋄ k − k ≤ ΦTΦz = ΦTΦ(x∗ x♮) ⋄ ⋄ k k k − k ΦT(Φx∗ Φx♮ w) + ΦTw ⋄ ⋄ ≤ k − − k k k λ+κλ = (1+κ)λ. ≤ Therefore, Φz 2 = z,ΦTΦz z ΦTΦz (1+κ)λ z , k k2 h i ≤ k k♯ ·k k⋄ ≤ k k♯ where the first inequality follows from the Cauchy-Schwartz inequality. Step 4: Finish the proof. Assume z > C x♮ a , since otherwise we are done. In 2 0 ♯ k k ·k − k light of (2), we obtain 2 2L z < + z = ρ−1 z , ♯ 2 2 k k (cid:20)C (1 κ) 1 κ(cid:21)k k k k 0 − − i.e., z > ρ z . By the (ρ,α)-robust width property of Φ, we have Φz α z . 2 ♯ 2 2 k k k k k k ≥ k k Utilizing the upper bound of Φz 2 in Step 3, we derive that 2 k k (1+κ)λ α2 z 2 Φz 2 (1+κ)λ z < z . k k2 ≤ k k2 ≤ k k♯ ρ k k2 Thus, (1+κ)λ z = C λ C x♮ a +C λ. k k2 ≤ α2ρ 1 · ≤ 0k − k♯ 1 · This completes the proof of (a) (b). ⇒ 5 The proof of (b) (a). Pick x♮ such that Φx♮ < α x♮ . By the expression of 2 2 ⇒ k k k k τ = sup Φx and using the Cauchy-Schwartz inequality, we derive that kxk♯≤1k k2 τ α x♮ > τ Φx♮ = sup Φx Φx♮ 2 2 2 2 · k k ·k k k k ·k k kxk♯≤1 sup Φx,Φx♮ = sup x,ΦTΦx♮ ≥ h i h i kxk♯≤1 kxk♯≤1 = ΦTΦx♮ . ⋄ k k Let λ = κ−1τα x♮ and ω = Φx♮. Then, we have 2 k k − κλ = τ α x♮ ΦTΦx♮ = ΦTw , 2 ⋄ ⋄ · k k ≥ k k k k which implies that the choosing of λ and ω satisfies the constrained condition ΦTw κλ. ⋄ k k ≤ Thereby, we can take ω = Φx♮ and hence conclude that x∗ = 0 is a minimizer of (P ). λ − Thus, x♮ = x∗ x♮ C x♮ +C λ = C x♮ +C κ−1τα x♮ . 2 2 0 ♯ 1 0 ♯ 1 2 k k k − k ≤ k k k k k k Take α = κ and ρ = 2C and rearrange terms to give 2τC1 0 C x♮ 0 x♮ = ρ x♮ . k k2 ≤ 1 C κ−1ταk k♯ k k♯ 1 − So the (ρ,α)-robust width property of Φ holds. Remark 1. In the paper [5], to obtain a corresponding result for (Q ), it suffices for ǫ ♯ k·k to satisfy: (i) x 0 for every x , and ♯ ♯ k k ≥ k k ∈ H (ii) x+y x + y for every x,y . ♯ ♯ ♯ k k ≤ k k k k ∈ H In contrast, Theorem 1 not only requires (i) and (ii) above, but also utilizes the convexity of and its dual norm. The additional requirement of convexity excludes the cases of ♯ k · k nonconvex . For example, the case of ♯ k·k N x = x p := x p, 0 < p < 1 k k♯ k kp | i| Xi=1 is not covered by Theorem 1. With very similar arguments, we can show the following theorem which characterizes the uniformly stable and robust reconstruction via the Dantzig type model by utilizing the (ρ,α)-robust width property. 6 Theorem 2. For any CS space ( , , ) with bound L and any linear operator Φ : ♯ H A k·k H → FM, the following are equivalent up to constants: (a) Φ satisfies the (ρ,α)-robust width property over B . ♯ (b) For every x♮ ,λ > 0 and ω FM satisfying ΦTω λ, any solution x∗ to the ⋄ ∈ H ∈ k k ≤ following optimization model min x , subject to ΦT(Φx (Φx♮ +ω)) λ (R ) ♯ ⋄ λ k k k − k ≤ satisfies x∗ x♮ C x♮ a +C λ for every a . 2 0 ♯ 1 k − k ≤ k − k · ∈ A In particular, (a) implies (b) with −1 1 2 C = L , C = 0 (cid:18)2ρ − (cid:19) 1 α2ρ provided ρ < 1 . Also, (b) implies (a) with 2L κ ρ = 2C , α = , 0 2τC 1 where τ = sup Φx . kxk♯≤1k k2 Proof. The proof below follows from the pattern used for that of Theorem 1. Let z = x∗ x♮. − Step 1: Since x∗ is a minimizer of (R ), it holds that x∗ x♮ . Now, repeat the λ ♯ ♯ k k ≤ k k argument for Step 2 in the proof of Theorem 1 to give z 2 x♮ a +2L z . ♯ ♯ 2 k k ≤ k − k k k Step 2: Prove the upper bound: Φz 2 2λ z . k k2 ≤ k k♯ This follows from that ΦTΦz ΦT(Φx∗ (Φx♮ +w)) + ΦTw 2λ ⋄ ⋄ ⋄ k k ≤ k − k k k ≤ and Φz 2 = z,ΦTΦz z ΦTΦz . 2 ♯ ⋄ k k h i ≤ k k ·k k The remained proof of (a) (b) follows by repeating the argument for Step 4 in the proof ⇒ of Theorem 1. The proof of (b) (a). Pick x♮ such that Φx♮ < α x♮ . Let λ = τα x♮ and 2 2 2 ⇒ k k k k k k ω = Φx♮. We have proved in the proof of Theorem 1 that such choosing of λ and ω − satisfies the constrained condition of ΦTw λ and hence x∗ = 0 is the unique minimizer ⋄ k k ≤ of (R ). The remained proof of (b) (a) follows by repeating the corresponding part in the λ ⇒ proof of Theorem 1. Note that the convexity of is not involved in the proof of Theorem 2. ♯ k·k 7 Acknowledgements The author would like to thank Dr. Jameson Cahill for his communication and anony- mous reviewers for their valuable comments, with which great improvements have been made in this manuscript. The work is supported by the National Science Foundation of China (No.11501569 and No.61571008). References [1] E. J. Cand`es, Mathematics of sparsity (2014) 1–27. [2] S. Foucart, H. Rauhut, A mathematical introduction to compressive sensing, Applied and Numerical Harmonic Analysis, Birkh¨auser, 2013. [3] H. Zhang, W. T. Yin, L. Z. Cheng, Necessary and sufficient conditions of solution uniqueness in ℓ minimization, Journal of Optimization Theory and Application 164 1 (2015) 109–122. [4] H. Zhang, M. Yan, W. T. Yin, One condition for solution uniqueness and robustness of both ℓ -synthesis and ℓ -analysis minimizations, arXiv:1304.5038v1 (2013). 1 1 [5] J. Cahill, D. G. Mixon, Robust width: A characterization of uniformly stable and robust compressed sensing, arXiv:1408.4409v1 [cs.IT] 19 Aug (2014). [6] G. Tang, A. Nehorai, Performance analysis of sparse recovery based on constrained minimal singular values, Signal Processing, IEEE Transactions on 59 (2011) 5734–5745. [7] H. Zhang, L. Z. Cheng, On the constrained minimal singular values for sparse recovery, IEEE Signal Processing Letter 19 (2012) 499–502. [8] H. Bauschke, P. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, Springer-Verlag, New York, 2011. [9] E. J. Cand`es, Y. Plan, Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements, Information Theory, IEEE Transactions on 57 (2011) 2342–2359. 8

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.