1 ℓ Analysis of weighted -minimization for 1 model-based compressed sensing Sidhant Misra and Pablo A. Parrilo Abstract 3 Model-basedcompressedsensingreferstocompressedsensingwithextrastructureabouttheunderlying 1 sparse signal known a priori. Recent work has demonstrated that both for deterministic and probabilistic 0 models imposed on the signal, this extra information can be successfully exploited to enhance recovery 2 performance. In particular, weighted ℓ1-minimization with suitable choice of weights has been shown to improveperformanceatleastfor asimpleclassof probabilisticmodels. Inthispaper, weconsider amore n general and natural class of probabilistic models where the underlying probabilities associated with the a indices of the sparse signal have a continuously varying nature. We prove that when the measurements J are obtained using a matrix with i.i.d Gaussian entries, weighted ℓ1-minimization with weights that have 7 a similar continuously varying nature successfully recovers the sparse signal from its measurements with overwhelming probability. It is known that standard ℓ1-minimization with uniform weights can recover ] T sparse signals up to a known sparsity level, or expected sparsity level in case of a probabilistic signal model.Withsuitablechoiceofweightswhicharechosenbasedonoursignalmodel,weshowthatweighted .I ℓ1-minimization can recover signals beyond the sparsity level achievable by standard ℓ1-minimization. s c [ I. INTRODUCTION 1 CompressedSensinghasemergedasamodernalternativetotraditionalundersamplingforcompressible v signals. Previously, the most common way to view recovery of signals from samples was based on the 7 2 Nyquist criterion. According to the Nyquist criterion, a band-limited signal has to be sampled at a rate 3 at least twice its bandwidth to allow exact recovery. In this case the band-limitedness of the signal is 1 the extra known information that allows us to reconstruct the signal from compressed measurements. In . 1 compressedsensingtheadditionalstructureconsideredisthatthesignalissparsewithrespecttoacertain 0 knownbasis.AsopposedtosamplingatNyquistrateandsubsequentlycompressing,wenowobtainlinear 3 measurements of the signal and the compression and measurementsteps are now combined by obtaining 1 much smaller number of linear measurements than what would be in general required to reconstruct the : v signal from its measurements. i X After fixing the basis with respect to which the signal is sparse, the process of obtaining the measure- mentscanbewrittenasy=Ax,where,y m isthevectorofmeasurements,x n isthesignaland r a A m n representsthemlinearfunction∈alRsactingonthesignalx.WecallAthe∈mReasurementmatrix. × ∈R The signal x is considered to have at most k non-zero componentsand we are typically interested in the scenario where k is much smaller than n. Compressed Sensing revolves around the fact that for sparse signals, the number of such linear measurements needed to reconstruct the signal can be significantly smaller than the ambientdimension of the signal itself. The reconstructionproblem can be formulatedas finding the sparsest solution x satisfying the constraints imposed by the linear measurementsy. This can LaboratoryofInformationandDecisionSystems,DepartmentofElectricalEngineeringandComputerScience,MIT,Cambridge, MA,02139,[email protected], [email protected] 2 be represented by min x 0 || || subject to y=Ax. This problem is inherently combinatorial in nature and is in general a NP-hard problem. This is because foracertainvalueofthesize ofthesupportofx givenbyk, oneneedsto searchthroughall n possible k supportsofthesignal.SeminalworkbyCande´sandTaoin[1]andDonohoin[2]showthatundercertain (cid:0) (cid:1) conditionsonthemeasurementmatrixA,ℓ normminimization,whichcanberecastasa linearprogram, 1 can recover the signal from its measurements. Additionally, a random matrix with i.i.d. Gaussian entries with mean zero satisfies the required condition with overwhelming probability. Linear programming is knowntohavepolynomialtimecomplexityandtheabovementionedresulttellsusthatforalargeclassof measurementmatrices A we can solve an otherwise NP-hard combinatorialproblem in polynomialtime. Subsequently,iterativemethodsbasedonagreedyapproachwereformulatedwhichrecoversparsesignals fromtheir measurementsby obtainingan increasinglyaccurateapproximationto the actualsignal in each iteration.ExamplesoftheseincludeCoSaMP[3]andIHT[4].Thecompressedsensingframeworkhasalso been generalized to the problem of recovering low rank matrices from compressed linear measurements represented by linear matrix equations [5] [6]. MostoftheearlierliteratureonCompressedSensingfocusedonthecase wheretheonlyconstraintson thesignalx arethoseimposedbyitsmeasurementsy. Ontheotherhanditisnaturalto considerthecase where besides sparsity, there is certain additional information on the structure of the underlying signal. Thisistrue in severalapplications,examplesofwhichincludeencodingnaturalimages(JPEG),MRI and DNA microarrays[7], [8]. This leads us to Model Based Compressed Sensing where the aim is to devise recoverymethodsspecific to the signal model at hand. Furthermore,one would also want to quantify the possible benefits it has over the standard method (e.g. lesser number of required measurements for the same level of sparsity of the signal). This has been explored in some recent papers. The authors in [9] analyzed a deterministic signal model, were the supportof the underlyingsignal is constrained to belong to a given known set. This defines a subset of the set of all k-sparse signals, which is now the set of M allowable signals. This results in an additional constraint on the original reconstruction problem. min x 0 || || subject to y=Ax, x . ∈M Itwasshownthatanintuitivemodificationtothe CoSaMPorIHTmethodsucceedsin suitablyexploiting the information about the model. The key property defined in [1], known as the Restricted Isometry Property was adapted in [9] to a model based setting. With this, it was shown that results similar to [1] can be obtained for model-based signal recovery. As opposed to this, a probabilistic model, i.e. a Bayesian setting, was considered in [10]. Under this model there are certain known probabilities associated with the componentsof the signal x. Specifically, p ,i=1,2,...,n with 0 p 1 are such that i i ≤ ≤ P(x is non-zero)=p i=1,2,...,n. i i The deterministic version of the same model called the “nonuniform sparse model” was considered in [11]. For this, the use of weighted ℓ -minimization was suggested, given by 1 min x w,1 || || subject to y=Ax, 3 where x = n w x denotes the weighted ℓ norm of x. The quantities w ,i =1,2,...,n are || ||w,1 i=1 i| i| 1 i some positive scalars. Similar to [2], [12], ideasbased on high dimensionalpolytopegeometrywere used P to provide sufficient conditions under which weighted ℓ -minimization recovers the sparse signal. This 1 method was introduced earlier in [13] where it was used to analyze the robustness of ℓ -minimization 1 in recovering sparse signals from noisy measurements. The specific model considered in [10] can be described as follows. Consider a partition of the indices 1 to n into two disjoint sets T and T . Let 1 2 p = P ,i T and p = P ,i T . As a natural choice choose the weights in the weighted ℓ - i 1 1 i 2 2 1 ∈ ∈ minimization as w = W ,i T and w = W ,i T . The main result of [10] is that under certain i 1 1 i 2 2 conditionson W2, P2 andA,∈weightedℓ -minimizati∈oncanrecoverasignaldrawnfromtheabovemodel W1 P1 1 with overwhelming probability. In this paper, we extend this approach to a more general class of signal models. In our model, the probabilities p as described above are assumed to vary in a “continuous” manner across the indices. i The specific way in which the probabilities vary is governed by a known shape function p(.) which is continuous, non-negativeand non-increasing.For a signal drawn from this model, we propose the use of weightedℓ -minimizationtoreconstructitfromitscompressedlinearmeasurements.Inaddition,tomatch 1 thesignalmodel,weproposethattheweightsbechosenaccordingtoacontinuous,non-negativeandnon- decreasingshapefunctionf(.). We provethatundercertainconditionsonp(.), f(.)andthemeasurement matrix A, we can reconstruct the signal perfectly with overwhelmingprobability. We also suggest a way to select the shape function f(.) for the weights based on p(.). Our results are also applicable when the functions p(.) and f(.) are piece-wise continuous and the results in [10] can be obtained from ours as a special case. Therestofthepaperisorganizedasfollows.InSectionII,weintroducethebasicnotation.Weformulate the exact questions that we set out to answer in this paper and state our main theorem. In Section III we focus on how weighted ℓ -minimization behaves by restricting our attention to special class of signals 1 that are particularly suitable for the ease of analysis. We later show that the methods generalize to other simple classes and is all we need to establish the main result of this paper. In Section IV we describe the key featuresof a typical signal drawnfrom our model. We also provea suitable large deviationresult for the probability that a signal drawn from our model lies outside this typical set. In Section V we provide numerical computations to demonstrate the results we derive. We also provide simulations to reinforce our results. We then conclude the paper with possibly interesting questions that may be explored in the future. II. PROBLEM FORMULATION A. Notation and parameters We denote scalars by lower case letters (e.g. c), vectors by bold lower case letters (e.g. x), matrices with bolduppercase letters(e.g.A). Probabilityofan eventE is denotedbyP(E). The ith standardunit vector in n is denoted by e =(0,0,...,1,...,0)T, where the “1” is located in the ith position. i The unRderlying sparse signal is represented by x n, the measurement matrix by A m n. × ∈ R ∈ R The vector of observations is denoted by y and is obtained through linear measurements of x given by y = Ax. Typically we would need n linear measurements to be able to recover the signal. The scalar α = m determines how many measurements we have as a fraction of n. We call this the compression n ratio. B. Model of the Sparse Signal Letp:[0,1] [0,1]beacontinuousmonotonicallynon-increasingfunction.Wecallp(.)theprobability → shapefunctionandthereasonforthisnamewillbecomeclearfromthedescriptionbelow.Thesupportof 4 the signal x is decided by the outcome of n independentBernoulli random variables. In particular, if E i denotestheeventthati Supp(x),thentheeventsE i=1,...,nareindependentandP(E )=p i . ∈ i i n Although we assume throughout the paper that p(.) is continuous, our result generalizes to piecewise (cid:0) (cid:1) continuous functions as well. Let us denote by k the cardinality of Supp(x). Note that under the above model k is a random variable that can take any value from 0 to n. The expected value of k is given by n p i . We denote by δ the expected fractional sparsity given by δ = 1E[k]= 1 n p i . i=1 n n n i=1 n As is standard in Compressed Sensing literature we assume that the entries of the measurementmatrix P (cid:0) (cid:1) P (cid:0) (cid:1) A are i.i.d.Gaussian randomvariableswith meanzero and varianceone. The measurementsare obtained as y=Ax. C. Weighted ℓ -minimization 1 When the fractional sparsity δ is much smaller than one, a signal sampled from the model described above is sparse. Hence, it is possible to recover the signal from its measurements y by ℓ -minimization 1 which is formulated as min x 1 || || subject to Ax=y. Howeverthisdoesnotexploittheextrainformationavailablefromtheknowledgeofthepriors.Instead,we useweightedℓ -minimizationtorecoverthesparsesignalwhichiscapturedbythefollowingoptimization 1 problem: min x w,1 (1) || || subject to Ax=y. wherew ∈Rn is a vectorof positiveweightsand ||x||w,1 = ni=1wi|xi| refersto the weightedℓ1 norm of x, for a given weight vector w. The weight vector w plays a central role in determining whether (1) P successfully recovers the sparse signal x. Intuitively, w should be chosen in a certain way depending on p(.)soastoobtainthebestperformance(althoughatthispointwehavenotpreciselydefinedthemeaning of this). Keeping in mind the structure of p , i = 1,...,n, we suggest using weights w , i = 1,...,n i i whichhaveasimilarstructure.Formally,letf :[0,1] + beanon-negativenon-decreasingcontinuous function. Then we choose the weights as w =f i →. WRe call f(.) the weights shape function. i n (cid:0) (cid:1) D. Problem statement In this paper we try to answer the following two questions: Giventhe problemparameters(size of the matrix definedby m,n),the functionsp(.) and f(.), does • weighted ℓ -minimization in (1) recover the underlying sparse signal x with high probability? 1 Given a probability shape function p(.), and a family of weight shape functions , how to choose • W a function f(.) that has the best performance guarantees? ∈W We give an answer in the affirmative to the first question, given that the functionsp(.) and f(.) satisfy certain specified conditions. This is contained in the main result of this paper which is Theorem 1. Let the probability shape function p(.) and the weight shape function f(.) be given. Let E be the event that weighted ℓ -minimization described in (1) fails to recover the correct sparse vector x. 1 There exists aquantityψ¯ (p,f)whichcanbecomputedexplicitly asdescribedinSectionB-4, suchthat tot wheneverψ¯ (p,f)<0theprobabilityoffailureP(E)ofweightedℓ -minimizationdecaysexponentially tot 1 5 with respectto n. More precisely, if forsome ǫ>0 we have ψ (p,f) ǫ. then there exists a constant tot ≤− c(ǫ)>0 such that for large enough n, the probability of failure satisfies P(E) e nc(ǫ). − ≤ Toanswerthesecondquestion,wefirstdefinethemeasureofperformance.Foragivenvalueofα= m, n let δ¯be the maximum value of δ for which weighted ℓ -minimization can be guaranteedto succeed with 1 highprobability.Foragivenfamilyofweightfunctions ,theweightfunctionf(.)thathasthehighestδ¯ W is said to have the best performance.In thispaper we will describe a methodto choose a weightfunction in and demonstrate the method in Section V for a linearly parameterized family. W III. ANALYSIS OF WEIGHTEDℓ1-MINIMIZATION Inthissection,weanalyzetheperformanceofweightedℓ -minimizationwithweightsspecifiedbyf(.) 1 on a certain special class of sparse signals x. As we will see in Section IV, we can easily generalize the analysis to the signals drawn from our signal model. Recall that the failure event E from Theorem 1 is defined as the event that weighted ℓ -minimization 1 failstorecoverthecorrectsparsevectorx.WecalltheprobabilityofthiseventP(E)astheprobabilityof failure. Without loss of generality we can assume that x w,1 =1, that is x lies on a k 1-dimensional || || − face of the weighted cross-polytope , x n s.t. x w,1 1 P { ∈R || || ≤ } which is the weighted ℓ -ball in n dimensions. The specific face of on which x lies is determined by 1 P the support of x. The probability of failure P(E) can be written as P(E)= P(x F)P(E x F), ∈ | ∈ F X∈F where is the set of all faces of the polytope . Then, as shown in [10], the event E x F is F P { | ∈ } preciselythe eventthatthereexistsa non-zerou null(A)suchthat x+u w,1 x w,1 conditioned ∈ || || ≤|| || on x F . Also since A has i.i.d. Gaussian entries, sampling from the nullspace of A is equivalentto { ∈ } sampling a subspace uniformly from the Grassmann manifold Gr (n). So conditioned on x F (n m) the event E x F is same as the eventthat a uniformlychosen (n− m)-dimensionalsubspac{e sh∈ifte}d { | ∈ } − to x intersects the polytope at a point other than x. The probability of the above event is also called P the complementaryGrassmannAngleforthe face F with respectto the polytope undertheGrassmann P manifoldGr (n).BasedonworkbySantalo´ [14]andMcMullen[15]theComplementaryGrassmann (n m) Angle can be e−xpressed explicitly as the sum of products of internal and external angles. P(E x F)=2 β(F,G)γ(G, ) (2) | ∈ P Xs≥0G∈J(mX+1+2s) where β(F,G) and γ(G, ) are the internal and external angles and J(r) is the set of all r-dimensional P faces of . The definitions of internal and external angles can be found in [10]). We include them here P for completeness. Theinternalangleβ(F,G)isthefractionofthevolumeoftheunitballcoveredbytheconeobtained • by observing the face G from any any point in face F. The quantity β(F,G) is defined to be zero if F is not contained in G and is defined to be one if F =G. The external angle γ(G, ) is defined to be the fraction of the volume of the unit ball covered by • P the cone formed by the outward normals to the hyperplanes supporting at the face G. If G= P P then γ(G, ) is defined to be one. P 6 In this section we describe a method to obtain upperboundsfor P(E x F) by finding upperbounds | ∈ on the internal and external angles described above. We first analyze P(E x F) for the “simplest” | ∈ class of faces. We denote by Fk, the face whose vertices are given by 1 e , 1 e ,..., 1 e . Thus the 0 w1 1 w2 2 wk k vertices are defined by the first k indices and we call this face the “leading” k 1-dimensional face of − . We will spend much of this section developing bounds for P(E x Fk) for such “leading” faces. P | ∈ 0 Then in Section IV, we describe the typical set of our signal model and show that for the purposes of boundingP(E) it is sufficient to consider a certain special class of faces. The boundswe develop in this section for “leading” faces can be easily generalized to faces belonging to this special class. A. Angle Exponents for leading faces Wedefinethefamilyofleadingfaces asthesetofallfaceswhoseverticesaregivenby 1 e , 1 e ,..., 1 e F1 w1 1 w2 2 wk k for some k. In this section we will establish the following result which bounds the internal and external angles related to a leading face of . We will then use this result in the next section to providean upper P bound on P(E x F) for F . 1 | ∈ ∈F Theorem 2. Let k = δ and l = τ with τ > δ. Let Gl be the face whose vertices are given by n n 0 1 e , 1 e ,..., 1 e and let Fk Gl be the face whose vertices are given by 1 e , 1 e ,..., 1 e . Twh1en1 twh2ere2 existwqlualntities ψ¯ 0(τ⊂) an0d ψ¯ (δ,τ) which can be computed explwic1itl1y wa2s d2escribwekd ikn ext int Section III-C2 and Section III-C3 respectively, such that for any ǫ > 0, there exist integers n (ǫ) and 0 n (ǫ) satisfying 1 n 1log(β(Fk,Gl))<ψ¯ (δ,τ)+ǫ, for all n>n (ǫ), • n−1log(γ(G0l, )0)<ψ¯ in(tτ)+ǫ, for all n>n (ǫ)0. • − 0 P ext 1 The quantities ψ¯ (δ,τ) and ψ¯ (τ) are called the optimized internal angle exponent and the external int ext angle exponent respectively. Wheneveranyoftheangleexponentsdescribedaboveisnegative,thecorrespondingangledecaysatan exponentialratewithrespecttotheambientdimensionn.WewillproveTheorem2overthefollowingtwo subsections by finding the optimized internal and external angle exponents that satisfy the requirements of the theorem. 1) InternalAngleExponent: Inthissubsectionwefindtheoptimizedinternalangleexponentψ¯ (δ,τ) int that satisfies the conditions described in Theorem 2. We begin by stating the following result from [10] which providesthe expression for the internal angle of a face F with respect to a face G in terms of the weights associated with their vertices. Subsequently,we find an asymptotic upper bound on the exponent of the internal angle using that expression. In what follows, we denote by HN(0,σ2) the distribution of a half normal random variable obtained by taking the absolute value of a N(0,σ2) distributed random variable. Lemma 1. [10] Define σ = j w2. Let Y N(0,1) be a normal random variable and Y i,j p=i p 0 ∼ 2 p ∼ w2 HN(0,2σp1+,kk) for p = 1,2,...,lP−k be independent half normal distributed random variables that are independent of Y0. Define the random variable Z ,Y0− pl−=k1Yp. Then, β(Fk,Gl)= √π σP1,lp (0), 0 0 2l k σ Z − r 1,k where p (.) is the density function of Z. Z 7 We now proceed to derive an upper bound for the quantity p (0). Much of the analysis is along the Z lines of [2]. Let the random variable S be defined as S = pl−=k1Yp. So, Z = Y0 −S and using the convolution integral, the density function of Z can be written as P ∞ ∞ p (0)= p ( v)p (v)dv =2 vp (v)F (v)dv, Z Y0 − S Y0 S Z−∞ Z0 where F (v) is the cumulative distribution function of S. Let µ be the mean of the random variable S. S S Then, µS ∞ p (0)=2 vp (v)F (v)dv+2 vp (v)F (v)dv. Z Y0 S Y0 S Z0 ZµS I II As in [2], the second term satis|fies II <{zµ∞S2vpY0(v})dv|= e−µ2S.{Azs we will}see later in this section µ2 cn for some c > 0 and hence II e c2n2. Since we are interested in computing the asymptotic S ∼ ∼R− exponentofp (0), we can ignoreII fromthiscomputation.To boundI, we use F (v) exp( λ (v)), Z S ≤ − ∗S where λ (v) denotes the rate function (convex conjugate of the characteristic function λ (.)) of the ∗S S random variable S. So we get I 2 µSve−v2−λ∗S(v)dv. (3) ≤ √π Z0 For ease of notation we define the following quantities. l s , w k+1,l p p=k+1 X 1 λ(s), s2+log(2Φ(s)) 2 l k λ (s), 1 − λ(w s) 0 p+k s k+1,l p=1 X λ∗0(y),msaxsy−λ0(s). Here λ(s) is the characteristic function of the standard half normal random variable and Φ(x) is the standard normal distribution function. Using the above definition, we can express the relation between λ (y) and λ (y) as ∗S ∗0 1 s k+1,l λ (y)= λ y ∗0 sk+1,l ∗S(cid:18)√σ1k (cid:19) We compute l k µ = − EY = 2 sk+1,l S p p=1 rπ√σ1,k X Changing variables in (3) by substituting v = sk+1,ly, we get √σ1,k I 2 s2k+1,l √π2 y exp[ s2k+1,ly2 s λ (y)]dy. ≤ √π 2σ −2σ − k+1,l ∗0 1,k Z0 1,k 8 Now, as w =f i , we have i n (cid:0) (cid:1) l i τ s = f =n f(x)dx+o(1) . k+1,l n i=k+1 (cid:18) (cid:19) (cid:18)Zδ (cid:19) X Define c (δ,τ), τ f(x)dx. This gives us s =n(c (δ,τ)+o(1)). Similarly, 0 δ k+1,l 0 R k k i δ σ = w2 = f2 =n( f2(x)dx+o(1))=n(c (δ)+o(1)), 1,k i n 1 i=1 i=1 (cid:18) (cid:19) Z0 X X where c (δ) is defined as c (δ), δf2(x)dx. This gives us 1 1 0 s2 R c2 2σk+1,l y2+sk+1,lλ∗0(y)=n 2c0 y2+c0λ∗(y)+o(1) =n(η(y)+o(1)). k+1,l (cid:18) 1 (cid:19) where η(y) is defined as η(y) , 2cc201y2+c0λ∗0(y) . Using Laplace’s method to bound I as in [2], we get (cid:16) (cid:17) I Rne−nη(y∗), (4) ≤ where n 1log(R )=o(1), and y is the minimizer of the convex function η(y). Let − n ∗ λ (y )=max sy λ (s)=s y λ (s ). ∗0 ∗ s ∗− 0 ∗ ∗− 0 ∗ ′ ′ Then the maximizing s satisfies λ (s ) = y . From convex duality, we have λ (y ) = s . The ∗ 0 ∗ ∗ ∗0 ∗ ∗ minimizing y of η(y) satisfies ∗ c20y +c λ ′(y )=0 c ∗ 0 ∗0 ∗ 1 c2 = 0y +c s =0. (5) ∗ 0 ∗ ⇒ c 1 This gives λ′(s )= c1s . (6) 0 ∗ −c ∗ 0 First we approximate λ (s) as follows 0 1 l−k 1 l p 1 τ λ (s)= λ(w s)= λ f s +o(1)= λ(sf(x))dx+o(1). 0 p+k s nc n c k+1,l Xp=1 0 p=Xk+1 (cid:16) (cid:16) (cid:17) (cid:17) 0 Zδ From the above we obtain d 1 τ ′ λ (s)= f(x)λ (sf(x))dx+o(1). 0 ds c 0 Zδ Combining this with equation (6), we can determine s up to a o(1) error by finding the solution to the ∗ equation τ ′ f(x)λ (s f(x))dx+c s =0. (7) ∗ 1 ∗ Zδ 9 We define the internal angle exponent as ψ (β,τ,y), (τ δ)log2 η(y) (8) int − − − c2 = (τ δ)log2 0 y2+c λ (y) , (9) − − − 2c 0 ∗0 (cid:18) 1 (cid:19) and the optimized internal angle exponent as ψ¯ (β,τ),maxψ (β,τ,y)= (τ δ)log2 η(y ) (10) int int ∗ y − − − c2 = (τ δ)log2 0 y 2+c λ (y ) , (11) − − − 2c ∗ 0 ∗0 ∗ (cid:18) 1 (cid:19) where y is determined through s from equation (5) and s is determined by solving the equation (7). ∗ ∗ ∗ From inequality (4) and Lemma 1 we see that the function ψ¯ (β,τ) satisfies the conditions described int in the first part of Theorem 2. 2) ExternalAngleExponent: Inthissubsectionwefindtheoptimizedexternalangleexponentψ¯ (τ) ext thatsatisfies the statementof Theorem2. We proceedsimilar to the previoussection and beginby stating the following lemma from [10] which provides the expression for the external angle of a face G with respect to the polytope in terms of the weights associated with its vertices. We then find an upper P bound on the asymptotic exponent of the external angle. Lemma 2. [10] The external angle γ(Gl, ) is given by 0 P n wpx γ(Gl0,P)=π−n−2l+12n−lZ0∞e−x2p=l+1 Z0√σ1,l e−yp2dyp!dx. Y To simplify the expression for the external angle, we define the standard error function as 2 x erf(x)= e t2dt − √π Z0 and rewrite the external angle as n γ(Gl0,P)= σπ1,l ∞e−σ1,lx2 erf(wix)dx. r Z0 i=l+1 Y Similar to the method used in the previous section l τ σ = w2 =n f2(x)dx+o(1) =nc (τ)+o(1). 1,l i 2 i=1 (cid:18)Z0 (cid:19) X where c (τ) is defined as c (τ), τf2(x)dx. Substituting this we have 2 2 0 γ(Gl, )= σ1,lR ∞exp n c x2 1 n log(erf(w x)) dx 0 P r π Z0 "− 2 − ni=l+1 i !# X σ1,l ∞ = exp[ nζ(x)]dx. π − r Z0 10 where ζ(x) is defined as ζ(x) , c x2 1 n log(erf(w x)) . Again using Laplace’s method we 2 − n i=l+1 i get (cid:0) P (cid:1) γ(Gl, ) R exp[ nζ(x )], (12) 0 P ≤ n − ∗ wherex istheminimizerofζ(x)andn 1log(R )=o(1).Theminimizingx satisfies2c x =G′(x ), ∗ − n ∗ 2 ∗ 0 ∗ where n 1 G (x), log(erf(w x)). 0 i n i=l+1 X We first approximate G (x) as follows: 0 1 n 1 n 1 G (x)= log(erf(w x))= log(erf(f(i/n)x))= log(erf(xf(y)))dy+o(1). 0 i n n i=l+1 i=l+1 Zτ X X So the minimizing x can be computed up to an error of o(1) by solving the equation ∗ 1 f(y)erf′(x(f(y)) 2c x = dy. (13) 2 ∗ erf(xf(y)) Zτ We define the external angle exponent as ψ¯ (τ,x), ζ(x) (14) ext − and the optimized external angle exponent as 1 ψ¯ (τ),maxψ¯ (τ)= ζ(x )= c x 2 log(erf(x f(y)))dy . (15) ext ext ∗ 2 ∗ ∗ x − − − (cid:18) Zτ (cid:19) where x can be obtained by solving equation (13). ∗ From (12) it is clear that this function satisfies the conditions in the second part of Theorem 2. This completes the proof of both parts of the theorem. B. Recovery Threshold for the family of leading faces In this section we use the bounds on the asymptotic exponents of the internal and external angle from Theorem 2 to find an upper bound for P(E x F) for F . The main result of this section is the 1 | ∈ ∈ F following theorem. Theorem 3. Let F =Fk for k =δn. F be the face with k =δn. There exists a function ψ¯ (δ) 0 δ ∈F1 tot whichwe callthetotalexponentofFk suchthatgivenǫ>0, thereexists n(ǫ)suchthatforalln>n(ǫ), 0 1 log(P(E x Fk))< nψ¯ (δ)+ǫ. The function ψ¯ (δ) can be computed explicitly as described in n | ∈ 0 − tot tot (16). Using the decomposition equation (2) and the fact that β(F,G) is non-zero only if F G, we get ⊆ P(E x F) 2 β(F,G)γ(G, ). | ∈ ≤ P (l>m)(G F, G J(l)) X ⊇ X∈ Recall that J(r) is the set of all r-dimensional faces of . P To proceed, we will need the following useful lemma (proof can be found in the appendix).