ebook img

Sequential Sensing with Model Mismatch PDF

0.6 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Sequential Sensing with Model Mismatch

Sequential Sensing with Model Mismatch 1 Ruiyang Song, Yao Xie, and Sebastian Pokutta Abstract—We characterize the performance of sequential infor- signal and the result of the measurement disregarding noise and mation guided sensing, Info-Greedy Sensing [1], when there is a what has already been learned from previous measurements. mismatch between the true signal model and the assumed model, It was shown in [1] that Info-Greedy Sensing for a Gaussian whichmaybeasampleestimate.Inparticular,weconsiderasetup signal is equivalent to choosing the sequential measurement where the signal is low-rank Gaussian and the measurements vectors a ,a ,... as the orthonormal eigenvectors of Σ in a 1 2 are taken in the directions of eigenvectors of the covariance decreasing order of eigenvalues. 5 matrix Σ in a decreasing order of eigenvalues. We establish a set 1 of performance bounds when a mismatched covariance matrix In practice, we do not know the signal covariance matrix Σ 0 Σ(cid:98) is used, in terms of the gap of signal posterior entropy, as and have to use a sample covariance matrix Σ(cid:98) as an estimate. 2 well as the additional amount of power required to achieve the As a consequence, the measurement vectors are calculated from r same signal recovery precision. Based on this, we further study Σ(cid:98), which deviate from the optimal directions. Since we almost a how to choose an initialization for Info-Greedy Sensing using always have to use some estimate for the signal covariance, it M the sample covariance matrix, or using an efficient covariance is important to quantify the performance of sensing algorithms sketching scheme. with model mismatch. 6 Keywords—compressed sensing, information theory, sequential Inthispaper,wecharacterizetheperformanceofInfo-Greedy 1 methods, high-dimensional statistics, sketching algorithms SensingforGaussiansignals[1]whenthetruesignalcovariance matrixisreplacedwithaproxy,whichmaybeanestimatefrom ] L direct samples or using a covariance sketching scheme. We M establish a set of theoretical results including (1) relating the I. INTRODUCTION error in the covariance matrix (cid:107)Σ−Σ(cid:98)(cid:107) to the entropy of the . Sequential compressed sensing is a promising new informa- t signal posterior distribution after each sequential measurement, a tion acquisition and recovery technique to process big data and thus characterizing the gap between this entropy and st that arise in various applications such as compressive imaging the entropy when the correct covariance matrix is used; (2) [ [2]–[4], power network monitoring [5], and large scale sensor establishing an upper bound on the amount of additional power networks[6].Thesequentialnatureoftheproblemsariseseither 2 required to achieve the same precision of the recovered signal because the measurements are taken one after another, or due v if using an estimated covariance matrix; (3) if initializing Info- to the fact that the data is obtained in a streaming fashion so 1 Greedy Sensing via a sample covariance matrix, finding the 4 that it has to be processed in one pass. minimum number of samples required so that using such an 2 Toharvestthebenefitsofadaptivityinsequentialcompressed initialization can achieve good performance; (4) presenting a 6 sensing, various algorithms have been developed (see [1] for covariance sketching scheme to initialize Info-Greedy Sensing 0 a review.) We may classify these algorithms as (1) being and find the conditions so that using such an initialization is . agnostic about the signal distribution and, hence, using random 1 sufficient. We also present a numerical example to demonstrate 0 measurements [7]–[13]; (2) exploiting additional structure of the good performance of Info-Greedy Sensing compared to a 5 the signal (such as graphical structure [14] and tree-sparse batch method (where measurements are not adaptive) when 1 structure [15], [16]) to design measurements; (3) exploiting there is mismatch. v: the distributional information of the signal in choosing the Our notations are standard. Denote [n](cid:44){1,2,...,n}; (cid:107)X(cid:107) measurementspossiblythroughmaximizingmutualinformation: i isthespectralnormofamatrixX,(cid:107)X(cid:107) denotestheFrobenius X the seminal Bayesian compressive sensing work [17], Gaussian F norm of a matrix X, and (cid:107)X(cid:107) represents the nuclear norm of ∗ r mixture models (GMM) [18], [19] and our earlier work [1] a matrix X; (cid:107)x(cid:107) is the (cid:96) norm of a vector x, and (cid:107)x(cid:107) is the a which presents a general framework for information guided (cid:96) norm of a vector x; l2et χ2 be the quantile function1of the sensing referred to as Info-Greedy Sensing. c1hi-squareddistributionwithnndegreesoffreedom;letE[x]and InthispaperweconsiderthesetupofInfo-GreedySensing[1], Var[x] denote the mean and the variance of a random variable asitprovidescertainoptimalityguarantees.Info-GreedySensing x; X (cid:23)0 means that the matrix X is positive semi-definite. aims at designing subsequent measurements to maximize the mutual information conditioned on previous measurements. Conditional mutual information is a natural metric here, as II. PROBLEMSETUP it captures exclusively useful new information between the A typical sequential compressed sensing setup is as follows. Let x∈Rn be an unknown n-dimensional signal. We make K Ruiyang Song ([email protected]) is with the Dept. of measurements of x sequentially ElectronicEngineering,TsinghuaUniversity,Beijing,China. Yao Xie ([email protected]) and Sebastian Pokutta (sebas- (cid:124) y =a x+w , k =1,...,K, [email protected])areH.MiltonStewartSchoolofIndustrialand k k k SystemsEngineering,GeorgiaInstituteofTechnology,Atlanta,GA. andthepowerofthemeasurementis(cid:107)a (cid:107)2 =β .Thegoalisto ThisworkispartiallysupportedbyNSFgrantCMMI-1300144andCCF- k k 1442635. recover x using measurements {yk}Kk=1. Consider a Gaussian Algorithm 1 Info-Greedy Sensing for Gaussian signals a a Σˆk−1 k Σˆk Σk−1 k Σk Require: assumed signal mean θ and covariance matrix Γ, algorithm parameter(cid:1) true parameter(cid:1) noise variance σ2, recovery accuracy ε, confidence level p 1: repeat Fig.1:Parameterupdateinthealgorithmandforthetruedistribution. 2: λ←(cid:107)Γ(cid:107) 3: β ←(χ2(p)/ε2−1/λ)σ2 n {largest eigenvalue} signal x ∼ N(0,Σ) with known zero mean and covariance 4: u← normalized eigenvec√tor of Γ for eigenvalue λ matrix Σ (here without loss of generality we have assumed the 5: form measurement: a= βu signalhaszeromean).AssumetherankofΣissandthesignal 6: measure: y =a(cid:124)x+w can be low-rank s(cid:28)n. Info-Greedy Sensing [1] chooses each 7: update mean: θ ←θ+Γa(y−a(cid:124)θ)/(λ+σ2) measurement to maximizes the conditional mutual information 8: update covariance: Γ←Γ−Γaa(cid:124)Γ/(λ+σ2) ak ←argmaxI[x;a(cid:124)x+w|yj,aj,j <k]/βk. 9: until (cid:107)Γ(cid:107)≤ε2/χ2n(p) {all eigenvalues become small} a 10: return posterior mean θ as a signal estimate xˆ The goal is to use minimum number of measurements (or total power) so that the estimated signal is recovered with precision ε: (cid:107)xˆ−x(cid:107)<ε with high probabilities. Σ with the eigenvalues (which can be zero) ranked from the In [1], we have devised a solution to the above problem, and largest to the smallest to be (λ ,u ),(λ ,u ),...,(λ ,u ), 1 1 2 2 n n established that Info-Greedy Sensing for low-rank Gaussian and let the eigenpairs of Σ(cid:98) with the eigenvalues (which signal is to measure in the directions of the eigenvectors of can be zero) ranked from the largest to the smallest to be Σ in a decreasing order of eigenvalues with power allocation (λˆ ,uˆ ),(λˆ ,uˆ ),...,(λˆ ,uˆ ). Let the updated covariance depending on the noise variance, signal recovery precision ε 1 1 2 2 n n matrix in Algorithm 1 starting from Σ(cid:98) after k measurements and confidence level p, as given in Algorithm 1. Ideally, if we know the true signal covariance we will use using{ai}ki=1 beΣ(cid:98)k,andthetrueconditionalcovariancematrix of the signal after these measurements be Σ . The evolution of the corresponding eigenvector to form measurements. However, k the covariance matrices in Algorithm 1 is illustrated in Fig. 1. in practice, we have to use an estimate of the covariance matrix Hence,bythisnotation,sinceeachtimewemeasureindirection which usually has errors. To establish performance bound when of the dominating eigenvector of the updated covariance matrix, thereisamismatchbetweentheassumedandthetruecovariance matrix, we adopt a metric which is the posterior entropy of we have that (λˆk,uˆk) is the largest eigenpair of Σ(cid:98)k−1, and that (λ ,u ) is the largest eigenpair of Σ . Furthermore, denote the signal conditioned on previous measurement outcomes. The k k k−1 the difference between the true and the assumed conditional entropy of a Gaussian signal x∼N(µ,Σ) is given by covariance matrices after we obtain k measurements 1 H[x]= 2ln((2πe)ndet(Σ)). Ek =Σ(cid:98)k−Σk, Hence, the conditional mutual information is essentially the and let log of the determinant of the conditional covariance matrix, δ =(cid:107)E (cid:107). k k or equivalently the log of the volume of the ellipsoid defined Assume the eigenvalues of E are e ≥e ≥···≥e . Then by the covariance matrix. Here, to accommodate the scenario k 1 2 n δ =max{|e |,|e |}. wherethecovariancematrixislow-rank,weconsideramodified k 1 n definitionforconditionalentropy,whichisthelogofthevolume of the ellipsoid on the low-dimensional space. Let Σ be the k underlying true signal covariance conditioned on the previous A. Deterministic error k measurements; denote by Σ(cid:98)k the observed covariance matrix, which is also the output of the sequential algorithm. Assume The following theorem shows that when the error, (cid:107)Σ(cid:98) −Σ(cid:107) the rank of Σ is s. Then the metric we use to track the progress is sufficiently small, the performance of Info-Greedy Sensing of our algorithm is will not degrade much. Note that, however, if the power allocations β are calculated using the eigenvalues of the H[x|y ,a ,j ≤k]=ln((2πe)s/2Vol(Σ )), i j j k assumed covariance matrix Σ(cid:98), after K = s iterations, we do where Vol(Σ ) is the volume of the ellipse defined by the not necessarily reach the desired precision ε with probability p. k covariance matrix Σk, which is equal to the product of its Theorem 1. Assume the power allocations β =(χ2(p)/ε2− non-zero eigenvalues. 1/λˆk)σ2 are calculated using eigenvalues λˆkk of Σ(cid:98),nthe noise variance σ2, recovery accuracy ε and confidence level p in III. PERFORMANCEBOUNDS Algorithm1.Giventherankofthecovariancematrixrank(Σ)= s, the number of total measurements is K, for some constant We analyze the performance of Info-Greedy Sensing, when 0<ζ <1, if the error satisfies the assumed covariance matrix is used for measurement design, Σ(cid:98), which is different from the true signal covariance matrix Σ, ζ ε2 (cid:107)Σ−Σ(cid:98)(cid:107)≤ , i.e. Σ(cid:98) is used to initialize Algorithm 1. Let the eigenpairs of 4K+1χ2(p) n 2 then where β˜ =(χ2(p)/ε2−1/λ )σ2. Hence, we have j n j   s  (cid:88)k  H[x|yj,aj,j ≤k]≤ H[x|y ,a ,j ≤k]≤ ln[2πetr(Σ)]− ln(1/f ) , j j 2 j  H [x,|y ,a ,j ≤k]+ s ln tr(Σ) j=1 (1) ideal j j 2 (cid:113)s (cid:81)sj=1λi where 1(cid:88)k χ2(p) ε2 1 1−ζ β λˆ − [ n λ +(1−ζ)(1− ]. (4) fk =1− s β λˆk+kσ2 ∈(0,1), k =1,...,K. 2 j=1 ε2 i χ2n(p))λˆj k k This upper bound has a nice interpretation: it characterizes the amount of uncertainty reduction with each measurement. IntheproofofTheorem1,weusethetraceoftheunderlying For example, when the number of measurements required when actual covariance matrix tr(Σ ) as potential function, which using the assumed covariance matrix versus using the true k serves as a surrogate for the product of eigenvalues that covariance matrix are the same, we have λi ≥ε2/χ2n(p) and determines the entropy, since the calculation of the trace of the λˆ ≥ε2/χ2(p). Hence, the third term in (4) is upper bounded i n observed covariance matrix tr(Σ(cid:98)k) is much easier. Note that by −k/2, which means that the amount of reduction in entropy for an assumed covariance matrix Σ, after measuring in the is roughly 1/2 nat per measurement. direction of a unit norm eigenvector u with eigenvalue λ using Remark 3. Consider the special case where the errors only power β, the updated matrix takes the form of occur in the eigenvalues of the matrix but not in the eigenspace Σ−Σ(cid:112)βu(cid:16)(cid:112)βu(cid:124)Σ(cid:112)βu+σ2(cid:17)−1(cid:112)βu(cid:124)Σ U, i.e. (cid:124) Σ(cid:98) −Σ=Udiag{e1,··· ,es}U (2) λσ2 = uu(cid:124)+Σ⊥u, and max1≤i≤s|ei|=δ0, the upper bound in (3) can be further βλ+σ2 simplified.SupposeonlythefirstK(K ≤s)largesteigenvalues whereΣ⊥aisthecomponentofΣintheorthogonalcomplement of Σ(cid:98) are larger than the stopping criterion ε2/χ2n(p) required of a. Thus, the only change in the eigen-decomposition of Σ by the precision, i.e., the algorithm takes K steps in total. Then is the update of the eigenvalue of a from λ to λσ2/(βλ+σ2). H[x|y ,a ,j ≤k]≤H [x,|y ,a ,j ≤k] j j ideal j j Based on the update above in (2), after one measurement, the χ2(p) trace of the covariance matrix that the algorithm keeps track of +Kln(1+ n δ ) becomes ε2 K tr(Σ(cid:98)k)=tr(Σ(cid:98)k−1)− β λˆβkλˆ+2kσ2. + (cid:88)s ln(1+ δ0+λ δK). k k j j=K+1 This characterizes the gap between the signal posterior entropy using the correct versus the incorrect covariance matrices after Remark 1. The upper bound of the posterior signal entropy all measurements have been used. in (1) shows that the amount of uncertainty reduction by the kth measurement is roughly (s/2)ln(1/f ). k If we allow more total power and use a different power allocation scheme than what is prescribed in Algorithm 1, we areabletoreachthedesiredprecisionε.Thefollowingtheorem Remark 2. Use the inequality that ln(1−x) ≤ −x for x ∈ establishes an upper bound on the amount of extra total power (0,1), we have that in (1) needed to reach the same precision ε (than the total power P if using the correct covariance matrix). H[x|yj,aj,j ≤k]≤ 2s ln[2πetr(Σ)]− 1−2 ζ (cid:88)k β λˆβkλˆ+kσ2 Thideeoalrem 2. Given the recovery precision ε, confidence level j=1 k k p, rank of the true covariance matrix rank(Σ) = s, assume s k(1−ζ) K ≤s eigenvalues of Σ are larger than ε2/χ2(p). If n = ln[2πetr(Σ)]− 2 2 1 ε2 (1−ζ)ε2 (cid:88)k 1 (cid:107)Σ(cid:98) −Σ(cid:107)≤ 4s+1χ2(p), + . n 2χ2(p) λˆ n j=1 j then to reach a precision ε at confidence level p, the total On the other hand, if the true covariance matrix is used, the power Pmismatch required by Algorithm 1 when using Σ(cid:98) is upper bounded by posterior entropy of the signal is given by 20 1 χ2(p) H [x,|y ,a ,j ≤k]= 1ln[(2πe)s(cid:89)s λ ]− χ2n(p)(cid:88)k λ Pmismatch <Pideal+[51s+ 272K] nε2 σ2. ideal j j 2 i 2ε2 i i=1 j=1 Remark 4. In a special case when K = s eigenvalues of Σ (3) are larger than ε2/χ2(p), then under the condition of Theorem n 3 a a Σˆ k Σˆ Σ k Σ k−1 k k−1 k algorithm parameter(cid:1) true parameter(cid:1) 2, we have a simpler expression for the upper bound ⎡ b ,⋅ +w ⎤2 ⎣ ij ij⎦ 323χ2(p) j=1,...,N,i=1,...,M P <P + n σ2s. mismatch ideal 816 ε2 x!1 Note that the additional power required is only linear in s, Σ γ which is quite small. All other parameters are independent of 1 the input matrix. Σ γ 2 Also, note that when there is a mismatch in the assumed … covariance matrix, better performance can be achieved if we (cid:1) make many low power measurements than making one full … power measurement because we update the assumed covariance (cid:1) matrix in between. Σ γ M x! N B. Initialization with sample covariance matrix Fig. 2: Diagram of covariance sketching in our setting. The circle In practice, we usually use a sample covariance matrix for aggregatesquadraticsketchesfrombranchesandcomputestheaverage. Σ(cid:98). When the samples are Gaussian distributed, the sample covariance matrix follows a Wishart distribution. By finding the tail probability of the Wishart distribution, we are able to Theorem 3. Assume the setup of covariance sketching as √ establish a lower bound on the number of samples to form the above. Then with probability exceeding 1−2/n−2/ n− √ sample covariance matrix so that the conditions required by 2nexp(− n))−exp(−c c ns), the solution to (5) satisfies 0 1 Theorem 1 are met with high probability and, hence, Algorithm 1 has good performance with the assumed matrix Σ(cid:98). (cid:107)Σ(cid:98) −Σ(cid:107)≤δ0, Corollary1. Supposethesamplecovariancematrixisobtained for some δ0 > 0, as long as for some constant c > 0 the from training samples x˜ ,...,x˜ that are drawn i.i.d. from parameters M, N, L, and τ are chosen such that 1 L N(0,Σ): M (cid:44)cns>c ns, L 0 Σ(cid:98) = L1 (cid:88)x˜ix˜(cid:124)i. N ≥4n1/2tr(Σ)(36c2n4s2(cid:107)Σ(cid:107) + 24cn2s), i=1 τ2 τ cs 1 6cns Let δ0 =(cid:107)Σ(cid:98) −Σ(cid:107). When L≥max{4n(cid:107)Σ(cid:107)σ2, (cid:112)2tr(Σ)(cid:107)Σ(cid:107)csn3σ2, τ σ2}, (cid:107)Σ(cid:107) 4 L≥4n1/2tr(Σ)( + ), τ =cnsδ0/C2. δ2 δ 0 0 Here c , c , C , and C are absolute constants. 0 1 1 2 we have √(cid:107)Σ(cid:98) − Σ(cid:107) ≤ δ0 with probability exceeding 1 − 2nexp(− n). IV. NUMERICALEXAMPLE Whentheassumedcovariancematrixforthesignalxisequal C. Initialization with covariance sketching to its true covariance matrix, Info-Greedy Sensing is identical to the batch method [19] (the batch method measures using the We may also use a covariance sketching scheme to form an largest eigenvectors of the signal covariance matrix). However, estimate of the covariance matrix to initialize the algorithm, whenthereisamismatchbetweenthetwo,Info-GreedySensing as illustrated in Fig. 2. Covariance sketching is based on outperforms the batch method due to its adaptivity, as shown sketches γ , j = 1,...,M, of the samples x˜ , i = 1,...,N j i by the example demonstrated in Fig. 3. Info-Greedy Sensing drawn from the signal distribution. The sketches are formed by also outperforms the sensing algorithm where a are chosen to linearly projecting these samples via random sketching vectors i be random Gaussian vectors with the same power allocation, b ,i=1,...,M andthencomputingtheaverageenergyoverL i as it uses prior knowledge (albeit being imprecise) about the repetitions. The sketching can be shown to be a linear operator signal distribution. B applied on the original covariance matrix Σ, as demonstrated in Appendix A. Then we may recover the original covariance matrix from these sketches γ by solving the following convex V. DISCUSSION program In high-dimensional problems, a commonly used low- dimensional signal model for x is to assume the signal lies Σ(cid:98) =argminX tr(X) (5) in a subspace plus Gaussian noise, which corresponds to the subject to X (cid:23)0, (cid:107)γ−B(X)(cid:107)1 ≤τ, case we considered in this paper where the signal covariance where τ is a user parameter that specifies the noise level. is low-rank. A more general model is the Gaussian mixture model (GMM), which can be viewed as a model for the signal Wefurtherestablishconditionsonthecovariancesketchingso lying in a union of multiple subspaces plus Gaussian noise, and that such an initialization for Info-Greedy Sensing is sufficient. it has been widely used in image and video analysis among 4 Mismatch Σx [5] W.BoonsongandW.Ismail,“Wirelessmonitoringofhouseholdelectrical 101 power meter using embedded RFID with wireless sensor network Random platform,”Int.J.DistributedSensorNetworks,ArticleID876914,10 Batch pages,vol.2014,2014. Info−Greedy 100 ε = 0.1 [6] B.Zhang,X.Cheng,N.Zhang,Y.Cui,Y.Li,andQ.Liang,“Sparse targetcountingandlocalizationinsensornetworksbasedoncompressive sensing,”inIEEEInt.Conf.ComputerCommunications(INFOCOM), ˆx2|| pp.2255–2258,2014. −10−1 [7] J.Haupt,R.Nowak,andR.Castro,“Adaptivesensingforsparsesignal x|| recovery,” in IEEE13th DigitalSignal ProcessingWorkshopand 5th IEEESignalProcessingEducationWorkshop(DSP/SPE),pp.702–707, 10−2 2009. [8] M. A. Davenport and E. Arias-Castro, “Compressive binary search,” arXiv:1202.0937v2,2012. 10−30 200 400 600 800 1000 [9] Aar.XivT:a1j2e1r0:2an4d06vH1,.20V1.2.Poor, “Quick search for rare events,” ordered trial [10] D. Malioutov, S. Sanghavi, and A. Willsky, “Sequential compressed sensing,”IEEEJ.Sel.TopicsSig.Proc.,vol.4,pp.435–444,April2010. Fig. 3: Sensing a low-rank Gaussian signal of dimension n = 500 [11] J.Haupt,R.Baraniuk,R.Castro,andR.Nowak,“Sequentiallydesigned andabout5%oftheeigenvaluesarenon-zero,whenthereismismatch compressedsensing,”inProc.IEEE/SPWorkshoponStatisticalSignal between the assumed covariance matrix and true covariance matrix: Processing,2012. Σx,assumed = Σx,true + ee(cid:124), where e ∼ N(0,I), and using [12] S. Jain, A. Soni, and J. Haupt, “Compressive measurement designs 20 measurements. The batch method measures using the largest for estimating structured signals in structured clutter: A Bayesian eigenvectors of Σx,assumed, and the Info-Greedy Sensing updates experimentaldesignapproach,”arXiv:1311.5599v1,2013. Σx,assumed in the algorithm. Info-Greedy Sensing is more robust to [13] M.L.MalloyandR.Nowak,“Near-optimaladaptivecompressedsensing,” mismatch than the batch method. arXiv:1306.6239v1,2013. [14] A. Krishnamurthy, J. Sharpnack, and A. Singh, “Recovering graph- structured activations using adaptive compressive measurements,” in AnnualAsilomarConferenceonSignals,Systems,andComputers,Sept. others.Ouranalysisforalow-rankGaussiansignalcanbeeasily 2013. extended to an analysis of a low-rank Gaussian mixture model [15] T. Ervin and R. Castro, “Adaptive sensing for estimation of structure (GMM). Such results for GMM are quite general and can be sparsesignals,”arXiv:1311.7118,2013. used for an arbitrary signal distribution. In fact, parameterizing [16] S. Akshay and J. Haupt, “On the fundamental limits of recovering via low-rank GMMs is a popular way to approximate complex treesparsevectorsfromnoisylinearmeasurements,”IEEETrans.Info. Theory,vol.60,no.1,pp.133–149,2014. densities for high-dimensional data. Hence, we may be able [17] S.Ji,Y.Xue,andL.Carin,“Bayesiancompressivesensing,”IEEETrans. to couple the results for Info-Greedy Sensing of GMM with Sig.Proc.,vol.56,no.6,pp.2346–2356,2008. the recently developed methods of scalable multi-scale density [18] J.M.Duarte-Carvajalino,G.Yu,L.Carin,andG.Sapiro,“Task-driven estimation based on empirical Bayes [20] to create powerful adaptivestatisticalcompressivesensingofGaussianmixturemodels,” tools for information guided sensing for a general signal model. IEEETrans.Sig.Proc.,vol.61,no.3,pp.585–600,2013. We may also be able to obtain performance guarantees using [19] W. Carson, M. Chen, R. Calderbank, and L. Carin, “Communication multiplicative weight update techniques together with the error inspiredprojectiondesignwithapplicationtocompressivesensing,”SIAM bounds in [20]. J.ImagingSciences,2012. [20] Y. Wang, A. Canale, and D. Dunson, “Scalable multiscale density estimation,”arXiv:1410.7692,2014. ACKNOWLEDGEMENT [21] G. W. Stewart and J.-G. Sun, Matrix perturbation theory. Academic This work is partially supported by an NSF CAREER Award Press,Inc.,1990. CMMI-1452463 and an NSF grant CCF-1442635. Ruiyang [22] Y. Chen, Y. Chi, and A. J. Goldsmith, “Exact and stable covariance Song was visiting the H. Milton Stewart School of Industrial estimation from quadratic sampling via convex programming,” IEEE Trans.Info.Theory,vol.inrevision,2013. andSystemsEngineeringattheGeorgiaInstituteofTechnology while working on this paper. [23] S. Zhu, “A short note on the tail bound of wishart distribution,” arXiv:1212.5860,2012. REFERENCES APPENDIXA [1] G. Braun, S. Pokutta, and Y. Xie, “Info-greedy sequential adaptive COVARIANCESKETCHING compressedsensing,”toappearinIEEEJ.Sel.Top.Sig.Proc.,2014. [2] A.Ashok,P.Baheti,andM.A.Neifeld,“Compressiveimagingsystem We consider the following setup for covariance sketching. designusingtask-specificinformation,”AppliedOptics,vol.47,no.25, Suppose we are able to form measurement in the form of pp.4457–4471,2008. y =a(cid:124)x+wlikewehaveintheInfo-GreedySensingalgorithm. [3] J.Ke,A.Ashok,andM.Neifeld,“Objectreconstructionfromadaptive Suppose there are N copies of Gaussian signal we would like compressivemeasurementsinfeature-specificimaging,”AppliedOptics, vol.49,no.34,pp.27–39,2010. tosketch:x˜1,...,x˜N thatarei.i.d.sampledfromN(0,Σ),and we sketch using M random vectors: b ,...,b . Then for each [4] A.AshokandM.A.Neifeld,“Compressiveimaging:hybridmeasurement 1 M basisdesign,”J.Opt.Soc.Am.A,vol.28,no.6,pp.1041–1050,2011. fixed sketching vector bi, and fixed copy of the signal x˜j, we 5 acquire L noisy realizations of the projection result yijl via z = 1 (cid:88)N w b(cid:124)x˜ , E[z ]=0 and Var[z ]= σ2tr(Σ). (cid:124) i N ij i j i i NL y =b x˜ +w , l=1,...,L. ijl i j ijl j=1 We choose the random sampling vectors b as i.i.d. Gaussian i Wemayrecoverthetruecovariancematrixfromthesketches with zero mean and covariance matrix equal to an identity γ using the convex optimization problem (5). matrix. Then we average y over all realizations l=1,...,L ijl to form the ith sketch y for a single copy x˜ : ij j L (cid:124) 1 (cid:88) y =b x˜ + w . ij i j L ijl l=1 (cid:124) (cid:123)(cid:122) (cid:125) wij APPENDIXB Theaverageisintroducedtosuppressmeasurementnoise,which BACKGROUNDS can be viewed as a generalization of sketching using just one sample. Denote w = 1 (cid:80)L w , which is distributed as ij L l=1 ijl N(0,σ2/L). Then we will use average energy of the sketches as our data γ , i=1,...,M, for covariance recovery: i Lemma 1. [21] Let Σ, Σ(cid:98) ∈ Rp×p be symmetric,with γ (cid:44) 1 (cid:88)N y2. eigenvalues λ1 ≥ ··· ≥ λp and λˆ1 ≥ ··· ≥ λˆp respectively. i N ij E = Σ(cid:98) −Σ has eigenvalues e1 ≥ ··· ≥ ep. Then for each j=1 i∈{1,··· ,p}, Note that γ can be further expanded as i λˆ ∈[λ +e ,λ +e ]. i i p i 1 N N γi =tr(Σ(cid:98)Nbib(cid:124)i)+ N2 (cid:88)wijb(cid:124)ix˜j + N1 (cid:88)wi2j, (6) Lemma 2. [22] Denote A:Rn×n (cid:55)→Rm a linear operator j=1 j=1 and for X ∈Rn×n, A(X)={aTXa }m . Suppose the mea- i i i=1 surement is contaminated by noise η ∈Rm, i.e. Y =A(Σ)+η where N and assume (cid:107)η(cid:107) ≤ (cid:15) . Then with probability exceeding 1 (cid:88) (cid:124) 1 1 Σ(cid:98)N = N x˜jx˜j, 1−exp(−c1m) the solution Σ(cid:98) to the trace minimization (5) j=1 satisfies is the maximum likelihood estimate of Σ (and is also unbiased). (cid:107)Σ−Σ (cid:107) (cid:15) We can write (6) in vector matrix notation as follows. Let (cid:107)Σ(cid:98) −Σ(cid:107)F ≤C1 √rr ∗ +C2m1, γ = [γ ,···γ ](cid:124). Define a linear operator B : Rn×n (cid:55)→ RM 1 M such that [B(X)]i = tr(Xbib(cid:124)i). Thus, we can write (6) as a for all Σ ∈ Rn×n, provided that m > c0nr. c0, c1, C1 and linear measurement of the true covariance matrix Σ C2 are absolute constants and Σr represents the best rank-r approximation of Σ. When Σ is exactly rank-r r γ =B(Σ)+η, (cid:15) where η ∈RM contains all the error terms and corresponds to (cid:107)Σ(cid:98) −Σ(cid:107)F ≤C2m1. the noise in our covariance sketching measurements, with the ith entry given by Lemma 3. [23] If X ∈Rn×n ∼W (N,Σ), then for t>0, n ηi =b(cid:124)i(Σ(cid:98)N −Σ)bi+ N2 (cid:88)N wijb(cid:124)ix˜j + N1 (cid:88)N wi2j. P{(cid:107)N1 X−Σ(cid:107)≥((cid:114)2t(θN+1) + 2Ntθ)(cid:107)Σ(cid:107)}≤2nexp(−t), j=1 j=1 where θ =tr(Σ)/(cid:107)Σ(cid:107). Note that we can further bound the (cid:96) norm of the error term 1 as M M (cid:88) (cid:88) (cid:107)η(cid:107)1 = |ηi|≤(cid:107)Σ(cid:98)N −Σ(cid:107)b+2 |zi|+w, i=1 i=1 where APPENDIXC M b=(cid:88)(cid:107)b (cid:107)2, E[b]=Mn, Var[b]=2Mn, PROOFS i i=1 1 (cid:88)M (cid:88)N 2Mσ4 w = w2, E[w]=Mσ2/L, and Var[w]= , N ij NL2 Lemma 4. Suppose the power of measurement in the kth step i=1j=1 is β . If δ ≤3σ2/4β , δ ≤4δ . k k−1 k k k−1 6 (cid:124) (cid:124) Proof: Let A(cid:98)k =akak, and (cid:107)A(cid:98)k(cid:107)=βk, Proof: Let A(cid:98)k =akak. E =E + Σk−1aka(cid:124)kΣk−1 − λˆkaka(cid:124)kΣˆk−1 Ek =Ek−1+λˆ2kA(cid:98)k k k−1 a(cid:124)k−1Σk−1ak+σ2 βkλˆk+σ2 · a(cid:124)kEk−1ak β λˆ a E a (β λˆ +σ2)(β λˆ +σ2−a(cid:124)E a ) δk ≤δk−1+ (β λˆ +σ2)k(βkλˆk +k−σ12−k a(cid:124)E a) ·(cid:107)A(cid:98)kΣ(cid:98)k−1(cid:107) k k λˆ k k k k−1 k k k1 k k k k−1 − β λˆ +σ2−ka(cid:124)E a ·(A(cid:98)kEk−1+Ek−1A(cid:98)k) + k k k k−1 k β λˆ +σ2−a(cid:124)E a 1 ·[λˆkk((cid:107)Ak(cid:98)kEk−1(cid:107)]+k (cid:107)kE−k1−1kA(cid:98)k(cid:107))+(cid:107)Ek−1A(cid:98)kEk−1(cid:107)] + βkλˆk+σ2−a(cid:124)kEk−1akEk−1A(cid:98)kEk−1. β2λˆ2δ ≤δ + k k k−1 Note that rank(A(cid:98)k)=1, thus rank(A(cid:98)kEk−1)≤1, therefore it k−1 (β λˆ +σ2)(β λˆ +σ2−β δ ) has at most one nonzero eigenvalue, k k k k k k−1 β + β λˆ +σ2k−β δ [2λˆkδk−1+δk2−1] |tr(A(cid:98)kEk−1)|=|tr(Ek−1A(cid:98)k)| k k β kλˆk−1 =(cid:107)A(cid:98)kEk−1(cid:107)≤(cid:107)A(cid:98)k(cid:107)(cid:107)Ek−1(cid:107)=βkδk−1. ≤(1+3β λˆ +σk2−k β δ )δk−1 Note that Ek−1 is symmetric and Aˆk is positive semi-definite, k k k k−1 β we have tr(Ek−1A(cid:98)kEk−1)≥0. Hence, + k δ2 . βkλˆk+σ2−βkδk−1 k−1 tr(Ek)=tr(Σ(cid:98)k)−tr(Σk) Now that δk−1 ≤ 43βσk2, we have δk ≤4δk−1. ≥tr(Ek−1)− (β λˆ3β+kσλˆk2)((ββkλˆλˆk++2σσ322)−δkβ−1δ ). k k k k k k−1 Therefore, Lemma 5. Consider positive semi-definite matrix X ∈Rn×n, β λˆ2 3β λˆ δ for h∈Rn, if tr(Σk)≤tr(Σk−1)− β λˆk+kσ2 + β λˆ +kσ2k−k−β1δ . k k k k k k−1 1 (cid:124) Y =X− Xhh X, h(cid:124)Xh+σ2 we have rank(X)=rank(Y). Lemma 7. Denote θ = tr(Σ)/(cid:107)Σ(cid:107) ≥ 1, rank(Σ) = s, M = cns, if Proof: 36c2n4s2(cid:107)Σ(cid:107) 24cn2s N ≥4n1/2tr(Σ)( + ) Apparently, ∀x∈ker(X), Yx=0, i.e. τ2 τ and ker(X)⊂ker(Y). cs 1 6cns L≥max{ σ2, σ2, σ2}, Apply a decomposition for the positive semi-definite matrix (cid:112) 4n(cid:107)Σ(cid:107) 2tr(Σ)(cid:107)Σ(cid:107)csn3 τ X = Q(cid:124)Q. For ∀x ∈ ker(Y), let b = Qh, z = Qx. If b = 0, √ √ Y =X; otherwise, when b(cid:54)=0, we have thenwithprobabilityexceeding1−2/n−2/ n−2nexp(− n) z(cid:124)bb(cid:124)z we have (cid:107)η(cid:107)1 ≤τ. (cid:124) (cid:124) 0=x Yx=z z− . b(cid:124)b+σ2 Thus, z(cid:124)bb(cid:124)z b(cid:124)b Proof: From Chebyshev’s inequality, we have that (cid:124) (cid:124) z z = ≤ z z. b(cid:124)b+σ2 b(cid:124)b+σ2 τ 36M2σ2tr(Σ) P{|z |< }≥1− , i 6M NLτ2 Therefore z = 0, i.e. x ∈ ker(X), ker(Y) ⊂ ker(X). This shows that ker(X) = ker(Y), which leads to rank(X) = σ2 τ 72σ4M P{w <M + }≥1− , rank(Y). L 6 NL2τ2 and Lemma 6. If δ ≤ λˆ , the true conditional covariance √ 2 k−1 k P{b<(M + M)n}≥1− . matrix Σ of the signal x conditioned upon the measurements n k √ y ,...,y is related to the previous iteration as follows: 1 k Let δ =τ/[3n(M + M)]. When Σ tr(Σk)≤tr(Σk−1)− βkλˆβkkλˆ+2kσ2 + βkλˆk3+βkσλˆ2k−δk−β1kδk−1. N ≥4n1/2tr(Σ)(36n2Mτ22(cid:107)Σ(cid:107) + 24nτM) 7 with with Lemma 3, we have where (1) follows from the Hadamard’s inequality and (2) follows from the mean inequality. Finally, we can bound the P{(cid:107)Σ(cid:98)N −Σ(cid:107)≤δΣ} conditional entropy of the signal as (cid:114) 2n1/2(θ+1) 2θn1/2 ≥P{(cid:107)Σ(cid:98)N −Σ(cid:107)≤( N + N )(cid:107)Σ(cid:107)} H[x|yj,aj,j ≤k]=ln(2πe)s/2Vol(Σk) √ k >1−2nexp(− n), s (cid:89) ≤ ln{2πe( f )tr(Σ )}. 2 j 0 when j=1 cs 1 6cns L≥max{ σ2, σ2, σ2}, (cid:112) 4n(cid:107)Σ(cid:107) 2tr(Σ)(cid:107)Σ(cid:107)csn3 τ we have Proof of Corollary 1: Let θ =tr(Σ)/(cid:107)Σ(cid:107)≥1. We have that for some constant δ >0, when τ 1 0 P{|z |< }≥1− √ , i 6M M n (cid:107)Σ(cid:107) 4 L≥4n1/2tr(Σ)( + ), τ 1 δ2 δ P{w < }≥1− √ , 0 0 3 n with Lemma 3, we have √ 2 P{|b|<(M + M)n}≥1− n. P{(cid:107)Σ(cid:98) −Σ(cid:107)≤δ0} (cid:114) Therefore, (cid:107)η(cid:107) ≤τ holds with probability at least 1−2/n− 2n1/2(θ+1) 2θn1/2 √ 1 √ ≥P{(cid:107)Σ(cid:98) −Σ(cid:107)≤( + )(cid:107)Σ(cid:107)} 2/ n−2nexp(− n). L L √ Proof of Theorem 1: Recall that for k = 1,...,K, >1−2nexp(− n). λˆ ≥ ε2/χ2(p). With Lemma 4 provided in the Appendix, k n we can show that for some constant 0 < ζ < 1, if δ ≤ 0 ζε2/(4K+1χ2(p)), for the first K measurements, n 1 ζε2 1 3σ2 ProofofTheorem2: Recallthatrank(Σ)=s,λs+1(Σ)= δk ≤ 4K−k 4χ2(p) ≤ 4K−k 4β , k =1,...,K. ··· = λn(Σ) = 0. Notice that for each step of iteration, the n 1 the eigenvalue of Σ(cid:98)k in the direction of ak, which corresponds By applying the result in Lemma 6, we have to the largest eigenvalue of Σ(cid:98)k, is eliminated below threshold. β λˆ Therefore, as long as the sequential algorithm continues, the tr(Σk)≤≤ttrr((ΣΣk−1))−−((11−−ζζ))βkλˆβkkkλˆ+kkσ2λtrk(Σk−1). leaanirgdgeenLsvetamleumiegaeon4fv,aΣ(cid:98)lu.eNoofwΣ(cid:98)thkatisδ0ex≤act4lys1+1thχe2nε(2p()kw+ith1)tLhemlamrgaes1t k−1 βkλˆk+σ2 s |λˆ −λ (Σ)|≤δ , for k =1,...,s, k k 0 Recall that fk =1− 1−s ζ β λˆβkλˆ+kσ2, |λˆj|≤δ0 ≤ χ2nε(2p) −δs, for k =s+1,...,n. k k Notice that in the ideal case with no perturbation, the aim of we have each measurement is to decrease the eigenvalue of a particular tr(Σ )≤f tr(Σ ). k k k−1 direction to ε2/χ2(p). Suppose in the ideal scenario, the n Subsequently, algorithm stops after K ≤s steps of iteration. Hence, k ε2 tr(Σk)≤((cid:89)fj)tr(Σ0). λ1(Σ)≥···≥λK(Σ)> χ2(p), n j=1 ε2 Lemma 5 shows that the rank of the covariance will not λ (Σ)≤···≤λ (Σ)≤ . s K+1 χ2(p) be changed by updating the covariance matrix sequentially: n rank(Σ1) = ··· = rank(Σk) = s. Hence, we may decom- Therefore, the power needed in the ideal case is pose the covariance matrix Σ = QQ(cid:124), with Q ∈ Rn×s k btre(iQng(cid:124)Qa)f=ullt-rra(QnkQm(cid:124))a,trwixe,htahveen Vol(Σk) = det(Q(cid:124)Q). Since Pideal =(cid:88)K (χ2nε(2p) − λ 1(Σ))σ2. k k=1 s Vol2(Σ )=det(Q(cid:124)Q)(≤1) (cid:89)(Q(cid:124)Q) In the noisy case, for the first K steps of measurements, 1≤ k jj k ≤K, we choose the power j=1 (≤2)(tr(Qs(cid:124)Q))s =(tr(sΣk))s, βk =σ2(χ2ε(2p)1−δs − λˆ1k). n 8 We have σ2 ε2 λˆ = −δ . β λˆ +σ2 k−1 χ2(p) s k−1 k−1 n For the steps K+1≤k ≤s, 1 1 β =max{0,σ2( − )} k ε2 −δ λˆ χ2(p) s k n 1 1 ≤σ2( − ) ε2 −δ ε2 +δ χ2(p) s χ2(p) 0 n n (4s+1)δ 20χ2(p) ≤σ2 0 ≤ n σ2. ( ε2 +δ )( ε2 −4sδ ) 51 ε2 χ2(p) 0 χ2(p) 0 n n With Lemma 1, all eigenvalues of Σ are no greater than K ε2 ε2 −δ +λ (E )= . χ2(p) s 1 s χ2(p) n n And the total power s (cid:88) P = β mismatch k k=1 (cid:88)K 1 1 20(s−K)χ2(p) ≤σ2{ ( − )+ n }. ε2 −δ λˆ 51 ε2 k=1 χ2(p) s k n In order to achieve precision ε and confidence level p, the extra power needed is upper bounded as P −P mismatch ideal (cid:88)K 1χ2(p) 1 20(s−K)χ2(p) ≤σ2{ ( n +δ )+ n } 3 ε2 0λ2 51 ε2 k=1 k 1 ε2 (cid:88)K 1 20s−3K χ2(p) ≤σ2{ + n } 4s+1χ2(p) λ2 51 ε2 n k=1 k 20 1 1 χ2(p) <( s−( − )K) n σ2 51 17 4s+1 ε2 20 1 χ2(p) ≤( s+ K) n σ2. 51 272 ε2 Proof of Theorem 3: Let θ = tr(Σ)/(cid:107)Σ(cid:107) ≥ 1. With Lemma7,letτ =Mδ /C ,thechoiceofM,N,andLensures 0 2 √ that(cid:107)η(cid:107) ≤Mδ /C withprobabilityatleast1−2/n−2/ n− 1√ 0 2 2nexp(− n)). By applying Lemma 2 and noting that the rank of Σ is exactly s, we have (cid:107)Σ(cid:98) −Σ(cid:107)F ≤δ0. √ Therefore, with probability exceeding 1 − 2/n − 2/ n − √ 2nexp(− n))−exp(−c c ns), 0 1 (cid:107)Σ(cid:98) −Σ(cid:107)≤(cid:107)Σ(cid:98) −Σ(cid:107)F ≤δ0. 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.