ebook img

Adaptive-Rate Compressive Sensing Using Side Information PDF

2.7 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Adaptive-Rate Compressive Sensing Using Side Information

1 Adaptive-Rate Compressive Sensing Using Side Information Garrett Warnell, Sourabh Bhattacharya, Rama Chellappa, and Tamer Bas¸ar Abstract—We provide two novel adaptive-rate compressive observations that are simultaneously used to infer both the sensing (CS) strategies for sparse, time-varying signals using foregroundandthesceneactivity.Thesecondadaptivemethod side information. Our first method utilizes extra cross-validation we present determines scene activity using observations that measurements, and the second one exploits extra low-resolution come from a secondary visual sensor. Both methods utilize measurements.UnlikethemajorityofcurrentCStechniques,we 4 do not assume that we know an upper bound on the number a compressive sensing (CS) [1] [2] [3] [4] [5] camera as the 1 of significant coefficients that comprise the images in the video primary modality. While many such sensors are beginning to 0 sequence. Instead, we use the side information to predict the emerge [6], our methods are specifically developed for a fast 2 number of significant coefficients in the signal at the next time variant of a spatially multiplexing camera such as the single- n instant. For each image in the video sequence, our techniques pixel camera [7] [8]. a specifyafixednumberofspatially-multiplexedCSmeasurements In this paper, we consider the following basic scenario: a J to acquire, and adjust this quantity from image to image. Our strategies are developed in the specific context of background CS camera is tasked with observing a region for the purpose 3 subtractionforsurveillancevideo,andweexperimentallyvalidate of obtaining foreground video. Since the foreground often ] the proposed methods on real video sequences. occupiesonlyarelativelysmallnumberofpixels,Cevheretal. V Index Terms—Compressive sensing, cross validation, oppor- [9] have shown that a small number of compressive measure- C tunistic sensing, background subtraction mentsprovidedbythiscameraaresufficienttoensurethatthe . foreground can be accurately inferred. However, the solution s c I. INTRODUCTION provided in that work implicitly relies on an assumption that [ Visual surveillance is a task that often involves collecting is pervasive in the CS literature: that an upper bound on the 1 a large amount of data in search of information contained in sparsity (number of significant components) of the signal(s) v relativelysmallsegmentsofvideo.Forexample,asurveillance under observation is known. Such an assumption enables the 3 system tasked with intruder detection will often spend most use of a static measurement process for each image in the 8 of its time collecting observations of a scene in which no video sequence. However, foreground video is a dynamic 5 0 intruders are present. Without any such foreground objects, entity: changes in the number and appearance of foreground . the corresponding surveillance video is useless: it is only objects can cause large changes in sparsity with respect to 1 0 the portions of video that depict these unexpected objects time. Underestimating this quantity will lead to the use of 4 in the environment that are useful for surveillance. However, a CS system that will provide too few measurements for an 1 because it is unknown when such objects will appear, many accurate reconstruction. Overestimating signal sparsity, on the v: systems gather the same amount of data regardless of scene other hand, will require the collection of more measurements i content. This static approach to sensing is wasteful in that than necessary to achieve such a reconstruction. For example, X resources are spent collecting unimportant data. However, it consider Figure 1. The true foreground’s (Figure 1(a)) recon- r is not immediately clear how to efficiently acquire useful data structionispoorwhentoofewcompressivemeasurementsare a since the periods of scene activity are unknown in advance. collected (Figure 1(b)), but looks virtually the same whether If this information were available a priori, a better scheme or not an optimal or greater-than-optimal number of mea- would be to collect data only during times when foreground surements are acquired (Figures 1(c) and 1(d), respectively). objects are present. Therefore,dependentonthenumberofmeasurementsacquired In any attempt to do so, the system must make some sort at each time instant, the static CS approach is insufficient at of real-time decision regarding scene activity. However, such worst and wasteful at best. We provide in this paper novel, adaptive-rate CS strate- a decision can be made only if real-time data to that effect is available. We shall refer to such data as side information. gies that seek to address this problem. The approaches we present utilize two different forms of side information: cross- Broadly, this information can come from two sources: a sec- validationmeasurementsandlow-resolutionmeasurements.In ondarymodalityand/ortheprimaryvideosensoritself.Inthis eachcase,weusetheextrainformationinordertopredictthe paper, we develop two adaptive sensing schemes that exploit number of foreground pixels (sparsity) in the next frame. sideinformationthatcomesfromanexampleofeach.Ourfirst strategy employs a single video sensor to continuously make A. Related Work G.WarnellandR.ChellappaarewiththeUniversityofMarylandCollege Adapting the standard CS framework to a dynamic, time- Park,CollegePark,MD varyingsignalissomethingthathasbeenstudiedfromvarious S.BhattacharyaiswithIowaStateUniversity,Ames,IA T.Bas¸ariswiththeUniversityofIllinoisUrbana-Champaign,Urbana,IL perspectives by several researchers. 2 (a) (b) (c) (d) Fig.1. Foregroundreconstructionwithvaryingmeasurementrates.(a)isthetrueforeground,(b)istheforegroundreconstructionwhentoofewmeasurements areused,(c)isthereconstructionwhenanoptimalnumberofmeasurementsareused,and(d)isthereconstructionwhenmorethantheoptimalnumberof measurementsareused. Wakin et al. [10], Park and Wakin [11], Sankaranarayanan new measurement before the signal has significantly changed. et al. [12], and Reddy et al. [13] have each proposed video- Recent adaptive-rate work by Yuan et al. [32] and Schaeffer specific versions of CS. Each one leverages video-specific et al. [33] sidesteps this problem by using a static spatial signal dynamics such as temporal correlation and optical measurementrateandconsideringhowtoadaptivelyselectthe flow. For measurement models that provide streaming CS temporalcompressionratethroughbatchanalysis.Incontrast, measurements, Sankaranarayan et al. [14], Asif and Romberg we propose here techniques that specify a fixed number of [15], and Angelosante et al. [16] have proposed adaptive CS spatially-multiplexed measurements to acquire before sensing decoding procedures that are faster and more accurate than the signal at a given time instant and modify this quantity those that do not explicitly model the video dynamics. between each acquisition without assuming that the signal Vaswani et al. [17] [18] [19], Cossalter et al. [20], and remains static between acquisitions. That is, we consider a Stankovic et al. [21] [22] propose modifications to the CS system in which the decoding procedure is fixed and we are decoding step that leverage extra signal support information able to change the encoding procedure, which is fundamen- in order to provide more accurate reconstructions from a tallydifferentfromthepreviously-discussedworkonadaptive fixed number of measurements. More generally, Scarlett et decoding procedures (e.g., that of Vaswani et al. [17] [18] al. [23] provide generic information-theoretic bounds for any [19]). support-adaptive decoding procedure. Malioutov et al. [24] and Boufonous et al. [25] propose decoders with adaptive B. Organization stopping criteria: sequential signal estimates are made until Thispaperisorganizedasfollows.InSectionII,weprovide either a consistency or cross-validation criterion is met. a brief overview of CS. Sections III and IV contain a precise Several researchers have also considered adaptive encod- formulationofandcontextforourrate-adaptiveCSalgorithms. ing techniques. These techniques primarily focus on finding OurmeasurementacquisitiontechniqueisdescribedinSection and using the “best” compressive measurement vectors at V. The proposed adaptive rate CS techniques are discussed in each instant of time. Ashok et al. [26] propose an offline Sections VI and VII, and they are experimentally validated procedure in order to design entire measurement matrices in Section VIII. Finally, we provide a summary and future optimized for a specific task. Similarly, Duarte-Carvajalino et research directions in Section IX. al. [27] compute class-specific optimal measurements offline, but decide which class to use using an online procedure with a fixed number of measurements. Purely online procedures II. COMPRESSIVESENSING include those developed by Averbuch et al. [28], Ji et al. Compressive sensing is a relatively new theory in sensing [29], Chou et al. [30], and Haupt et al. [31]: the next-best which asserts that a certain class of discrete signals can be measurement vectors are computed by optimizing criterion adequately sensed by capturing far fewer measurements than functions that seek to minimize quantites such as posterior the dimension of the ambient space in which they reside. By entropy and expected reconstruction error. Some of these “adequately sensed,” it is meant that the signal of interest can methods use a fixed measurement rate, while others propose beaccuratelyinferredusingthemeasurementsacquiredbythe stopping criterion similar to several of the adaptive decoding sensor. procedures. Inthispaper,weuseCSinthecontextofimaging.Consider Some of the above methods exhibit an adaptive measure- a grayscale image F ∈ RN×N, vectorized in column-major ment rate in that they stop collecting measurements when order as f ∈ RN2. A traditional camera uses an N × N certain criteria are met. However, due to the dynamic nature array of photodetectors in order to produce N2 measurements of video signals, it may not be possible to evaluate these of F: each detector records a single value that defines the criteria (as they often involve CS decoding) and collect a corresponding component of f. If we are instead able to 3 gather measurements of a fundamentally different type, CS N ×N, and X ∈ RN×N will denote the specific image at t theory suggests that we may be able to determine f from timet.VectorizingX usingcolumn-majororderasx ∈RN2 t t far fewer than N2 of them. Specifically, these compressive allows us to write the compressive measurement process at measurements record linear combinations of pixel values, i.e., time t as y =Φ x . t t t ξ =Φf, where Φ∈CM×N2 is referred to as a measurement We will present two adaptive sensing strategies that will matrix and M <<N2. each exploit a different type of side information. The first CStheorypresentsthreegeneralconditionsunderwhichthe strategy uses a small set of cross-validation measurements, aboveclaimisvalid.First,f shouldbesparseorcompressible. χ ∈ Rr obtained from a static linear measurement operator t In general, a vector is said to be sparse if very few of its Ψ ∈ Cr×N2, i.e., χ = Ψx . Ψ here is referred to as t t components are nonzero; more precisely, vectors having no a cross-validation matrix. The second strategy we present more than s nonzero components are said to be s-sparse. A relies on a set of low-resolution measurements, Z ∈ RL×L t vector is said to be compressible if it is well-approximated by that we obtain via a secondary sensor that collects lower- asparsesignal,i.e.,ithasasmallnumberofcomponentswith resolution measurements of X . Such multi-camera systems t a large magnitude and many with much smaller magnitudes. arenotuncommoninthesurveillanceliterature(see,e.g.,[35] Second, the measurement matrix (encoder) should exhibit [36]). the restricted isometry property (RIP) of a certain order and Having established the above notation, the problem we constant. Specifically, Φ exhibits the RIP of order s with address in this paper is that of how to use the observations constant δs if the following inequality holds for all s-sparse yt, χt, and Zt to select a minimal value for Mt+1 that will f: ensure Φ gathers enough information to ensure accurate t+1 (cid:107)Φf(cid:107)2 reconstruction of the foreground (dynamic) component of the (1−δ )≤ 2 ≤(1+δ ) . (1) s (cid:107)f(cid:107)2 s high-resolution Xt. 2 WhilewewilldiscussproposedconstructionmethodsforaΦ that exhibits the RIP for specified s and δs in Section V, they IV. COMPRESSIVESENSINGFORBACKGROUND generally involve selecting M such that it exceeds a lower SUBTRACTION bound that grows with increasing s and decreasing δ . s Finally, an appropriate decoding procedure, ˆf = ∆(ξ,Φ), Wepresentourworkinthecontextoftheproblemofback- ground subtraction for video sequences. Broadly, background should be used. While many successful decoding schemes subtraction is the process of decomposing an image into have been discussed in the literature, we shall focus here on foregroundandbackgroundcomponents,wheretheforeground one in particular: usually represents the objects of interest in the environment ∆(ξ,Φ)=argmin(cid:107)z(cid:107) subject toΦz=ξ , (2) under observation. For our purposes, we shall adopt the 1 z∈RN2 following model for images x : t (cid:80) where the (cid:96) norm is given explicitly by (cid:107)z(cid:107) = |z(i)|. 1 1 i x =f +b , (3) With these three conditions in mind, CS theory provides t t us with the following result: for an s-sparse f measured with √ where b is an unknown but deterministic static component a Φ that exhibits the RIP of order 2s with δ ≤ 2−1, 2s of each image in the video sequence and f is a random ∆(ξ,Φ) will exactly recover f [34]. If f is compressible, a t variable. At time t, we estimate the locations of foreground similar result that bounds the reconstruction error is available. pixelsbycomputingthesetofindicesF ={i : |f (i)|≥τ}, Thus, by modifying the sensor and decoder to implement Φ t t for some pre-defined threshold τ. We further assume that and ∆, respectively, f can be adequately sensed using only the components of f that correspond to F are bounded in M <<N2 measurements. t t magnitude, i.e., |f (i)|≤1 for all i∈F . Sensors based on the above theory are still just beginning t t Throughoutthiswork,weshallassumethatthecomponents to emerge [6]. One of the most notable is the single-pixel of f are distributed as follows: camera [7], where measurements specified by each row of Φ t are sequentially computed in the optical domain via a digital (cid:40) U{[−1,−τ]∪[τ,1]} ,i∈F micromirror device and a single photodiode. Throughout the f (i)∼ t , (4) t N(0,σ2) ,i(cid:54)∈F remainder of this paper, we shall assume that such a device is b t the primary sensor. where each component is assumed to be independent of the others. We have approximated the intensity distribution of III. PROBLEMSTATEMENT those pixels not in F as a zero-mean Gaussian under the t We assume that we possess a CS camera that is capable of assumption that σ2 is much smaller than τ. b acquiring a variable number of compressive measurements at FollowingtheworkofCevheretal.[9],weseektoperform discrete instants of time. We denote the measurement matrix backgroundsubtractioninthecompressivedomain.Often,itis at time t by Φt ∈RMt×N2, and we construct it via a process thecasethattheforegroundoccupiesonlyaverysmallportion that depends only on our choice for M (see Section V). The of the image plane, i.e., |F | << N2. Given the foreground t t value used for M will be determined by the adaptive sensing model (4), this implies that f is compressible in the spatial t t strategypriortotimet.Theimagesweobservewillbeofsize domain. Therefore, if b is known, we can use it, (3), and 4 compressive image measurements y = Φ x to generate the For practical measurement matrices, we are only interested t t t following estimate of f : in the case where N2 ≥M (i.e., matrices for which compres- t sion actually occurs). Combining this requirement with (9) ˆf =∆(ξ ,Φ ) , (5) t t t yields the following lower bound for N2/s: where ξt =yt−βt and βt =Φtb. N2 log(N2)+ 1log( 2 ) 1+log(12) AswewilldiscussinSectionV,weconstructΦt bytaking ≥ s s 1−τg + δ . (10) asubsetofrowsfromafixedN2×N2matrix,Φ,andrescaling s δ2(1− δ) δ2(1− δ) 16 3 16 3 the result. We can therefore calculate β from β = Φb by t For s-sparse signals, the reconstruction guarantee that accom- similarly dropping components and rescaling. Noting (4), a panies ∆ requires that Φ exhibits the RIP of order 2s with maximum-likelihoodestimateofβcanbefoundbycomputing √ δ ≤ 2−1.Usingonlythesecondtermofthelowerbound the mean of compressive measurements of a background-only 2s in(10)andnotingthatthefirsttermisalwayspositive,wesee video sequence, i.e., that requiring such a δ means that s/N2 can be no greater 2s J than ∼0.0011. 1 (cid:88) β = J yj , (6) Inoursystem,s/N2representsthepercentageofforeground j=1 pixels in the image, and it is unreasonable to expect that this where y = Φx and |F | = 0 for all j in the summation. quantitywillneverexceed0.11%.Therefore,ifwewishtouse j j j These measurements can be obtained in advance by using the CS for compression (i.e., with a measurement matrix that has full sensing matrix, Φ, to observe the scene when it is known fewer rows than columns), we must design and use matrices that there is no foreground component. without the guarantee provided by the above result. However, that result is merely sufficient: in the next part, we will experimentally show that similarly-constructed matrices with V. SENSINGMATRIXDESIGN far fewer rows are indeed still able to provide measurements In this section, we will discuss our method for constructing that enable accurate sparse signal reconstruction. adaptive rate measurement matrices for the purpose of recov- eringsparsesignalsfromaminimalamountofmeasurements. B. Practical Sensing Matrix Design Based on Phase Dia- grams A. Theoretical Guarantees Given a candidate sensing matrix construction technique, In Section II, we presented a theoretical result from CS Donoho and Tanner [38] discuss an associated phase dia- literature that states that ∆ will exactly recover an s-sparse f √ gram: a numerical representation of how useful the generated from ξ if Φ exhibits the RIP of order 2s with δ2s ≤ 2−1. matrices are for CS. Specifically, the ratios M/N2 (signal One of the most prevalent methods discussed in the literature undersampling) and s/M (signal sparsity) are considered. A for constructing such matrices involves drawing each matrix phase diagram is a function defined over the phase space entry from a Gaussian distribution with parameters that de- (M/N2,s/M) ∈ [0,1]2. We discretize this space and per- pend on the number of rows that the matrix possesses. For form multiple sense-and-reconstruct experiments at each grid Φ∈RM×N2,thistechniquedefinesentriesφ asindependent ij point in order to approximate the phase diagram there: the realizationsofaGaussianrandomvariablewithzeromeanand valueofM/N2 providestheinformationnecessaryformatrix variance 1/M, i.e., construction, and s/M provides the information necessary to φ ∼N(0,1/M) . (7) generate random sparse signals. We make the approximation ij using the percentage of trials that result in successful signal Baraniuk et al. [37] provide the following theoretical result recovery, which we define as a normalized (cid:96) reconstruction 2 for this construction technique: for a given δ ∈ (0,1) and error of 10−3 or less. positiveintegersM ands,Φ∈RM×N2 constructedaccording Even though we cannot use the theoretical guarantee dis- to(7)exhibitstheRIPoforderswithδs =δ withprobability cussed earlier in this section, the first matrix construction exceeding techniqueweuseisbasedonrandomly-generatedmatricesthat 1−2e−c0(δ/2)M+s(log(eN2/s)+log(12/δ)) , (8) rely on independent realizations of a Gaussian random vari- able.Specifically,weusethefollowingconstructiontechnique: where c (x)=x2/4−x3/6. wegenerateΦ∈RN2×N2 bydrawingeachentryaccordingto 0 The scenarios discussed in this paper require us to find (7).Then,foragivenvalueofMt,weformthecorresponding the minimum M that will ensure the constructed matrix Mt×N2 matrix Φt via can successfully recover s-sparse signals. Therefore, we now (cid:115) consider the case where δ, s, and N2 are fixed. If we impose Φ = N2 Φ , (11) a lower bound, τg, on the probability of success given by (8), t Mt 1:Mt rearranging terms reveals that the theory requires whereΦ denotesthesubmatrixofΦcorrespondingtothe 1:Mt s[1+log(N2)+log(12)]+log( 2 ) first Mt rows. The scaling factor ensures that the relationship M ≥ s δ 1−τg . (9) between the variance and the number of rows defined in (7) δ2(1− δ) 16 3 is preserved. 5 We also analyze a second matrix construction technique B. Adaptive-Rate Compressive Sensing via Cross Validation based on the discrete Fourier transform (DFT). Specifically, Let s denote the true value of the foreground sparsity we generate Φ ∈ CN2×N2 by randomly permuting the rows t at time t, i.e., s = |F |. The method we present here t t of the DFT matrix and form Φ according to (11). t relies on an estimate of this quantity, which we denote as In this paper, we will make predictions regarding the spar- sˆ. Before sensing begins at time t, we assume f to be sˆ- t t t sityofthesignalsweareabouttoobserve.Givenaprediction sparse, and select the corresponding minimal M (and thus t s , we will seek the minimum M such that (11) generates a t t Φ ) according to the phase diagram technique described in t sensing matrix capable of providing enough measurements to SectionV.WethenuseΦ andΨtocollecty andχ .Using ensureaccuratereconstructionofs -sparsesignals.Inorderto t t t t thetechniquedescribedinSectionIV,wecanfindξ andform determine the mapping from st to Mt, we use the associated theforegroundestimateˆf(sˆt).Inasimilarfashion,wtecanalso phase diagram. We construct this diagram (see Figure 2) t find γ by subtracting a precalculated set of cross-validation during a one-time, offline analysis. Then, given s and a t t measurements of the static signal component, ζ = Ψb, from minimumprobabilityofreconstructionsuccessτ ∈(0,1),we d χ . Finally, we select sˆ based on the result of a multiple use the phase diagram as a lookup table to find the smallest hytpothesis test that usest+γ1 and ˆf(sˆt). value of M that yields at least a τ success rate for s -sparse t t t d t We formulate the multiple hypothesis test by first assuming signals. that we are able to observe e (f )2. We define the null sˆt t 2 hypothesis, H , as the scenario under which sˆ exceeds s . VI. METHODI:CROSSVALIDATION If this hypothe0sis is true, then f(sˆt) (i.e., the optimt al sˆ-sparste t t In this section, we describe a rate-adaptive CS method that approximation to f ) captures all s foreground pixels and t t utilizes a set of linear cross-validation measurements χ = t (sˆt −st) background pixels while neglecting the remaining Ψx .AnearlierversionofthisworkwaspresentedbyWarnell t (N −sˆ) background pixels. Using (4), it can be shown that t et al. [39]. e (f )2 isarandomvariablewithmean,µ ,andvariance,σ2, sˆt t 2 0 0 given by A. Compressive Sensing with Cross Validation µ =(N −sˆ)σ2 Let ξ ∈ CMt be a set of compressive measurements of a 0 t b t sparse signal ft ∈ RN2 obtained using Φt, i.e., ξt = Φtft. σ02 =2(N −sˆt)σb4 . (15) In this section, we will use ˆf(s) to denote the s-sparse point t We also define a set of hypotheses that are possible when estimate of this signal obtained using ∆(ξ ,Φ )(s), where ∆ t t H is not true. Let H ,k ∈ {sˆ + 1,...,N} describe the is defined as in (2) and ·(s) denotes a truncation operation sce0nario under which sk=k. Undter H , f(sˆt) cannot capture that sets all but the s largest-magnitude components of the t k t all k foreground pixels: it neglects the smallest (k −sˆ) of vector-valued argument to zero. t them and the (N−k) background pixels. Using (4), it can be Ward [40] bounds the error of the above estimate using shown that the mean, µ , and variance, σ2, of e (f )2 under a cross-validation technique that is based on the Johnson- k k sˆt t 2 these hypotheses are given by Lindenstrauss lemma [41]. At the same time ξ is collected, t we use a static cross-validation matrix Ψ∈Cr×N2 to collect 1 cross-validation measurements γ =Ψf . We construct Ψ by µ =(N −k)σ2+ (k−sˆ)(τ2+τ +1) t t k b 3 t drawingeachofitsentriesfromani.i.d.Bernoullidistribution σ2 =1(cid:2)(k−sˆ)2−(k−sˆ)(cid:3)(τ2+τ +1)2 with zero mean and variance 1/r. Such a construction leads k 9 t t to the following statement: for given accuracy and confidence 1 + (k−sˆ)(τ4+τ3+τ2+τ +1) parameters(cid:15)andρ(respectively),r ≥8(cid:15)−2log 1 rowssuffice 5 t to ensure that 2ρ +(cid:2)(N −k)2+2(N −k)(cid:3)σb4 2 (cid:107)f −ˆf(s)(cid:107)2 + (N −k)(k−sˆ)(τ2+τ +1)σ2−µ2 . (16) (1−(cid:15))2 ≤ t t 2 ≤(1+(cid:15))2 (12) 3 t b k (cid:107)γ −Ψˆf(s)(cid:107)2 t t 2 with probability exceeding 1−ρ. The hypothesis test can be succintly written as Lete (f ) denotetheoptimals-sparseapproximationerror s t p H :s <sˆ measured with respect to the (cid:96) norm, i.e., 0 t t p H :s =k (17) k t e (f ) =argmin(cid:107)f −z(cid:107) , (13) s t p t p (cid:107)z(cid:107)0≤s fork ∈{sˆt+1,...,N}.Letqk denotetheprobabilitydensity twhheefraectthteha(cid:96)pt-ˆfn(osr)misiss-gsipvaernseb,yth(cid:107)ex(cid:107)uppp=er((cid:80)boiu|nxd(ii)n|p()112/)p.cUansinbge fku∈nc{ti0o,nsˆfto+r e1sˆ,t.(.f.t),22Nu}n.dWerethweilalsesvuamlupattieonextphlaitciHtakssisumtrputeiofnosr t regardingtheformofq inSectionVIII.Theoptimaldecision extended to e (f )2 as follows: k s t 2 rule for (17) under the minimum probability of error criterion e (f )2 ≤(cid:107)f −ˆf(s)(cid:107)2 ≤(1+(cid:15))2(cid:107)γ −Ψˆf (cid:107)2 . (14) with an equal prior for each hypothesis is given by sˆt t 2 t t 2 t t 2 That is, the observable CV error can be used to upper bound k∗ = argmax q (cid:0)e (f )2(cid:1) . (18) k sˆt t 2 the unobservable optimal s-sparse approximation error. k∈{0,sˆ+1,...,N} 6 (a)Gaussian (b)Fourier Fig.2. PhasediagramsforGaussianandFouriermeasurementensembles.Colorcorrespondstoprobabilityofsuccessfulreconstruction(here,normalized(cid:96)2 errorbelow10−3). Assumingthatthesparsityofftisaslowly-varyingquantity, VII. METHODII:LOW-RESOLUTIONTRACKING we choose to set sˆ equal to what we believe s to be. If t+1 t In this section, we propose an adaptive method that utilizes k∗ = 0, it is our belief that sˆ > s , and we expect that the t t a much richer form of side information than the random error in ˆf(sˆt) to be very small. Therefore, we find the set of t projections of the previous section: low-resolution images, afonrdegsreotusˆnd en=tr|iFeˆs|f.oFrotrhiasnysigonthael,rFvˆatlu=e{oif k:∗,|fˆwt(sˆet)s(eit)|sˆ≥τ=}, Zt, that have been captured using a traditional (i.e., non- t+1 t t+1 compressive) camera. k∗. Unfortunately, it is impossible to directly observe e (f )2. sˆt t 2 A. Low-Resolution Measurements However, we can upper bound this quantity using the cross- validation measurements as specified in (14). Therefore, we We assume that the low- and high-resolution images, Z ∈ t propose the following modification to (18): RL×L and X ∈ RN×N(L < N), repectively, are related t (cid:16) (cid:17) by a simple downsampling operation. Let t = (cid:2)tx ty(cid:3)T k∗ = argmax q (1+(cid:15))2(cid:107)γ −Ψˆf(sˆt)(cid:107)2 . (19) Z Z Z k t t 2 denote the coordinates of a pixel in the image plane of k∈{0,sˆ+1,...,N} the low-resolution camera. If we use t = (cid:2)tx ty (cid:3)T to Observing that µk and σk2 are increasing functions of k, denotethecorrespondingcoordinateinthXeimageXplaneXofthe it is apparent that (19) will potentially yield a value of k∗ compressivecamera,theeffectofthedownsamplingoperation greater than that which would have been selected by (18). on coordinates is given by This will result in a higher-than-necessary measurement rate at time t+1, but it will not negatively impact the quality of t =(cid:20)D 0 −D2−1(cid:21)(cid:20)tZ(cid:21) , (20) ˆft(+sˆt1+1). X 0 D −D2−1 1 We term the strategy we have outlined above adaptive- where we assume the dowsampling factor, D = N/L, to be rate compressive sensing via cross validation (ARCS-CV) and an integer. Using (20), each pixel in Z maps to the center t summarize the procedure in Algorithm 1. of a unique D × D block of pixels in X . The effect of t the downsampling operation on image intensity is given by Algorithm 1 ARCS-CV for Background Subtraction averaging the intensities within this block, i.e., Require: Φ,Ψ,sˆ,β,ζ,σ2,τ t b 1 (cid:88) Select Mt using sˆt and the phase diagram lookup table Zt(tZ)= D2 Xt(tX) , Form Φ and β t t tX∈B(tZ) Obtain image measurements y , χ t t where the coordinates of the pixels in the block are given Compute foreground-only measurements ξ , γ Estimate foreground:ˆft(sˆt) =∆(ξt,Φt)(sˆt)t t explicitly as Compute k∗ using (19) B(t )={(tx −1)D+1,...,txD}× Z Z Z if k∗ =0 then {(ty −1)D+1,...,tyD}. sˆ =|Fˆ| Z Z t+1 t else sˆ =k∗ B. Object Tracking and Foreground Sparsity t+1 end if Given Z , we assume that we are able to track the fore- t ground objects. Specifically, we assume that at each time 7 a value has been selected, we use the method presented in Section V to select a minimal M and the corresponding t Φ . We then use Φ to collect compressive measurements of t t X and calculate ξ . Using this procedure, the ∆-generated t t estimate ˆf will obey t C e (f ) (cid:107)f −ˆf (cid:107) ≤ 0√sˆt t 1 , (24) t t 2 sˆ t where e (·) represents the optimal sˆ-sparse (cid:96) estimation sˆt 1 t 1 error[34].Thevalueoftheconstantin(24)isgivenexplicitly by √ 2−(2− 2)δ C = √ 2sˆt . (a) (b) 0 1−(1− 2)δ 2sˆt Fig.3. Illustrationofthedownsamplingandlow-resolutiontrackingprocess One criterion we will consider when selecting sˆ is the utilized by ARCS-LRT for a sample image from the PETS_2009 dataset. t ((a))correspondstothehigh-resolutionimageforwhichweseektoperform expected value of the(cid:110)(cid:96)2 reconstru(cid:111)ction error, i.e., we would compressiveforegroundreconstruction.((b))correspondstothelow-resolution like sˆ to minimize E (cid:107)f −ˆf (cid:107) . However, since the non- obtained by the secondary, non-compressive camera. The bounding box t t t 2 aroundthewomancorrespondstotheoutputofatrackingalgorithm. linearityof∆makesdeterminingthestatisticsofthatquantity very difficult,we insteadlook tominimize the right-handside of (24). It is easy to see that this quantity can be minimized index, we are able to estimate a zero-skew affine warp pa- by selecting sˆ as high as possible, which would provide no (cid:2) (cid:3)T t rameter p = p (1) ··· p (4) that maps coordinates in t t t compression. Therefore, inspired by results from the model- an object template image, T, to their corresponding location order selection literature [42] [43] [44], we penalize larger inZ .Usingt todenoteapixelcoordinateinT,p specifies t T t values of sˆ and instead propose to select sˆ by solving t t the corresponding coordinate in Z via t (cid:26) (cid:27) C e (f ) tZ =(cid:20)pt0(1) p 0(2) ppt((34))(cid:21)(cid:20)t1T(cid:21) . (21) sˆt =argsˆmin E 0√sˆsˆt 1 +λsˆ , (25) t t where λ is an importance factor that specifies the tradeoff We further assume that the time-evolution of p is governed t betweenlowreconstructionerrorandasmallsparsityestimate. by a known Markov dynamical system, i.e., Using the law of total expectation, the foreground model p =u (p ,η ) , (22) (4), and techniques similar to those used in Section VI, we t t t−1 t can rewrite (25) as for known u and i.i.d. system noise η . t t Let {t : i∈Z/4Z} be the set of corner coordinates of T i C in any order that traces its outline. Then, given pt, we can sˆt =argmin√0 [J0(sˆ)+J1(sˆ)]+λsˆ , (26) calculate the position of the tracked object’s bounding box in sˆ sˆ F using (21) and (20). We shall assume that the area of this where t boundingboxspecifiesthenumberofforegroundcomponents sˆ (cid:88)(cid:112) inft,i.e.,st.Ifthisareaisnotinteger-valued,wesimplyround J0 = 2/π(N −sˆ)σbqt(k) up. Using the well-known formula for the area of a polygon k=1 from its corner coordinates, st can be written as st = h(pt), (cid:88)N (cid:104) (cid:112) (cid:105) where J1 = (k−sˆ)(1+τ)/2+ 2/π(N −k)σb qt(k). (cid:12) (cid:12) h(pt)=(cid:12)(cid:12)(cid:12)(cid:12)D2[pt(1)pt(4)2−pt(2)pt(3)] (cid:88) T(i)(cid:12)(cid:12)(cid:12)(cid:12), (23) Wekt=ersˆm+1thestrategythatwehaveoutlineaboveasadaptive- (cid:12) i∈Z/4Z (cid:12) rate compressive sensing via low-resolution tracking (ARCS- andT(i)=txty −tytx .Above,(cid:100)·(cid:101)representstheceiling LRT) and summarize the procedure in Algorithm 2. i i+1 i i+1 function. Algorithm 2 ARCS-LRT for Background Subtraction From (23), it is clear that the distribution of the random variable st is a function of the distribution of pt. For the Require: Φ,sˆt,β,σb2,τ,λ remainder of this section, we will use qt(st) = p(st|pt) to Select Mt using sˆt and the phase diagram lookup table denote the corresponding probability mass function. Form Φt and βt Figure 3 illustrates the relationship between a typical high- Obtain image measurements yt, zt andlow-resolutionimagepairandshowsanexamplebounding Compute foreground-only measurements ξ t box found by a tracker using the low-resolution image. Estimate foreground:ˆft =∆(ξt,Φt) Compute low-resolution object track p t C. Sparsity Estimation Compute qt+1 via (22) and (23) Compute sˆ by solving (26) We now turn our attention to selecting a value to use for t+1 s , sˆ, on the basis of the previous image’s track, p . Once t t t−1 8 TABLEI PARAMETERVALUESUSEDINEXPERIMENTS σ2 τ Σ λ b convoy2 25452 0.1 diag([1.01.03.03.0]) 0.045 marker_cam 25452 0.1 diag([1.01.03.03.0]) 1.5 PETS2009_S2L1 25452 0.1 diag([1.01.03.03.0]) 0.15 VIII. EXPERIMENTS (a) (b) (c) We tested the proposed algorithms on real video se- quences captured using traditional cameras. The compressive, cross-validation, and low-resolution measurements were sim- ulated via software. The SPGL1 [45] [46] software package was used to implement the decoding procedure (2). Three video sequences were used: convoy2, marker_cam, and PETS2009_S2L1. convoy2 is a video of vehicles driving past a stationary camera. The vehicles comprise the fore- ground, and the foreground sparsity varies as a result of these vehiclessequentiallyenteringandexitingthecamera’sfieldof (d) (e) (f) view. marker_cam is a video sequence we captured using a surveillancecameramountedtothesideofourbuildingatthe University of Maryland, College Park. The sequence begins withasinglepedestrianwalkinginaparkinglot,withasecond pedestrian joining him halfway through the sequence. The two pedestrians comprise the foreground, and the foreground sparsity varies due to the entrance of the second pedestrian and the variation in each pedestrian’s appearance as he moves relativetothecamera.ThePETS2009_S2L1videosequence is a segment taken from the PETS 2009 benchmark data [47]. (g) (h) (i) Thissequenceconsistsoffourpedestriansenteringandexiting Fig. 4. Example images from the marker_cam, PETS2009_S2L1, and the camera’s field of view. Similar to marker_cam, the convoy2 (columns one, two, and three, respectively), video sequences. foreground sparsity changes as a function of the number and The first row contains the background images, the second row contains an imagewithbothforegroundandbackgroundcomponents,andthethirdimage appearance of pedestrians. Example images from each dataset containsthecorrespondingforegroundcomponent. are shown in Figure 4. densities by a normal distribution with mean and covariance A. Practical Considerations specified by (15) and (16) under H and H , respectively. 0 k ImplementationoftheARCSmethodspresentedinSections That is, we make the approximation q ≈ N(µ ,σ2). As VI and VII requires certain practical choices. In this part, k k k a consequence of this approximation, we observed that (19) we describe the choices we made that generated the results sometimes yielded a nonzero k∗ for sufficiently small cross- presented later in this section. Specific choices for parameter validation error upper bounds. However, when this upper values for each video sequence are given in Table I. bound is low, it is clear that we should select H . Therefore, 0 1) Foreground Model: The foreground model specified in we explicitly impose a selection of H for cross-validation (4) is parameterized by σ2 and τ. The value that should be 0 used for σ2 will depend obn the quality of the estimate of b error upper bounds that are less than µ0 by using b (cid:40) (or,moreaccurately,β inoursystem):thebetter(3)describes 0, (1+(cid:15))2(cid:107)γ −Ψˆf(sˆt)(cid:107)2 <µ images in the video sequence, the smaller σb2 can be. Since k∗∗ = k∗, (1+(cid:15))2(cid:107)γtt−Ψˆftt(sˆt)(cid:107)222 ≥µ00 (27) τ represents the foreground-background intensity threshold, its value depends on the value selected for σ2: τ should be inplaceof(19)inAlgorithm1,wherek∗ representsthevalue b set high enough to ensure that N(τ;0,σ2) is sufficiently low, obtained from (19). b but low enough to ensure that it does not neglect intensities 3) ARCS-LRT: The ARCS-LRT method of Section VI re- belonging to the foreground. quireslow-resolutionobjecttracksinordertoreasonaboutthe 2) ARCS-CV: The ARCS-CV algorithm developed in Sec- sparsityofthehigh-resolutionforeground.Inordertofocuson tionVIreliesonthehypothesistestspecifiedin(17).Whilewe theperformanceoftheadaptivealgorithm,wefirstdetermined areabletocalculatethefirst-andsecond-ordermomentsofs these tracks manually, i.e., by hand-marking bounding boxes t under the various hypotheses, the maximum-likelihood deci- aroundeachlow-resolutionforegroundimage.Weonlydidthis sionrule(19)requirestheentireprobabilitydensityfunctions, forimagesinwhichtheobjectwasfullyvisible.Weshallalso q , for each. In our implementation, we approximate these consider automatically-obtained tracks later in this section. k 9 We used u (p ,η ) = p +η to define the system t t−1 t t−1 t dynamics in (22) with η ∼ N(0,Σ) i.i.d. for each t, where t the value of Σ should vary with the expected type of object motion. Given this selection for u , p(p |p ) = N(p ;p ,Σ) t t t−1 t t−1 representsourbeliefaboutthenexttrackgiventhecurrentone. Due to the complexity of h in (23), it is difficult to obtain an exact form for p(s |p ). Therefore, we used the unscented t t−1 transformation [48] to obtain the first- and second-order mo- ments, µ and σ2 , respectively. We then approximated t+1 t+1 p(s |p ) using the pdf for a discrete approximation to the (a) t t−1 normal distribution with the computed mean and covariance. The sparsity estimator (26) requires values for both C and 0 λ. Since our phase diagram lookup table returns an M for t which ∆ recovers sˆ-sparse signals, we selected δ = 1/4 < √ t 2−1. We then selected a λ that provided a good balance between the reconstruction error and foreground sparsity. For each video sequence, we chose this value by trying out many and selecting one that provided a good balance between low reconstruction error and a low sparsity estimate. Finally, we must compute a solution to (26). To do so, we used MATLAB’s fminbmd function, which is based on (b) golden selection search and parabolic interpolation [49]. B. Comparitive Results In order to provide some context in which to interpret the results from our ARCS methods, we present them alongside those from the best-case sensing strategy: oracle CS. Oracle CS uses the true value of s as its sparsity estimate, which t is impossible to obtain in practice. We compare the average measurement rates and foreground reconstruction errors for the three methods (oracle, ARCS-CV, and ARCS-LRT) in Table II, and show the more detailed dynamic behavior in (c) Figure 5. Note that the measurement values reported for Fig. 6. Steady-state behavior for both ARCS algorithms using a video the ARCS algorithms include the necessary overhead for the sequenceconstructedbyrepeatingasingleimageselectedfromtheconvoy2 side information (i.e., the cross-validation and low-resolution dataset.Foreachalgorithm,twoexperimentalpathsareshown:onegenerated measurements). byinitializingthesparsityestimatesuchthatitistoosmall(s1 <<s),and theothergeneratedbyinitializingthesparsityestimatesuchthatitistoolarge We first observe that the ARCS-LRT algorithm uses a (s1>>s). significantly larger measurement rate than any of the others. This is due to the necessary overhead for the low-resolution side information. In our experiments, we used L = N/2, i.e. M is at least 25% of N2. A smaller L could be selected C. Steady-State Behavior t at the risk of poorer low-resolution tracking. The ARCS-CV algorithm performs much better in terms of measurement rate since the side-information overhead is relatively small (for all We analyzed the behavior of our ARCS methods when the datasets, r is less than 2% of N2). signal under observation is static (i.e., f = f for all t). To t ItcanalsobeseenthattheARCS-LRTsparsityestimatelags do so, we created a synthetic data sequence by repeating a behind the true foreground sparsity for those images in which single image in the convoy2 data set for which s = 1233. an object is entering or exiting the camera’s field-of-view but Figure6showsthebehaviorofeachalgorithmwhentheinitial not fully visible. The phenomenon is especially visible in the sparsity estimate, sˆ , is wrong. For each method, we ran 1 thirdcolumn(convoy2)ofFigure5.Itisduetothefactthat two experiments. For the first one, we initialized the sparsity wehavemanuallyimposedtheconditionthattheobjectcannot estimate using a value that was too low (sˆ = 0). For the 1 be tracked unless it is fully visible. This leads to the large second one, we initialized with a value that was too high spikes in foreground reconstruction error. However, when the (sˆ =2500). Note that both methods are able to successfully 1 objectbecomesfullyvisible,thelow-resolutiontracksprovide adapt to the true value of s, and the ARCS-LRT method the algorithm with enough information to monitor the high- adapts very quickly (requiring only a single image) due to resolution signal sparsity and the effect disappears. the immediate availability of the low-resolution track. 10 TABLEII EXPERIMENTALCOMPARISONOFADAPTIVECOMPRESSIVESENSINGMEASUREMENTSTRATEGIES(ORACLE,ARCS-CV,ARCS-LRT) Average#ofMeasurements(M¯/N2) AverageReconstructionError((cid:96)2) Oracle ARCS-CV ARCS-LRT Oracle ARCS-CV ARCS-LRT marker_cam 0.0598 0.0939 0.3356 1.4388 1.7802 1.8229 PETS2009_S2L1 0.1209 0.1530 0.4238 1.2811 1.6181 1.4911 convoy2 0.0997 0.1251 0.3627 1.6573 2.0296 2.6137 (a) (b) (c) (d) (e) (f) (g) (h) (i) Fig.5. PerformanceofadaptiveCSstrategiesforthemarker_cam(columnone),PETS2009_S2L1(columntwo),andconvoy2(columnthree)video sequences.Inthefirstrow,sˆtisusedtodenotethesparsityestimateusedbyeachstrategy.Inrowtwo,Mtisusedtodenotethetotalnumberofmeasurements thatmustbeacquired.The(cid:96)2 reconstructionerrorisplottedinrowthree. D. ARCS-LRT and Automatic Tracking subtraction. The first technique involves collecting side infor- mationintheformofasmallnumberofextracross-validation We also investigated the effect of using low-resolution measurements and using an error bound to infer underlying tracks obtained via an automatic method. To do so, we imple- mented a simple blob tracker in MATLAB for the convoy2 signalsparsity.Thesecondmethodusessideinformationfrom asecondary,low-resolution,traditionalcamerainordertoinfer sequence and used the resulting tracks in the ARCS-LRT the sparsity of the high-resolution images. In either case, we framework. A comparison of algorithm performance between used a pre-computed phase diagram as a lookup table to map using automatic tracks and our manually-marked tracks is sparsity estimates to minimal compressive measurement rates. shown in Figure 7. Given the negligible effect of the blob We validated these techniques on real video sequences using tracker on the behavior of ARCS-LRT, we would not expect practical approximations for theoretical quantities. moresophisticatedautomatictrackingtechniquestonegatively affect performance. This work provides a framework that allows for numerous extensions: IX. SUMMARYANDFUTUREWORK • Itmaybepossibletoachievemoreoptimalmeasurement We have described two techniques for using side informa- rates by modifying the decoder. For example, using tion to adjust the measurement rate of a dynamic compressive techniques like those developed by Vaswani et al. [19], sensing system. These techniques were developed in the the phase diagrams we use could be updated. specific context of using this system for video background • Inadditiontomodifyingthenumberofrows,thecontent

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.