1 Fast Integral Image Estimation at 1% measurement rate Kuldeep Kulkarni, Pavan Turaga Abstract—We propose a framework called ReFInE to directly obtain integral image estimates from a very small number of spatiallymultiplexedmeasurementsofthescenewithoutiterativereconstructionofanyauxiliaryimage,anddemonstratetheir practicalutilityinvisualobjecttracking.Specifically,wedesignmeasurementmatriceswhicharetailoredtofacilitateextremely fastestimationoftheintegralimage,byusingasingle-shotlinearoperationonthemeasuredvector.Leveragingapriormodel 6 for the images, we formulate a nuclear norm minimization problem with second order conic constraints to jointly obtain the 1 measurementmatrixandthelinearoperator.Throughqualitativeandquantitativeexperiments,weshowthathighqualityintegral 0 image estimates can beobtained usingour framework atvery lowmeasurement rates. Further,on astandard datasetof 50 2 videos, we present object tracking results which are comparable to the state-of-the-art methods, even at an extremely low measurementrateof1%. n a IndexTerms—Spatial-multiplexingcameras,Nuclear-normminimization,Lowsensingrates,Tracking,IntegralImages J (cid:70) 7 2 ] V 1 INTRODUCTION multiplexed measurements, where s is the sparsity of the C signal. However, a significant research shows that high- . In this paper, we study the problem of obtaining inte- s quality reconstruction is computationally intensive [5], [6], cgralimageestimatesfromemergingflexibleprogrammable [7], [8]. Hence, despite the promise of CS-based SMCs [ imaging devices. These novel imaging devices, often con- [2],[3],thecomputationalbottleneckofnon-linear,iterative 1sidered under the broad umbrella of spatial-multiplexing reconstruction has withheld their wide-spread adoption in vcameras (SMCs)[1], [2], [3] provide a number of benefits applicationswhichrequirefastinferenceofobjects,actions, 8in reducing the amount of sensing for portable and re- 5 and scenes. This has led to researchers exploring the source constrained acquisition. The imaging architectures 2 option of tackling inference problems directly from these in these cameras employ spatial light modulators like 7 pseudo-random multiplexed measurements [4], [9], [10], 0digital micromirror arrays to optically compute projections [11], [12], [13], [14] (more on these later in the section .of the scene. Mathematically, the projections are given 1 in related work). However, the ‘universal’ nature of such 0by y = φx, where x ∈ Rn is the image, y ∈ Rm, measurements has made it challenging to devise new or 6knownasthemeasurementvector,denotesthesetofsensed adopt existing computer vision algorithms to solve the 1projections and φ ∈ Rm×n is called measurement matrix inference problem at hand. : vdefined by the set of multiplexing patterns. The nature The need to acquire as less data as possible combined iof the acquisition framework enables us to deploy SMCs X with the limitations of pseudo-random multiplexers props in resource constrained settings, wherein one can employ r us to outline the following goal. The goal of this paper is m << n number of photon detectors to sense otherwise a to propose a novel sensing framework for SMCs such that high-resolution imagery [2], [4] and obtain a very small acquired measurements satisfy the following properties. 1) numberofmeasurements.Later,areconstructionalgorithm Themeasurementsarenotrandominnaturebutaretailored is used to recover the image x. However, reconstructing foraparticularapplication.2)Thenumberofmeasurements x from y when m < n is an ill-posed problem. Re- is 2 orders less than the number of pixels, so that SMCs searchers in the past have attempted to provide solutions based on our framework can be employed in resource by carefully designing the measurement matrix φ in the constrained applications. 3) A simple linear operation on hope of easier recovery of x from y. Recent compressive the measurement vector y yields a ‘proxy’ representation sensing(CS)theoryprovidesonepossiblesolutiontotackle (e.g integral images, gradient images) from which the the above mentioned ill-posed problem. According to CS required features are extracted for the application in hand, theory, a signal can be recovered perfectly from a small thus avoiding the computationally expensive iterative and number of m = O(s log(n)) such pseudo-random (PR) s non-linear reconstruction. In this paper, we focus on one such ‘proxy’ representa- • K. Kulkarni and P. Turaga are with the School of Arts, Media tion, integral images. Integral images are extremely attrac- and Engineering and School of Electrical, Computer and Energy tiverepresentationsinceHaar-likefeaturesandbox-filtered Engineering, Arizona State University. Email: [email protected], [email protected]. image outputs can be computed from integral images with a small and fixed number of floating point operations in constant time [15]. These advantages have led to their 2 widespread use inreal time applications likeface detection global dictionaries for entire images is not possible, and [15], pedestrian detection [16], object tracking [17], [18], hence the framework is not scalable. Goldstein et al.[23] [19], [20] and object segmentation [21]. designed measurement matrices called ‘STOne’ Transform Instead of setting a fixed number of measurements, we whichfacilitatefastlowresolution‘previews’justbydirect formulateanoptimizationproblemtominimizethenumber reconstruction, and the hope is that ‘previews’ are of high of measurements while incorporating the other two (1 and enough quality so that conventional methods for inference 3) properties of measurements (as mentioned above) in the tasks can be applied. Assuming a multi-resolutional signal constraints. Minimizing the number of measurements is model for natural images, Chang et al.[24] proposed an al- akin to minimizing the rank of the measurement matrix. gorithm to obtain measurements which have the maximum In more concrete terms, the problem is posed to jointly mutual information with natural images. minimize the rank of the measurement matrix, φ and b) Inference problems from CS videos: A LDS (Lin- learn the linear operator, L which when applied on the ear Dynamical System) based approach was proposed by measurement vector yields the approximate integral image, Sankaranarayanan et al.[4] to model CS videos and re- with the probabilistic constraint that the error between the cover the LDS parameters directly from PR measurements. approximate integral image and the exact integral image is However, the method is sensitive to spatial and view withinallowablelimitswithhighprobability.Bycontrolling transforms. Calderbank et al.[12] theoretically proved that theallowableerrorlimit,wecanobtainmeasurementmatrix one can learn classifiers directly from PR measurements, of the desired rank. Incorporating a wavelet domain prior andthatwithhighprobabilitytheperformanceofthelinear model for natural images combined with a relaxation (ex- kernel support vector machine (SVM) classifier operating plained in section 2) allows us to convert the probabilistic on the CS measurements is similar to that of the the constraint into a series of second conic constraints. Rank best linear threshold classifier operating on the original minimization is a NP-hard problem. Relaxing the objective data. A reconstruction-free framework was proposed by functiontonuclearnormallowstouseoff-the-shelfconvex Thirumalai et al.[9] to compute optical flow based on optimization tools to solve the problem and obtain the correlation estimation between two images, directly from measurement matrix and the linear operator. PR measurements. Davenport et al.[25] proposed a mea- Contributions: 1) The main contribution of the paper is surementdomainbasedcorrelationfilterapproachfortarget a novel framework to recover estimates of integral images classification. Here, the trained filters are first projected fromasmallnumberofspatiallymultiplexedmeasurements onto PR patterns to obtain ‘smashed filters’, and then the without iterative reconstruction of any auxillary image. We PR measurements of the test examples are correlated with dub the framework ReFInE (Reconstruction-Free Integral these smashed filters. Recently, Kulkarni et al.[13] and Image Estimation). 2) Leveraging the MGGD (multivari- Lohit et al.[14] extended the ‘smashed filter’ approach ate generalized Gaussian distribution) prior model for the to action recognition and face recognition respectively, vector of detailed wavelet coefficients of natural images, and demonstrated the feasibility and scalability of tackling we propose a nuclear norm minimization formulation to difficult inference tasks in computer vision directly from obtain a new specialized measurement matrix. We term the PR measurements. measurements acquired with such a measurement matrix, as ReFInE measurements. 3) On a large dataset of 4952 2 BACKGROUND images, we present qualitative and quantitative results to show that high quality estimates of integral images and In this section, we provide a brief background on the box-filtered outputs can be recovered from ReFInE mea- probability model for natural images, which we rely on surements in real-time. 4) We show object tracking results, in the paper, and introduce notations required to set up the which are comparable to state-of-the-art methods, on a optimization problem to derive measurement matrix and challenging dataset of 50 videos to demonstrate the utility above referred linear operator. of the box-filtered output estimates in tackling inference Probability Model of natural images: There is a rich problems from SMCs at 1% measurement rate. body of literature which deals with statistical modeling of Related Work: The related previous works in literature naturalimages.Werefertosomeworkswhicharerelatedto follow one of the two themes. Some attempt to tackle theprobabilitymodelweuseinthepaper.Manysuccessful inferenceproblemsdirectlyfromPRmeasurementswithout probability models for wavelet coefficients fall under the optimizing for the measurement matrix for the inference broad umbrella of Gaussian scale mixtures (GSM) [26], taskathand,someothersattempttooptimizemeasurement [27].Typicallythecoefficientspaceispartitionedintoover- matrix for a particular signal model so as to minimize lapping blocks, and each block is modeled independently reconstruction error. as a GSM, which captures the local dependencies. This a) Design of measurement matrix: A closely related implicitlygivesrisetoaglobalmodelofthewaveletcoeffi- work can be found in [22], wherein a framework is pro- cients.Buildingonthisframework,Lyuetal.[28]proposed posed to jointly optimize for a measurement matrix and an a field of Gaussian scale mixtures (FoGSM) to explicitly overcomplete sparsifying dictionary for small patches of model the subbands of wavelet coefficients, while treating images. Results suggest that better reconstruction results each subband independently. However, incorporating such can be obtained using this strategy. However, learning a general model for wavelet coefficient vector makes it 3 very difficult to compute the distribution for even simple approximate integral image, Iˆ from the measured vector functions like a linear function of the wavelet coefficient y = φx, just by applying a linear operator L on y, so vector. Therefore, it is vital to assume a prior model which that Iˆ = Ly. For reasons which will be apparent soon, can lead to tractable computation of the distribution. It is we assume L = H(φd)T, where φd ∈ Rm×n such that well-known that marginal distributions of detailed wavelet rank(φd) = rank(φ). We call φd as the dual of φ. Thus coefficients follow generalized Gaussian distribution [29]. by construction L∈Rn×m. The value at location i in the We extend this notion to multi-dimensions and model approximate integral image is given by Iˆ = hT(φd)Tφx. i i the vector of detailed wavelet coefficients by multivariate The distortion in integral image at location i is given by generalized Gaussian distribution (MGGD). d =Iˆ−I =hT((φd)Tφx−x).NotingQ=(φd)Tφ,and i i i i To put it formally, let UT ∈ Rn×n be the orthogonal n×nidentitymatrixbyI,thedistortionscanbecompactly matrixrepresenting the log (n) levelwavelet transform,so written as d = hT(Q−I)x. We call d = [d ,...,d ] as 2 i i 1 n that x=Uw, where w ∈Rn is the corresponding wavelet distortion vector. coefficient vector. Without loss of generality, we assume √ that all entries in the first row of√UT are 1/ n so that the 3 OPTIMIZATION PROBLEM first entry in w corresponds to n times the mean of all Our aim is to search for a measurement matrix φ of entries in x. Also we denote the rest n−1 rows in UT by √ minimum rank such that distortions, d are within allow- UT . Now we can write w = [ nx¯,w ], where x¯ is the i 2:n d able limits for all i, jointly, with high probability. Q by mean of x and w is the vector of detailed coefficients. d construction is the product of two matrices of identical As explained above, the probability distribution of w d ranks, φ and (φd)T. Hence, we have the relation, rank(Q) (MGGD) is given by = rank(φ). Inspired by the phase-lifting technique used in f(w)=K|Σ |−0.5exp(−(wTΣ−1w )β), (1) [31] and [32], instead of minimizing the rank of φ, we wd d wd d minimize the rank of Q. Now, we can formally state the whereΣ ,commonlyknownasthescattermatrix,isequal wd optimization problem as follows. torank(Σ )Γ((2+n−2)/2β)/Γ((2+n)/2β)timesthe wd covariancematrixofw ,β ∈(0,1],andK isanormalizing d minimize rank(Q) constant. For β =1, we obtain the probability distribution Q (3) for the well-known multivariate Gaussian distribution. In s.t P(|d |≤δ ,..,|d |≤δ ..,|d |≤δ )≥1−(cid:15), 1 1 i i n n the following we briefly provide a background regarding where δ ≥ 0 denotes the allowable limit of distortion the multivariate generalized Gaussian distribution. i at location i of integral image, and 0 < (cid:15) < 1. Once A linear transformation of the multivariate generalized Q∗ is found, we show later in the section that the SVD Gaussian random vector is also a multivariate generalized decomposition of Q∗ allows us to write Q∗ as a product Gaussian random vector. of two matrices of identical ranks, thus yielding both the Proposition 1: [30] Let u be the a n×1 multivariate generalized Gaussian random vector with mean µ ∈ Rn, measurement matrix, φ∗ and the desired linear operator, u and scatter matrix, Σ ∈Rn×n. Let A be a l×n full rank L∗. The constraint in (3) is a probabilistic one. Hence to u compute it, one needs to assume a statistical prior model matrix. Then the l × 1 random vector, v = Au has the for x. Using the model in 1, and its properties given in multivariate generalized Gaussian distribution with mean µ = Aµ ∈ Rl, and scatter matrix, Σ = AΣ AT ∈ proposition(1)and(2),wearriveatasolvableoptimization v u v u Rl×l. problem. Computation of probabilistic constraint in (3): Sub- If v is a univariate generalized Gaussian random variable, stituting for x, we can write the distortion at location i as thentheprobabilityofv fallingintherangeof[δ−µ ,δ+ v d = hT(Q − I)Uw. We let all the entries in the first µv], for δ ≥0, can be found in terms of lower incomplete i i √ row of φ and φd to be equal to 1/ n, so that one of gamma function. √ the m measurements is exactly equal to nx¯. Further, Proposition 2: If v is a univariate generalized Gaussian random variable with mean µ ∈ R and scatter matrix, we denote the rest m − 1 rows of the two matrices by v Σv ∈ R, then the probability of v falling in the range of φ2:m and φd2:m. Now, if we restrict φ2:m and φd2:m to be respectively equal to CUT and DUT for some C, D [−δ+µv,δ+µv], for δ ≥0, is given by, in Rm−1×n−1, then from b2a:nsic linear a2lg:nebra we can show P(|v−µv|≤δ)=2γ21β,Σδv(cid:118)(cid:117)(cid:117)(cid:116)ΓΓ((2231ββ))2β, (2) Itrhtanaitks(dPeia)=sy+ht1oTi,(wsPehe−ertehIa)OUt QT2i:snwt=hde, Pwmha+etrriexn1POw,i=than(adφlld2r:aimtns)kTe(Qφnt2)r:ime=s. equal to unity (see Appendix A for details). Hence we where γ(.,.) is the lower incomplete gamma function, and can replace the objective function in (3) by rank(P). Rank Γ(.) is the ordinary gamma function. minimizationisanon-convexproblem.Hencewerelaxthe Preliminaries: Let H ∈ Rn×n be the block Toeplitz objective function to nuclear norm, as is done typically. To matrix representing the integral operation so that the inte- compute the constraint in (3), one needs to first compute gral image, I = Hx ∈ Rn, and hT for i = 1,..,n be the the joint probability of d = [d ,..,d ], and then compute i 1 n rows of H. Hence, I equals hTx. We wish to recover the a n dimensional definite integral. Now that d ’s are linear i i i 4 combinations of w , it follows from proposition 1, that d since nuclear norm is non-smooth. The nuclear norm is d also has a multivariate generalized Gaussian distribution. smoothened by the addition of a square of Forbenius norm However,noclosedformforthedefiniteintegralisknown. of the matrix, and is replaced by τ(cid:107)P(cid:107) +1(cid:107)P(cid:107)2, where ∗ 2 F Hence, we relax the constraint by decoupling it into n τ > 0. The optimization problem with the smoothened independent constraints, each enforcing the constraint that objective function is given in 8. thedistortionataspecificlocationistobewithinallowable 1 limits with high probability, independent of the distortions (P2) minimize τ(cid:107)P(cid:107) + (cid:107)P(cid:107)2 P ∗ 2 F atotherlocations.Theoptimizationwithrelaxedconstraints (cid:20) (cid:21) (8) b −A (P) is thus given by s.t i i ∈K , i=1,..,n. ∆ i i Recently, many algorithms [33], [34] have been developed minimize (cid:107)P(cid:107) P ∗ (4) to tackle nuclear norm minimization problem of this form s.t P(|di|≤δi)≥1−(cid:15), i=1,..,n. in the context of matrix completion. We use SVT (singular value thresholding) algorithm [34] to solve (P2). Now, d = hT(P−I)UT w , is a linear combination of i i 2:n d SVT iteration to solve (P2): Here, we first briefly the entries of w . From the proposition 2, d has a one- d i describe the SVT algorithm for smoothened nuclear norm dimensional generalized Gaussian distribution with zero (cid:13) (cid:13) minimization with general convex constraints, and later we mean and scatter parameter, (cid:13)Σ1/2UT (P−I)Th (cid:13), and (cid:13) wd 2:n i(cid:13) show how we adapt the same to our problem P2 which the probability in equation 4 can be explicitly written as hasnsecond-orderconstraints.Letthesmoothenednuclear follows. norm with general convex constraints, be given as below. P(|di|≤δi) 1 (cid:118) 2β minimize τ(cid:107)P(cid:107) + (cid:107)P(cid:107)2 =2γ21β,(cid:13)(cid:13)(cid:13)Σ1w/d2UT2:nPThiδi−Σ1w/d2UT2:nhi(cid:13)(cid:13)(cid:13)2(cid:117)(cid:117)(cid:116)ΓΓ((2231ββ)) . P s.t fi(∗P)≤20, Fi=1,..,n, (9) (5) The optimization problem now can be rewritten as where fi(P) ≤ 0, i = 1,..,n denote the n convex constraints. Let F(P) = [f (P),..,f (P)]. The SVT 1 n minimize(cid:107)P(cid:107) s.t algorithm for the 9 with the modified objective function ∗ P (cid:118) 2βis given as below. 2γ21β,(cid:13)(cid:13)(cid:13)Σ1w/d2UT2:nPThiδi−Σ1w/d2UT2:nhi(cid:13)(cid:13)(cid:13) (cid:117)(cid:117)(cid:116)ΓΓ((2231ββ)) zPkk ==arPgP(cid:0)mzink−1τ+(cid:107)Pηk(cid:107)f∗(+Pk21)(cid:107)(cid:1)P,(cid:107)2Fi+=(cid:104)1z,k.−.,1n,F(P)(cid:105) (cid:41) 2 i i i i ≥1−(cid:15) . (10) where zk is a short form for [z k,..,z k], and P (q) de- 1 n i We compactly write Σ1/2UT PTh as A (P), notestheprojectionofqontotheconvexsetdefinedbythe Σ1w/d2UT2:nhi as bi and w(γd−1(2:1nδ,i1−(cid:15)))i21β (cid:114)ΓΓ((2231ββ))i as ycoiknsdtreaninottefsi(thPe)fi≤rs0t.nLeeltezmiken=ts[oyfikz,iskkia],nsdosthkiadtethneotveesctthoer ∆ . Plugging above in equation (5),2βand2rearranging terms, lastelementofz k.Letyk beashortformfor[y k,..,y k], i i 1 n we have and sk is a short form for [sk,..,sk]. To obtain a explicit 1 n form of the update equations in 10 for our problem, P1, minimize (cid:107)P(cid:107) P ∗ (6) let us consider the first equation of the same. F(P) for s.t (cid:107)A (P)−b (cid:107) ≤∆ i=1,..,n. P1isgivenby[b −A (P),∆ ,..,b −A (P),∆ ]T.We i i 2 i 1 1 1 n n n substitute for F(P), and after removal of the terms not We can rewrite the problem in conic form as below. involving P, we have 1 (P1) minimize (cid:107)P(cid:107) s.t Pk =arg min τ(cid:107)P(cid:107) + (cid:107)P(cid:107)2 +(cid:104)yk−1,b−A(P)(cid:105). P ∗ ∗ 2 F (cid:20) b −A (P) (cid:21) (7) P (11) i i ∈K , i=1,..,n, ∆ i Equation 11 can be rewritten as below. i where Ki is a second order cone Ki = {(xi,ti) ∈ Pk =arg min τ(cid:107)P(cid:107) + 1(cid:13)(cid:13)P−A∗(yk−1)(cid:13)(cid:13)2 Rn+1 : (cid:107)xi(cid:107) ≤ ti}. Let b = [b1,..,bn], and P ∗ 2 F A(P)=[A1(P),..,An(P)]. Let A∗ denote the adjoint of −1(cid:13)(cid:13)A∗(yk−1)(cid:13)(cid:13)2 +(cid:104)P,A∗(yk−1)(cid:105) the linear operator A. It is easy to recognize that the 2 F optimizationaboveisaconvexproblem,sincenuclearnorm +(cid:104)yk−1,b(cid:105)−(cid:104)yk−1,A(P)(cid:105). is convex and the constraints enforce finite bounds on the Removing the terms not involving P and noting that norms of affine functions of the decision variable, P and (cid:104)P,A∗(yk−1)(cid:105)=(cid:104)yk−1,A(P)(cid:105), we have the following. hence are also convex. Even though the constraints are second-order cone constraints, the standard second order Pk =arg min τ(cid:107)P(cid:107) + 1(cid:13)(cid:13)P−A∗(yk−1)(cid:13)(cid:13)2 . (12) conic programming methods cannot be used to solve (P1) ∗ 2 F P 5 Before we write down the solution to equation 12, we first the corresponding measurements, y = φ∗x as ReFInE define D , the singular value shrinkage operator. Consider measurements. The desired linear operator L∗ is given τ the SVD of a matrix X, given by X=WΣVT. Then for by HW Σ1/2. By construction, the approximate integral M M τ ≥ 0, the singular value shrinkage operator, Dτ is given image Iˆis given by Iˆ=L∗y =(HWMΣ1M/2)y. by Dτ(X)=WDτ(Σ)VT,Dτ(Σ)=diag({(σi−τ)+}), Computational Complexity: Since the length of the where t+ = max(0,t). The solution to equation 12 is given image x is n, the number of entries in P is n2. Hence, by Pk = Dτ(A∗(yk−1)). Now, it remains to calculate the dimension of the optimization problem P2 is n2. This A∗(yk−1). We achieve it according to the following. Con- means that if we are to optimize for a measurement matrix sider (cid:104)A∗(yk−1),P(cid:105). tooperateonanimageofsize,256×256,thedimensionof (cid:104)P,A∗(yk−1)(cid:105) the optimization problem would be 232! Optimizing over a large number of variables is computationally expensive, if n =(cid:104)A(P),yk−1(cid:105)=(cid:88)(cid:104)A (P),y k−1(cid:105) not impractical. Hence we propose the following subopti- i i mal solution to obtain the measurement matrix. We divide i=1 n n the image into non-overlapping blocks of fixed size, and (cid:88) (cid:88) = (cid:104)P,A∗(y k−1)(cid:105)=(cid:104)P, A∗(y k−1)(cid:105) sense each block using a measurement matrix, optimized i i i i i=1 i=1 for this fixed size, independently of other blocks. Let the Hence, we have A∗(yk−1) = (cid:80)n A∗(y k−1). Thus, the image, x be divided into B blocks, x1,x2,..,xB, each of i=1 i i a fixed size, f ×f, and φ ∈Rm×f2 be the measurement first equation of SVT iteration for our problem is given by f matrix optimized for images of size f × f, and (φd)T (cid:32) n (cid:33) f (cid:88) be the corresponding dual matrix. Then the ‘ReFInE’ Pk =D A∗(y k−1) . (13) τ i i measurements, y are given by the following, i=1 Using basic linear algebra, it can be shown that y1 φf 0 ··· 0 x1 Ade∗ita(yilsik).−1) = U2:n(Σw1/d2)Tyik−1hTi (see Appendix B for y =y...2=0... φ...f ·.·.·. 0... x...2. (16) WenowprovidetheprojectionontotheconvexconeK . i y 0 0 ··· φ x B f B The projection operator, P as derived in [35] is given as Ki follows. Oncethemeasurements,y areobtained,theintegralimage, Iˆis given by (x,t), (cid:107)x(cid:107)≤t, (φd)T 0 ··· 0 y f 1 PKi :(x,t)(cid:55)→ ((cid:107)20x(cid:107),(cid:107)x0+(cid:107))t,(x,(cid:107)x(cid:107)), −(cid:107)tx(cid:107)≤≤−t(cid:107)≤x(cid:107)(cid:107).x(cid:107),(14) Iˆ=H 0... (φdf...)T ·.·.·. 0... y...2. (17) To solve (P2), starting with (cid:20) yi0 (cid:21) = 0 for all i = 0 0 ··· (φdf)T yB s0 i 1,..n, the kth SVT iteration is given by (15). 4 EXPERIMENTS (cid:18) n (cid:19) Pk =Dτ (cid:80)U2:n(Σ1w/d2)Tyik−1hTi Before we can conduct experiments to evaluate our i=1 framework, we first need to estimate the parameters of (cid:20) yskiik (cid:21)=PKi(cid:18)(cid:20) yskiik−−11 (cid:21)+ηk(cid:20) b−i∆−iAi(Pk) (cid:21)(cid:19), tphreobparboibliatybimlitoydemloadnedloipntim1.izEinstgimmaetainsugrepmareanmtemteartsricoefstfhoer (15) anyarbitrarylargesizedimageblocksisnotpracticalsince where, A∗ are the adjoints of linear operators A . For the i i theformertaskrequiresanenormousamountofimagedata iterations(15)toconverge,weneedtochoosethestepsizes, and the latter requires prohibitive amount of memory and ηk ≤ 2 , where (cid:107)A(cid:107) is the spectral norm of the linear (cid:107)A(cid:107)2 2 computational resources. Hence, we fix the block size to transforma2tion A [34]. be 32 × 32 images. The scatter matrix Σ is a scalar Once the solution P∗ is found, using the relation noted multiple of the covariance matrix of w . Hewndce it suffices d earlier in the section, we have Q∗ = P∗ + 1O. Having to compute the covariance matrix. To this end, we first n obtainedQ∗,thetasknowistoexpressitintoaproductof downsampleallthe5011trainingimagesinPASCALVOC twomatricesofidenticalranks.Thisisdonealmosttrivially 2007 dataset [36] to a size of 32×32, so that n = 1024 as follows. Noting the singular value decomposition of Q∗ and then obtain the level 7 Daubechies wavelet coefficient as Q∗ = WMΣMVMT , where ΣM = diag{λ1,..,λM} vectors. We compute the sample covariance matrix of thus denote the diagonal matrix with the non-zero singular obtained wavelet coefficient data. For various values of β, values arranged along its diagonal, and WM and VMT are we evaluate the χ2 distance between the histograms of the the matrices whose columns are the left and right singular individual wavelet coefficients and their respective theo- vectors respectively. We can choose φ∗ = Σ1/2VT , so retical marginal distributions with the variances computed M M that ((φd)∗)T = W Σ1/2 and rank(φ∗) = rank((φd)∗) = above. We found for β = 0.68, the distance computed M M rank(Q∗). We, henceforth refer to φ∗ as ReFInE φ, and above is minimum. 6 Computing measurement matrix: To obtain a mea- of our framework in recovering good quality box-filtered surement matrix, we need to input a desired distortion outputestimates,weconductthefollowingexperiment.For vector δ to the optimization problem in (P2). The desired box filters of sizes 3×3, 5×5 and 7×7, we compute the distortion vector is computed according to the following. estimates of filtered outputs for the four images using their We first perform principal component analysis (PCA) on respective recovered integral image estimates. RSNR v/s the downsampled 5011 training images in the PASCAL measurement rate plots for different filter sizes are shown VOC 2007 dataset [36]. We use only the top 10 PCA in figure 3. It is evident that even for a remarkably low components as φ to ‘sense’ these images. We obtain the measurement rate of 1% , we obtain high RSNR box- desired distortion vector by first assuming φd = φ and filtered outputs. For a fixed measurement rate, expectedly calculating distortions, |dj| at each location for all training the RSNR increases with the size of the filter. This shows i images, j =1,..,5011. Now, the entry in location i of the the structures which are more global in nature are captured desired δ is given by the minimum value α, so that 95% better.Thisisparticularlytrueinthecaseof‘Plane’image. of the values, |dj|,j = 1,..,5011 are less than α. We use The high RSNR for this image hints at the absence of fine i (cid:15)=0.95 and solve (P2) to obtain P∗, and hence also Q∗. structures and homogeneous background. Further, for the The rank of ReFInE φ is simply the rank of Q∗. ‘Two Men’ Image, we also compare the heat maps of the Estimation of integral images: We show that good exact box-filtered outputs with the estimated ones. We fix quality estimates of integral images can be obtained using themeasurementrateto1%.Forfiltersizes3×3and7×7, our framework. To this end, we first construct ReFInE the exact box-filtered outputs are computed and compared measurement matrices of various ranks, M. We achieve with the box-filtered output estimates obtained using our this by considering the SVD of Q∗ obtained above. For a framework, and RG-CoSamP as well. The heat map visu- particular value of M, the ReFInE φ is calculated accord- alizations of the outputs are shown in figure 4. It is clear ing to φ = Σ1/2VT , where Σ = diag{λ ,..,λ } is that greater quality box-filtered output estimates can be M M M M 1 M a diagonal matrix with M largest singular values arranged recovered using our framework and the recovered outputs alongthediagonalandVT denotethecorrespondingrows retaintheinformationregardingmedium-sizedstructuresin M in VT. Its dual, φd, is calculated similarly. For each the images, while in the case of RG-CoSamP, the output is particularmeasurementrate,determinedbythevalueofM, all noisy and does not give us any information. theintegralimageestimatesarerecoveredfromM ReFInE measurements for all the 4952 test images in the PASCAL 5 TRACKING APPLICATION VOC 2007 dataset [36]. Similarly integral image estimates In this section, we show the utility of the framework are recovered from random Gaussian measurements by in practical applications. In particular, we show tracking first performing non-linear iterative reconstruction using results on 50 challenging videos used in benchmark com- the CoSamP algorithm [8] and then applying the integral parisonoftrackingalgorithms[37].Weemphasizethatour operationonthereconstructedimages.Thispipelineisused aim is not to obtain state-of-the-art tracking results but to asbaselinetocompareintegralestimates,andhenceforthis show that integral image estimates can be used to obtain referred to as ‘RG-CoSamP’. We then measure the recov- (cid:18) (cid:19) (cid:107)Iˆ(cid:107) robust tracking results at low measurement rates. To this eredsignal-to-noiseratio(RSNR)via20log F . 10 (cid:107)Iˆ−I(cid:107) end, we conduct two sets of tracking experiments, one F The average RSNR for recovered integral image estimates withoriginalresolutionvideosandonewithhighdefinition as well as the time taken to obtain integral images are videos. tabulated in the table 1. Our framework outperforms RG- Tracking with original resolution videos: We conduct CoSamPintermsofbothrecoverysignal-to-noiseratioand tracking experiments on original resolution videos at three time taken to estimate the integral image, at all measure- different measurement rates, viz 1.28%, 1%, and 0.49%. ment rates. This shows that ReFInE φ, the measurement In each case, we use the measurement matrix obtained for matrices designed by our framework, facilitate faster and block size of 32×32, and obtain ReFInE measurements moreaccuraterecoveryofintegralimageestimatesthanthe for each frame using the φ∗ obtained as above. Once, universalmatrices.Theaveragetimetakentoobtainintegral the measurements are obtained, our framework recovers image estimates in our framework is about 0.003s, which integral image estimates from these measurements in real amounts to a real-time speed of 300 FPS. Further, we time. The estimates are subsequently fed into the best randomly select four images (‘Two Men’, ‘Plane’, ‘Train’ performing Haar-feature based tracking algorithm, Struck and ‘Room’) from the test set (shown in figure 1(a), 1(b), [38] to obtain the tracking results. Henceforth, we term 1(c), 1(d)) and present qualitative and quantitative results this tracking pipeline as ReFInE+Struck. To evaluate our for individual images. Image-wise RSNR v/s measurement tracking results, we use the standard criterion of precision rate plots are shown in figure 2. It is very clear that for all curve as suggested by [37]. To obtain the precision curve, the four images, our framework clearly outperforms RG- we plot the precision scores against a range of thresholds. CoSamP in terms of RSNR, at all measurement rates. A frame contributes to the precision score for a particular Estimation of box-filtered outputs: It is well known threshold, α if the distance between the ground truth that box-filtered outputs of any size can efficiently com- location of the target and estimated location by the tracker puted using integral images [15]. To show the capability is less than the threshold, α. Precision score for given 7 Method ReFInE RG-CoSamP ReFInE RG-CoSamP ReFInE RG-CoSamP M (measurementratio) 20(0.005) 20(0.005) 40(0.01) 40(0.01) 60(0.015) 60(0.015) Timeins 0.0034 0.38 0.0036 0.58 0.0031 0.97 RSNRindB 38.95 -16.76 38.96 -11.22 38.96 -10.9 TABLE1 ComparisonofaverageRSNRandtimeforrecoveredintegralimageestimatesobtainedusingourmethodwith RG-CoSamP.OurframeworkoutperformsRG-CoSamPintermsofbothrecoverysignal-to-noiseratioandtime takentoestimatetheintegralimage,atallmeasurementrates. (a) (b) (c) (d) Fig.1. (Fourimages(L-R:‘TwoMen’,‘Plane’,‘Train’and‘Room’)arerandomlychosenforpresentingqualitative andquantitativeresults. ’Two Men’ image ’Plane’ image 40 70 35 60 ReFInE ReFInE 30 RG−CoSamP 50 RG−CoSamP dB25 dB n n 40 R i20 R i SN SN30 R15 R 20 10 5 10 00 0.01 0.02 0.03 0.04 00 0.01 0.02 0.03 0.04 Measurement rate = m/n Measurement rate = m/n (a) (b) ’Train’ image ’Room’ image 40 35 35 30 ReFInE ReFInE 30 RG−CoSamP 25 RG−CoSamP B25 B n d n d20 R i20 R i N N15 RS15 RS 10 10 5 5 0 0 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 Measurement rate = m/n Measurement rate = m/n (c) (d) Fig. 2. The figure shows variation of image-wise RSNR for recovered integral image estimates for the four images.Itisveryclearthatforallthefourimages,ourframeworkoutperforms‘RG-CoSamP’intermsofRSNR, atallmeasurementrates. 8 ’Two Men’ Image ’Plane’ Image 15 3 × 3 37 3 × 3 14 5 × 5 36 5 × 5 7 × 7 7 × 7 13 35 B B n d12 n d34 R i R i SN11 SN33 R R 10 32 9 31 8 30 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 Measurement rate Measurement rate (a) (b) ’Train’ Image ’Room’ Image 14 3 × 3 11 3 × 3 13 5 × 5 5 × 5 7 × 7 10 7 × 7 12 B B n d11 n d 9 R i R i N10 N S S 8 R R 9 7 8 7 6 0 0.01 0.02 0.03 0.04 0 0.01 0.02 0.03 0.04 Measurement rate Measurement rate (c) (d) Fig.3. ThefigureshowsthevariationofRSNRfortherecoveredbox-filteredoutputsusingReFInEwithmeasurementrate. Itisevidentthatevenfor1%measurementrate,weobtainhighRSNRbox-filteredoutputs.Forafixedmeasurementrate,the RSNRincreaseswiththesizeofthefilter.Thisshowsthestructuresglobalinnaturearecapturedbetter.Thisisparticularly trueinthecaseof‘Plane’image.ThehighRSNRforthisimagehintsattheabsenceoffinestructuresandhomogeneous background. 10 10 10 20 20 20 30 30 30 40 40 40 50 50 50 60 60 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 (a) 3×3 10 10 10 20 20 20 30 30 30 40 40 40 50 50 50 60 60 60 10 20 30 40 50 60 10 20 30 40 50 60 10 20 30 40 50 60 (b) 7×7 Fig. 4. Heat maps for box-filtered outputs for the ‘Two men’ image. Left to right: Exact output, ReFInE (m/n = 0.01), and RG-CosamP (m/n = 0.01). It is clear that greater quality box-filtered output estimates can be recovered from ReFInE measurementsandtherecoveredoutputsretaintheinformationregardingmedium-sizedstructuresintheimages,whilein caseofRG-CoSamP,theoutputisallnoisyanddoesnotgiveusanyinformation. 9 threshold is calculated as the percentage of frames which All sequences 80 contribute to the precision score. When precision scores are required to be compared with other trackers at one 70 particular threshold, generally threshold is chosen to be 60 equal to 20 pixels [37]. aawtwitPthhirdeetehcthiresariensoeganmedcioeufffrfelvoroeerc:naotTttihmhoeenerapetsrrrueraocrceriksmtiehorernsne,tscOhuraorrtvlaedecsssl,eafaroSnertdrpoualuocrrkettferc[ado3mm8ae]gp,waaiaornnersddkt Precision percentage345000 variousothertrackers,TLD[20],MIL[19],CSK[39],and 20 OOuurrss aatt MMRR == 11.%28 +% S +tr uSctkruck FCT [40] in figure 5. It is to be noted all the trackers Ours at MR = 0.49% + Struck Oracle Struck used for comparison utilize full-blown images for tracking 10 TCLSDK MIL and hence operate at 100% measurement rate. As can be FCT 0 seen clearly, ‘ReFInE+Struck’ at 1.28% performs better 0 5 10 15 20 25 30 35 40 45 50 Location error threshold than other trackers, MIL, CSK, TLD, and FCT and only (a) a few percentage points worse than Oracle Struck for all thresholds. In particular, the mean precision over all Fig. 5. ‘ReFInE+Struck’ at a measurement rate of 1.28% performs better than other trackers, MIL, CSK, TLD, and 50 sequences in the dataset [37] for the threshold of 20 FCT and only a few percentage points worse than Oracle pixels is obtained for ‘ReFInE+Struck’ at three different Struckforallthresholds.Evenatameasurementrateof1%, measurement rates and is compared with other trackers in ‘ReFInE+Struck’ performs nearly as well as TLD and CSK table2.Weobtainaprecisionof59.26%atameasurement trackerswhichoperateat100%measurementrate. of 1.28%, which is only a few percentage points less than precision of 65.5% using Oracle Struck and 60.8% using Tracker MeanPrecision MeanFPS ReFInEatMR=1.28%+Struck 59.26 19.61 TLD.Evenatanextremelylowmeasurementrateof0.49%, ReFInEatMR=1%+Struck 52.47 19.61 we obtain mean precision of 45.78% which is competitive ReFInEatMR=0.49%+Struck 45.78 19.62 whencomparedtoothertrackers,MIL,andFCTwhichop- OracleStruck[38] 65.5 20 TLD[20] 60.8 28 erate at 100% measurement rate. This clearly suggests that CSK[39] 54.11 362 the small number of well-tailored measurements obtained MIL[19] 47.5 38 using our framework retain enough information regarding FCT[40] 42.37 34.92 integralimagesandhencealsotheHaar-likefeatureswhich TABLE2 playacriticalroleinachievingtrackingwithhighprecision. Meanprecisionpercentagefor20pixelserrorandmean Frame rate: Even though, our framework uses Struck framespersecondforvariousstate-of-the-arttrackersare tracker, the frame rates at which ‘ReFInE+Struck’ operates comparedwithourframeworkatdifferentmeasurement arepotentiallylessthantheframeratethatcanbeobtained rates.Theprecisionpercentagesforourframeworkare with Oracle Struck, and can even be different at different stableevenatextremelylowmeasurementrates,and measurement rates. This is due to the fact that once the comparefavorablywithothertrackerswhichoperateat measurements are obtained for a particular frame, we first 100%measurementrate,i.eutilizeallthepixelsinthe have to obtain an intermediate reconstructed frame before frames. applying the integral operation. However, in the case of Oracle Struck, the integral operation is applied directly on the measured frame. The frame rate for ‘Our+Struck’ at different measurement rates are compared with the frame than TLD, CSK, MIL and FCT, whereas in the case of the rates for other trackers in table 2. However, as can be ‘Background Clutter’ and ‘Scale Variation’ attributes, TLD seen,thepreprocessingoperationtoobtaintheintermediate performs slightly better than ‘Our+Struck’ at measurement reconstructed frame barely affects the speed of tracking rate of 1.28%. since the preprocessing step, being multiplication of small- Figure 7 shows the corresponding plots for attributes, sized matrices can be efficiently at nearly 1000 frames per ‘Deformation’, ‘Fast Motion’, ‘Motion Blur’, and ‘Low second. Resolution’. In the cases of ‘Deformation’, ‘Fast Motion’ Experiments with sequence attributes: Each video and ‘Motion Blur’, ‘ReFInE+Struck’ at measurement rate sequence in the benchmark dataset is annotated with a of 1.28% performs better than TLD, CSK, MIL and FCT, set of attributes, indicating the various challenges the whereas in the case of ‘Low Resolution’, TLD performs video sequence offers in tracking. We plot precision per- better than ‘ReFInE+Struck’. centages against the location error threshold for each of Figure 8 shows the corresponding plots for attributes, these 10 different kinds of attributes. Figure 6 shows the ‘In the Plane rotation’, ‘Out of View’, and ‘Out of Plane corresponding plots for attributes, ‘Illumination Variation’, rotation’. ‘Background Clutter’, ‘Occlusion’, and ‘Scale Variation’. Tracking with high resolution videos: Tracking using In the case of ‘Illumination Variation’ and ‘Occlusion’ high-resolution videos can potentially lead to improvement ‘Our+Struck’atmeasurementrateof1.28%performsbetter in performance due to availability of fine-grained informa- 10 Illumination Variation (25 sequences) Background Clutter (21 sequences) 70 80 60 70 60 50 Precision percentage3400 Precision percentage345000 20 OOuurrss aatt MMRR == 11.%28 +% S +tr uSctkruck 20 OOuurrss aatt MMRR == 11.%28 +% S +tr uSctkruck Ours at MR = 0.49% + Struck Ours at MR = 0.49% + Struck Oracle Struck Oracle Struck 10 TCLSDK 10 TCLSDK MIL MIL FCT FCT 0 0 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Location error threshold Location error threshold (a) (b) Occlusion (29 sequences) Scale Variation (28 sequences) 80 80 70 70 60 60 Precision percentage345000 Precision percentage345000 20 OOuurrss aatt MMRR == 11.%28 +% S +tr uSctkruck 20 OOuurrss aatt MMRR == 11.%28 +% S +tr uSctkruck Ours at MR = 0.49% + Struck Ours at MR = 0.49% + Struck Oracle Struck Oracle Struck 10 TCLSDK 10 TCLSDK MIL MIL FCT FCT 0 0 0 5 10 15 20 25 30 35 40 45 50 0 5 10 15 20 25 30 35 40 45 50 Location error threshold Location error threshold (c) (d) Fig. 6. Precisionplotsforfourdifferentattributes.Inthecaseof‘IlluminationVariation’and‘Occlusion’‘ReFInE+Struck’at measurementrateof1.28%performsbetterthanTLD,CSK,MILandFCT,whereasinthecaseofthe‘BackgroundClutter’ and‘ScaleVariation’attributes,TLDperformsslightlybetterthan‘ReFInE+Struck’atmeasurementrateof1.28%. tion regarding the scene. However, in many applications, the measurement rate wrt original resolution, which we the deployment of high-resolution sensors is severely lim- call effective measurement rate (EMR), is given by the ited by the lack of storage capacity. In such scenarios, it ratio of the number of ReFInE measurements per frame will be interesting to see if the small number of ReFInE to the number of pixels in a frame of original resolution measurements of high-resolution videos can yield better video. Here, the tracking algorithm, Struck which we used tracking performance than the full-blown low-resolution for original resolution videos does not scale well in terms videos.Toconducttrackingexperimentsonhighresolution of computational complexity. For higher resolution videos, videos, we first employ a deep convolutional network where the search space is much larger, we found that based image super resolution (SR) algorithm, SRCNN, Struck is too slow for real-time application. Instead, we [41] to obtain high resolution frames of the 50 videos use a faster Haar feature based tracking algorithm, FCT considered earlier in the section. The aspect ratio for all [40] algorithm. Henceforth, we dub this tracking pipeline frames is maintained, and the upscaling factor for image as SR+ReFInE+FCT. Once tracking results are obtained super resolution is calculated such that the resolution of for the high resolution videos are obtained, we normalize the longer dimension in the higher resolution frame is at the coordinates so as to obtain the tracking outputs with least 1000 pixels. We found that upscale factors varies respect to original resolution videos. The precision score is between2and8forvariousvideosinthedataset.Oncethe calculated as before. The mean precision percentage for 20 high resolution videos are obtained, we proceed to obtain pixels error and mean frames per second for ‘SR + ReFIne ReFInE measurements as before. We conduct tracking + FCT’ at various measurement rates are given in table experimentsatfourdifferentmeasurementrate(1%,0.49%, 3 and are compared for the same for ‘Oracle FCT’ which 0.29%, 0.2%). Note that these different measurement rates operatesforfull-blownoriginalresolutionvideos.Itisclear are with respect to (wrt) the high-resolution frames, and that we obtain a significant boost in tracking accuracy