ebook img

Motion Segmentation via Global and Local Sparse Subspace Optimization PDF

8 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Motion Segmentation via Global and Local Sparse Subspace Optimization

Motion Segmentation via Global and Local Sparse Subspace Optimization Michael Ying Yang1, Hanno Ackermann2, Weiyao Lin3, Sitong Feng2 and Bodo Rosenhahn2 Abstract—In this paper, we propose a new framework for have been developed to reconstruct the missing trajectories segmentingfeature-basedmovingobjectsunderaffinesubspace with their sparse representation. The drawback is that they model. Since the feature trajectories in practice are high- aresensitivetotherealvideowhichcontainsalargenumber dimensional and contain a lot of noise, we firstly apply the of missing trajectories. Most of the existing subspace-based sparse PCA to represent the original trajectories with a low- dimensional global subspace, which consists of the orthogonal methods still fall their robustness for handling missing fea- sparse principal vectors. Subsequently, the local subspace sep- tures. Thus, there is an intense demand to explore a new 7 1 aration will be achieved viaautomatically searching the sparse subspace-base algorithm that can not only segment multiple representationofthenearestneighborsforeachprojecteddata. 0 kinds of motions but also handle the missing and corrupted Inordertorefinethelocalsubspaceestimationresultanddeal 2 trajectories from the real video. withthemissingdataproblem,weproposeanerrorestimation n toencouragetheprojecteddatathatspanasamelocalsubspace A. Contributions a to be clustered together. In the end, the segmentation of J different motions is achieved through the spectral clustering We propose a new framework with subspace models for 4 on an affinity matrix, which is constructed with both the segmenting different types of moving objects from a video 2 error estimation and sparse neighbors optimization. We test under affine camera. We cast the motion segmentation as a our method extensively and compare it with state-of-the-art twostagesubspaceestimation:theglobalandlocalsubspace ] methods on the Hopkins 155 dataset and Freiburg-Berkeley V MotionSegmentationdataset.Theresultsshowthatourmethod estimation. Sparse PCA [7] is adopted for optimizing the C is comparable with the other motion segmentation methods, global subspace in order to defend the noise and outliers. and in many cases exceed them in terms of precision and Meanwhile, we seek a sparse representation for the nearest . s computation time. neighbors in the global subspace for each data point that c [ I. INTRODUCTION span a same local subspace. In order to solve the missing data problem and refine the local subspace estimation, we 1 In the past years, dynamic scenes understanding has propose an error estimation and build the affinity graph for v 4 been receiving increasing attention especially on the moving spectral clustering to obtain the clusters. To the best of our 4 camera or multiple moving objects. Motion segmentation knowledge, our framework is the first one to simultaneously 9 as a part of the video segmentation is an essential part optimizetheglobalandlocalsubspacewithsparserepresen- 6 for studying the dynamic scenes and many other computer tation. 0 vision applications [1]. Particularly, motion segmentation . 1 aims to decompose a video into different regions according The remaining sections are organized as follows. The 0 todifferentmovingobjectsthattrackedthroughoutthevideo. relatedworksarediscussedinSectionII.Thebasicsubspace 7 In case of feature extraction for all the moving objects from 1 modelsformotionsegmentationareintroducedinSectionIII. the video, segmentation of different motions is equivalent : The proposed approach will be described in detail in Sec- v to segment the extracted feature trajectories into different i tion IV. Furthermore, the experimental results are presented X clusters. One example of feature-based motion segmentation in Section V. Finally, this paper is concluded in Section VI. is presented in Figure 1. r a Generally, the algorithms of motion segmentation are II. RELATEDWORK classified into 2 categories [3]: affinity-based methods and During the last decades, either the subspace-based tech- subspace-based methods. The affinity-based methods focus niques [5], [6] or the affinity-based methods [3], [4] have oncomputingthecorrespondencesofeachpairofthetrajec- been receiving an increasing interest on segmentation of tories, whereas the subspace-based approaches use multiple different types of motions from a real video. subspacestomodelthemultiplemovingobjectsinthevideo Affinity-based methods. [4] uses the distances of each and the segmentation of different motions is accomplished pair of feature trajectories as the measurement to build through subspace clustering. Recently, some affinity-based the affinity matrix based on a translational motion model. methods [3], [4] are proposed to cluster the trajectories with This method can segment motions with unlimited number unlimitednumberofmissingdata.However,thecomputation of missing or incomplete trajectories, which means they timesofthemaresohighthatrequireanoptimizingplatform are robust to the video with occlusions or moving camera tobereduced.Whereas,thesubspace-basedmethods[5],[6] problems. Another approach which is based on the affinity is called Multi-scale Clustering for Motion Segmentation 1UniversityofTwente,[email protected] 3LeibnizUniversityHannover (MSMC)[3].Basedontheconventionalimageclassification 2ShanghaiJiaoTongUniversity technique split and merge, they use the correspondences Fig.1. Exampleresultsofthemotionsegmentationontherealtrafficvideocars9.avifromtheHopkins155dataset[2]. of each two features between two frames to segment the through K-nearest neighbors (KNN). After calculating the differentmotionswithmanymissingdata.Oneofthegeneral affinities of different estimated local subspaces with princi- problemsofaffinity-basedmethodishighlytime-consuming. ple angles, the final clusters are obtained through spectral They have to be implemented with an optimized platform in clustering. It comes to the issue that the KNN policy may order to save the computation times. overestimate the local subspaces due to noise and improper Subspace-based methods. The existing works based on selection of the number K, which is determined by the rank subspace models can be divided into 4 main categories: of the local subspace. LSA uses the model selection (MS) algebraic, iterative, sparse representation and subspace es- [11] to estimate the rank of global and local subspaces, but timation. the MS is quite sensitive to the noise level. Algebraicapproaches,suchasGeneralizedPrincipalCom- III. MULTI-BODYMOTIONSEGMENTATIONWITH ponent Analysis (GPCA) [8], which uses the polynomials SUBSPACEMODELS fitting and differentiation to obtain the clusters. GPCA can In this section, we introduce the motion structure under segmenttherigidandnon-rigidmotionseffectively,butonce affine camera model. Subsequently, we show that under the number of moving objects in the video increases, its affine model segmentation of different motions is equivalent computation cost increases and the precision decreases in to separate multiple low-dimensional affine subspaces from the same time. The general procedure of an iterative method a high-dimensional space. containstwomainaspects:findtheinitialsolutionandrefine the clustering results to fit each subspace model. RANdom A. Affine Camera Model SAmple Consensus (RANSAC) [9] selects randomly the Most of the popular algorithms assume an affine camera number of points from the original dataset to fit the model. model, which is an orthographic camera model and has a RANSACisrobusttotheoutliersandnoise,butitrequiresa simple mathematical form. It gives us a tractable represen- goodinitialparameterselection.Specifically,itcomputesthe tation of motion structure in the dynamic scenes. Under the residual of each point to the model with setting a threshold, affinecamera,thegeneralprocedureformotionsegmentation if the residual is below the threshold, it will be considered is started from translating the 3-D coordinates of each as inliers and vice versa. Sparse Subspace Clustering (SSC) moving object to its 2-D locations in each frame. Assume [5] is one of the most popular method based on the sparse that {x }p=1,...,P ∈R2 represents one 2-D tracked feature representation. SSC exploits a fact that each point can be fp f=1,...,F point p of one moving object at frame f, its corresponding linearly represented with a sparse combination of the rest of 3-D world coordinate is {X } ∈ R3. The pose p p=1,...,P otherdatapoints.SSChasoneofthebestaccuracycompared of the moving object at frame f can be represented with with the other subspace-based methods and can deal with (R ,T ) ∈ SO(3), where R and T are related to the f f f f the missing data. The limitation is that it requires a lot rotation and translation respectively. Thus, each 2-D point of computation times. Another popular algorithm based on x can be described with Equation 1 fp thesparserepresentationisAgglomerateLossyCompression (ALC) [6], which uses compressive sensing on the subspace x =[R T ]X =A X (1) fp f f p f p modeltosegmentthevideowithmissingorcorruptedtrajec- (cid:20) (cid:21) 1 0 0 tories. However, the implementation of ALC cannot ensure where Af = 0 1 0 [Rf Tf] ∈ R2×4 is the affine thatfindthe globalmaximumwiththegreedy algorithm.By transformation matrix at frame f. the way ALC is highly time-consuming in order to tune the parameter. B. Subspace models for Motion Segmentation under Affine View Our work combines the subspace estimation and sparse representationmethods.Thesubspaceestimationalgorithms, The general input for the subspace-based motion segmen- such as Local Subspace Affinity (LSA) [10], firstly project tation under affine camera can be formulated as a trajec- original data set with a global subspace. Then the projected tory matrix containing the 2-D positions of all the feature global subspace is separated into multiple local subspaces trajectories tracked throughout all the frames. Given 2-D locations {x }p=1,...,P ∈ R2 of the tracked features on a obtained through performing singular value decomposition fp f=1,...,F rigid moving object, the corresponding trajectory matrix can (SVD) for W. The solutions z∗ are fully observed, which be formulated as Equation 2 means they are constructed with all the input variables.   However, if the principal components z∗ are built with only x ··· x 11 1P a few number of original variables but still can represent . . . W2F×P = .. .. ..  (2) the original data matrix well, it should be easier to separate x ··· x the underlying local subspaces from the transformed global F1 FP subspace. The sparse PCA technique has been proved that it under affine model, the trajectory matrix W can be 2F×P is robust to the noise and outliers in terms of dimensionality further reformulated as Equation 3 reduction and feature selection [13], [14], which aims to   A seekalow-dimensionalsparserepresentationfortheoriginal 1 (cid:20) (cid:21) W = ..  X1 ··· XP (3) high-dimensional data matrix. In contrast to PCA, sparse 2F×P  .  1 ··· 1 PCA produces the sparse principal components that achieve A 4×P F 2F×4 the dimensional reduction with a small number of input we can rewrite it as following, variables but can interpret the main structure and significant information of the original data matrix. W =M ST (4) 2F×P 2F×4 P×4 Inordertocontaintheorthogonalityofprojectedvectorsin theglobalsubspace,weapplythegeneralizedpowermethod where M is called motion matrix, whereas S is 2F×4 P×4 forsparsePCA[15]totransformtheglobalsubspace.Given structure matrix. According to Equation 4, we can obtain the trajectory matrix W = [w ,...,w ]T, where w ∈ that under affine view the rank of trajectory matrix W 2F×P 1 F f 2F×P R2×P,f = 1,...,F contains all the tracked P 2-D feature of a rigid motion is no more than 4. Hence, as the trajectory points in each frame f. We can consider a direct single unit matrixisobtained,thefirststepisreducingitsdimensionality formasEquation6toextractonesparseprincipalcomponent with a low-dimension representation, which is called the z∗ ∈RP [7], [15]. globalsubspacetransformation.Subsequently,eachprojected trajectoryfromtheglobalsubspacelivesinalocalsubspace. z∗(γ)= max max (yTWz)2−γ(cid:107)z(cid:107) (6) Then the obstacle of multi-body motion segmentation is to 0 y∈BP z∈B2F separate these underlying local subspaces from the global where y denotes a initial fixed data point from the unit subspace,whichmeansthesegmentationofdifferentmotions Euclidean sphere BP = {y ∈ RP|yTy ≤ 1}, and γ > 0 is related with segmenting different subspaces. is the sparsity controlling parameter. If project dimension IV. PROPOSEDFRAMEWORK is m,1 < m < 2F, which means there are more than OurproposedframeworkextendstheLSA[10]withsparse one sparse principal components needed to be extracted, in optimization for both the global and local parts. As shown orderto enforcetheorthogonality forthe projectedprincipal in Figure 2, given a general trajectory matrix, we firstly vectors, [15] extends Equation 6 to block form with a trace transform it into a global subspace with Sparse PCA [7], function(Tr()), which can be defined as Equation 7 which is robust to noise and outliers. Furthermore, instead Z∗(γ)= max max Tr(Diag(YTWZN)2) of the KNN estimation we use the sparse neighbors to Y∈SP Z∈[S2F]m m automatically find the projected data points span a same m (7) (cid:88) subspace. To correct the overestimation and encourage the − γ (cid:107)z (cid:107) j j 0 projected data from the same subspace to be collected, we j=1 propose an error function to build the affinity matrix for spectral clustering. whereγ =[γ1,...,γm]T isapositivem-dimensionalsparsity controlling parameter vector, and parameter matrix N = A. Global Subspace Transformation Diag(µ ,µ ,...,µ ) with setting distinct positive diagonal 1 2 m Due to the trajectory matrix of a rigid motion has a elementsenforcestheloadingvectorsZ∗ tobemoreorthog- maximalrank4,mostpeoplechoosetheprojecteddimension onal, Smp ={Y ∈RP×m|YTY =Im} represents the Stiefel to be m = 4n or 5, where n is the number of the motions manifold1.Subsequently,Equation7iscompletelydecoupled in the video. Assume that the trajectory matrix is W , in the columns of Z∗(γ) as following, 2F×P where F is the number of frames and P is the number of m extractedtrajectories.ThetraditionalwaytoprojectW2F×P Z∗(γ)= max (cid:88) max (µ yTWz )2−γ ||z || (8) j j j j j 0 is Principal Component Analysis (PCA) [12], which can be Y∈SmP j=1zj∈S2F formed as following, Obviously,theobjectivefunctioninEquation8isnotconvex, z∗ = max zTΣz, (5) but the solution Z∗γ can be obtained after solving a convex zTz≤1 where Σ=WTW is the covariance matrix of W, solutions 1Stiefelmanifold:theStiefelmanifoldVk(Rn)isthesetofallorthonor- z∗ represent the principal components. Usually, PCA can be malk-framesinRn. High dimensional Low dimensional Low dimensional Sparse nearest neighbors searching with normalization ... ... ... Input: Video sequences (a) Global subspace transformation (b) Local subspace estimation Output: Segmented video sequences (d) Build affinity matrix for (c) error estimation spectral clustering Fig.2. Overviewoftheproposedframework. problem in Equation 9 subspace, which means an overestimation, especially for the videowhocontainsalotofdegenerated/dependedmotionsor m F Y∗(γ)= max (cid:88)(cid:88)(cid:2)(µ wTy )2−γ (cid:3) (9) missing data. Moreover, [17] has testified that the selection Y∈SP j i j j + of number K is quite sensitive, which depends on the rank mj=1i=1 estimation.Inthispaper,forthesakeofavoidingthesearch- which under the constraint that all γ > µ2max ||w ||2. j j i i 2 ingforonlynearestneighborsandsolvingtheoverestimation In [15], a gradient scheme has been proposed to efficiently problem we adopt the sparse nearest neighbors optimization solve the convex problem in Equation 9. Hence, the sparsity to automatically find the set of the projected data points that patternIforthesolutionZ∗ isdefinedbyY∗ afterEquation span a same local subspace. 9 under the following criterion, The assumption of sparse nearest neighbors is derived (cid:26) active, (µ wTy∗)2 >γ , from SMCE [18], which can cluster the data point from I= j i j j (10) 0, otherwise a same manifold robustly. Given a random data point xi that draw from a manifold M with dimension d , under l l As a result, the seeking sparse loading vectors Z∗ ∈ SP m the SMCE assumption, we can find a relative set of points are obtained after iteratively solving Equation 9. After N =x ,j (cid:54)=ifromM butcontainsonlyasmallnumberof i j l normalization, the global projected subspace W(cid:102)m×P = non-zero elements that passes through x . This assumption i normalize(Z∗)T isachieved,whichisembeddedwithmul- can be mathematically defined with Equation 11 tiple orthogonal underlying local subspaces. (cid:107)c [x −x ,...,x −x ](cid:107) ≤(cid:15), s.t 1Tc =1 (11) B. Local Subspace Estimation i 1 i P i 2 i In order to cluster the different subspaces according to where c contains only a few non-zero entries that denote i different moving bodies, the first step is finding out the the indices of the data point that are the sparse neighbors of multiple underlying local subspaces from the global sub- x from the same manifold, 1Tc =1 is the affine constraint i i space. Generally, the estimation of different local subspaces and P represent the number of all the points lie in the entire can be addressed as the extraction of different data sets, manifold. which contain only the projected trajectories from the same We apply the sparse neighbors estimation to find the subspace. One of the most traditional way is the local sam- underlying local subspaces in our transformed global sub- pling[10],whichusestheKNN.Specifically,theunderlying space. As shown in Figure 3, with the 6-nearest neighbors local subspace spanned by each projected data is found by estimation, there are four triangles have been selected to collecting each projected data point and its corresponding spanthesamelocalsubspacewithobserveddataα ,because i K nearest neighbors, which are calculated by the distances they are near to α than the other circles. While, the sparse i [10], [16]. However, the local sampling can not ensure that neighbors estimation is looking for only a small number all the extracted K-nearest neighbors truly span one same of data point that close to α , in this way most of the i Fig. 3. Illustration of 6-nearest neighbors and sparse nearest neighbors policy. The circles and triangles represent the data points from two different localsubspacesrespectively.Theredpointsdenotetheestimatedneighborsfortheobserveddataαi fromthesamelocalsubspaceunderthedeterminate searchingarea. intersectionareabetweenthedifferentlocalsubspacescanbe As investigated in SMCE [18], in order to build the affinity eliminated. In particular, we constraint the searching area of matrixwithsparsesolutionC wecanformulateasparse P×P the sparse neighbors for each projected trajectory from the weightmatrixΩ withvectorω ,whichisbuiltbyω = P×P i ii global subspace with calculating the normalized subspace 0,ω = cij/Xij ,j (cid:54)= i. The achieved weight matrix inclusion (NSI) distances [19] between them. NSI can give Ω ij con(cid:80)tati(cid:54)=niscoitn/lXytai fewnon-zeroentriesincolumn,which P×P us a robust measurement between the orthogonal projected givetheindicesofalltheestimatedsparseneighborsandthe vectors based on their geometrically consistent, which is distances between them. Hence, we can collect each data α i formulated as anditsestimatedsparseneighborsN intoonelocalsubspace i NSI = tr{αiTαjαjTαi} (12) S(cid:98)i according to the non-zero elements of ωi. ij min(dim(α ),dim(α )) i j C. Error Estimation where the input is the projected trajectory matrix W(cid:102)m×P = Although the sparse neighbors optimization can help us [α ,...,α ], and α and α ,i,j = 1,...,P represent two to avoid the intersection between different local subspaces, 1 P i j differentprojecteddata.ThereasonofusingNSIdistancesto it turned into quite sensitive and can’t ensure to carry all constraintthesparseneighborssearchingareaisthegeomet- the information about the underlying local subspaces under ric property of the projected global subspace. Nevertheless the missing data situation. The local subspace estimation the data vectors which are very far away from α definitely after the sparse neighbors searching can be illustrated with i can not span the same local subspace with α . Moreover, Figure 4. In Figure 4 the estimated local subspaces are not i in addition to save computation times, the selection for the completely spanned by each observed data and its corre- searching area with NSI distances is more flexible, which sponding sparse neighborhood. Obviously, there are some has a wide range of values, than tuning the fixed parameter neighbors have been estimated to span two different local K for nearest neighbors. subspaces, which can be called the overlapping estimation. Furthermore,alltheNSIdistancesarestackedintoavector Moreover, the obtained local subspaces with some over- X = [NSI ,...,NSI ]T, the assumption from SMCE lapping problems cannot carry the enough dissimilarity or i i1 iP in Equation 11 can be solved with a weighted sparse L similarity information between two local subspaces, which 1 optimization under affine constraint, which is formulated as can be used to build an affinity matrix that can separate the following different subspaces with spectral clustering. min(cid:107)Qici(cid:107)1 For purpose of estimating these overlapping and making (13) s.t (cid:107)X c (cid:107) ≤(cid:15),1Tc =1 a strong connection between the data points from the same i i 2 i localsubspace,weproposethefollowingerrorfunctionwith where Q is a diagonal weight matrix and defined as Q = i i Equation 14 exp(Xi/σ) ∈ (0,1],σ > 0. The effect of the positive- dexefip(n(cid:80)itte(cid:54)=miXaittr)i/xσQi isencouragingtheselectionoftheclosest eit =(cid:107)(I−β(cid:98)iβ(cid:98)i+)αt(cid:107)22,t=1,...,P (14) points for the projected data α with a small weight, which i meansalowerpenalty,butthepointsthatarefarawayto αi where β(cid:98)i ∈Rm×mi is the basis of estimated local subspace will have a larger weight, which favours the zero entries in S(cid:98)i,mi = rank(S(cid:98)i), which can be achieved through the + solution ci. We can use the same strategy as SMCE to solve SVD of S(cid:98)i, and β(cid:98)i is the Moore-Penrose inverse of β(cid:98)i, the the optimization problem in Equation 13 with Alternating I ∈ Rm×m is an identity matrix. Actually the geometrical direction method of multipliers (ADMM) [20]. meaningoftheerrorfunctione isthedistancebetweenthe it As a result, we can obtain the sparse solutions C = estimated local subspace and projected data. More specif- P×P [c ,...,c ]T with a few number of non-zero elements that ically, if the projected data α truly comes from the local 1 P t contain the informations and connections between the pro- subspace S(cid:98)i, the corresponding error eit should have a very jected data point and its estimated sparse neighborhoods. small value, which ideally is near to zero, and vice versa. assumedthatthenumberofmotionshasbeenalreadyknown. For the Hopkins 155 dataset, we give the exactly number of clusters according to the number of motions, while for the Berkeley dataset we set the number of clusters with 7 for all the test sequences. In this work, the constrained area for searching the sparse neighbors is firstly varied in a range variables [10,20,30,50,100], then it turns out that the tuned constrained area performs equally well from 20 to 50, so we choose to set the number with 20, which accordingtothealternativenumberofsparsenumbers.Inour experiments, we have applied the PCA and sparse PCA for evaluating the performance of our framework on estimating the multiple local subspaces from a general global subspace with dimension m = 5. The sparsity controlling parameter for sparse PCA is setted to γ = 0.01 and the distinct Fig.4. Thegeometricalillustrationofincorrectlocalsubspaceestimation parametervector[µ ,...,µ ]issettedto[1/1,1/2,...,1/m]. 1 m with sparse neighbors. S1,S2,S3,S4 are four estimated local subspaces spannedbytheobserveddataα1,α2,α3,α4 respectively. A. The Hopkins 155 Dataset The Hopkins 155 dataset [2] contains 3 different kinds sequences: checkerboard, traffic and articulated. For each As a consequence, after computing for each estimated local of them, the tracked feature trajectories are already been subspaceS(cid:98)iitscorrespondingerrorvectorei =[ei1,...,eiP], provided in the ground truth and the missing features are we can build an error matrix e = [e ,...,e ], which P×P 1 P removedaswell,whichmeansthetrajectoriesintheHopkins contains the strong connection between the projected data 155 dataset are fully observed and there is no missing data. span a same local subspace. We have computed the average and median misclassification In the end, we can construct our affinity graph G = error for comparison our method with state-of-the-art meth- (V,E) with combining the estimated error matrix e P×P ods: SSC [5], LSA [10], ALC [6]and MSMC [3], as shown and the sparse weight matrix Ω , whose the nodes V P×P in Table I, Table II, Table III. Table IV refers to the run representalltheprojecteddatapointsandedgesE denotethe timesofourmethodcomparingwithtwosparseoptimization distancesbetweenthem.Inouraffinitygraph,theconnection based methods: ALC and SSC. Obviously, as Table I and between each two nodes α and α is determined by both i j the eij and ωij. Therefore, our constructed affinity graph Method ALC SSC MSMC LSA Ourpca Ourspca contains only several connected elements, which are related Articulated, 11 sequences to the data points span the same subspace, whereas there mean 10.70 0.62 2.38 4.10 2.67 0.55 is no connection between the data points live in a different median 0.95 0.00 0.00 0.00 0.00 0.00 Traffic, 31 sequences subspace. More formally, the adjacent matrix of the affinity mean 1.59 0.02 0.06 5.43 0.2 0.48 graph is formulated as follows median 1.17 0.00 0.00 1.48 0.00 0.00 Checkerboard, 78 sequences A[i]=|ω |+|e | i i mean 1.55 1.12 3.62 2.57 1.69 0.56   A[1] 0 ... 0 median 0.29 0.00 0.00 0.27 0.00 0.00  0 A[2] ... 0  (15) All 120 sequences A= ... ... ... ... Γ mmeeadnian 20..4403 00..8020 20..6020 30..4559 10..5020 00..5030 0 0 ... A[P] TABLEI where the Γ ∈ RP×P is an arbitrary permutation matrix. MEANANDMEDIANOFTHEMISCLASSIFICATION(%)ONTHEHOPKINS Subsequently, we can perform the normalized spectral clus- 155DATASETWITH2MOTIONS. tering [21] on the symmetric matrix A and obtain the final clusters with different labels, and each cluster is related to one moving object. TableIIshow,theoverallerrorrateofourswithsparsePCA projection is the lowest for both 2 and 3 motions. Generally, V. EXPERIMENTALRESULTS the PCA projection has a lower accuracy than sparse PCA OurproposedframeworkisevaluatedonboththeHopkins projection for the articulated and checkerboard sequences. 155 dataset [2] and the Freiburg-Berkeley Motion Seg- However, the traffic video with PCA projection reaches a mentation Dataset [4] with comparing with state-of-the-art better result than the sparse PCA projection, which gives us subspace clustering and affinity-based motion segmentation a conclusion that PCA is more robust to represent the rigid algorithms. motion trajectory matrix, but the sparse PCA projection can Implementation Details Most popular subspace based better represent the trajectory matrix of independent or non- motion segmentation methods [5], [10], [6], [3], [4] have rigid motions. We also notice that MSMC performs the best Method ALC SSC MSMC LSA Our Our Method ALC SSC Our Our pca spca PCA SPCA Articulated, 2 sequences Run-time [s] 88831 14500 1066 1394 mean 21.08 1.91 1.42 7.25 3.72 3.19 median 21.08 1.91 1.42 7.25 3.72 3.19 TABLEIV Traffic, 7 sequences COMPUTATION-TIME(S)ONALLTHEHOPKINS155DATASET. mean 7.75 0.58 0.16 25.07 0.19 0.72 median 0.49 0.00 0.00 5.47 0.00 0.19 Checkerboard, 26 sequences mean 5.20 2.97 8.30 5.80 5.01 1.22 median 0.67 0.27 0.93 1.77 0.78 0.55 sequencesandallthefeaturetrajectoriesaretrackeddensely. All 35 sequences All the missing trajectories have not been removed and mean 6.69 2.45 3.29 9.73 2.97 1.94 there is no pre-processing for correcting the error tracked median 0.67 0.20 0.78 2.33 1.50 1.30 trajectory. The parameters for evaluation are precision (%) and recall (%). Our method has been compared with Ochs TABLEII [4],whichisbasedontheaffinityofthetrajectoriesbetween MEANANDMEDIANOFTHEMISCLASSIFICATION(%)ONTHEHOPKINS each two frames, SSC [5] and ALC [6]. The results on all 155DATASETWITH3MOTIONS. thetrainingsetandtestsetoftheBerkeleydatasetareshown in Table V-B. Method ALC SSC MSMC LSA Our Our pca spca Ochs ALC SSC Our Our all 155 sequences pca spca Mean 3.56 1.24 2.96 4.94 1.98 0.70 Precision 82.36 55.78 64.55 72.12 70.77 Median 0.50 0.00 0.90 0.75 0.00 Recall 61.66 37.43 33.45 66.52 65.42 TABLEIII TABLEV MEANANDMEDIANOFTHEMISCLASSIFICATION(%)ONALLTHE RESULTSONTHEENTIREFREIBURG-BERKELEYMOTION HOPKINS155DATASET. SEGMENTATIONDATASET[4]. In general, as shown in Table V-B, the PCA projection for the traffic sequence with 3 motions, but our work with has a better performance on this dataset than the sparse PCA projection is just slightly worse to MSMC and inferior PCA, which can not deal with the data matrix contains a to SSC, which is one of the most accurate subspace-based lot of zero entries. More specifically, our method with PCA algorithm.ButduetothepropertyofMSMC,whichisbased projection obtains the most Recall value comparing with the on computing the affinities between each pair trajectories, others, which indicates our assigned clusters can cover the it is highly time-consuming. The checkerboard data is the most parts of the different ground-truth regions. However, most significant component for the entire Hopkins dataset, compared with Ochs [4], which is based on the affinity, our which in particular contains a lot of features points and methodlackstheprecision.Itmeansthatourmethodcande- many intersection problems between different motions. To tect the boundaries of different regions but can not complete be specific, the most accurate results for the checkerboard segment the moving objects from the background. Figure 7 sequences belong to our proposed framework with sparse show us the examples of our results with PCA projection. PCA projection, either for two or three motions. It means Among all of these examples, our method has high quality thatourmethodhasthemostaccuracyforclusteringdifferent segmentations of the primary foreground moving objects, intersectedmotions.TableIIIshowsourmethodachievesthe which according to to the high recall value. However, there least misclassification error for all the sequences from the aresomeincorrectsegmentationsaswell,suchasthefeatures Hopkinsdatasetincomparisonwithalltheotheralgorithms. on the object cannot be distinguished exactly especially at Although our method with sparse PCA or PCA projection is the last few frames. These incomplete segmentation results a bit loss of precision for the traffic sequences, we save a indicatethesmallprecisionvalueinTableV-B.Amongallof lot of computation times comparing with SSC and ALC as thesubspace-basedmotionsegmentationalgorithmsSSCand showninTableIV.WeevaluateourmethodwithsparsePCA ALC, which need to firstly apply the sparse reconstruction projection in comparison with LSA [10], SSC [5], MSMC for the incomplete trajectories, our method only depends on [3], GPCA [8], RANSAC [9] and MSMC [3] in Figure 5 the error estimation and sparse neighbors technique but has and Figure 6 on the Hopkins 155 dataset. Note that MSMC a superior performance on the precision and recall. has not been evaluated on the checkboard sequence. Figure 8 show us some additional segmentation results. The typical failure segmentations are shown in the bottom B. Freiburg-Berkeley Motion Segmentation Dataset row marple1.avi, which contains 300 frames. Our method In this subsection, our method has been evaluated on the can not exactly extract the moving objects from the back- Freiburg-Berkeley Motion Segmentation dataset [4] to test ground for the video that has the really long observed the performance on the real video sequences with occlusion frames. Moreover our method can not segment the video and moving camera problems. This dataset contains 59 accurately when the camera is also moving, due to the (a) (b) (c) (d) (e) (f) Fig.5. ComparisonofOurapproachwithgroundtruthandtheotherapproachesonthe1RT2RCvideo:5(a):GroudTruth;5(b):GPCA,error:44.98%; 5(c):LSA,error:1.94%;5(d):RANSAC,error:33.66%;5(e):SSC,0%;5(f):Our,0%onthe1RT2TCsequencefromtheHopkins155dataset. moving foreground usually has the short feature trajectories subspaceforeachdatapointthatspanasamelocalsubspace. that are very difficult to handle. Moreover, we propose an error estimation to refine the local subspace estimation for the missing data. The advantage VI. CONCLUSIONS of the proposed method is that we can apply two sparse In this paper, we have proposed a subspace-based frame- optimizations and a simple error estimation to handle the work for segmenting multiple moving objects from a video incorrect local subspace estimation under the missing trajec- sequence with integrating global and local sparse subspace tories. The limitation of our work is the number of motions optimization methods. The sparse PCA performs a data should be known firstly and only a constrained number of projection from a high-dimensional subspace to a global missing data can be handled accurately. The experiments subspace with sparse orthogonal principal vectors. To avoid on the Hopkins and Berkeley dataset show our method improperly choosing K-nearest neighbors and defend in- are comparable with state-of-the-art methods in terms of tersection between different local subspaces, we seek a accuracy,andsometimesexceedsthemonbothprecisionand sparse representation for the nearest neighbors in the global computation time. (a) (b) (c) (d) (e) (f) Fig.6. ComparisonofOurapproachwithgroundtruthandtheotherapproachesonthe1RT2RCvideo:6(a):GroudTruth;6(b):GPCA,error:19.34%; 6(c):LSA,error:46.23%;6(d)MSMC,error:46.23%;6(e)SSC,0%;6(f):Our,0%. ACKNOWLEDGEMENTS [4] P.Ochs,J.Malik,andT.Brox,“Segmentationofmovingobjectsby longtermvideoanalysis,”PAMI,vol.36,no.6,pp.1187–1200,2014. The work is funded by DFG (German Research Founda- [5] E. Elhamifar and R. Vidal, “Sparse subspace clustering,” in CVPR, tion) YA 351/2-1 and the ERC-Starting Grant (DYNAMIC 2009,pp.2790–2797. MINVIP). The authors gratefully acknowledge the support. [6] Y. Ma, H. Derksen, W. Hong, and J. Wright, “Segmentation of multivariatemixeddatavialossydatacodingandcompression,”PAMI, REFERENCES vol.29,no.9,pp.1546–1562,2007. [7] H. Zou, T. Hastie, and R. Tibshirani, “Sparse principal component [1] M.Y.YangandB.Rosenhahn,“Videosegmentationwithjointobject analysis,” Journal of computational and graphical statistics, vol. 15, and trajectory labeling,” in IEEE Winter Conference on Applications no.2,pp.265–286,2006. ofComputerVision,2014,pp.831–838. [8] R. Vidal, Y. Ma, and S. Sastry, “Generalized principal component [2] R.TronandR.Vidal,“Abenchmarkforthecomparisonof3-dmotion analysis(gpca),”PAMI,vol.27,no.12,pp.1945–1959,2005. segmentationalgorithms,”inCVPR,2007,pp.1–8. [9] M. A. Fischler and R. C. Bolles, “Random sample consensus: a [3] R.Dragon,B.Rosenhahn,andJ.Ostermann,“Multi-scaleclusteringof paradigm for model fitting with applications to image analysis and frame-to-framecorrespondencesformotionsegmentation,”inECCV, automatedcartography,”CommunicationsoftheACM,vol.24,no.6, 2012,pp.445–458. pp.381–395,1981. (a) (b) (c) Fig. 7. Our segmentation results on Freiburg-Berkeley Motion Segmentation Dataset in comparison with the groundtruth segmentations from [4]. 7(a):bear01,7(b):marple4,7(c):cars8.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.