SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 1 A Closed-Form Solution to Tensor Voting: Theory and Applications Tai-Pang Wu, Sai-Kit Yeung, Jiaya Jia, Chi-Keung Tang, Ge´rard Medioni Abstract—Weproveaclosed-formsolutiontotensorvoting(CFTV):givenapointsetinanydimensions,ourclosed-formsolution provides an exact, continuous and efficient algorithm for computing a structure-aware tensor that simultaneously achieves salient structure detection and outlier attenuation. Using CFTV, we prove the convergence of tensor voting on a Markov random field (MRF),thustermedasMRFTV,wherethestructure-awaretensorateachinputsitereachesastationarystateuponconvergence in structure propagation. We then embed structure-aware tensor into expectation maximization (EM) for optimizing a single linear structuretoachieveefficientandrobustparameterestimation.Specifically,ourEMTValgorithmoptimizesboththetensorandfitting 6 parametersanddoesnotrequirerandomsamplingconsensustypicallyusedinexistingrobuststatisticaltechniques.Weperformed 1 quantitativeevaluationonitsaccuracyandrobustness,showingthatEMTVperformsbetterthantheoriginalTVandotherstate-of- 0 the-arttechniquesinfundamentalmatrixestimationformultiviewstereomatching.TheextensionsofCFTVandEMTVforextracting 2 multipleandnonlinearstructuresareunderway.AnaddendumisincludedinthisarXivversion. n a IndexTerms—Tensorvoting,closed-formsolution,structureinference,parameterestimation,multiviewstereo. J (cid:70) 0 2 ] V C 1 INTRODUCTION as well as general theory of tensor voting. This paper s. THIS paper reinvents tensor voting [19] for robust focuses on the special theory, where the above data c computer vision, by proving a closed-form solution communication is data driven without using constraints [ to computing an exact structure-aware tensor after data other than proximity and continuity. The special theory, 2 communication in a feature space of any dimensions, sometimes coined as “first voting pass,” is applied to v where the goal is salient structure inference from noisy process raw input data to detect structures and outliers. 8 and corrupted data. Inadditiontostructuredetectionandoutlierattenuation, 8 To infer structures from noisy data corrupted by in the general theory of tensor voting, tensor votes are 8 outliers, in tensor voting, input points communicate propagated along preferred directions to achieve data 4 0 among themselves subject to proximity and continuity communication when such directions are available, typ- . constraints. Consequently, each point is aware of its ically after the first pass, such that useful tensor votes 1 structure saliency via a structure-aware tensor. Structure are reinforced whereas irrelevant ones are suppressed. 0 6 referstosurfaces,curves,orjunctionsifthefeaturespace Expressing tensor voting in a single and compact 1 is three dimensional where a structure-aware tensor can equation, or a closed-form solution, offers many ad- v: be visualized as an ellipsoid: if a point belongs to a vantages: not only an exact and efficient solution can i smooth surface, the resulting ellipsoid after data com- be achieved with less implementation effort for salient X munication resembles a stick pointing along the surface structure detection and outlier attenuation, formal and r normal; if a point lies on a curve the tensor resembles useful mathematical operations such as differential cal- a a plate, where the curve tangent is perpendicular to culus can be applied which is otherwise impossible the plate tensor; if it is a point junction where surfaces using the original tensor voting procedure. Notably, we intersect, the tensor will be like a ball. An outlier is can prove the convergence of tensor voting on Markov characterized by a set of inconsistent votes it receives random fields (MRFTV) where a structure-aware tensor after data communication. at each input site achieves a stationary state upon con- We develop in this paper a closed-form solution to vergence. tensor voting (CFTV), which is applicable to the special Using CFTV, we contribute a mathematical derivation basedonexpectationmaximization(EM)thatappliesthe • T.-P.WuiswiththeEnterpriseandConsumerElectronicsGroup,Hong exacttensorsolutionforextractingthemostsalientlinear KongAppliedScienceandTechnologyResearchInstitute(ASTRI),Hong structure,despitethattheinputdataishighlycorrupted. Kong.E-mail:[email protected]. Our algorithm is called EMTV, which optimizes both • S.-K.YeungandC.-K.TangarewiththeDepartmentofComputerScience andEngineering,HongKongUniversityofScienceandTechnology,Clear the tensor and fitting parameters upon convergence and WaterBay,HongKong.E-mail:{saikit,cktang}@cse.ust.hk. does not require random sampling consensus typical • J. Jia is with the Department of Computer Science and Engineering, of existing robust statistical techniques. The extension Chinese University of Hong Kong, Shatin, Hong Kong. E-mail: leo- [email protected] to extract salient multiple and nonlinear structures is • G. Medioni is with the Department of Computer Science, University of underway. SouthernCalifornia,USA.E-mail:[email protected]. While the mathematical derivation may seem in- SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 2 Fig. 1. Inlier/outlier and tensor inverse illustration. (a) The normal votes received at a surface point cast by points in x’s neighborhood. Three salient outliers are present. (b) For a non-surface point, there is no preference to any normals. (c) The structure-awaretensorinducedbythenormalobservationsin(a),whichisrepresentedbyad-Dellipsoid,whered≥2.Theorange curve(dashedcurve)representsthevarianceproducedalongallpossibledirections.(d)Thestructure-awaretensorcollectedafter collectingthereceivedvotesin(b).(e)and(f)correspondtotheinverseof(c)and(d),respectively. volved, our main results for CFTV, MRFTV and EMTV, no free parameter as input. Rather than using a Gaus- that is, Eqns (11), (12), (18), (19), (26), and (30). have sian distribution to model inliers, the authors of [28] rigorousmathematicalfoundations,areapplicabletoany proposed to use a two-step scale estimator (TSSE) to dimensions, produce more robust and accurate results refine the model scale: first, a non-Gaussian distribution as demonstrated in our qualitative and quantitative is used to model inliers where local peaks of density are evaluation using challenging synthetic and real data, found by mean shift [5]; second, the scale parameter is but on the other hand are easier to implement. The estimatedbyamedianscaleestimatorwiththeestimated source codes accompanying this paper are available in peaks and valleys. On the other hand, the projection- the supplemental material. based M-estimator (pbM) [3], an improvement made on theM-estimator,usesaParzenwindowforscaleestima- tion, so the scale parameter is automatically found by 2 RELATED WORK searching for the normal direction (projection direction) that maximizes the sharpest peak of the density. This Whilethispaperismainlyconcernedwithtensorvoting, doesnotrequireaninputscalefromtheuser.Whilethese we provide a concise review on robust estimation and recent methods can tolerate more outliers, most of them expectation maximization. still rely on or are based on RANSAC and a number of Robust estimators. Robust techniques are widely used randomsamplingtrialsisrequiredtoachievethedesired andanexcellentreviewofthetheoreticalfoundationsof robustness. robustmethodsinthecontextofcomputervisioncanbe To reject outliers, a multi-pass method using L - ∞ found in [20]. norms was proposed to successively detect outliers The Hough transform [13] is a robust voting-based which are characterized by maximum errors [24]. technique operating in a parameter space capable of Expectation Maximization. EM has been used in han- extracting multiple models from noisy data. Statistical dling missing data and identifying outliers in robust Hough transform [6] can be used for high dimensional computervision[8],anditsconvergencepropertieswere spaceswithsparseobservations.Meanshift[5]hasbeen studied [18]. In essence, EM consists of two steps [2], widely used since its introduction to computer vision [18]: for robust feature space analysis. The Adaptive mean shift [11] with variable bandwidth in high dimensions 1) E-Step.Computinganexpectedvalueforthecom- was introduced in texture classification and has since plete data set using incomplete data and the cur- been applied to other vision tasks. Another popular rent estimates of the parameters. robust method in computer vision is in the class of 2) M-Step. Maximizing the complete data log- random sampling consensus (RANSAC) procedures [7] likelihood using the expected value computed in which have spawned a lot of follow-up work (e.g., the E-step. optimal randomized RANSAC [4]). Like RANSAC [7], robust estimators including the EM is a powerful inference algorithm, but it is also LMedS[22]andtheM-estimator[14]adoptedastatistical well-known from [8] that: 1) initialization is an issue approach. The LMedS, RANSAC and the Hough trans- because EM can get stuck in poor local minima, and 2) form can be expressed as M-estimators with auxiliary treatment of data points with small expected weights scale [20]. The choice of scales and parameters related requiresgreatcare.Theyshouldnotberegardedasneg- to the noise level are major issues. Existing works on ligible, as their aggregate effect can be quite significant. robust scale estimation use random sampling [26], or In this paper we initialize EMTV using structure-aware operate on different assumptions (e.g., more than 50% tensors obtained by CFTV. As we will demonstrate, of the data should be inliers [23]; inliers have a Gaus- such initialization not only allows the EMTV algorithm sian distribution [15]). Among them, the Adaptive Scale to converge quickly (typically within 20 iterations) but Sample Consensus (ASSC) estimator [28] has shown the alsoproducesaccurateandrobustsolutioninparameter best performance where the estimation process requires estimation and outlier rejection. SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 3 Fig. 2. (a) The normal vote v received at x using an arc of i i Fig. 3. Illustration of Eqn. (5). Normal vote v = v (x ,x ) the osculating circle between x and x , assuming the normal i θ i j i j receivedatx usingthearcoftheosculatingcirclebetweenx voter at x is n , where n , r , and v are unit vectors in this i i j j j ij i andx ,consideringoneofthenormalvotersatx isn .Here, illustration.(b)PlotofEqn.(1)in2D. j j θj n andrareunitvectors. θj 3 DATA COMMUNICATION attenuatingthecontributionaccordingtocurvature.Sim- In the tensor voting framework, a data point, or voter, ilar to the original tensor voting framework, Eqn. (1) communicates with another data point, or vote receiver, favors nearby neighbors that produce small-curvature subjecttoproximityandcontinuityconstraints,resulting connections,thusencodingthesmoothnessconstraint.A in a tensor vote cast from the voter to the vote receiver plot of the 2D version of Eqn. (1) is shown in Fig. 2(b), (Fig.1).Inthefollowing,weusentodenoteaunitvoting where x is located at the center of the image and n j j stick tensor, v to denote a stick tensor vote received. is aligned with the blue line. The higher the intensity, Stick tensor vote may not be unit vectors when they the higher the value Eqn. (1) produces at a given pixel are multiplied by vote strength. These stick tensors are location. building elements of a structure-aware tensor vote. Next, consider the general case where the normal n j Here, we first define a decay function η to encode at x is unavailable. Here, let K at x be any second- j j j the proximity and smoothness constraints (Eqns 1 and order symmetric tensor, which is typically initialized as 2). While similar in effect to the decay function used an identity matrix if no normal information is available. in the original tensor voting, and also to the one used To compute K given K , we consider equivalently i j in [9] where a vote attenuation function is defined to the set of all possible unit normals {n } associated θj decouple proximity and curvature terms, our modified with the corresponding length {τ } which make up K θj j function, which also differs from that in [21], enables a at x , where {n } and {τ } are respectively indexed j θj θj closed-form solution for tensor voting without resorting by all possible directions θ. Each τ n postulates a θj θj to precomputed discrete voting fields. normal vote v (x ,x ) at x under the same smoothness Refer to Fig. 2. Consider two points x ∈Rd and x ∈ θ i j i i j constraint prescribed by the corresponding arc of the Rd (where d > 1 is the dimension) that are connected osculating circle as illustrated in Fig. 3. by some smooth structure in the feature/solution space. Let S be the second-order symmetric tensor vote ij Suppose that the unit normal n at x is known. We j j obtained at x due to this complete set of normals at i want to generate at x a normal (vote) v so that we can i i x defined above. We have calculate K ∈Rd×Rd, where K is the structure-aware j i i (cid:90) tensor at xi in the presence of nj at xj. In tensor voting, S = v (x ,x )v (x ,x )Tη(x ,x ,n )dN (3) ij θ i j θ i j i j θj θj a structure-aware tensor is a second-order symmetric Nθj∈ν tensor which can be visualized as an ellipsoid. where While many possibilities exist, the unit direction v i N =n nT (4) can be derived by fitting an arc of the osculating circle θj θj θj betweenthetwopoints.Suchanarckeepsthecurvature and ν is the space containing all possible N . For θj constantalongthehypothesizedconnection,thusencod- example, if ν is 2D, the complete set of unit normals ing the smoothness constraint. K is then given by v vT i i i n describes a unit circle. If ν is 3D, the complete set of θ multiplied by η(xi,xj,nj) defined as: unit normals n describes a unit sphere2. θ η(x ,x ,n )=c (1−(rTn )2) (1) In a typical tensor voting implementation, Eqn. (3) is i j j ij ij j precomputedasdiscretevotingfields(e.g.,plateandball where voting fields in 3D tensor voting [19]): the integration ||x −x ||2 c =exp(− i j ) (2) is implemented by rotating and summing the contri- ij σ d butions using matrix addition. Although precomputed is an exponential function using Euclidean distance for once,suchdiscreteapproximationsinvolveuniformand attenuating the strength based on proximity. σ is the d size of local neighborhood (or the scale parameter, the 2. Thedomainofintegrationν representsthespaceofsticktensors only free parameter in tensor voting). givenby nθj.Note thatd>1;alternatively, itcanbe understoodby In Eqn. (1), r ∈ Rd is a unit vector at x pointing expressing Nθj using polar coordinates; and thus in N dimensions, ij j θ=(φ1,φ2,···,φn−1).Itfollowsnaturallywedonotuseθtodefine to xi, and 1 − (rTijnj)2 is a squared-sine function1 for the(cid:82)integrationdomain,becauseratherthansimplywriting (cid:82)Nθj(cid:82)∈ν···(cid:82)dNθj,itwouldhavebeen 1. sin2ρ=1−cos2ρ, where cos2ρ=(rTijnj)2 and ρ is theangle φ1 φ2··· φn−1···dφn−1dφn−2···dφ1 making the derivation of betweenrij andnj. theproofofTheorem1morecomplicated. SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 4 dense sampling of tensor votes n nT in higher dimen- we have θ θ sions where the number of dimensions depends on the (cid:90) τ2N rrTN dN problem.Inthefollowingsection,wewillproveaclosed- θ θ θ θ form solution to Eqn. (3), which provides an efficient Nθ∈ν (cid:90) and exact solution to computing K without resorting to = [f(θ)g(θ)]Nθ∈ν − f(cid:48)(θ)g(θ)dNθ discrete and dense sampling. Nθ∈ν (cid:20)1 (cid:21) 1(cid:90) = τ2N rrTN2 − τ2rrTN dN 2 θ θ θ 2 θ θ θ Nθ∈ν Nθ∈ν 4 CLOSED-FORM SOLUTION 1(cid:90) (cid:18) d d (cid:19) = τ2 [N ]rrTN2+τ2N [rrTN2] dN 2 θdN θ θ θ θdN θ θ Theorem 1: (Closed-Form Solution to Tensor Voting) The Nθ∈ν θ θ 1 tensor vote at xi induced by Kj located at xj is given −2rrTKj. by the following closed-form solution: The explanation for the last equality above is given in S =c R K R(cid:48) this footnote4. ij ij ij j ij Finally,weapplythefactthatNq =N (forallq ∈Z+) θ θ vw2wrehiictjehtrorTierσj,KpRoaj(cid:48)iisjnistt=hiane(gsIse−cfcraoo12lenmrdipj-xraoTijrrjad)tmRoereixjts,eiyIrma.insmdaenctirijidc=etneetnixtspyo,(r−r,iRj||xiisij−σa=dxuj|In|2−i)t to conve12rt(cid:90)dNNdθθ∈[νrr(cid:0)τTθ2Nrr2θT] Nin2θto+dτNdθ2θN[rθrrTrNT(cid:1)θ]d.NWθe−ob12trariTnKj Prdoof: For simplicity of notation, set r = rij, nθ = = 12(cid:0)rrTKj +KjrrT −rrTKj(cid:1) nθj and Nθ = Nθj. Now, using the above-mentioned 1 osculating arc connection, vθ(xi,xj) can be expressed as = 2KjrrT. (9) By substituting Eqn. (9) back to Eqn. (35), we obtain the vθ(xi,xj)=(nθ−2r(rTnθ))τθ (5) result as follows: (cid:18) (cid:19) 1 Recall that nθ is the unit normal at xj with direction θ, Sij =cijRKj I− 2rrT RT. (10) andthatτ isthelengthassociatedwiththenormal.This θ vector subtraction equation is shown in Fig. 3 where the Replacerbyr suchthatR =I−2r rT andletR(cid:48) = ij ij ij ij ij roles of vθ,nθ,r, and τθ are illustrated. (I− 21rijrTij)Rij, we obtain Let S =c R K R(cid:48) . (11) R=(I−2rrT), (6) ij ij ij j ij where I is an identity, we can rewrite Eqn. (3) into the A structure-aware tensor K = (cid:80) S can thus be i j ij following form: assigned at each site x . This tensor sum considers both i geometric proximity and smoothness constraints in the (cid:90) Sij =cij τθ2RnθnTθRT(1−(nTθr)2)dNθ. (7) presence of neighbors xj under the chosen scale of Nθ∈ν analysis. Note also that Eqn. (11) is an exact equivalent of Eqn. (3), or (7), that is, the first principle. Since the Following the derivation: first principle produces a positive semi-definite matrix, (cid:90) Eqn. (11) still produces a positive semi-definite matrix. Sij = cij τθ2Rnθ(1−(nTθr)2)nTθRTdNθ In tensor voting, eigen-decomposition is applied to a Nθ∈ν structure-aware tensor. In three dimensions, the eigen- (cid:18)(cid:90) (cid:19) = cijR τθ2nθ(1−nTθrrTnθ)nTθdNθ RT system has eigenvalues λ1 ≥ λ2 ≥ λ3 ≥ 0 with the Nθ∈ν correspondingeigenvectorseˆ1,eˆ2,andeˆ3.λ1−λ2denotes (cid:18)(cid:90) (cid:19) surface saliency with normal direction indicated by eˆ ; = c R τ2N −τ2N rrTN dN RT 1 ij θ θ θ θ θ θ λ − λ denotes curve saliency with tangent direction Nθ∈ν 2 3 (cid:18) (cid:90) (cid:19) indicated by eˆ ; junction saliency is indicated by λ . 3 3 = cijR Kj − τθ2NθrrTNθdNθ RT (8) While it may be difficult to observe any geometric Nθ∈ν intuition directly from this closed-form solution, the The integration can be solved by integration by parts. geometric meaning of the closed-form solution has been Let f(θ) = τ2N , f(cid:48)(θ) = τ2I, g(θ) = 1rrTN2 and described by Eqn. (3) (or (7), the first principle), since θ θ θ 2 θ g(cid:48)(θ) = rrTN , and note that Nq = N for all q ∈ Z+ θ θ θ (see this footnote3), and K , in the most general form, 4. Here,werewritethefirsttermbytheproductruleforderivative can be expressed as a generjic tensor (cid:82)Nθ∈ντθ2NθdNθ. So asencdonthdetfeurmndabmyeangtaelntehreicorteemnsoorf.cWalecuolbutsaiann:dthenexpresspartofthe 1(cid:90) d 1 3.ThederivationisasfollowsNqθ =nθnTθnθnTθ ···nθnTθ =nθ·1· 2 Nθ∈ν dNθ[τθ2NθrrTN2θ]dNθ− 2rrTKj. 1···1·nTθ =nθnTθ =Nθ. SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 5 Fig.4. (a)3D ball voting field.Aslicegeneratedusingtheclosed-formsolutionEqn(11),whichhassimilartensororientations (butdifferenttensorstrengths)astheballvotingfieldin[19].(b)3D plate voting field.Left:acutofthevotingfield(directionof eˆ normaltothepage).Right:acutofthesamevotingfield,showingthe(λ −λ )eˆ component(i.e.,componentparalleltothe 3 2 3 3 tangentdirection).ThefieldisgeneratedbyusingEqn(11)showingsimilartensororientationsastheplatevotingfieldin[19].(c) 3D stick voting field. A slice after zeroing out votes lying in the 45-degree zone as done in [19]. The stick tensor orientations showninthefigureareidenticaltothoseinthe3Dstickvotingfieldin[19].(d)Votecomputationusingtheclosed-formsolutionin onesinglestepbyEqn(11). ofthesamedimensionalitydasK .Toverifyourclosed- j form solution, we perform the following to compare with the voting fields used by the original tensor voting framework (Fig. 4): (a) Set K to be an identity (ball tensor) in Eqn (11) and compute all votes S in a neighhorhood. This Fig. 5. Close-ups of ball voting fields generated using the procedure generates the ball voting field, Fig. 4(a). originaltensorvotingframework(left)andCFTV(right). 1 0 0 (b) Set K to be a plate tensor 0 1 0 in Eqn (11) Eqn.(11)isequivalenttoEqn.(3).Notethatoursolution 0 0 0 is different from, for instance, [21], where the N-D and compute all votes S in a neighborhood. This formulation is approached from a more geometric point procedure generates the plate voting field, Fig. 4(b). of view. 1 0 0 As will be shown in the next section on EMTV, the (c) Set K to be a stick tensor 0 0 0 in Eqn (11) inverse of K is used. In case of a perfect stick ten- 0 0 0 j sor, which can be equivalently represented as a rank- and compute all votes S in a neighborhood. This 1 matrix, does not have an inverse. Similar in spirit procedure generates the stick voting field, Fig. 4(c). where a Gaussian function can be interpreted as an (d) Set K to be any generic second-order tensor in impulse function associated with a spread representing Eqn (11) to compute a tensor vote S at a given site. uncertainty,asimilarstatisticalapproachisadoptedhere We do not need a voting field, or the somewhat in characterizing our tensor inverse. Specifically, the complex procedure described in [21]. In one single uncertainty is incorporated using a ball tensor, where step using the closed-form solution Eqn (11), we (cid:15)I is added to K , (cid:15) is a small positive constant (0.001) obtain S as shown in Fig. 4(d). j and I an identity matrix. Fig. 1 shows a tensor and its Note that the stick voting field generation is the same inverse for some selected cases. The following corollary as the closed-form solution given by the arc of an regarding the inverse of Sij is useful: osculating circle. On the other hand, since the closed- Corollary 1: Let R(cid:48)(cid:48)ij = Rij(I+rijrTij) and also note form solution does not remove votes lying beyond the that R−ij1 =Rij, the corresponding inverse of Sij is: 45-degree zone as done in the original framework, it is useful to compare the ball voting field generated using S(cid:48) =c−1R(cid:48)(cid:48) K−1R (12) the CFTV and the original framework. Fig. 5 shows the ij ij ij j ij close-ups of the ball voting fields generated using the Proof:Thiscorollarycansimplybeprovedbyapply- originalframeworkandCFTV.Asanticipated,thetensor ing inverse to Eqn (11). orientations are almost the same (with the maximum Note the initial Kj can be either derived when input angular deviation at 4.531◦), while the tensor strength direction is available, or simply assigned as an identity is different due to the use of different decay functions. matrix otherwise. Thenewcomputationresultsinperfectvoteorientations which are radial, and the angular discrepancies are due to the discrete approximations in the original solution. 4.1 Examples While the above illustrates the usage of Eqn (11) in Using Eqn (11), given any input K at site j which is three dimensions, the equation applies to any dimen- j a second-order symmetric tensor, the output tensor S sions d. All of the S’s returned by Eqn (11) are second- ij canbecomputeddirectly.NotethatR isad×dmatrix order symmetric tensors and can be decomposed using ij SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 6 eigen-decomposition. The implementation of Eqn (11) is a matter of a few lines of C++ code. Our“votingwithoutvotingfields”methodisuniform toanyinputtensorsK thataresecond-ordersymmetric j tensor in its closed-form expressed by Eqn (11), where formal mathematical operation can be applied on this compactequation,whichisotherwisedifficultontheal- gorithmicproceduredescribedinprevioustensorvoting Fig.6. ConvergenceofMRF-TV.Fromlefttoright:inputpoints, papers. Notably, using the closed-form solution, we are result after 2 passes, result after convergence (10 iterations). now able to prove mathematically the convergence of Visually, this simple example distinguishes tensor voting from tensor voting in the next section. smoothing as the sharp orientation discontinuity is preserved uponconvergence. 4.2 TimeComplexity Akin to the original tensor voting formalism, each site if more tensor voting passes are applied? This has never (input or non-input) communicates with each other on been answered properly. an Markov random field (MRF) in a broad sense, where In this section we provide a convergence proof for the number of edges depends on the scale of analysis, tensor voting based on CFTV: the structure-aware ten- parameterized by σ in Eqn (2). In our implementation, sor obtained at each site achieves a stationary state d we use an efficient data structure such as ANN tree [1] upon convergence. Our convergence proof makes use of to access a constant number of neighbors x of each x . Markov random fields (MRF), thus termed as MRFTV. j i It should be noted that under a large scale of analysis It should be noted that the original tensor voting where the number of neighbors is sufficiently large, formulation is also constructed on an MRF according similar number of neighbors are accessed in ours and to the broad definition, since random variables (that is, the original tensor voting implementation. the tensors after voting) are defined on the nodes of an The speed of accessing nearest neighbors can be undirected graph in which each node is connected to all greatly increased (polylogarithmic) by using ANN thus neighbors within a fixed distance. On the other hand, making efficient the computation of a structure-aware without CFTV, it was previously difficult to write down tensor. Note that the running time for this implementa- anobjectivefunctionandtoprovetheconvergence.One tion of closed-from solution is O(d3), while the running caveat to note in the following is that we do not disable timefortheoriginaltensorvoting(TV)isO(ud−1),where the ball component in each iteration, which will be d is the dimension of the space and u is the number addressedinthefutureindevelopingthegeneraltheory of sampling directions for a given dimension. Because of tensor voting in structure propagation. As we will of this, a typical TV implementation precomputes and demonstrate, MRFTV does not smooth out important stores the dense tensor fields. For example, when d=3 features (Fig. 6) and still possesses high outlier rejection and u = 180 for high accuracy, our method requires ability (Fig. 13). 27 operation units, while a typical TV implementation RecallinMRF,aMarkovnetworkisagraphconsisting requires 32400 operation units. Given 1980 points and of two types of nodes – a set of hidden variables E the same number of neighbors, the time to compute and a set of observed variables O, where the edges a structure-aware tensor using our method is about of the graph are described by the following posterior 0.0001 second; it takes about 0.1 second for a typical TV probability P(E|O) with standard Bayesian framework: implementation to output the corresponding tensor. The P(E|O)∝P(O|E)P(E) (13) measurementwasperformedonacomputerrunningon a core duo 2GHz CPU with 2GB RAM. By letting E = {K |i = 1,2,··· ,N} and O = {K˜ |i = i i Note that the asymptotic running time for the im- 1,2,··· ,N},whereN istotalnumberofpointsandK˜ is i proved TV in [21] is O(dγ2) since it applies Gramm- the known tensor at x , and suppose that inliers follow i Schmidt process to perform component decomposition, Gaussian distribution, we obtain the likelihood P(O|E) whereγ isthenumberoflinearlyindependentsetofthe and the prior P(E) as follows: tensors.Inmostofthecases,γ =d.So,therunningtime for our method is comparable to [21]. However, their P(O|E) ∝ (cid:89)p(K˜i|Ki)=(cid:89)e−||Ki−σhK˜i||2F (14) approachdoesnothaveaprecisemathematicalsolution. i i (cid:89) (cid:89) P(E) ∝ p(S |K ) (15) ij i 5 MRFTV i j∈N(i) We have proved CFTV for the special theory of tensor = (cid:89) (cid:89) e−||Ki−σSsij||2F (16) voting, or the “first voting pass” for structure inference. i j∈N(i) Conventionally, tensor voting was done in two passes, where the second pass was used for structure propa- where || · || is the Frobenius norm, K˜ is the known F i gation in the preferred direction after disabling the ball tensor at x , N(i) is the set of neighbor corresponds to i component in the structure-aware tensor. What happens x and σ and σ are two constants respectively. i h s SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 7 NotethatweusetheFrobeniusnormtoencodetensor orientation consistency as well as to reflect the neces- sary vote saliency information including distance and continuity attenuation. For example, suppose we have a unit stick tensor at x and a stick vote (received at TABLE1 i x ), which is parallel to it but with magnitude equal ComparisonwithgroundtruthfortheexampleinFig.6. i to 0.8. In another scenario x receives from a voter i farther away a stick vote with the same orientation estimation, explaining the use of RANSAC in the final but magnitude being equal to 0.2. The Frobenius norm parameter estimation step after outlier rejection [27]. reflects the difference in saliency despite the perfect This section describes the EMTV algorithm for opti- orientation consistency in both cases. Notwithstanding, mizing a) the structure-aware tensor K at each input itisarguablethatFrobeniusnormmaynotbetheperfect site, and b) the parameters of a single plane h of any solution to encode orientation consistency constraint in dimensionalitycontainingtheinliers.Thisalgorithmwill the pertinent equations, while this current form works be applied to stereo matching. acceptably well in our experiments in practice. We first formulate the three constraints to be used By taking the logarithm of Eqn (13), we obtain the in EMTV. These constraints are not mutually exclusive, following energy function: where knowing the values satisfying one constraint will E(E)=(cid:88)||K −K˜ ||2 +g(cid:88) (cid:88) ||K −S ||2 (17) help computing the values of the others. However, in i i F i ij F i i j∈N(i) our case, they are all unknowns, so EM is particularly where g = σh. Theoretically, this quadratic energy func- suitable for their optimization, since the expectation cal- σs culationandparameterestimationaresolvedalternately. tion can be directly solved by Singular Value Decompo- sition (SVD). Since N can be large thus making direct 6.1 Constraints SVD impractical, we adopt an iterative approach: by Data constraint Suppose we have a set of clean data. taking the partial derivative of Eqn (17) (w.r.t. to K ) i One necessary objective is to minimize the following for the following update rule is obtained: all x ∈Rd with d>1: i K∗i =(K˜i+2g (cid:88) Sij)(I+g (cid:88) (I+c2ijR(cid:48)ij2))−1 (18) ||xTh|| (20) i j∈N(i) j∈N(i) where h∈Rd is a unit vector representing the plane (or which is a Gauss-Seidel solution. When successive over- the model) to be estimated.5 This is a typical data term relaxation (SOR) is employed, the update rule becomes: that measures the faithfulness of the input data to the K(m+1) =(1−q)K(m)+qK∗ (19) fitting plane. i i i Orientation consistency The plane being estimated is where1<q <2istheSORweightandmistheiteration defined by the vector h. Since the tensor Ki ∈ Rd×Rd number. After each iteration, we normalize Ki such that encodes structure awareness, if xi is an inlier, the ori- the eigenvalues of the corresponding eigensystem are entation information encoded by Ki and h have to be within the range (0,1]. consistent. That is, the variance hTK−1h produced by h i The above proof on convergence of MRF-TV shows shouldbeminimal.Otherwise,xi mightbegeneratedby that structure-aware tensors achieve stationary states other models even if it minimizes Eqn (20). Mathemati- afterafinitenumberGauss-Seideliterationsintheabove cally, we minimize: formulation. It also dispels a common pitfall that tensor ||hTK−1h||. (21) voting is similar in effect to smoothing. Using the same i scale of analysis (that is, in (2)) and same σh, σs in each Neighborhood consistency While the estimated Ki iteration, tensor saliency and orientation will both con- helps to indicate inlier/outlier information, K has to i verge. We observe that the converged tensor orientation be consistent with the local structure imposed by its isinfactsimilartothatobtainedaftertwovotingpasses neighbors (when they are known). If K is consistent i using the original framework, where the orientations at with h but not the local neighborhood, either h or K is i curve junctions are not smoothed out. See Fig. 6 for wrong.Inpractice,weminimizethefollowingFrobenius an example where sharp orientation discontinuity is not norm as in Eqns (14)–(16): smoothed out when tensor voting converges. Here, λ 1 ||K−1−S(cid:48) || . (22) of each structure-aware tensor is not normalized to 1 i ij F for visualizing its structure saliency after convergence. In the spirit of MRF, S(cid:48) encodes the tensor information ij Table 1 summarizes the quantitative comparison with withinx ’sneighborhood,thusanaturalchoicefordefin- i the ground-truth orientation. ing the term for measuring neighborhood orientation 6 EMTV 5. Notethat,insomecases,theunderlyingmodelisrepresentedin Previously, while tensor voting was capable of rejecting tEhqins.fo(2rm0).xFTiohr −exzaimwplhee,reexwpaencdanxrTieh-ar−raznigeinittoinatxoit+hebfyoirm+1gzivie=nb0y, outliers, it fell short of producing accurate parameter whichcanwrittenintheformof(20). SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 8 consistency. This is also useful, as we will see, to the to either category (inlier/outlier). For generality in the M-step of EMTV which makes the MRF assumption. following we will include α in the derivation. The above three constraints will interact with each Definew =p(r |o ,Λ(cid:48))tobetheprobabilityofo being i i i i other in the proposed EM algorithm. an inlier. Then 6.2 ObjectiveFunction p(o ,r =1|Λ(cid:48)) w = p(r =1|o ,Λ(cid:48))= i i Define O = {oi = xi|i = 1,··· ,N} to be the set of i i i p(oi|Λ(cid:48)) observations. Our goal is to optimize h and K−i 1 given αβ exp(−||xTih||2)exp(−||hTK−i1h||) O. Mathematically, we solve the objective function: = 2σ2 2σ12 (26) αβ exp(−||xTih||2)exp(−||hTK−i1h||)+ 1−α Λ∗ =argmaxP(O,R|Λ) (23) 2σ2 2σ12 C Λ where β = 1 is the normalization term. where P(O,R|Λ) is the complete-data likelihood to be 2σσ1π maximized, R={r } is a set of hidden states indicating i 6.4 Maximization(M-Step) if observation o is an outlier (r = 0) or inlier (r = 1), i i i and Λ={{K−i 1},h,α,σ,σ1,σ2} is a set of parameters to In the M-Step,we maximize Eqn. (24) using wi obtained beestimated.α,σ,σ andσ areparametersimposedby fromtheE-Step.Sinceneighborhoodinformationiscon- 1 2 some distributions, which will be explained shortly by sidered, we model P(O,R|Λ) as a MRF: using an equation to be introduced6. Our EM algorithm (cid:89) (cid:89) estimates an optimal Λ∗ by finding the value of the P(O,R|Λ)= p(ri|rj,Λ)p(oi|ri,Λ) (27) complete-data log likelihood with respect to R given O i j∈G(i) and the current estimated parameters Λ(cid:48): where G(i) is the set of neighbors of i. In theory, G(i) (cid:88) containsalltheinputpointsexcepti,sincec inEqn.(2) Q(Λ,Λ(cid:48))= logP(O,R|Λ)P(R|O,Λ(cid:48)) (24) ij is always non-zero (because of the long tail of the R∈ψ Gaussian distribution). In practice, we can prune away whereψ isaspacecontainingallpossibleconfigurations the points in G(i) where the values of c are negligible. ij ofRofsizeN.AlthoughEMdoesnotguaranteeaglobal This can greatly reduce the size of the neighborhood. optimal solution theoretically, because CFTV provides Again, using ANN tree [1], the speed of searching for goodinitialization,wewilldemonstrateempiricallythat nearest neighbors can be greatly increased. reasonable results can be obtained. Let us examine the two terms in Eqn. (27). p(o |r ,Λ) i i has been defined in Eqn. (25). We define p(r |r ,Λ) here. i j 6.3 Expectation(E-Step) Using the third condition mentioned in Eqn. (22), we In this section, the marginal distribution p(ri|oi,Λ(cid:48)) will have: be defined so that we can maximize the parameters in ||K−1−S(cid:48) ||2 p(r |r ,Λ)=exp(− i ij F) (28) the next step (M-Step) given the current parameters. i j 2σ2 2 If r = 1, the observation o is an inlier and therefore i i WearenowreadytoexpandEqn.(24).Sincer canonly minimizes the first two conditions (Eqns 20 and 21) in i assume two values (0 or 1), we can rewrite Q(Λ,Λ(cid:48)) in Section 6.1, that is, the data and orientation constraints. Eqn. (24) into the following form: In both cases, we assume that inliers follow a Gaussian distribution which explains the use of K−1 instead of i Ki.7 We model p(oi|ri,Λ(cid:48)) as (cid:88) log((cid:89) (cid:89) p(r =t|r ,Λ)p(o |r =t,Λ))P(R|O,Λ(cid:48)) i j i i ∝(cid:40) exp(−||x2Tiσh2||2)exp(−||hT2Kσ−i21h||), if ri =1; (25) t∈{0,1} i j∈G(i) 1, 1 if r =0. After expansion, C i We assume that outliers follow uniform distribution (cid:88) 1 ||xTh||2 Q(Λ,Λ(cid:48)) = log(α √ exp(− i ))w where C is a constant that models the distribution. Let σ 2π 2σ2 i C be the maximum dimension of the bounding box of i m the input. In practice, Cm ≤ C ≤ 2Cm produces similar + (cid:88)log( √1 exp(−||hTK−i 1h||))w results. i σ1 2π 2σ12 i anSiinnclieerwoerhoauvtelinero, pwreiormianyforamssautmioen tohnatatphoeinmtibxetuinrge + (cid:88)log(exp(−||K−i 1−S(cid:48)ij||2F))w w 2σ2 i j probability of the observations p(ri = 1) = p(ri = 0) i 2 equals to a constant α = 0.5 such that we have no bias (cid:88) 1−α + log( )(1−w ) (29) C i i 6.SeetheM-stepinEqn(30). 7.Although a linear structure is being optimized here, the inliers To maximize Eqn. (29), we set the first derivative of togethermaydescribeastructurethatdoesnotnecessarilyfollowany Q with respect to K−1, h, α, σ, σ and σ to zero particularmodel.Eachinliermaynotexactlylieonthisstructurewhere i 1 2 themisalignmentfollowstheGaussiandistribution. respectively to obtain the following set of update rules: SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 9 with synthetic outliers and/or noise where the ground 1 (cid:88) truth is available, and perform comparison. Third, more α = w N i experiments on multiview stereo matching on real im- i ages are performed. 1 (cid:88) σ2 K−1 = ( S(cid:48) w − 2 hhTw ) As we will show, EMTV performed the best in highly i (cid:80)j∈G(i)wj j∈G(i) ij j 2σ12 i corrupted data, because it is designed to seek one linear structure of known type (as opposed to multiple, poten- min||Mh|| subject to ||h||=1 (cid:80) ||xTh||2w tially nonlinear structures of unknown type). The use σ2 = i i i of orientation constraints, in addition to position con- (cid:80) w i i straints, makes EMTV superior to the random sampling methods as well. σ2 = (cid:80)i||hTK−i 1h||wi Outlier/inlier(OI)ratio Wewillusetheoutlier/inlier(OI) 1 (cid:80) w ratio to characterize the outlier level, which is related to i i (cid:80) (cid:80) ||K−1−S(cid:48) ||2w w the outlier percentage Z ∈[0,1] σ2 = i j∈G(i) i ij F i j 2 (cid:80) w R i i Z = (31) R+1 (30) where M = (cid:80) x xTw + σ2 (cid:80) K−1w and G(i) is a set where R is the OI ratio. Fig. 7 shows a plot of Z = RR+1 i i i i σ2 i i i indicating that it is much more difficult for a given 1 of neighbors of i. Eqn. (30) constitutes the set of update method to handle the same percentage increase in out- rules for the M-step. liers as the value of Z increases. Note the rapid increase In each iteration, after the update rules have been in the number of outliers as Z increases from 50% to executed, we normalize K−1 onto the feasible solution i 99%. That is, it is more difficult for a given method space by normalization, that is, the eigenvalues of the to tolerate an addition of, say 20% outliers, when Z corresponding eigensystem are within the range (0,1]. is increased from 70% to 90% than from 50% to 70%. Also,S(cid:48) willbeupdatedwiththenewlyestimatedK−1. ij i Thus the OI ratio gives more insight in studying an algorithm’s performance on severely corrupted data. 6.5 ImplementationandInitialization 7.1 Robustness Insummary,Eqns(12),(26)and(30)arealltheequations We generate a set of 2D synthetic data to evaluate the needed to implement EMTV and therefore the imple- performance on line fitting, by randomly sampling 44 mentation is straightforward. pointsfromalinewithintherange[−1,−1]×[1,1]where Noting that initialization is important to an EM al- thelocationsofthepointsarecontaminatedbyGaussian gorithm, to initialize EMTV, we set σ to be a very 1 noise of 0.1 standard deviation. Random outliers were large value, K = I and w = 1 for all i. S(cid:48) is i i ij added to the data with different OI ratios. initialized to be the inverse of S , computed using the ij The data set is then partitioned into two: closed-form solution presented in the previous section. • SET1: OI ratio ∈[0.1,1] with step size 0.1, These initialization values mean that at the beginning • SET2: OI ratio ∈[1,100] with step size 1. we have no preference for the surface orientation. So all In other words, the partition is done at 50% outliers. the input points are initially considered as inliers. With Note from the plot in Fig. 7 that the number of outliers such initialization, we execute the first and the second increases rapidly after 50% outliers. Sample data sets rules in Eqn. (30) in sequence. Note that when the first rule is being executed, the term involving h is ignored with different OI ratios are shown in the top of Fig. 7. because of the large σ , thus we can obtain K−1 for Outliers were added within a bounding circle of radius 1 i 2. In particular, the bottom of Fig. 7 shows the result of the second rule. After that, we can start executing the the first EMTV iteration upon initialization using CFTV. algorithm from the E-step. This initialization procedure The input scale, which is used in RANSAC, TV and is used in all the experiments in the following sections. EMTV, was estimated automatically by TSSE proposed Fig. 7 shows the result after the first EMTV iteration in [28]. Note in principle these scales are not the same, on an example; note in particular that even though because TSSE estimates the scales of residuals in the the initialization is at times not close to the solution normal space. Therefore, the scale estimated by TSSE our EMTV algorithm can still converge to the desired used in TV and EMTV are only approximations. As we ground-truth solution. will demonstrate below, even with such rough approx- imations, EMTV still performs very well showing that 7 EXPERIMENTAL RESULTS it is not sensitive to scale inaccuracy, a nice property of First, quantitative comparison will be studied to eval- tensor voting which will be shown in an experiment to uate EMTV with well-known algorithms: RANSAC [7], bedetailedshortly.NotethatASSC[28]doesnotrequire ASSC [28], and TV [19]. In addition, we also provide any input scale. the result using the least squares method as a baseline SET1–RefertotheleftofFig.8whichshowstheerror comparison. Second, we apply our method to real data producedbyvariousmethodstestedon SET1.Theerror SUBMITTEDTOIEEETRANSACTIONSONPATTERNANALYSISANDMACHINEINTELLIGENCE 10 Fig. 7. Thetop-leftsubfigureshowstheplotof R .Thefour2DdatasetsshownherehaveOIratios[1,20,45,80]respectively, R+1 whichcorrespondtooutlierpercentages[50%,95%,98%,99%].OurEMTVcantolerateOIratios≤51inthisexample.Theoriginal input,theestimatedlineafterthefirstEMTViterationusingCFTVtoinitializethealgorithm,andthelineparametersafterthefirst EMTViterationandfinalEMTVconvergencewereshown.Theground-truthparameteris[−0.71,0.71]. Fig.8. ErrorplotsforSET1(OIratio=[0.1,1],upto50%outliers)andSET2(OIratio=[1,100],≥50%outliers).Left:forSET1,all thetestedmethodsexcepttheleast-squaresdemonstratedreliableresults.EMTVisdeterministicandconvergesquickly,capable of correcting Gaussian noise inherent in the inliers and rejecting spurious outliers, and resulting in the almost-zero error curve. Right:forSET2,EMTVstillhasanalmost-zeroerrorcurveuptoanOIratioof51((cid:39)98.1%outliers).Weran100trialsinRANSAC andASSCandaveragedtheresults.ThemaximumandminimumerrorsofRANSACandASSCareshownbeloweacherrorplot. ismeasuredbytheanglebetweentheestimatedlineand running 100 trials. EMTV does not have such maximum the ground-truth. Except the least squares method, we and minimum error plots because it is deterministic. observe that all the tested methods (RANSAC, ASSC, Observe that the errors produced by our method TV and EMTV) performed very well with OI ratios are almost zero in SET 1. EMTV is deterministic and ≤ 1. For RANSAC and ASSC, all the detected inliers converges quickly, capable of correcting Gaussian noise were finally used in parameter estimation. Note that inherentintheinliersandrejectingspuriousoutliers,and the errors measured for RANSAC and ASSC were the resulting in the almost-zero error curve. RANSAC and average errors in 100 executions8, Fig. 8 also shows the ASSC have error <0.6◦ , which is still very acceptable. maximumandminimumerrorsofthetwomethodsafter SET 2 – Refer to the right of Fig. 8 which shows the result for SET 2, from which we can distinguish 8.Weexecutedthealgorithm100times.Ineachexecution,iterative the performance of the methods. TV breaks down at randomsamplingwasdonewherethedesiredprobabilityofchoosing atleastonesamplefreefromoutlierswassetto0.99(defaultvalue). OI ratios ≥ 20. After that, the performance of TV is