IJCV manuscript No. (will be inserted by the editor) A General Preprocessing Method for Improved Performance of Epipolar Geometry Estimation Algorithms Maria Kushnir · Ilan Shimshoni 5 1 Received:date/Accepted:date 0 2 n Abstract In this paper a deterministic preprocessing testing performed on almost 900 image pairs from six a algorithmispresented,whoseoutputcanbegivenasin- publicly available datasets. J put to most state-of-the-art epipolar geometry estima- Keywords Fundamental matrix · epipolar geometry 7 tion algorithms, improving their results considerably. 2 estimation · local features · SIFT They are now able to succeed on hard cases for which ] theyfailedbefore.Thealgorithmconsistsofthreesteps, V whose scope changes from local to global. In the local 1 Introduction C stepitextractsfromapairofimageslocalfeatures(e.g. s. SIFT). Similar features from each image are clustered Epipolar geometry estimation from image pairs with c and the clusters are matched yielding a large number partialsceneoverlapisabasicproblemincomputervi- [ of putative matches. In the second step pairs of spa- sion.Itisusedasacomponentofmanyimportantappli- 1 tiallyclosefeatures(called2keypoints)arematchedand cationssuchasvisionbasedrobotnavigation,structure v ranked by a classifier. The 2keypoint matches with the from motion (SfM) and other multiple view geometry 6 highestranksareselected.Intheglobalstep,fromeach applications. 1 7 two 2keypoint matches a fundamental matrix is com- This problem has attracted considerable interest in 6 puted. As quite a few of the matrices are generated the computer vision community, interest which contin- 0 from correct matches they are used to rank the puta- ues till this day. Most of the successful algorithms are . 1 tive matches found in the first step. For each match based on an initial step, in which local features are 0 the number of fundamental matrices, for which it ap- detected in both images. For each detected feature a 5 proximately satisfies the epipolar constraint, is calcu- local descriptor is computed. These features are then 1 : lated. This set of matches is combined with the puta- matched based on their local descriptors. For each pu- v tive matches generated by standard methods and their tative match a prior probability or score is estimated. i X probabilities to be correct are estimated by a classifier. These putative matches and scores are given as in- r Thesearethengivenasinputtostate-of-the-artepipo- put to the algorithm (Chum et al, 2003; Chum and a lar geometry estimation algorithms such as BEEM, Matas, 2005; Brahmachari and Sarkar, 2013b; Goshen BLOGS and USAC yielding much better results than and Shimshoni, 2008; Raguram et al, 2013a; Tordoff the original algorithms. This was shown in extensive and Murray, 2002). Even though successful algorithms have been proposed to address this problem, it still remains an active field of research. This is because MariaKushnir thereareseveralreasonswhyinputgiventothesealgo- Department of Information Systems, University of Haifa, rithmsmaybechallenging.Aspointedoutforexample 31905Haifa,Israel. by Lowe (2004) and tested extensively by Mikolajczyk E-mail:[email protected] etal(2005),astheanglebetweentheviewingdirections IlanShimshoni increases, the appearance of local descriptors changes, Department of Information Systems, University of Haifa, 31905Haifa,Israel. making them hard to match. Thus, wide baseline im- E-mail:[email protected] ages are hard inputs for the algorithms. Urban scenes 2 MariaKushnir,IlanShimshoni are also challenging for such algorithms. In such scenes match from the first step we calculate the number of featuressuchasforexamplewindowsarerepeatedsev- fundamental matrices it supports. Finally we combine eral times. In such cases it is hard for the local match- putative matches generated by standard methods with ingalgorithmtomatchthewindowinthefirstimageto those found by our method, and estimate their proba- its corresponding window in the second image. In both bilities to be correct, using a simple classifier. these types of cases the percentage of correct matches The correlation between these probabilities and the (inliers) from the set of putative matches is low. When ground truth inlier-outlier labels is much higher. As a the probabilities are taken into account, the problem is result, when we submit the putative matches and the that the percentage of correct matches with high prior computedprobabilitiesasinputtoalgorithmsfromthe probabilitiesislow.Inthesecases,evenstate-of-the-art guided RANSAC family, much better results are ob- algorithms tend to fail. tainedonchallengingdatasets.Forexample,theperfor- For that reason, in this paper, instead of trying to mance of all three algorithms when run on the Open2 proposeanewepipolargeometryestimationalgorithm, dataset (Goldman et al, 2014) increased considerably. we present a preprocessing step which is given as in- Thenumberofimagepairstheysucceededonincreased put two images and returns a set of putative matches by between 62% and 239% relative to the original per- withtheirassociatedprobabilities.Ourmethodwasex- formance of those algorithms. This demonstrates the tensively tested on almost 900 image pairs from dif- fact that our algorithm improves the quality of the in- ferent datasets: ZuBuD dataset (Shao et al, 2003), put significantly, resulting in better results of the basic BLOGS dataset (Brahmachari and Sarkar, 2013a), algorithm.Similarimprovementwasobtainedwhenthe USAC dataset (Raguram et al, 2013b) and Open1, algorithm was run as a preprocessing step of USAC on Open2 and Urban datasets (Goldman et al, 2014). Our the Urban dataset (Goldman et al, 2014). resultsaremuchbetterthanthoseobtainedbythestan- The paper continues as follows. In Section 2, we dard initial steps of state-of-the-art algorithms. Con- review related work concentrating mainly on how the sequently, when our output is given as input to them qualityoftheinputaffectstheperformanceofthealgo- (BEEM(GoshenandShimshoni,2008),BLOGS(Brah- rithm. Section 3 presents the overview of our method, machariandSarkar,2013b)andUSAC(Ragurametal, while the details are given in the next section. Experi- 2013a) in this paper) they outperform the same algo- mental results are presented in Section 5. We conclude rithms operating on their regular input. Our output is in Section 6. general and can be incorporated within many other al- gorithms such as Chum et al (2003); Chum and Matas 2 Related work (2005); Tordoff and Murray (2002). The algorithm starts with standard techniques of In reviewing related work we will concentrate on how detecting local features and extracting putative corre- the quality of the input effects the algorithm’s perfor- spondences from them. Using this input we propose a mance and not on the various components of the algo- new concept consisting of three steps, running from lo- rithms. cal to global. In the local step features are clustered We will first consider PROSAC (Chum and Matas, together in each image. Clusters with similar features 2005) and USAC (Raguram et al, 2013a). The algo- from the first image are matched to clusters of features rithm is given as input a set of putative matches or- fromthesecondimageandviceversa.Theresultofthis deredbyascoreorpriorprobability.Underthisgeneral step is a large set of putative matches, most of which framework the set of putative matches can be ordered areincorrect.Inthesecondstepwematchpairsofspa- for example using the distance ratio d method intro- r tially close features (2keypoints) in the first image, to duced by Lowe (2004). The models are generated in an corresponding pairs of features (found in the first step) order consistent with the order of the matches used to from the second image. For each 2keypoint match a generate them. Once a model is generated it is verified short descriptor is generated, characterizing the qual- using the Statistical Probability Ratio Test (SPRT). ity of the match. Using a classifier we trained on data TheputativematchesaretesteduntiltheSPRTreaches from several image pairs, for each 2keypoint match the a decision on whether the model is correct or not. probability of being correct is estimated. The highest Thus,whenthebeginningofthelist(matcheswithhigh K 2keypoint matches are chosen. Here already the scores) is contaminated by a large number of outliers, 2kp percentageofcorrect2keypointmatchesismuchhigher the number of required iterations increases consider- then was recovered in the first step. In the global step, ably. When the number of iterations of the algorithm the2keypointmatchesareusedtogeneratealargenum- is limited, this also increases the probability of failure. berofpossiblefundamentalmatrices.Foreachputative On the other hand, a list consisting of a large number AGeneralPreprocessingMethodforImprovedPerformance of EpipolarGeometryEstimation Algorithms 3 of matches does not effect the running time, since the each match is accompanied by a prior probability (or SPRT process usually reaches a decision quite early in score) that the match is correct. The higher the qual- the verification procedure. ity of this set, the more probable that algorithms from In algorithms from the Guided RANSAC fam- the guided RANSAC family (Chum and Matas, 2005; ily (Tordoff and Murray, 2002; Goshen and Shimshoni, Ragurametal,2013a;BrahmachariandSarkar,2013b; 2008)thesubsetofmatchesusedformodelgenerationis Goshen and Shimshoni, 2008) will succeed to estimate chosen according to their probability. Thus, the perfor- the epipolar geometry. In this section we will present mance of the algorithm is similar to that of PROSAC. an overview of the algorithm. The details will be given When the list contains a large number of outliers with in the next section. high probabilities, the chances of the algorithm to fail Thealgorithm,giveninpseudo-codeinAlgorithm1, are high. is described as follows: A similar behavior occurs in BLOGS (Brahmachari and Sarkar, 2013b). There also, in the global search Algorithm 1AGeneralPreprocessingMethodforIm- step, a model is computed from a minimal subset of proved Performance of Epipolar Geometry Estimation matches according to a score. Thus, the probability of 1: Input: imagesI andI findingamodelconsistingonlyofinliersdependsonthe 1 2 2: ExtractSIFT features fromI andI 1 2 qualityofthepriorscores.SpecificallyinBLOGS,anew 3: Findstandardputativecorrespondences{X }andasso- L methodforputativematchrankingwasintroduced,and ciatetothem distanceratios{dr} 4: Findstandardputativecorrespondences{X }andasso- their scores are referred to as similarity weights {t }. B k ciatetothem similarityweights {t } k Thus,forallthesealgorithms,ifwecanassignmore 5: ClusterSIFTfeaturesfromeachimagebasedondescrip- accurateprobabilitiestotheputativematches,thealgo- torsimilarityyieldingclustersoffeatures rithms performance should improve considerably. This 6: Estimate relativerollangle αexp 7: forα∈[α ,0◦] do is exactly the goal of the algorithm we suggest here. exp 8: Match clusters from the two images, yielding cluster We would also like to review two other algorithms pairs whichaddresstheproblemofmatchingimagescontain- 9: Generate putative correspondences {X} from the ing scenes with repeated structures. In that case the membersofthe matchedclusters 10: Generateall2keypoints:apairoffeaturesfromamain initialstageofthealgorithmsmentionedabovewillfail feature point and another feature point which is close to match a feature belonging to repeated structures to toitin theimage its correct match in the second image, since it will not 11: Match 2keypoints from the first image to the 2key- be able to choose the correct candidate. Thus, this fea- pointsfromthesecond image 12: Use a classifier to assign probabilities to 2keypoint turewillbediscarded.InGeneralizedRANSAC(Zhang matches and Kosecka, 2006), all possible matches of the feature 13: Selectthetop K of2keypointmatches 2kp tosimilar(normalizedcrosscorrelationaboveacertain 14: Estimate a candidate fundamental matrix from each threshold) features in the second image are generated two matched 2keypoints, yielding K2kp(K2kp −1)/2 matrices but are given low probabilities. On the list of puta- 15: For each putative match from {X} count how many tivematchesguidedRANSACisrun.Thus,inthecase candidatefundamentalmatrices(sfm)supportit when there are not enough non-repeating inliers in the 16: Assigneachputativematchfrom{X}(cid:83){X }(cid:83){X } L B listwithhighprobabilities,thealgorithmmightfail.In aprobabilitythatitiscorrect 17: Usetheseputativematchesandtheirassociatedprob- our previous work (Kushnir and Shimshoni, 2014), a abilities as input to one of the algorithms from the special algorithm was developed to deal with buildings guidedRANSACfamilytoyieldafundamentalmatrix withrepeatedfeatures.Therealso,allpossiblematches anditssupport ofthefeaturetosimilarfeaturesinthesecondimageare 18: endfor 19: Choosefromthetwofundamentalmatrices,theonewith generated. The algorithm assumes that in both images maximalsupport a planar facade is visible. The algorithm tends to fail 20: return Thefundamentalmatrixandthelistofitsinliers when this assumption is not satisfied. In this work we proposeamethodwhichcansuccessfullydealwithgen- eral scenes, including the case of repeated structures. The algorithm is given as input two images I and 1 I . The first step of the algorithm (described in Sec- 2 tion 4.1) is to detect features (SIFT in our case) in 3 Algorithm outline each image. Those features are used to generate three groups of putative correspondences. The goal of the algorithm is to generate a set of pu- FollowingthestandardmethodintroducedbyLowe tative feature matches between the two images, where (2004), we find putative correspondences {X } and as- L 4 MariaKushnir,IlanShimshoni sociatetothemdistanceratios{d }.Thedistanceratio r is used to assign a prior probability for the correctness of the match (described in Section 4.1). Following the scheme introduced by Brahmachari and Sarkar (2013b), we find putative correspondences {X } and associate to them similarity weights {t }. B k To resolve problems which occur in image pairs which are hard to match, such as scenes which in- clude repeating elements, we cluster detected features (described in Section 4.2). Thus, repeated features or features with very similar descriptors are clustered to- gether. Non-repeating features will belong to clusters of size one. Then each cluster from the first image is matched to the most similar cluster in the second im- age and vice versa. The result of this step is a large number of putative correspondences {X} (described in Section 4.3). A vast majority of them however, are in- correct. An example of how clustering similar features can help in the case of a scene which includes repeating elements, is shown in Figure 1. For the feature point markedbyaRedcircleintheupperimageLowe(2004) Fig. 1 An example of why using the standard putative finds no match and Brahmachari and Sarkar (2013b) matchesasis,issometimesinsufficient,whileclusteringmight find the feature point marked by a Red circle in the help.ImagesGEO00038andGEO00029weretakenfromthe bottom image, which is incorrect. When using cluster- Urban dataset. The feature point marked by a Red circle in ing,aswesuggest,thefeaturepointintheupperimage theupperimageisnotmatchedatallbyLoweormistakenly matchedbyBLOGStoafeaturepointmarkedbyaRedcir- belongstoaclusterofsizeone,whileinthebottomim- cle in the bottom image. When using clustering, for feature age two feature points are clustered together, marked point marked by the Red circle in the upper image we gen- by Green points. Only in the case of clustering, a cor- erate two putative matches marked by Green points in the rect match is generated, namely the correct match is a bottom image,oneofwhichiscorrect. member of {X}, but not of {X } or {X }. L B Inaddition,inordertoovercometheproblemoffea- are also close to each other and belong to matching tures looking similar to rotated features (such as win- clusters. An illustration of a 2keypoint match is shown dow corners), for the clustering step only, the orienta- in Figure 2. tionofalltheSIFTfeaturesintheimageisfixedinone The decision to work with 2keypoints is a compro- specific direction. Thus, for example different corners misebetweentwocontradictorypreferences:ontheone of a window will not be clustered together. In order to hand any combination of features contains more infor- determine this direction, we propose to find a rough mation than a single keypoint, which can be used to relative roll angle α, between the two images, from the detectinliersmoreaccurately.Ingeneral,thelargerthe differencesofSIFTorientationsin{XL}and{XB}and number of features in the combination, the higher the denote it αexp. Using this approximation and the fact, probability that the matched combination is correct. that many images are taken with zero roll angle as a Ontheotherhand,sincetheprobabilityforfeaturede- prior, all the following steps of our algorithm are re- tectionislow,theprobabilityfordetectingalargenum- peated twice, once for α=αexp and again for α=0◦. ber of features in a combination is even lower. Thus, Inordertoovercometheproblemthatthemajority relying on the minimal subset of features is preferable of the matches in {X} are incorrect we estimate their due to the difficulties in detecting large combinations probabilities to be inliers. This is done in two steps: of features. In the first step, which is described in Section 4.4, From the set of 2keypoint matches we would like to local information is used. We create a pair of features choose a subset, which have a high prior probability to from a main feature point and another feature point becorrectlymatched.Inordertoaccomplishthis,each which is close to it in the image. This pair of features 2keypoint match is characterized by a short descriptor. is called a 2keypoint. These two features are matched The descriptor consists of measures of geometric sim- to corresponding features in the second image which ilarity between the two 2keypoints and a count of the AGeneralPreprocessingMethodforImprovedPerformance of EpipolarGeometryEstimation Algorithms 5 (a) (b) (c) (d) Fig. 2 An example of the advantage in using 2keypoints. Images (a) FLH00010 and (b) FLH00016 taken from the Open2 dataset. (c) Zoom-in of (a). (d) Zoom-in of (b). A correct 2keypoint match generated by our method and ranked in the fifth place.Whensingle keypointmatchesareusedbothBlueandRedmatchesarerankedmuchlower. number of possible matches between each 2keypoint in fundamentalmatrices(sfm)supportit.Thisnumberis one of the images to 2keypoints in the other image. As astrongindicationoftheprobabilitythatthisputative theinterdependenciesbetweenthesecharacteristicsare match is inlier. complex, a classifier is trained to learn the probability Finallywecombinethethreegroupsofputativecor- that the 2keypoint is correctly matched. At test time respondences{X},{X }and{X },inordertoachieve L B each 2keypoint match is assigned a probability and the a set of putative feature matches between the two im- top K (100 in our application) 2keypoint matches ages,whereeachmatchisaccompaniedbyapriorprob- 2kp are selected. ability that the match is correct (described in Sec- An example of how the generation and ranking of tion 4.6). For that purpose we construct a keypoint 2keypointmatchescanhelpindealingwithlowranking match descriptor, denoted kpmd, and train a classifier matches, is shown in Figure 2. In that example if only on it. The descriptor consists of the local measures of single keypoints are used, standard techniques (Lowe, similarity, namely {dr} and {tk} and the global mea- 2004) and (Brahmachari and Sarkar, 2013b) would or- sure {sfm}. At test time each putative feature match der the Blue match in the 55th and 16th places respec- is assigned a probability. tively, whereas the Red match would be placed in a As was already mentioned, all the previous steps of 161th place or would not be generated at all. On the our algorithm are repeated twice, once for α = αexp other hand, when 2keypoint matches are used, the cor- andagainforα=0◦.Inordertoproceedwerunoneof rect 2keypoint is ranked in the fifth place. thealgorithmsfromtheguidedRANSACfamilytwice, once for each set of putative matches, and choose the In the second step, described in Section 4.5, global one with maximal support. information is used. Up until now the analysis we per- The output of the entire process is a fundamental formed has been local in nature. We first matched sin- matrix along with its inlier set. gle features and then pairs of close features. In order to be able to assign more accurate probabilities to the matches, the epipolar geometry constraint, which is 4 Algorithm details global in nature, comes into play. In order to generate rough estimations of the fundamental matrix we bor- We will now delve into the details of the various com- row an idea from the BEEM algorithm (Goshen and ponents of the algorithm. Shimshoni, 2008), where it is estimated from only two matches.Inourcasefromeachtwomatched2keypoints a candidate fundamental matrix is estimated, yielding 4.1 Extraction of putative matches with their distance K2kp(K2kp−1)/2 matrices. As a result of the ranking ratios and similarity weights of the 2keypoints, quite a few of them are generated from inlier matches. The problem is that even in this The algorithm is given as input two images. As a case they are quite inaccurate. Each of them is sup- first step we apply feature detection on both im- ported (i.e., the Sampson distance computed from the ages. In general, any feature detector which returns matrix and the match is below a certain threshold) by the location, scale, and orientation can be used (e.g.: a subset of the inliers and quite a few outliers. Instead MSER (Matas et al, 2002), BRISK (Leutenegger et al, of returning the matrix with the largest support, we 2011), ORB (Rublee et al, 2011), SURF (Bay et al, exploit these matrices in a different way. For each pu- 2006), SIFT (Lowe, 2004)). In our case we use the im- tative match from {X} we count how many candidate plementationofSIFTbyVedaldiandFulkerson(2008). 6 MariaKushnir,IlanShimshoni FollowingthestandardmethodintroducedbyLowe 4.2 Feature clustering (2004), we find putative correspondences {X } based L on descriptor similarity. The best candidate match for Using the former putative matches as is, is sometimes each keypoint in the first image, is found by identify- insufficient due to the following two problems which ingitsnearestneighborinthesecondimage.Thenear- occur in challenging image pairs. In scenes which in- est neighbor is defined as the keypoint with maximal cluderepeatingelements(suchasforexamplewindows normalized cross-correlation from the given descriptor of buildings), the matching process is unable to match vector. The probability that a match is correct can be therepeatedfeaturescorrectly.Inimagepairswithwide determined by taking a distance ratio baselines, the descriptors of the matching features are quite dissimilar and will receive quite low matching cos−1(m ) scores. We will now deal with the first problem. The dr = cos−1(mk), second problem will be addressed in Section 4.4. k2 Ineachimage,thefeaturesrecoveredfromitinSec- tion 4.1, are clustered based on descriptor similarity. where m is the similarity to the closest neighbor and k In our algorithm we use agglomerative clustering. The m is the second highest similarity in the second im- k2 merging of clusters stops when the similarity measure age. All matches for which the distance ratio is greater between the closest clusters is below a certain thresh- than a certain threshold (in our case 0.9) are rejected. old (normalized cross-correlation below 0.85). The re- Thischoiceofthresholddistanceratioisrelativelyhigh sult of this process is a set of clusters of features. Non- (therearemanyworkswhere0.85oreven0.8areused) repeatingfeaturesyieldclustersofsizeone.Eachcluster andmanymorematchesarekept.Thisisdonesincewe isrepresentedbythemediandescriptorofitsmembers. rely on the next steps of our method to deal with them correctly (See for example Figure 2.). In addition we follow the scheme introduced byBrahmachariandSarkar(2013b),whichisadifferent waytodefineputativecorrespondencesandweights.We find putative correspondences {X } that exhibit the B highest similarity measure in both images. This means thatputativecorrespondencex =(u ,v )isamember k k k of{X }ifforakeypointinthefirstimageu itsnearest B k neighborinthesecondimageisv ,andforthekeypoint k in the second image v its nearest neighbor in the first k image is u . With each such putative match pair, they k associate a confidence measure which is referred to as the similarity weight. They define the similarity weight t for the correspondence x as k k (cid:18) (cid:19)(cid:18) (cid:19) tk =(cid:0)1−exp−mk(cid:1)2 1− mmkk1 1− mmkk2 , Ffixiged. 3oriAenntaextiaomn.plIemoafgkeeoybpjoeicntt0c0l0u8s.tveireiwng04wwitahsatnadkewnitfhrooumt theZuBuDdataset.Blackcircles:clusteringwithfixedSIFT orientation. Red points: clustering without fixed orientation. where m is the second highest similarity in the first k1 Inthelattercasethreedifferenttypesofcornersareclustered image and mk2 is the second highest similarity in the together. secondimageasmentionedabove.Whilethethirdterm (cid:16) (cid:17) of the tk, i.e. 1− mmkk2 can be interpreted as another In order to overcome the problem, which is com- version of d , the two other terms are new. The second r mon in buildings, of features looking similar to rotated (cid:16) (cid:17) term, i.e. 1− mk1 is a symmetric complimentary of features (such as window corners), for the clustering mk the third one, and it emphasizes that there should be step only, the orientation of all the SIFT features in no difference between the treatment of first and second the image is fixed in one specific direction. Thus, for image.Thefirstterm,i.e.(1−exp−mk)2 isasimilarity example different corners of a window will not be clus- based component. While Lowe in his work did not use teredtogether.Anexampleofkeypointsclusteringwith the absolute distance/similarity as a measure of simi- and without fixing the orientation of all the SIFT fea- larity, in BLOGS the contribution of the absolute sim- tures is presented in Figure 3. The Red points show a ilarity exists. clusterwithoutfixedorientation.Inthatcasethreedif- AGeneralPreprocessingMethodforImprovedPerformance of EpipolarGeometryEstimation Algorithms 7 (a) (b) (c) (d) (e) (f) Fig. 4 Anexampleofa relativerollangle αestimation.Images(a)corridor1and (b)corridor2weretakenfromtheBLOGS dataset. The images in (a) and (b) are taken with relative roll of α = 78◦. (c) The kernel density estimation of the angle difference for (a) and (b) with the maximal peak at α = 78◦. Images (d) IMG0047 and (e) IMG0106 were taken from exp the Open1 dataset. The images in (d) and (e) are taken with zero relative roll. (f) The kernel density estimation of the angle differencefor(d) and (e)withthemaximalpeakatα =−3◦. exp ferent orientations of the window corner are clustered tiveandnotaffine,themaximalpeakα ofthisfunc- exp together. The Black circles are features clustered to- tion can be used as a rough approximation for α. gether, when fixing the SIFT orientation. All of them TwoexamplesofαextractionareshowninFigure4. are upper left corners of a window. In general, cluster- The red arrows were added to indicate the upright di- ing without fixed orientation of all the SIFT features rection. In the first row the image pair, taken with a in the image, leads not only to larger clusters which relativerollofα=78◦,alongwithitskerneldensityes- can be handled by our method, but to systematic er- timation of the angle difference is presented. The max- rors when matching features from those clusters. This imalpeakofthiskerneldensityispreciselyα =78◦. exp will be further explained later on. In the second row an image pair, taken with zero rela- tive roll is shown. The maximal peak of its kernel den- sity estimation is located at α =−3◦ which is quite exp Theapproach,ofdefiningonespecificorientationfor a good approximation for α. alltheSIFTfeaturesintheimage,hasbeenextensively Using this approximation and the fact that many usedintheliteratureintheframeofuprightSIFT,and imagesaretakenwithzerorollangleasaprior,wepro- in this work we generalize this idea to any orientation. ceed as follows. When running the algorithm, the first It is true, that there are many applications such as vi- image is processed one time using an upright SIFT, sionbasedrobotnavigationandstructurefrommotion, while the second image is processed in two orientations where all the images are taken with a zero roll angle, [α ,0◦]. Therefore all the following steps of our algo- exp which justifies the upright SIFT assumption in all the rithm, described in Sections 4.3-4.6 are repeated twice, images. However, since we do not limit our approach once for each orientation. to any specific application, we propose to find a rough approximation of the relative roll angle α, between the two images, from the existing data. For that purpose 4.3 Generation of all keypoint matches wecalculatethedifferenceofSIFTorientationsineach putativematchfoundinSection4.1,andbuildakernel After both of the images have been processed as de- density estimation of those angle differences. Although scribedabove,thenextstepistogeneratepairsofpos- the transformation between the two images is perspec- sible matches between features from the two images. 8 MariaKushnir,IlanShimshoni The main problem we have to overcome is when a real fixed orientation of all the SIFT features in the image. cluster is segmented into several clusters. We try to As was already shown, when the orientation of all the deal with this problem as follows. Each cluster from SIFT features is not kept fixed, features looking sim- the first image is matched to the closest cluster from ilar to rotated features (such as window corners) will thesecondimageusingthenormalizedcross-correlation be clustered together. In that case, when generating between the cluster representatives. This process is re- putative feature matches, there will be matches of the peatedwhentherolesoftheimagesareswitched.Thus, same feature (left upper corner in both images for ex- if a real cluster was over-segmented in one image but ample) and there also will be matches of the rotated not in the other we can still match the clusters from features (left upper corner in the first image matched the two images correctly. If there is over-segmentation to right lower corner in the second image for example). in both images, not all possible matches will be found. Inthenextstepsofthealgorithm,thesematchesofthe Due to this problem we can not use the distance rotated features will all vote together supporting each ratio method suggested by Lowe (2004). Because if the other and will lead to systematic errors. We therefore distances to the closest cluster and the second closest chose to keep the orientation of all the SIFT features clusteraresimilar,wecannotdistinguishbetweenover- fixed during the clustering in Section 4.2, to prevent segmentationandwhenbothclustersinthesecondim- such failures. age equally far from the cluster in the first image and shouldnotbematchedatall.Thereforetheclosestclus- 4.4 Generation and ranking of 2keypoint matches ter is always chosen. Mostofthefeaturepairsgeneratedinthepreviousstage are incorrect and therefore in order to be able to use them for epipolar geometry estimation prior probabil- ities have to be assigned to them. This will be done in two steps: a step which uses only local information which will be described here and a step which uses global information described in the next section. Recall that each SIFT feature p has besides a de- scriptoralsoascales(p)andanorientationangleα(p). (a) (b) These values will be used in our analysis to make it scale and orientation invariant. For each feature point Fig.5 Anexampleofkeypointclusteringandmatching.Im- ages (a) object0076.view02 and (b) object0076.view04 were p we add a neighboring feature n. The distance be- taken from the ZuBuD dataset. The large Red cluster in (b) tween the features in terms of the scale of p is denoted issegmentedintotwosmaller(GreenandYellow)clustersin d=|p−n|/s(p).Theanglebetweenthevectorconnect- (a).Inthisexampleduetoclustermatchinginbothdirections ing p to n with respect to α(p) is denoted θ. This pair allthecorrect matcheswerefound. of features will be termed a 2keypoint. A 2keypoint {p ,n } in the first image is matched to a 2keypoint 1 1 An example of the result of the clustering process {p ,n } in the second image. Naturally, p and p , and 2 2 1 2 is shown in Figure 5. The large Red cluster in the sec- n andn ,havetobeputativematches.Thissetoffour 1 2 ond image is segmented into two smaller (Green and features is illustrated in Figure 6. Yellow)clustersinthefirstimage.TheRedclusterwas Wesuggestthreemethodstochoosetheneighboring matched to the Yellow cluster when clusters from the pairs{n ,n }closeto{p ,p }.Thefirstmethodsimply 1 2 1 2 secondimagearematchedtoclustersinthefirstimage. takes n , i = 1,2 from the K closest features around i 1 Because the matching is also done from clusters in the p . The second method chooses n from all the features i i first image to the closest cluster in the second image, withinacertaindistanceK s(p )inpixelsfromp .This 2 i i the Green cluster is also matched to the Red cluster. parameter is given in units of scale in order to be scale In this example all of the correct matches were found invariant.Finally,thethirdmethodchoosesn fromthe i together with many incorrect matches. K closestfeatureswhichbelongtothesameclusteras 3 Eachpairofclusterswhichismatchedyieldsasetof p . Experimentally we found that optimal values are i putative feature matches from the members of the two achieved for K =5,K =5, and K =1. 1 2 3 clusters. The result of this step is a large number of Inordertoestimatetheprobabilitythata2keypoint possiblematchesmostofwhichareobviouslyincorrect. match consists only of inliers, we have to take into ac- We will now refer to the systematic errors men- countquiteafewfactors.Duetotheinterdependencies tioned above when explaining the clustering without a betweenthesefactorsandtheireffectsontheestimated AGeneralPreprocessingMethodforImprovedPerformance of EpipolarGeometryEstimation Algorithms 9 (a) (b) (c) (d) Fig. 6 An example of 2keypoints and the parameters used for their matching. Images (a) object0041.view01 and (b) ob- ject0041.view04weretakenfromtheZuBuDdataset.(c)Zoom-inof(a).(d)Zoom-inof(b).GreenandMagentapointsshow the2keypointsgeneratedby thefirstmethod. BlueandRedpointsshowa2keypoint generatedbythe thirdmethod. probability,weconstructa2keypointmatchdescriptor, denoted 2kpmd, and train a classifier on it. The descriptor consists of the following fields: 2kpmd=[N ;N ;dist ;angle ;cluster ;min ]. 1 2 r d t d The definitions of the fields are as follows: N 1 and N are the number of 2keypoint matches that 2 the 2keypoints {p ,n } and {p ,n } belong to respec- 1 1 2 2 tively. The smaller the values, the higher the prob- ability that the 2keypoint match consists of inliers. dist = min(d /d ,d /d ) is the ratio of the distance r 1 2 2 1 between p and n in the first image in terms of s(p ) 1 1 1 to the distance between p and n in the second image 2 2 in terms of s(p ). This measure is scale invariant and 2 its value should be close to one. The value angle = Fig.7 2keypointmatchclassifierperformancesonthetrain- d ingset.Solidcurvesrepresentresultsoftheclassifier.Dashed ang (θ ,θ )measuresthedifferencebetweenthean- diff 1 2 curvesshowresultswithouttheclassifier(theinlierspercent- gles associated with the two 2keypoints and should be ages from all the 2keypoint matches). When using the clas- close to zero. The field cluster is equal one if p and sifier, there are many more inliers in the top K = 100 t i 2kp n belong to the same cluster and zero otherwise. Fi- 2keypoint matches. i nally, the distance between the point and its neighbor also affects the probability that the 2keypoint match is correct. The further the points are, the lower the prob- 91.6% of the 2keypoint matches, but also gives a max- ability is. We therefore define min =min(d ,d ). d 1 2 imal precision (the proportion of positive results that In order to train the classifier we chose six im- aretruepositive)of73.8%,whichisthemostimportant age pairs from the ZuBuD dataset. From them four parameter as will be now explained. were cases that state-of-the-art algorithms were able to match, while the other two were more challenging. The 2keypoint matches are then sorted by proba- For these image pairs we manually found the ground bility and the highest K (K = 100 in our imple- 2kp 2kp truth matches and trained on the data a C4.5 decision mentation)arechosen.Thus,whatismostimportantis tree classifier (Quinlan, 1993). This classifier returns thatfromthetopK afairamountofthemshouldbe 2kp for each descriptor the probability that the 2keypoint inliers (precision). This is evident from the results on match consists of inliers. The training set consists of thetrainingsetshowninFigure7.Thecumulativepre- 31352 2keypoint matches of which 4102 are inliers and cision of the classifier is shown as a function (on a log the rest are outliers. The quality of the classifier was scale) of the number of 2keypoint matches ordered by estimated using a 10-fold cross-validation procedure. theprobabilityreturnedbytheclassifier.Dashedcurves Whenchoosingaclassifier,wetriedseveraloptionssuch represent the inlier percentages from all the 2keypoint asrandomforest(Breiman,2001),SVM(Vapnik,1995) matches,whichwouldbecorrectifnoclassifierexisted. and others. The C4.5 decision tree classifier was se- Considerforexamplethehardestcaseoftheimagepair lected, as the one which not only classifies correctly (obj066view2,obj066view5). From the top ranked 100 10 MariaKushnir,IlanShimshoni 2keypoint matches, 41% were inliers, while their per- centagefromallthe2keypointmatcheswasonly3.84%. 4.5 Global ranking of matches Eventhoughwecouldusethe2keypointmatchesfound in the previous step as the input for epipolar geometry estimation,betterresultscanbeobtainedbyexploiting global information. In BEEM (Goshen and Shimshoni, 2008) a method was proposed to generate a rough es- timate of the fundamental matrix using only two pairs of matches instead of 7 or 8. This is done by using the similarity transformation between the regions around thecorrespondingfeatures,togeneratethreeadditional matchesforeach“real”match.Theresultingestimated Fig. 8 An example of most supported inliers. Image ob- fundamentalmatrixisquiteinaccuratebutcanbeused ject0092.view02wastakenfromtheZuBuDdataset.Matches as a basis for local optimization (Chum et al, 2003), supported by more than 600 sfms are plotted in Blue, yielding good results. In our case we use two 2keypoint matches with more than 200 sfms in Green, and the rest inRed. matches (four matched points) as the input for esti- mating the fundamental matrix. One 2keypoint match could not be used since all the points from each image are too close to each other to generate a meaningful result. ForeachoftheK (K −1)/2pairsof2keypoint 2kp 2kp matches a fundamental matrix F is generated. All the putative matches generated in Section 4.3 are checked toseewhethertheysupportF ornot.Insteadoftaking thefundamentalmatrixwiththelargestsupportasthe resultofouralgorithm,wesuggesthereamethodtoex- ploit all the generated fundamental matrices. Since we assume that many of them were generated from inlier 2keypoint matches they are therefore rough estimates of the required solution. Thus, we measure the support of the putative matches. The larger the number of fun- damentalmatriceswhichsupportthematch(sfm),the Fig. 9 2keypoint match classifier performance vs. higher the probability that the match is correct. global ranking of matches performance. Red: images An example of the spatial distribution of the in- object0076.view02 and object0076.view04 were taken from lier matches is shown in Figure 8. Since the fundamen- the ZuBuD dataset, shown in Figure 5. Green: images FLH00010 and FLH00016 were taken from the Open2 talmatricesgeneratedfrominliersarequiteinaccurate, dataset, shown in Figure 2. Blue: images GEO00029 and only a small number of matches which lie close to each GEO00038 were taken from the Urban dataset, shown in other, are supported by a large number of fundamen- Figure 1. Solid curves show the results of exploiting global tal matrices (marked in Blue). The other matches with information, while the dashed curves show the results found in Section 4.4. Better results are obtained by exploiting lower support are distributed around this group in an globalinformation. irregular manner. InFigure9wecompareresultsobtainedbyexploit- ing global information, to the ones obtained using only sfms. The dashed curves show the results found in the 2keypoint matches found in Section 4.4. For that, Section 4.4, namely the cumulative precision as a func- we present the cumulative precision (inliers fraction) tion of the number of keypoint matches ordered by the as a function of the number of putative matches, or- probability returned by the 2keypoint match classifier. deredbasedontheir associatedprobabilities.The solid Each color represents a different image pair. As can curves show the results obtained by exploiting global be seen in the graph, better results are obtained by information, namely cumulative precisions as a func- exploiting global information, in addition to using the tion of number of keypoint matches ordered by their 2keypointmatchrankingcomputedinSection4.4.Con-