Bundle Ajustment – A Modern Synthesis Bill Triggs, Philip Mclauchlan, Richard Hartley, Andrew Fitzgibbon To cite this version: Bill Triggs, Philip Mclauchlan, Richard Hartley, Andrew Fitzgibbon. Bundle Ajustment – A Modern Synthesis. Bill Triggs and Andrew Zisserman and Richard Szeliski. Vision Algorithms: Theory and Practice, 1883, Springer-Verlag, pp.298–372, 2000, Lecture Notes in Computer Science (LNCS), 10.1007/3-540-44480-7_21. inria-00590128 HAL Id: inria-00590128 https://hal.inria.fr/inria-00590128 Submitted on 3 May 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Bundle Adjustment — A Modern Synthesis BillTriggs1,PhilipMcLauchlan2,RichardHartley3 andAndrewFitzgibbon4 1 INRIA Rhoˆne-Alpes,655avenuedel’Europe,38330Montbonnot,France. [email protected] http://www.inrialpes.fr/movi/people/Triggs (cid:5) 2 SchoolofElectricalEngineering,InformationTechnology& Mathematics UniversityofSurrey,Guildford,GU25XH, U.K. [email protected] http://www.ee.surrey.ac.uk/Personal/P.McLauchlan (cid:5) 3 GeneralElectricCRD,Schenectady,NY,12301 [email protected] 4 DeptofEngineeringScience,UniversityofOxford,19ParksRoad,OX1 3PJ,U.K. [email protected] http://www.robots.ox.ac.uk/awf (cid:5) Abstract This paper is a survey of the theory and methods of photogrammetric bundle adjustment, aimed at potential implementors in the computer vision community. Bundle adjustment is the problemofrefiningavisualreconstructiontoproducejointlyoptimalstructureandviewingpa- rameterestimates.Topicscoveredinclude:thechoiceofcostfunctionandrobustness;numerical optimization including sparse Newton methods, linearly convergent approximations, updating andrecursivemethods;gauge(datum)invariance;andqualitycontrol. Thetheoryisdeveloped for general robust cost functions rather than restricting attention to traditional nonlinear least squares. Keywords: BundleAdjustment,SceneReconstruction,GaugeFreedom,SparseMatrices,Opti- mization. 1 Introduction Thispaperisasurvey ofthetheoryandmethodsofbundleadjustmentaimedatthecomputervision community, and more especially at potential implementors who already know a little about bundle methods. Most of the results appeared long ago in the photogrammetryand geodesy literatures, but many seemto be littleknown in vision,wherethey aregraduallybeingreinvented. Byproviding an accessiblemodernsynthesis,wehopetoforestallsomeofthisduplicationofeffort,correctsomecom- mon misconceptions, and speed progress in visual reconstruction by promoting interaction between thevisionandphotogrammetrycommunities. Bundleadjustmentis theproblemofrefiningavisualreconstructiontoproducejointlyoptimal 3Dstructureandviewing parameter(cameraposeand/orcalibration)estimates. Optimalmeansthat ThisworkwassupportedinpartbytheEuropeanCommissionEspritLTRprojectCUMULI(B.Triggs),theUKEPSRC projectGR/L34099(P.McLauchlan),andtheRoyalSociety(A.Fitzgibbon).WewouldliketothankA.Zisserman,A.Gru¨n andW.Fo¨rstnerforvaluablecommentsandreferences.AversionofthispaperwillappearinVisionAlgorithms:Theory& Practice,B.Triggs,A.Zisserman&R.Szeliski(Eds.),Springer-VerlagLNCS1883,2000. 1 theparameterestimatesarefoundbyminimizingsomecostfunctionthatquantifiesthemodelfitting error,andjointlythatthesolutionissimultaneouslyoptimalwithrespecttobothstructureandcamera variations. Thenamereferstothe‘bundles’oflightraysleavingeach3Dfeatureandconverging on eachcameracentre,whichare‘adjusted’optimallywithrespecttobothfeatureandcamerapositions. Equivalently—unlikeindependentmodelmethods, whichmergepartialreconstructionswithoutup- dating theirinternal structure—allof the structureand cameraparameters areadjustedtogether‘in onebundle’. Bundle adjustment is really just a large sparse geometric parameter estimation problem, the pa- rameters being the combined 3D feature coordinates, camera poses and calibrations. Almost every- thingthatwewillsaycanbeappliedtomanysimilarestimationproblemsinvision,photogrammetry, industrial metrology, surveying and geodesy. Adjustment computations are a major common theme throughout the measurement sciences, and once the basic theory and methods are understood, they areeasytoadapttoawidevarietyofproblems. Adaptationislargelyamatterofchoosinganumerical optimization scheme that exploits the problem structure and sparsity. We will consider several such schemesbelowforbundleadjustment. Classically, bundle adjustment and similaradjustment computationsare formulated as nonlinear least squares problems [19, 46, 100, 21, 22, 69, 5, 73, 109]. The cost function is assumed to be quadratic in the feature reprojection errors, and robustness is provided by explicit outlier screening. Although it is already very flexible, this model is not really general enough. Modern systems of- ten use non-quadratic M-estimator-like distributional models to handle outliers more integrally, and manyincludeadditionalpenaltiesrelatedtooverfitting,modelselectionandsystemperformance(pri- ors, MDL). For this reason, we will not assume a least squares / quadratic cost model. Instead, the cost will be modelled as a sum of opaque contributions from the independent information sources (individualobservations,priordistributions,overfittingpenalties::: ). Thefunctionalformsofthese contributions and their dependence on fixed quantities such as observations will usually be left im- plicit. Thisallowsmanydifferenttypesofrobustandnon-robustcostcontributionstobeincorporated, without unduly cluttering the notation or hiding essential model structure. It fits well with modern sparse optimization methods (cost contributions are usually sparse functions of the parameters) and object-centred software organization, and it avoids many tedious displays of chain-rule results. Im- plementors are assumed to be capable of choosing appropriate functions and calculating derivatives themselves. One aim of this paper is to correct a number of misconceptions that seem to be common in the visionliterature: “Optimization/ bundle adjustmentis slow”: Such statementsoftenappear in papers introducing (cid:15) yet another heuristic Structure from Motion (SFM) iteration. The claimed slowness is almost alwaysduetotheunthinkinguseofageneral-purposeoptimizationroutinethatcompletelyignores theproblemstructure andsparseness. Realbundle routinesare muchmore efficientthan this, and usually considerably more efficient and flexible than the newly suggested method ( 6,7). That is x whybundleadjustmentremainsthedominantstructurerefinementtechniqueforrealapplications, after40yearsofresearch. “Only linear algebra is required”: This is a recent variant of the above, presumably meant to (cid:15) imply that the new technique is especially simple. Virtually all iterative refinement techniques use only linear algebra, and bundle adjustment is simpler than many in that it only solves linear systems: itmakesnouseofeigen-decompositionorSVD,whicharethemselvescomplexiterative methods. “Any sequence can be used”: Many vision workers seem to be very resistant to the idea that (cid:15) 2 reconstruction problems should be planned in advance ( 11), and results checked afterwards to x verify their reliability ( 10). System builders should at least be aware of the basic techniques x for this, even if application constraints make it difficult to use them. The extraordinary extent to whichweakgeometryandlackofredundancycanmaskgrosserrorsistooseldomappreciated,c.f. [34,50,30,33]. “PointP isreconstructedaccurately”: Inreconstruction,justas therearenoabsolutereferences (cid:15) for position, there are none for uncertainty. The 3D coordinate frame is itself uncertain, as it can only be located relative to uncertain reconstructed features or cameras. All other feature and camera uncertainties are expressed relative to the frame and inherit its uncertainty, so statements aboutthemaremeaninglessuntiltheframeanditsuncertaintyarespecified. Covariancescanlook completelydifferentindifferentframes,particularlyinobject-centredversuscamera-centredones. See 9. x Thereisatendencyinvisiontodevelopaprofusionofadhocadjustmentiterations. Whyshouldyou usebundleadjustmentratherthanoneofthesemethods? : Flexibility: Bundleadjustmentgracefullyhandlesaverywidevarietyofdifferent3Dfeatureand (cid:15) cameratypes(points,lines,curves,surfaces, exoticcameras),scenetypes(includingdynamicand articulated models, scene constraints), information sources (2D features, intensities, 3D informa- tion,priors)anderrormodels(includingrobustones). Ithasnoproblemswithmissingdata. Accuracy: Bundleadjustmentgivespreciseandeasilyinterpretedresultsbecauseitusesaccurate (cid:15) statisticalerrormodelsandsupportsasound,well-developedqualitycontrolmethodology. Efficiency: Mature bundle algorithms are comparatively efficient even on very large problems. (cid:15) They use economical and rapidly convergent numerical methods and make near-optimal use of problemsparseness. Ingeneral,ascomputervisionreconstructiontechnologymatures,weexpectthatbundleadjustment willpredominateoveralternative adjustmentmethodsinmuchthesamewayasithasinphotogram- metry. We see this as an inevitable consequence of a greater appreciation of optimization (notably, moreeffectiveuseofproblemstructureandsparseness),andofsystemsissuessuchasqualitycontrol andnetworkdesign. Coverage: We will touch on a good many aspects of bundle methods. We start by considering the camera projection model and the parametrization of the bundle problem 2, and the choice of er- x ror metric or cost function 3. 4 gives a rapid sketch of the optimization theory we will use. 5 x x x discusses the network structure (parameter interactions and characteristic sparseness) of the bundle problem. The following three sections consider three types of implementation strategies for adjust- ment computations: 6 covers second order Newton-like methods, which are still the most often x used adjustment algorithms; 7 covers methods with only first order convergence (most of the ad x hoc methods are in this class); and 8 discusses solution updating strategies and recursive filtering x bundle methods. 9 returns to the theoretical issue of gauge freedom (datum deficiency), including x thetheory ofinnerconstraints. 10 goesinto somedetailonqualitycontrolmethodsformonitoring x theaccuracyandreliabilityoftheparameterestimates. 11givessomebriefhintsonnetworkdesign, x i.e. how to place your shots to ensure accurate, reliable reconstruction. 12 completes the body of x the paper by summarizing the main conclusions and giving some provisional recommendations for methods. Therearealsoseveralappendices. Agives abriefhistoricaloverview ofthedevelopment x ofbundlemethods,withliteraturereferences. Bgivessometechnicaldetailsofmatrixfactorization, x updatingandcovariancecalculationmethods. Cgivessomehintsondesigningbundlesoftware,and x pointerstousefulresourcesontheInternet. Thepaperendswithaglossaryandreferences. 3 General references: Cultural differences sometimes make it difficult for vision workers to read the photogrammetry literature. The collection edited by Atkinson [5] and the manual by Karara [69] are both relatively accessible introductions to close-range (rather than aerial) photogrammetry. Other accessible tutorial papers include [46, 21, 22]. Kraus [73] is probably the most widely used photogrammetrytextbook. Brown’s earlysurvey ofbundle methods[19]is wellworthreading. The often-cited manual edited by Slama [100] is now quite dated, although its presentation of bundle adjustmentisstillrelevant. Wolf&Ghiliani[109]isatextdevotedtoadjustmentcomputations,with anemphasisonsurveying. Hartley &Zisserman[62]isanexcellentrecenttextbookcoveringvision geometryfromacomputervisionviewpoint. Fornonlinearoptimization,Fletcher[29]andGilletal [42] are the traditional texts, and Nocedal & Wright [93] is a good modern introduction. For linear leastsquares,Bj¨orck[11]issuperlative,andLawson&Hansonisagoodoldertext. Formoregeneral numericallinearalgebra, Golub& VanLoan [44]is thestandard. Duff et al [26] and George & Liu [40]arethestandardtextsonsparsematrixtechniques. Wewillnotdiscussinitializationmethodsfor bundle adjustmentin detail, but appropriatereconstructionmethods are plentiful and well-known in thevisioncommunity. See,e.g.,[62]forreferences. Notation: The structure, cameras, etc., being estimatedwill be parametrized bya single large state vectorx. Ingeneralthestatebelongstoanonlinearmanifold,butwelinearizethislocallyandwork with small linear state displacements denoted (cid:14)x. Observations (e.g. measured image features) are denoted z. The corresponding predicted values at parameter value x are denoted z = z(x), with residual prediction error z(x) z z(x). However, observations and prediction errors usually 4 (cid:17) (cid:0) only appear implicitly, through their influence on the cost function f(x) = f(predz(x)). The cost function’s gradient is g df, and its Hessian is H d2f. The observation-state Jacobian is (cid:17) dx (cid:17) dx2 J dz. Thedimensionsof(cid:14)x;(cid:14)zaren ;n . (cid:17) dx x z 2 Projection Model and Problem Parametrization 2.1 TheProjection Model Webeginthedevelopmentofbundleadjustmentbyconsideringthebasicimageprojectionmodeland theissueofproblemparametrization. Visualreconstructionattemptstorecoveramodelofa3Dscene from multiple images. As part of this, it usually also recovers the poses (positions and orientations) ofthecamerasthattooktheimages,andinformationabouttheirinternalparameters. Asimplescene model might be a collection of isolated 3D features, e.g., points, lines, planes, curves, or surface patches. However, far morecomplicatedscenemodelsarepossible,involving, e.g., complex objects linkedbyconstraintsorarticulations,photometryaswellasgeometry,dynamics,etc. Oneofthegreat strengths of adjustment computations — and one reason for thinking that they have a considerable future in vision — is their ability to take such complex and heterogeneous models in their stride. Almost any predictive parametric model can be handled, i.e. any model that predicts the values of someknownmeasurementsordescriptorsonthebasisofsomecontinuousparametricrepresentation oftheworld,whichistobeestimatedfromthemeasurements. Similarly, many possible camera models exist. Perspective projection is the standard, but the affineandorthographicprojectionsaresometimesusefulfordistantcameras,andmoreexoticmodels suchaspush-broomandrationalpolynomialcamerasareneededforcertainapplications[56,63]. In addition to pose (position and orientation), and simple internal parameters such as focal length and principal point, real cameras also require various types of additionalparameters to model internal aberrationssuchasradialdistortion[17,18,19,100,69,5]. 4 Forsimplicity,supposethatthesceneismodelledbyindividualstatic3DfeaturesX ,p=1:::n, p imagedinmshotswithcameraposeandinternalcalibrationparametersP ,i = 1:::m. Theremay i alsobefurthercalibrationparametersC ,c = 1:::k,constantacrossseveralimages(e.g.,depending c onwhichofseveralcameraswasused). Wearegivenuncertainmeasurementsx ofsomesubsetof ip the possible image features x (the true image of feature X in image i). For each observation x , ip p ip weassumethat wehave a predictive model x = x(C ;P ;X ) basedon theparameters, thatcan ip c i p beusedtoderiveafeaturepredictionerror: x (C ;P ;X ) x x(C ;P ;X ) (1) 4 ip c i p (cid:17) ip(cid:0) c i p In the case of image observations the predictive model is image projection, but other observation typessuchas3Dmeasurementscanalsobeincluded. To estimate the unknown 3D feature and camera parameters from the observations, and hence reconstruct the scene, we minimize some measure (discussed in 3) of their total prediction error. x Bundleadjustmentisthemodelrefinementpartofthis,startingfromgiveninitialparameterestimates (e.g.,fromsomeapproximatereconstructionmethod). Hence,itisessentiallyamatterofoptimizing a complicated nonlinear cost function (the total prediction error) over a large nonlinear parameter space(thesceneandcameraparameters). We will not go into the analytical forms of the various possible feature and image projection models, as these do not affect the general structure of the adjustment network, and only tend to obscure its central simplicity. We simply stress that the bundle framework is flexible enough to handle almost any desired model. Indeed, there are so many different combinations of features, imageprojectionsandmeasurements,thatitis besttoregardthemas blackboxes, capableofgiving measurement predictions based on their current parameters. (For optimization, first, and possibly second,derivativeswithrespecttotheparametersarealsoneeded). Formuchofthepaperwewilltakequiteanabstractviewofthissituation,collectingthesceneand cameraparameterstobeestimatedinto alarge statevectorx, andrepresentingthecost(totalfitting error)asanabstractfunctionf(x). Thecostisreallyafunctionofthefeaturepredictionerrors x = ip 4 x x(C ;P ;X ). Butastheobservations x areconstantsduringan adjustmentcalculation,we ip (cid:0) c i p ip leave the cost’s dependence on them and on the projection model x() implicit, and display only its (cid:1) dependenceontheparametersxactuallybeingadjusted. 2.2 Bundle Parametrization Thebundleadjustmentparameterspaceisgenerallyahigh-dimensionalnonlinearmanifold—alarge Cartesian product of projective 3D feature, 3D rotation, and camera calibration manifolds, perhaps withnonlinearconstraints, etc. Thestatex isnot strictlyspeaking avector, but rather apointin this space. Depending on how the entities that it contains are represented, x can be subject to various typesofcomplicationsincludingsingularities,internalconstraints,andunwantedinternaldegreesof freedom. Thesearisebecausegeometricentitieslikerotations,3Dlinesandevenprojectivepointsand planes, do nothave simpleglobal parametrizations. Theirlocalparametrizationsarenonlinear, with singularities that prevent them from covering the whole parameter space uniformly (e.g. the many variantsonEuleranglesforrotations,thesingularityofaffinepointcoordinatesatinfinity). Andtheir globalparametrizationseitherhaveconstraints(e.g.quaternionswith q 2 = 1),orunwantedinternal k k degrees of freedom (e.g. homogeneous projective quantities have a scale factor freedom, two points defining a line can slide along the line). For more complicatedcompound entities suchas matching tensorsandassembliesof3Dfeatureslinkedbycoincidence,parallelismororthogonalityconstraints, parametrizationbecomesevenmoredelicate. 5 Figure 1: Visiongeometryand its errormodel are essen- tiallyprojective. Affineparametrizationintroducesanar- tificialsingularityat projective infinity, which may cause numericalproblemsfordistantfeatures. Although they are in principle equivalent, different parametrizations often have profoundly dif- ferentnumericalbehaviourswhichgreatlyaffectthespeedandreliabilityoftheadjustmentiteration. The most suitable parametrizations for optimization are as uniform, finite and well-behaved as pos- sible near the current state estimate. Ideally, they should be locally close to linear in terms of their effect on the chosen error model, so that the cost function is locally nearly quadratic. Nonlinearity hindersconvergencebyreducingtheaccuracyofthesecondordercostmodelusedtopredictstateup- dates( 6). Excessivecorrelationsandparametrizationsingularitiescauseill-conditioninganderratic x numericalbehaviour. Large or infiniteparametervaluescan onlybe reachedafterexcessively many finiteadjustmentsteps. Anygivenparametrizationwillusuallyonlybewell-behaved inthissenseoverarelativelysmall section of state space. So to guarantee uniformly good performance, however the state itself may be represented, state updates should be evaluated using a stable local parametrization based on incrementsfromthecurrentestimate. Asexamplesweconsider3Dpointsandrotations. 3Dpoints: Evenforcalibratedcameras,visiongeometryandvisualreconstructionsareintrinsically projective. If a 3D (X Y Z)> parametrization (or equivalently a homogeneous affine (X Y Z 1)> one)is usedforverydistant3Dpoints,large X;Y;Z displacementsareneededto changetheimage significantly. I.e., in (X Y Z) space the cost function becomes very flat and steps needed for cost adjustment become very large for distant points. In comparison, with a homogeneous projective parametrization (X Y Z W)>, the behaviour near infinity is natural, finite and well-conditioned so long as the normalization keeps the homogeneous 4-vector finite at infinity (by sending W ! 0 there). In fact, there is no immediate visual distinction between the images of real points near infinity and virtual ones ‘beyond’ it (all camera geometries admit such virtual points as bona fide projective constructs). The optimal reconstruction of a real 3D point may even be virtual in this sense,ifimagenoisehappenstopushit‘acrossinfinity’. Also,thereisnothingtostopareconstructed point wandering beyond infinity and back during the optimization. This sounds bizarre at first, but it is an inescapable consequence of the fact that the natural geometry and error model for visual reconstructionisprojective ratherthanaffine. Projectively, infinityisjustlikeanyotherplace. Affine parametrization (X Y Z 1)> is acceptable for points near the origin with close-range convergent camera geometries, but it is disastrous for distant ones because it artificially cuts away half of the naturalparameterspace,andhidesthefactbysendingtheresultingedgetoinfiniteparametervalues. Instead, you should use a homogeneous parametrization (X Y Z W)> for distant points, e.g. with sphericalnormalization X2 = 1. i Rotations: Similarly, experience suggests that quasi-global 3 parameter rotation parametrizations P suchas Euler angles cause numericalproblemsunless one can be certain to avoid their singularities andregionsofunevencoverage. Rotationsshouldbeparametrizedusingeitherquaternionssubjectto q 2 = 1, orlocalperturbationsR(cid:14)R or(cid:14)RRof an existingrotationR, where(cid:14)Rcanbeany well- k k behaved 3 parameter small rotation approximation, e.g. (cid:14)R = (I +[(cid:14)r] ), the Rodriguez formula, (cid:2) localEulerangles,etc. Stateupdates: Justasstatevectorsxrepresentpointsinsomenonlinearspace,stateupdatesx x+ ! (cid:14)xrepresentdisplacementsinthisnonlinearspacethatoftencannotberepresentedexactlybyvector 6 addition. Nevertheless, weassumethatwecanlocallylinearizethestatemanifold, locallyresolving anyinternalconstraintsandfreedomsthatitmaybesubjectto,toproduceanunconstrainedvector(cid:14)x parametrizing the possible local state displacements. We can then, e.g., use Taylor expansion in (cid:14)x toformalocalcostmodelf(x+(cid:14)x) f(x)+ df (cid:14)x+ 1(cid:14)x> d2f (cid:14)x,fromwhichwecanestimatethe (cid:25) dx 2 dx2 stateupdate(cid:14)xthatoptimizesthismodel( 4). Thedisplacement(cid:14)xneednothavethesamestructure x or representation as x — indeed, if a well-behaved local parametrization is used to represent (cid:14)x, it generally will not have — but we must at least be able to update the state with the displacement to produce a new state estimate. We write this operation as x x+(cid:14)x, even though it may involve ! considerably more than vector addition. For example, apart from the change of representation, an updatedquaternionq q+dqwillneedtohave itsnormalization q 2 = 1corrected,andasmall ! k k rotationupdateoftheformR R(1+[r] )willnotingeneralgiveanexactrotationmatrix. ! (cid:2) 3 Error Modelling We now turn to the choice of the cost function f(x), which quantifies the total prediction (image reprojection)errorof the modelparametrizedbythecombinedsceneandcameraparametersx. Our mainconclusionwillbethatrobuststatistically-basederrormetricsbasedontotal(inlier+outlier)log likelihoodsshouldbeused,tocorrectlyallowforthepresenceofoutliers. Wewillarguethisatsome lengthasitseemstobepoorlyunderstood. Thetraditionaltreatmentsofadjustmentmethodsconsider onlyleastsquares(albeitwithdatatrimmingforrobustness),andmostdiscussionsofrobuststatistics givetheimpressionthatthechoiceofrobustifierorM-estimatoriswhollyamatterofpersonalwhim ratherthandatastatistics. Bundle adjustment is essentially a parameter estimation problem. Any parameter estimation paradigm could be used, but we will consider only optimal point estimators, whose output is by definition the single parameter vector that minimizes a predefined cost function designed to mea- sure how well the model fits the observations and background knowledge. This framework covers many practical estimators including maximum likelihood (ML) and maximum a posteriori (MAP), butnotexplicitBayesianmodelaveraging. Robustification,regularizationandmodelselectionterms areeasilyincorporatedinthecost. AtypicalMLcostfunctionwouldbethesummednegativeloglikelihoodsofthepredictionerrors ofalltheobservedimagefeatures. ForGaussianerrordistributions,thisreducestothesumofsquared covariance-weightedpredictionerrors( 3.2). AMAPestimatorwouldtypicallyaddcosttermsgiving x certainstructureorcameracalibrationparametersabiastowardstheirexpectedvalues. The cost function is also a tool for statistical interpretation. To the extent that lower costs are uniformly ‘better’, it provides a natural model preference ordering, so that cost iso-surfaces above the minimum define natural confidence regions. Locally, these regions are nested ellipsoids centred onthecostminimum,withsizeandshapecharacterizedbythedispersionmatrix(theinverseofthe costfunctionHessianH = d2f attheminimum). Also,theresidualcostattheminimumcanbeused dx2 asateststatisticformodelvalidity( 10). E.g.,foranegativeloglikelihoodcostmodelwithGaussian x errordistributions, twicetheresidualisa(cid:31)2 variable. 3.1 Desiderataforthe CostFunction In adjustment computationswe goto considerablelengthsto optimize a large nonlinearcost model, so it seems reasonable to require that the refinement should actually improve the estimates in some objective (albeit statistical) sense. Heuristically motivated cost functions can not usually guarantee this. Theyalmostalwaysleadtobiasedparameterestimates,andoftenseverelybiasedones. Alarge 7 body of statistical theory points to maximum likelihood (ML) and its Bayesian cousin maximum a posteriori (MAP) as the estimators of choice. ML simply selects the model for which the total probability of the observed data is highest, or saying the same thing in different words, for which the total posterior probability of the model given the observations is highest. MAP adds a prior term representing background information. ML could just as easily have included the prior as an additional ‘observation’: so far as estimation is concerned, the distinction between ML / MAP and prior/observationispurelyterminological. Informationusuallycomesfrommanyindependentsources. Inbundleadjustmenttheseinclude: covariance-weighted reprojection errors of individual image features; other measurements such as 3Dpositionsofcontrolpoints,GPSorinertialsensorreadings; predictionsfromuncertaindynamical models (for ‘Kalman filtering’ of dynamic cameras or scenes); prior knowledge expressed as soft constraints (e.g. on camera calibration or pose values); and supplementary sources such as overfit- ting, regularization or description length penalties. Note the variety. One of the great strengths of adjustment computations is their ability to combine information from disparate sources. Assuming thatthesourcesarestatisticallyindependentofoneanothergiventhemodel,thetotalprobabilityfor themodelgiventhecombineddataistheproductoftheprobabilitiesfromtheindividualsources. To getanadditivecostfunctionwetakelogs,sothetotalloglikelihoodforthemodelgiventhecombined dataisthesumoftheindividualsourceloglikelihoods. Properties of ML estimators: Apart from their obvious simplicity and intuitive appeal, ML and MAP estimators have strong statistical properties. Many of the most notable ones are asymptotic, i.e. they applyin thelimitof a large numberof independentmeasurements,or morepreciselyin the centrallimitwheretheposteriordistribution becomeseffectivelyGaussian1. Inparticular: Under mildregularity conditionson theobservation distributions, theposteriordistribution of the (cid:15) ML estimate converges asymptotically in probability to a Gaussian with covariance equal to the dispersionmatrix. TheMLestimateasymptoticallyhaszerobiasandthelowestvariancethatanyunbiasedestimator (cid:15) canhave. Sointhissense,MLestimationisatleastasgoodasanyothermethod2. Non-asymptotically, thedispersionisnotnecessarilyagoodapproximationforthecovarianceof the ML estimator. The asymptotic limit is usually assumed to be a valid for well-designed highly- redundant photogrammetric measurement networks, but recent sampling-based empirical studies of posterior likelihood surfaces [35, 80, 68] suggest that the case is much less clear for small vision geometryproblemsandweakernetworks. Moreworkisneededonthis. 1Costisadditive,soasmeasurementsofthesametypeareaddedtheentirecostsurfacegrowsindirectproportionto theamountofdatanz.Thismeansthattherelativesizesofthecostandallofitsderivatives—andhencethesizerofthe regionaroundtheminimumoverwhichthesecondorderTaylortermsdominateallhigherorderones—remainroughly constant asnz increases. Withinthis region, thetotal cost is roughly quadratic, so if the cost functionwas taken to be the posterior loglikelihood, the posterior distributionis roughlyGaussian. However the curvature ofthe quadratic(i.e. theinversedispersionmatrix)increasesasdataisadded,sotheposteriorstandarddeviationshrinksas (cid:27)=pnz nx , O (cid:0) where ((cid:27))characterizestheaveragestandarddeviationfromasingleobservation. Fornz nx ((cid:27)=r)2,essentially O (cid:0) (cid:29) (cid:0) (cid:1) theentireposteriorprobabilitymassliesinsidethequadraticregion,sotheposteriordistributionconvergesasymptotically inprobabilitytoaGaussian.ThishappensatanyproperisolatedcostminimumatwhichsecondorderTaylorexpansionis locallyvalid.Theapproximationgetsbetterwithmoredata(strongercurvature)andsmallerhigherorderTaylorterms. 2ThisresultfollowsfromtheCrame´r-Raobound(e.g.[23]),whichsaysthatthecovarianceofanyunbiasedestimator isboundedbelowbytheFisherinformationormeancurvatureoftheposteriorloglikelihoodsurface (x x)(x x)> d2logp h (cid:0) (cid:0) i(cid:23) wherepistheposteriorprobability,xtheparametersbeingestimated, xtheestimategivenbyanyunbiased e(cid:0)shtimdaxto2r, xi the true underlying x value, and A B denotes positive semidefiniteness of A B. Absymptobtically, the (cid:23) (cid:0) posteriordistributionbecomesGaussianandtheFisherinformationconvergestothebinversedispersion(thecurvatureof theposteriorloglikelihoodsurfaceatthecostminimum),sotheMLestimateattainstheCrame´r-Raobound. 8 0.4 Gaussian PDF 0.35 Cauchy PDF 1000 Samples from a Cauchy and a Gaussian Distribution 0.3 100 0.25 Cauchy 0.2 Gaussian 0.15 80 0.1 0.05 60 0 -10 -5 0 5 10 8 7 Gaussian -log likelihood 40 Cauchy -log likelihood 6 5 20 4 3 2 0 1 0 100 200 300 400 500 600 700 800 900 1000 0 -10 -5 0 5 10 Figure 2: Beware of treating any bell-shaped observation distribution as a Gaussian. Despite being narrowerinthepeakandbroaderinthetails,theprobabilitydensityfunctionofaCauchydistribution, p(x) = (cid:25)(1+x2) (cid:0)1, does not look so very different from that of a Gaussian (top left). But their negative log likelihoods are very different (bottom left), and large deviations (“outliers”) are much (cid:0) (cid:1) more probable for Cauchy variates than for Gaussian ones (right). In fact, the Cauchy distribution hasinfinitecovariance. The effect of incorrect error models: It is clear that incorrect modelling of the observation distri- butionsislikelytodisturbtheMLestimate. Suchmismodellingistosomeextentinevitablebecause error distributions stand for influences that we can not fully predict or control. To understand the distortionsthatunrealisticerrormodelscancause,firstrealizethatgeometricfittingisreallyaspecial case of parametric probability density estimation. For each set of parameter values, the geometric image projection model and the assumed observation error models combine to predict a probability density for the observations. Maximizing the likelihood corresponds to fitting this predicted obser- vation density to the observed data. The geometry and cameramodel onlyenter indirectly, via their influenceonthepredicteddistributions. Accurate noise modelling is just as critical to successful estimation as accurate geometric mod- elling. The most important mismodelling is failure to take account of the possibility of outliers (aberrantdatavalues,causede.g.,byblunderssuchas incorrectfeaturecorrespondences). Westress that so long as the assumed error distributions model the behaviour of all of the data used in the fit (includingboth inliers and outliers), theabove propertiesof ML estimationincludingasymptotic minimum variance remain valid in the presence of outliers. In other words, ML estimation is natu- rally robust: there is no need to robustify it so long as realistic error distributions were used in the firstplace. Adistribution thatmodelsbothinliersandoutliersiscalledatotaldistribution. Thereis noneedtoseparatethetwoclasses,asMLestimationdoesnotcareaboutthedistinction. Ifthetotal distributionhappenstobeanexplicitmixtureofaninlierandanoutlierdistribution (e.g.,aGaussian with a locally uniform background of outliers), outliers can be labeled after fitting using likelihood 9
Description: