EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow JeromeRevauda PhilippeWeinzaepfela ZaidHarchaouia,b CordeliaSchmida a Inria∗ b NYU [email protected] 5 1 Abstract 0 2 We propose a novel approach for optical flow estima- y tion, targeted at large displacements with significant oc- a clusions. It consists of two steps: i) dense matching by M Figure 1. Image edges detected with SED [15] and ground-truth edge-preservinginterpolationfromasparsesetofmatches; optical flow. Motion discontinuities appear most of the time at 9 ii) variational energy minimization initialized with the imageedges. 1 dense matches. The sparse-to-dense interpolation relies onanappropriatechoiceofthedistance, namelyanedge- ] aware geodesic distance. This distance is tailored to han- matching between adjacent frames together with the inte- V grationofthesematchesinavariationalapproach. Indeed, dle occlusions and motion boundaries – two common and C matching operators are robust to large displacements and difficult issues for optical flow computation. We also pro- . motiondiscontinuities[9,34]. Energyminimizationiscar- s poseanapproximationschemeforthegeodesicdistanceto c allow fast computation without loss of performance. Sub- riedoutinacoarse-to-fineschemeinordertoobtainafull- [ scaledenseflowfieldguidedbythematches.Amajordraw- sequent to the dense interpolation step, standard one-level 2 variationalenergyminimizationiscarriedoutonthedense back of coarse-to-fine schemes is error-propagation, i.e., v errors at coarser levels, where different motion layers can matches to obtain the final flow estimation. The proposed 5 overlap,canpropagateacrossscales. Evenifcoarse-to-fine approach, called Edge-Preserving Interpolation of Corre- 6 techniquesworkwellinmostcases, wearenotawareofa 5 spondences(EpicFlow)isfastandrobusttolargedisplace- theoreticalguaranteeorproofofconvergence. 2 ments. It significantly outperforms the state of the art on 0 MPI-SintelandperformsonparonKittiandMiddlebury. Instead,weproposetosimplyinterpolateasparsesetof . matchesinadensemannertoinitiatetheopticalflowesti- 1 0 mation. We then use this estimate to initialize a one-level 5 1.Introduction energy minimization, and obtain the final optical flow es- 1 timation. This enables us to leverage recent advances in v: Accurate estimation of optical flow from real-world matching algorithms, which can now output quasi-dense i videos remains a challenging problem [10], despite the correspondence fields [6, 34]. In the same spirit as [22], X abundantliteratureonthetopic. Themainremainingchal- weperformasparse-to-denseinterpolationbyfittingalocal r lengesareocclusions,motiondiscontinuitiesandlargedis- affinemodelateachpixelbasedonnearbymatches. Ama- a placements,allpresentinreal-worldvideos. jor issue arises for the preservation of motion boundaries. Effectiveapproacheswerepreviouslyproposedforhan- Wemakethefollowingobservation: motionboundariesof- dling the case of small displacements (i.e., less than a few ten tend to appear at image edges, see Figure 1. Con- pixels) [19, 35, 24]. These approaches cast the optical sequently, we propose to exchange the Euclidean distance flow problem into an energy minimization framework, of- with a better, i.e., edge-aware, distance and show that this tensolvedusingefficientcoarse-to-finealgorithms[8,28]. offersanaturalwaytohandlemotiondiscontinuities.More- However, due to the complexity of the minimization, such over,weshowhowanapproximationoftheedge-awaredis- methods get stuck in local minima and may fail to esti- tanceallowstofitonlyoneaffinemodelperinputmatch(in- mate large displacements, which often occur due to fast steadofoneperpixel). Thisleadstoanimportantspeed-up motion. This problem has recently received significant at- oftheinterpolationschemewithoutlossinperformance. tention. State-of-the-art approaches [9, 38] use descriptor The obtained interpolated field of correspondences is ∗LEARteam, InriaGrenoble Rhone-Alpes, Laboratoire JeanKuntz- sufficiently accurate to be used as initialization of a mann,CNRS,Univ.GrenobleAlpes,France. one-level energy minimization. Our work suggests that wasrecentlyinvestigatedinseveralpapers. Apenalization ofthedifferencebetweenflowandHOGmatcheswasadded totheenergybyBroxandMalik[9].Weinzaepfeletal.[34] replaced the HOG matches by an approach based on simi- CCooaarrsseesstt lleevveell FFllooww eessttiimmaattee aatt ccooaarrsseesstt lleevveell laritiesofnon-rigidpatches: DeepMatching. Xuetal.[38] mergedtheestimatedflowwithmatchingcandidatesateach levelofthecoarse-to-finescheme.Braux-Zinetal.[7]used segment features in addition to keypoints. However, these OOrriiggiinnaall ffrraammee FFllooww eessttiimmaattee aafftteerr ccooaarrssee--ttoo--ffiinnee methods rely on a coarse-to-fine scheme, that suffers from intrinsicflaws.Namely,detailsarelostatcoarsescales,and thin objects with substantially different motions cannot be detected. Those errors correspond to local minima, hence GGrroouunndd--ttrruutthh ffllooww EEppiiccFFllooww theycannotberecoveredandarepropagatedacrosslevels, Figure 2. Comparison of coarse-to-fine flow estimation and seeFigure2. EpicFlow. Errorsatthecoarsestlevelofestimation,duetoalow Incontrast,ourapproachisconceptuallyclosertorecent resolution,oftengetpropagatedtothefinestlevel(right,topand work that rely mainly on descriptor matching [36, 23, 12, middle). In contrast, our interpolation scheme benefits from an 22, 37, 27, 5]. Lu et al. [23] propose a variant of Patch- edgeprioratthefinestlevel(right,bottom). Match[6],whichusesSLICsuperpixels[1]asbasicblocks in order to better respect image boundaries. The purpose there may be better initialization strategies than the well- istoproduceanearest-neighbor-field(NNF)whichislater establishedcoarse-to-finescheme,seeFigure2. Inparticu- translatedintoaflow. However,SLICsuperpixelsareonly lar,ourapproach,EpicFlow(edge-preservinginterpolation locallyawareofimageedges,whereasouredge-awaredis- ofcorrespondences)performsbestonthechallengingMPI- tance is able to capture regions at the image scale. Simi- Sinteldataset[10]andiscompetitiveonKitti[16]andMid- larly, Chen et al. [12] propose to compute an approximate dlebury[4]. AnoverviewofEpicFlowisgiveninFigure3. NNF, and then estimate the dominant motion patterns us- Tosummarize,wemakethreemaincontributions: ing RANSAC. They, then, use a multi-label graph-cut to •WeproposeEpicFlow,anovelsparse-to-denseinterpola- solvetheassignmentofeachpixeltoamotionpatterncan- tion scheme of matches based on an edge-aware distance. didate. Theirmulti-labeloptimizationcanbeinterpretedas Weshowthatitisrobusttomotionboundaries, occlusions amotionsegmentationproblemorasalayeredmodel[29]. andlargedisplacements. Theseproblemsarehardandasmallerrorintheassignment •Weproposeanapproximationschemefortheedge-aware canleadtolargeerrorsintheresultingflow. distance, leading to a significant speed-up without loss of In the same spirit as our approach, Ren [26] proposes accuracy. to use edge-based affinities to group pixels and estimate •Weshowempiricallythattheproposedopticalflowesti- a piece-wise affine flow. Nevertheless, this work relies mationschemeismoreaccuratethanestimationsbasedon on a discretization of the optical flow constraint, which coarse-to-fineminimization. is valid only for small displacements. Closely related to This paper is organized as follows. In Section 2, we EpicFlow, Leordeanu et al. [22] also investigate sparse- review related work on large displacement optical flow. to-dense interpolation. Their initial matching is obtained We then present the sparse-to-dense interpolation in Sec- through the costly minimization of a global non-convex tion 3 and the energy minimization for optical flow com- matching energy. In contrast, we directly use state-of-the- putation in Section 4. Finally, Section 5 presents experi- art matches [34, 18] as input. Furthermore, during their mental results. Source code is available online at http: sparse-to-denseinterpolation,theycomputeanaffinetrans- //lear.inrialpes.fr/software. formationindependentlyforeachpixelbasedonitsneigh- borhoodmatches, whicharefoundinaEuclideanballand 2.RelatedWork weighted by an estimation of occluded areas that involves learningabinaryclassifier.Incontrast,weproposetousean Mostopticalflowapproachesarebasedonavariational edge-preservingdistancethatnaturallyhandlesocclusions, formulationandarelatedenergyminimizationproblem[19, andcanbeveryefficientlycomputed. 4, 28]. The minimization is carried out using a coarse-to- fine scheme [8]. While such schemes are attractive from 3.Sparse-to-denseinterpolation acomputationalpointofview,theminimizationoftengets stuckinlocalminimaandleadstoerroraccumulationacross The proposed approach, EpicFlow, consists of three scales,especiallyinthecaseoflargedisplacements[2,9]. steps,asillustratedinFigure3. First,wecomputeasparse Totacklethisissue, theadditionofdescriptor/matching setofmatchesbetweenthetwoimages,usingastate-of-the- 2 Contour Dense Interpolation First Image Energy Minimization Second Image Matching Figure3.OverviewofEpicFlow.Giventwoimages,wecomputematchesusingDeepMatching[34]andtheedgesofthefirstimageusing SED[15].Wecombinethesetwocuestodenselyinterpolatematchesandobtainadensecorrespondencefield.Thisisusedasinitialization ofaone-levelenergyminimizationframework. artmatchingalgorithm. Second,weperformadensification top: of this set of matches, by computing a sparse-to-dense in- (cid:80) k (p ,p)p(cid:48) terpolationfromthesparsesetofmatches,whichyieldsan D m m initial estimate of the optical flow. Third, we compute the FNW(p)= (pm,p(cid:48)m(cid:80))∈M k (p ,p) , (1) D m finalopticalflowestimationbyperformingonestepofvari- (pm,p(cid:48)m)∈M ational energy minimization using the dense interpolation asinitialization,seeSection4. wherekD(pm,p)=exp(−aD(pm,p))isaGaussianker- nelforadistanceDwithaparametera. 3.1.Sparsesetofmatches •Locally-weightedaffine(LA)estimation[17]. Thesec- ond estimator is based on fitting a local affine transforma- The first step of our approach extracts a sparse set of tion. The correspondence field F (p) is interpolated us- matches, see Figure 3. Any state-of-the-art matching al- LA ing a locally-weighted affine estimator at a pixel p ∈ I as gorithm can be used to compute the initial set of sparse F (p) = A p+t(cid:62) , where A and t are the parame- matches. In our experiments, we compare the results LA p p p p tersofanaffinetransformationestimatedforpixelp. These whenusingDeepMatching[34]orasubsetofanestimated parametersarecomputedastheleast-squaresolutionofan nearest-neighbor field [18]. We defer to Section 5.1 for a overdetermined system obtained by writing two equations description of these matching algorithms. In both cases, we obtain ∼ 5000 matches for an image of resolution foreachmatch(pm,p(cid:48)m)∈Mweightedaspreviously: 1024 × 436, i.e., an average of around one match per 90 k (p ,p)(cid:0)A p +t(cid:62)−p(cid:48) (cid:1)=0 . (2) D m p m p m pixels. Wealsoevaluatetheimpactofmatchingqualityand densityontheperformanceofEpicFlowbygeneratingarti- Local interpolation. Note that the influence of remote ficial matches from the ground-truth in Section 5.3. In the matches is either negligible, or could harm the interpola- following,wedenotebyM={(p ,p(cid:48) )}thesparsesetof tion,forexamplewhenobjectsmovedifferently.Therefore, m m inputmatches,whereeachmatch(p ,p(cid:48) )definesacorre- werestrictthesetofmatchesusedintheinterpolationata m m spondencebetweenapixelp inthefirstimageandanda pixelptoitsK nearestneighborsaccordingtothedistance m pixelp(cid:48) inthesecondimage. D,whichwedenoteasN (p). Inotherwords,wereplace m K thesummationoverMintheNWoperatorbyasummation 3.2.Interpolationmethod overN (p), andlikewiseforbuildingtheoverdetermined K We estimate a dense correspondence field F : I → I(cid:48) systemtofittheaffinetransformationforFLA. between a source image I and a target image I(cid:48) by inter- 3.3.Edge-preservingdistance polatingasparsesetofinputsmatchesM = {(p ,p(cid:48) )}. m m TheinterpolationrequiresadistanceD : I ×I → R+ be- Using the Euclidean distance for the interpolation pre- tweenpixels,seeSection3.3.Weconsiderheretwooptions sentedaboveispossible. However,inthiscasetheinterpo- fortheinterpolation. lationissimplybasedonthepositionoftheinputmatches • Nadaraya-Watson (NW) estimation [31]. The cor- anddoesnotrespectmotionboundaries. Supposeforamo- respondence field F (p) is interpolated using the mentthatthemotionboundariesareknown. Wecan,then, NW Nadaraya-Watson estimator at a pixel p ∈ I and is ex- useageodesicdistanceD basedonthesemotionbound- G pressed by a sum of matches weighted by their proximity aries. Formally, the geodesic distance between two pixels 3 (a) (c) (e) (g) (b) (d) (f) (h) Figure4. (a-b)twoconsecutiveframes;(c)contourresponseC fromSED[15](thedarker,thehigher);(d)matchpositions{p }from m DeepMatching[34]; (e-f)geodesicdistancefromapixelp(markedinblue)toallothersD (p,.)(thebrighter, thecloser). (g-h)100 G nearestmatches,i.e.,N (p)(red)usinggeodesicdistanceD fromthepixelpinblue. 100 G pandq isdefinedastheshortestdistancewithrespecttoa costmapC: 2 (cid:90) 3 D (p,q)= inf C(p )dp , (3) 1 G s s 4 Γ∈Pp,q Γ (a) (b) (c) (d) whereP denotesthesetofallpossiblepathsbetweenp Figure5.Fortheregionshownin(a),(b)showstheimageedges p,q C andwhitecrossesrepresentingthematchpositions{p }. (c) andq,andC(p )thecostofcrossingpixelp (theviscos- m s s displaystheassignmentL,i.e.,geodesicVoronoicells.Webuilda ityinphysics). Inoursettings,C correspondstothemotion graphGfromL(seetext).(d)showstheshortestpathbetweentwo boundaries. Hence, a pixel belonging to a motion layer is neighbor matches, which can go through the edge that connects close to all other pixels from the same layer according to them(3-4)orashorterpathfoundbyDijkstra’salgorithm(1-2). D ,butfarfromeverythingbeyondtheboundaries. Since G eachpixelisinterpolatedbasedonitsneighbors, theinter- requiredbyourinterpolationscheme)ishigh. Wenowpro- polationwillrespectthemotionboundaries. poseanefficientapproximationD˜ . Inpractice, weuseanalternativetotruemotionbound- G Akeyobservationisthatneighboringpixelsareoftenin- aries,makingtheplausibleassumptionthatimageedgesare terpolatedsimilarly,suggestingastrategythatwouldlever- asupersetofmotionboundaries. Thisway,thedistancebe- age such local information. In this section we employ the tweenpixelsbelongingtothesameregionwillbelow.Iten- term‘match’torefertop insteadof(p ,p(cid:48) ). sures a proper edge-respecting interpolation as long as the m m m Geodesic Voronoi diagram. We first define a clustering number of matches in each region is sufficient. Similarly, L, such that L(p) assigns a pixel p to its closest match Criminisi et al. [13] showed that geodesic distances are a according to the geodesic distance, i.e., we have L(p) = natural tool for edge-preserving image editing operations argmin D (p,p ). L defines geodesic Voronoi cells, (denoising,textureflattening,etc.) anditwasalsousedre- pm G m asshowninFigure5(c). centlytogenerateobjectproposals[21]. Inpractice,weset Approximated geodesic distance. We then approximate thecostmapCusingarecentstate-of-the-artedgedetector, namely the “structured edge detector” (SED) [15]1. Fig- the distance between a pixel p and any match pm as the distancetotheclosestmatchL(p)plusanapproximatedis- ure4showsanexampleofaSEDmap,aswellasexamples tancebetweenmatches: of geodesic distances and neighbor sets N (p) for differ- K entpixelsp. Noticehowneighborsarefoundonthesame D˜ (p,p )=D (p,L(p))+DG(L(p),p ) (4) G m G G m objects/partsoftheimagewithD ,incontrasttoEuclidean G distance(seealsoFigure6). whereDG isagraph-basedapproximationofthegeodesic G distance between two matches. To define DG we use G 3.4.Fastapproximation a neighborhood graph G whose nodes are {p }. Two m matches p and p are connected by an edge if they are The geodesic distance can be rapidly computed from a m n neighbors in L. The edge weight is then defined as the pointtoallotherpixels. Forinstance,Weberetal.[32]pro- geodesicdistancebetweenp andp ,wherethegeodesic pose parallel algorithms that simulate an advancing wave- m n distancecalculationisrestrictedtotheVoronoicellsofp front. Nevertheless, the computational cost for computing m andp . We, then, calculatetheapproximategeodesicdis- thegeodesicdistancebetweenallpixelsandallmatches(as n tancebetweenanytwomatchesp ,p usingDijkstra’sal- m n 1https://github.com/pdollar/edges gorithmonG,seeFigure5(d). 4 Piecewisefield. Sofar,wehavebuiltanapproximationof estimation that accurately minimizes the full-scale energy. thedistancebetweenpixelsandmatchpoints.Wenowshow Thus, the coarse-to-fine scheme should be considered as a that our interpolation model results in a piece-wise corre- heuristictoprovideaninitializationforthefull-scaleflow. spondence field (either constant for the Nadaraya-Watson Our approach can be thought of as an alternative to the estimator, or piece-wise affine for LA). This property is above strategy, by offering a smart heuristic to accurately crucial to obtain a fast interpolation scheme, and experi- initialize the optical flow before performing energy mini- ments shows that it does not impact the accuracy. Let us mization at the full-scale. This offers several advantages considerapixelpsuchthatL(p) = p . Thedistancebe- over the coarse-to-fine scheme. First, the cost map C in m tweenpandanymatchp isthesameastheonebetween our method acts as a prior on boundary location. Such n p and p up to a constant independent from p (Equa- a prior could also be incorporated by a local smoothness m n n tion 4). As a consequence, we have N (p) = N (p ) weight in the coarse-to-fine minimization, but would then K K m and k (p,p ) = k (p,p )×k (p ,p ). For the be difficult to interpret at coarse scales where boundaries NadarDa˜yGa-WatnsonestiDmGator,wmethusoDbGGtainm: n might strongly overlap. In addition, since our method di- rectlyworksatthefullimageresolution,itavoidspossible FNW(p)= (cid:80)(cid:80)(p(npn,p,(cid:48)np(cid:48)n)k)Dk˜DG˜G(p(,pp,npn)p)(cid:48)n (5) iosvseuressmroeolathteeddtaottchoearpsreessecnalcees.oSfuthcihneorrbojersctasttchoaatrcseouslcdalbees = kDkDGG(p(,pp,mpm)(cid:80))(cid:80)(p(npn,p,(cid:48)np(cid:48)n)k)DkDGGGG(p(mpm,p,npn)p)(cid:48)n =FNW(pm) aprreocpereodpsa,gsaeteedFtiogufirnee2r.scalesasthecoarse-to-fineapproach Variational Energy Minimization. We minimize an en- whereallthesumsarefor(p ,p(cid:48))∈N (p)=N (p ). n n K K m ergydefinedasasumofadatatermandasmoothnessterm. The same reasoning holds for the weighted affine interpo- We use the same data term as [40], based on a classical lator, which is invariant to a multiplication of the weights color-constancy and gradient-constancy assumption with a byaconstantfactor. Asaconsequence, itsufficestocom- normalizationfactor. Forthesmoothnessterm,wepenalize pute |M| estimations (one per match) and to propagate it the flow gradient norm, with a local smoothness weight α to the pixel assigned to this match. This is orders of mag- (cid:0) (cid:1) as in [33, 38]: α(x) = exp −κ(cid:107)∇ I(x)(cid:107) with κ = 5. 2 nitudefasterthananindependentestimationforeachpixel, WehavealsoexperimentedusingSEDinsteadandobtained e.g.asdonein[22]. WesummarizetheapproachinAlgo- similarperformance. rithm 1 for Nadaraya-Watson estimator. The algorithm is Forminimization,weinitializethesolutionwiththeout- similar for LA interpolator (e.g. line 6 becomes “Estimate put of our sparse-to-dense interpolation and use the ap- affine parameters A ,t ” and line 8 “Set W (p) = pm pm LA proachof[8]withoutthecoarse-to-finescheme. Morepre- A p+t(cid:62) ”). L(p) L(p) cisely, we perform 5 fixed point iterations, i.e., compute the non-linear weights (that appear when applying Euler- Algorithm1InterpolationwithNadaraya-Watson Lagrange equations [8]) and the flow updates 5 times it- Input:apairofimagesI,I(cid:48),asetMofmatches eratively. Theflowupdatesarecomputedbysolvinglinear Output:densecorrespondencefieldFNW systemsusing30iterationsofthesuccessiveoverrelaxation 1 ComputethecostCforIusingSED[15] method[39]. 2 ComputetheassignmentmapL 3 BuildthegraphGfromL 5.Experiments 4 For(p ,p(cid:48) )∈M m m 5 ComputeNK(pm)fromGusingDijkstra’salgorithm In this section, we evaluate EpicFlow on three state-of- 6 ComputeFNW(pm)fromNK(pm)usingEq.1 the-artdatasets: 7 Foreachpixelp •MPI-Sinteldataset[10]isachallengingevaluationbench- 8 SetF (p)=F (L(p)) NW NW markobtainedfromananimatedmovie.Itcontainsmultiple sequences including large/rapid motions. We only use the ‘final’ version that features realistic rendering effects such 4.OpticalFlowEstimation asmotion,defocusblurandatmosphericeffects. Coarse-to-finevs. EpicFlow. Theoutputofthesparse-to- •TheKittidataset[16]containsphotosshotincitystreets dense interpolation is a dense correspondence field. This from a driving platform. It features large displacements, field is used as initialization of a variational energy min- different materials(complex 3D objects liketrees), a large imization method. In contrast to our approach, state-of- varietyoflightingconditionsandnon-lambertiansurfaces. the-art methods usually rely on a coarse-to-fine scheme to •TheMiddleburydataset[4]hasbeenextensivelyusedfor computethefull-scalecorrespondencefield. Tothebestof evaluating optical flow methods. It contains complex mo- ourknowledge,thereexistsnotheoreticalprooforguaran- tions,butdisplacementsarelimitedtoafewpixels. tee that a coarse-to-fine minimization leads to a consistent Asin[34],weoptimizetheparametersonasubset(20%) 5 oftheMPI-Sinteltrainingset. Wethenreportaverageend- pointerror(AEE)ontheremainingMPI-Sinteltrainingset (80%), the Kitti training set and the Middlebury training set. Thisallowsustoevaluatetheimpactofparameterson differentdatasetsandavoidoverfitting. Theparametersare Figure6.Left:Matchpositionsreturnedby[34]areshowninblue. typically a (cid:39) 1 for the coefficient in the kernel k , the D Reddenotesoccludedareas. Right: Yellow(resp. blue)squares numberofneighborsisK (cid:39) 25forNWinterpolationand correspond to the 100 nearest matches with a Euclidean (resp. K (cid:39) 100 when using LA. In Section 5.4, we compare to edge-awaregeodesic)distancefortheoccludedpixelshowninred. thestateoftheartonthetestsets. Inthiscase,theparam- AllneighbormatcheswithaEuclideandistanceholdstoadiffer- etersareoptimizedonthetrainingsetofthecorresponding entobjectwhilethegeodesicdistanceallowstocapturematches dataset. TimingisreportedforoneCPU-coreat3.6GHz. fromthesameregion,eveninthecaseoflargeoccludedareas. In the following, we first describe two types of input matchesinSection5.1. Section5.2thenstudiesthediffer- Matching Interpolator MPI-Sintel Kitti Middlebury entparametersofourapproach.InSection5.3,wecompare on KPM NW 6.052 15.679 0.765 our method to a variational approach with a coarse-to-fine polati KDPMM NLWA 64..313443 152..406101 00..787968 scheme. Finally, we show that EpicFlow outperforms cur- nter DM LA 4.068 3.560 0.840 I rentmethodsonchallengingdatasetsinSection5.4. KPM NW 5.741 15.240 0.388 w o KPM LA 5.764 11.307 0.315 Fl 5.1.Inputmatches pic DM NW 3.804 4.900 0.485 E DM LA 3.686 3.334 0.380 Togenerateinputmatches,weuseandcomparetwore- Table1.Comparisonofaverageendpointerror(AEE)fordifferent cent matching algorithms. They each produce about 5000 sparsematches(DM,KPM)andinterpolators(NW,LA)aswell asforsparse-to-denseinterpolation(top)andEpicFlow(bottom). matchesperimage. TheapproximatedgeodesicdistanceD˜ isused. • The first one is DeepMatching (DM), used in Deep- G Flow[34],whichhasshownexcellentperformanceforop- ticalflow. Itbuildscorrespondencesbycomputingsimilar- Wealsoexperimentwithsyntheticsparsematchesofvar- itiesofnon-rigidpatches,allowingforsomedeformations. ious densities and noise levels in Section 5.3, in order to We use the online code2 on images downscaled by a fac- evaluate the sensitivity of EpicFlow to the quality of the tor 2. A reciprocal verification is included in DM. As a matchingapproach. consequence,themajorityofmatchesinoccludedareasare pruned,seematchesinFigure6(left). 5.2.Impactofthedifferentparameters •ThesecondoneisarecentvariantofPatchMatch[6]that In this section, we evaluate the impact of the matches reliesonkd-treesandlocalpropagationtocomputeadense and the interpolator. We also compare the quality of the correspondence field [18] (KPM). We use the online code sparse-to-dense interpolation and EpicFlow. Furthermore, to extract the dense correspondence field3. It is noisy, as weexamimetheimpactofthegeodesicdistanceanditsap- it is based on small patches without global regularization, proximationaswellastheimpactofthequalityofthecon- as well as often incorrect in case of occlusion. Thus, we tourdetector. perform a two-way matching and eliminate non-reciprocal Matchesandinterpolators. Table1comparestheresultof matchestoremoveincorrectcorrespondences.Wealsosub- our sparse-to-dense interpolation, i.e., before energy min- sampletheseprunedcorrespondencestospeed-uptheinter- imization, and EpicFlow for different matches (DM and polation. Wehaveexperimentallyverifiedonseveralimage KPM) and for the two interpolation schemes: Nadaraya- pairsthatthissubsamplingdoesnotresultinalossofper- Watson (NW) and locally-weighted affine (LA). The ap- formance. proximated geodesic distance is used in the interpolation, Pruningofmatches. Inbothcases, matchesareextracted seeSection3.4. locally and might be incorrect in regions with low texture. We can observe that KPM is consistently outperformed Thus, we remove matches corresponding to patches with by DeepMatching (DM) on MPI-Sintel and Kitti datasets, low saliency, which are determined by the eigenvalues of with a gap of 2 and 8 pixels respectively. Kitti contains autocorrelationmatrix. Furthermore,weperformaconsis- manyrepetitivetexturesliketreesorroads,whichareoften tencychecktoremoveoutliers. Werunthesparse-to-dense mismatched by KPM. Note that DM is significantly more interpolationoncewiththeNadaraya-Watsonestimatorand robust to repetitive textures than KPM, as it uses a multi- removematchesforwhichthedifferencetotheinitialesti- scalescoringscheme. TheresultsonMiddleburyarecom- mateisover5pixels. parableandbelow1pixel. 2http://lear.inrialpes.fr/src/deepmatching/ We also observe that LA performs better than NW on 3http://j0sh.github.io/thesis/kdtree/ Kitti, while the results are comparable on MPI-Sintel and 6 Contour Distance MPI-Sintel Kitti Middlebury Time tance, but weights k (p ,p) are set according to the ap- SED[15] Geodesic(approx.) 3.686 3.334 0.380 16.4s D˜ m SED[15] Geodesic(exact) 3.677 3.216 0.393 204s proximategeodesicdistance. Table2showsthatthisleads - Euclidean 4.617 3.663 0.442 40s to a drop of performance by around 0.3 pixels for MPI- SED[15] mixed 3.975 3.510 0.399 300s gPb[3] Geodesic(approx.) 4.161 3.437 0.430 26s SintelandKitti. Figure6illustratesthereason: noneofthe Canny[11] Geodesic(approx.) 4.551 3.308 0.488 16.4s Euclidean neighbor matches (yellow) belong to the region GT(cid:107)b∇ou2nId(cid:107)a2ries GGeeooddeessiicc((aapppprrooxx..)) 43..056818 3.399 0.388 16.4s correspondingtotheselectedpixel(red),butallofgeodesic Table2.ComparisonoftheAEEofEpicFlow(withDMandLA) neighbormatches(blue)belongtoit. Thisdemonstratesthe fordifferentdistancesanddifferentcontourextractors. Thetime importance of using an edge-preserving geodesic distance (rightcolumn)isreportedforaMPI-Sintelimagepair. throughout the whole pipeline, in contrast to [22] who in- terpolatesmatchesfoundinaEuclideanneigbhorhood. Impactofcontourdetector.Wealsoevaluatetheimpactof Middlebury. This is due to the specificity of the Kitti thecontourdetectorinTable2, i.e., theSEDdetector[15] dataset, where the scene consists of planar surfaces and, is replaced by the Berkeley gPb detector [3] or the Canny thus, affine transformations are more suitable than transla- edgedetector[11]. UsinggPbleadstoasmalldropinper- tions to approximate the flow. Based on these results, we formance(around0.1pixelonKittiand0.5onMPI-Sintel) use DM matches and LA interpolation in the remainder of and significantly increases the computation time. Canny theexperimentalsection. edges perform similar to the Euclidean distance. This can The interpolation is robust to the neighborhood size K be explained by the insufficient quality of the Canny con- withforinstanceanAEEof4.082,4.053,4.068and4.076 tours.Usingthenormofimage’sgradientimprovesslightly for K = 50,100,160 (optimal value on the training set), overgPb. Wefoundthatthisisduetothepresenceofholes 200respectively,onMPI-SintelwiththeLAestimatorand whenestimatingcontourswithgPb.Finally,weperformex- before variational minimization. We also implemented a perimentsusingground-truthmotionboundaries,computed variant where we use all matches closer than a threshold fromthenormofground-truthflowgradient,andobtainan andobtainedsimilarperformance. improvement of 0.1 on MPI-Sintel (0.2 before the varia- Sparse-to-denseinterpolationversusEpicFlow. Wealso tionalpart). Theground-truthflowisnotdenseenoughon evaluatethegainduetothevariationalminimizationusing MiddleburyandKittidatasetstoestimateGTboundaries. the interpolation as initialization. We can see in Table 1 thatthisstepclearlyimprovestheperformanceinallcases. 5.3.EpicFlowversuscoarse-to-finescheme Theimprovementisaround0.5pixel. Figure7presentsre- sults for three image pairs with the initialization only and To show the benefit of our approach, we have carried thefinalresultofEpicFlow(rowthreeandfour). Whilethe out a comparison with a coarse-to-fine scheme. Our im- flow images look similar overall, the minimization allows plementation of the variational approach is the same as in tofurthersmoothandrefinetheflow,explainingthegainin Section4,withacoarse-to-fineschemeandDeepMatching performance. Yet,itpreservesdiscontinuitiesandsmallde- integratedintheenergythroughapenalizationofthediffer- tails,suchasthelegsintherightcolumn. Inthefollowing, ence between flow and matches[9, 34]. Table 3 compares results are reported for EpicFlow, i.e., after the variational EpicFlow to the variational approach with coarse-to-fine minimizationstep. scheme,usingexactlythesamematchesasinput.EpicFlow Edge-aware versus Euclidean distances. We now study performs better and is also faster. The gain is around 0.4 theimpactofdifferentdistances. First,weexaminetheef- pixel on MPI-Sintel and over 1 pixel on Kitti. The impor- fect of approximating the geodesic distance (Section 3.4). tant gain on Kitti might be explained by the affine model Table 2 shows that our approximation has a negligible im- usedforinterpolation,whichfitswellthepiecewiseplanar pact when compared to the exact geodesic distance. Note structure of the scene. On Middlebury, the variational ap- thattheexactversionperformsdistancecomputationaswell proach achieves slightly better results, as this dataset does aslocalestimationperpixelandis, thus, anorderofmag- notcontainlargedisplacements. nitudeslowertocompute,seelastcolumnofTable2. Figure 7 shows a comparison to three state-of-the-art Next, we compare the geodesic distance and Euclidean methods,allbuiltuponacoarse-to-finescheme. Notehow distances. Table 2 shows that using a Euclidean distance motionboundariesarepreservedbyEpicFlow. Evensmall leadstoasignificantdropinperformance,inparticularfor details, like the limbs in the right column, are captured. the MPI-Sintel dataset, the drop is 1 pixel. This confirms Similarly, inthecaseofoccludedareas, EpicFlowbenefits the importance of our edge-preserving distance. Note that fromthegeodesicdistancetoproduceacorrectestimation, the result with the Euclidean distance is reported with an seetherightpartoftheleftexample. exactversion,i.e.,theinterpolationiscomputedpixelwise. Sensitivitytothematchingquality. Inordertogetabet- We also compare to a mixed approach, in which the ter understanding of why EpicFlow performs better than neighbor list N is constructed using the Euclidean dis- a coarse-to-fine scheme, we have evaluated and compared K 7 s e g a m I h ut Tr d- n u o Gr n o ati ol p r e nt I w o Fl c pi E 4] 3 [ w o Fl p e e D 8] 3 [ 2 w o Fl P D M 9] [ F O D L Figure7.Eachcolumnshowsfromtoptobottom:meanoftwoconsecutiveimages,ground-truthflow,resultofsparse-to-denseinterpola- tion(Interpolation),fullmethod(EpicFlow),and3state-of-the-artmethods.EpicFlowbetterrespectsmotionboundaries,isabletocapture smallpartslikethelimbsofthecharacter(rightcolumn)andsuccessfullyestimatestheflowinoccludedareas(rightpartofleftcolumn). Flowmethod MPI-Sintel Kitti Middlebury Time 10 DM+coarse-to-fine 4.095 4.422 0.321 25s AEE for EpicFlow AEE for coarse-to-fine 9 TabDleM3.+CEopmicFpalorwisonofA3E.E68f6orEpi3c.F3l3o4w(with0.3D8M0 +LA1)6a.n4sda density 2222−−−−4567 DM KPM density 2222−−−−4567 DM KPM 678 coarse-to-finescheme(withDM). Matching 22222−−−−−11189012 Matching 22222−−−−−11189012 2345 tohfetihrepienrpfourtmmaantccehsesf.orToditfhfearteanitmd,ewnseitgieesnearnadtedersryonrthraetteics 0.00.010.020.040.060.080.10.150.20.30.40.5 0.00.010.020.040.060.080.10.150.20.30.40.5 01 Matching noise Matching noise matchesbytakingtheground-truthflow,removingpointsin Figure 8. Comparison of AEE between EpicFlow (left) and a theoccludedareas, subsamplingtoobtainthedesiredden- coarse-to-finescheme(right)forvarioussyntheticinputmatches sity and corrupting the matches to the desired percentage withdifferentdensitiesanderrorlevels. Forpositionsabovethe ofincorrectmatches. Foreachsetofmatcheswithagiven redline,EpicFlowperformsbetter. density and quality, we have carefully determined the pa- rametersofEpicFlowandthecoarse-to-finemethodonthe 8 Method AEE AEE-occ s0-10 s10-40 s40+ Time Method AEE-noc AEE Out-Noc3 Out-All3 Time EpicFlow 6.285 32.564 1.135 3.727 38.021 16.4s EpicFlow 1.5 3.8 7.88% 17.08% 16s TF+OFM[20] 6.727 33.929 1.512 3.765 39.761 ∼400s NLTGV-SC[25] 1.6 3.8 5.93% 11.96% 16s(GPU) DeepFlow[34] 7.212 38.781 1.284 4.107 44.118 19s BTF-ILLUM[14] 1.5 2.8 6.52% 11.03% 80s S2D-Matching[22] 7.872 40.093 1.172 4.695 48.782 ∼2000s TGV2ADCSIFT[7] 1.5 4.5 6.20% 15.15% 12s(GPU) Classic+NLP[28] 8.291 40.925 1.208 5.090 51.162 ∼800s Data-Flow[30] 1.9 5.5 7.11% 14.57% 180s MDP-Flow2[38] 8.445 43.430 1.420 5.449 50.507 709s DeepFlow[34] 1.5 5.8 7.22% 17.79% 17s NLTGV-SC[25] 8.746 42.242 1.587 4.780 53.860 TF+OFM[20] 2.0 5.0 10.22% 18.46% 350s LDOF[9] 9.116 42.344 1.485 4.839 57.296 30s Table 5. Results on Kitti test set. AEE-noc is the AEE over Table4.ResultsonMPI-Sinteltestset(finalversion). AEE-occ non-occluded areas. Out-Noc 3 (resp. Out-all 3) refers to the istheAEEonoccludedareas. s0-10istheAEEforpixelswhose percentage of pixels where flow estimation has an error above 3 motionsisbetween0and10pxandsimilarlyfors10-40ands40+. pixelsinnon-occludedareas(resp.allpixels). MPI-Sinteltrainingsubset,andthenevaluatedthemonthe remainingtrainingimages. ResultsintermofAEEaregiveninFigure8,whereden- s e sityisrepresentedverticallyastheratioof#matches/#non- ag m occludedpixelsandmatchingerrorisrepresentedhorizon- I tallyastheratioof#falsematches/#matches. Wecanob- serve that EpicFlow yields better results provided that the T G matching is sufficiently dense for a given error rate. For low-densityorstronglycorruptedmatches,EpicFlowyields unsatisfactoryperformance(Figure8left),whilethecoarse- w to-fine method remains relatively robust (Figure 8 right). o Fl Thisshowsthatourinterpolation-basedheuristicforinitial- pic E izing the flow takes better advantage of the input matches thanacoarse-to-fineschemesforsufficientlydensematches Figure 9. Failure cases of EpicFlow due to missing matches on spearandhornsofthedragon(leftcolumn)andmissingcontours andisabletorecoverfrommatchingfailures. Wehavein- onthearm(rightcolumn). dicatedthepositionofDeepMatchingandKPMintermsof densityandqualityontheplots: theylieinsidetheareain whichEpicFlowoutperformsacoarse-to-finescheme. dataset,therearenolargedisplacements,andconsequently, 5.4.Comparisonwiththestateoftheart thebenefitsofamatching-basedapproacharelimited.Note Results on MPI-Sintel test set are given in Table 4. that we have slightly increased the number of fixed point Parameters are optimized on the MPI-Sintel training set. iterations to 25 in the variational method for this dataset EpicFlow outperforms the state of the art with a gap of (stillusingonelevel)inordertogetanadditionalsmooth- 0.5 pixel in AEE compared to the second best perform- ing effect. This leads to a gain of 0.1 pixels (measured on ing method, TF+OFM [20], and 1 pixel compared to the theMiddleburytrainingsetwhensettingtheparameterson third one, DeepFlow [34]. In particular, we improve for MPI-Sinteltrainingset). both AEE on occluded areas and AEE over all pixels and Timings. While most methods often require several min- for all displacement ranges. In addition, our approach is utestorunonasingleimagepair,oursrunsin16.4seconds significantlyfasterthanmostofthemethods, e.g.anorder for a MPI-Sintel image pair (1024 × 436 pixels) on one ofmagnitudefasterthanthesecondbest. CPU-core at 3.6Ghz. In detail, computing DeepMatching Table5reportstheresultsontheKittitestsetformeth- takes15s, extractingSEDedges0.15s, denseinterpolation odsthatdonotuseepipolargeometryorstereovision. Pa- 0.25s,andvariationalminimization1s.Wecanobservethat rameters are optimized on the Kitti training set. We can 91%ofthetimeisspentonmatching. see that EpicFlow performs best in terms of AEE on non- occludedareas. Intermofpercentageoferroneouspixels, Failure cases. EpicFlow can be incorrect due to errors in ourmethodiscompetitivewiththeotheralgorithms. When thesparsematchesorerrorsinthecontourextraction. Fig- comparing the methods on both Kitti and MPI-Sintel, we ure 9 (left column) shows an example where matches are outperformTF+OFM[20]andDeepFlow[34](secondand missing on thin elements (spear and horns of the dragon). third on MPI-Sintel) on the Kitti dataset, in particular for Thus, the optical flow takes the value of the surrounding occluded areas. We perform on par with NLTGV-SC [25] region for these elements. An example for incorrect con- onKittithatweoutperformby2.5pixelsonMPI-Sintel. tourextractionispresentedinFigure9(rightcolumn). The OntheMiddleburytestset,weobtainanAEEbelow0.4 contourofthecharacter’sleftarmispoorlydetected. Asa pixel. This is competitive with the state of the art. In this result,themotionofthearmspreadsintothebackground. 9 6.Conclusion [11] J.Canny.Acomputationalapproachtoedgedetection.IEEE Trans.PAMI,1986. 7 This paper introduces EpicFlow, a novel state-of-the- [12] Z.Chen,H.Jin,Z.Lin,S.Cohen,andY.Wu.Largedisplace- art optical flow estimation method. EpicFlow computes ment optical flow from nearest neighbor fields. In CVPR, a dense correspondence field by performing a sparse-to- 2013. 2 dense interpolation from an initial sparse set of matches, [13] A. Criminisi, T. Sharp, C. Rother, and P. Pe´rez. Geodesic leveragingcontourcuesusinganedge-awaregeodesicdis- imageandvideoediting. ACMTrans.Graph.,2010. 4 tance. The approach builds upon the assumption that con- [14] O. Demetz, M. Stoll, S. Volz, J. Weickert, and A. Bruhn. toursoftencoincidewithmotiondiscontinuities.Theresult- Learningbrightnesstransferfunctionsforthejointrecovery ing dense correspondence field is fed as an initial optical ofilluminationchangesandopticalflow. InECCV.2014. 9 flow estimate to a one-level variational energy minimiza- [15] P.Dolla´randC.L.Zitnick. Structuredforestsforfastedge tion. ExperimentalresultsshowthatEpicFlowoutperforms detection. InICCV,2013. 1,3,4,5,7 current coarse-to-fine approaches. Both the sparse set of [16] A.Geiger,P.Lenz,C.Stiller,andR.Urtasun. Visionmeets matchesandthecontourestimatesarekeytoourapproach. robotics:TheKITTIdataset. IJRR,2013. 2,5 Futureworkwillfocusonimprovingthesetwocomponents [17] R. Hartley and A. Zisserman. Multiple View Geometry in separatelyaswellasinaninterleavedmanner. ComputerVision. CambridgeUniversityPress,2003. 3 [18] K. He and J. Sun. Computing nearest-neighbor fields via Acknowledgments. propagation-assistedkd-trees. InCVPR,2012. 2,3,6 [19] B.K.P.HornandB.G.Schunck.DeterminingOpticalFlow. This work was supported by the European integrated ArtificialIntelligence,1981. 1,2 project AXES, the MSR-Inria joint centre, the LabEx [20] R.KennedyandC.J.Taylor.Opticalflowwithgeometricoc- Persyval-Lab (ANR-11-LABX-0025), the Moore-Sloan clusionestimationandfusionofmultipleframes. InEMM- DataScienceEnvironmentatNYU,andtheERCadvanced CVPR,2015. 9 grantALLEGRO. [21] P.Kra¨henbu¨hlandV.Koltun. Geodesicobjectproposals. In ECCV.2014. 4 References [22] M. Leordeanu, A. Zanfir, and C. Sminchisescu. Locally [1] R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and affinesparse-to-densematchingformotionandocclusiones- S. Susstrunk. Slic superpixels compared to state-of-the-art timation. InICCV,2013. 1,2,5,7,9 superpixelmethods. IEEETrans.PAMI,2012. 2 [23] J.Lu,H.Yang,D.Min,andM.Do. Patchmatchfilter: Ef- [2] L.Alvarez,J.Weickert,andJ.Sa´nchez. Reliableestimation ficientedge-awarefilteringmeetsrandomizedsearchforfast ofdenseopticalflowfieldswithlargedisplacements. IJCV, correspondencefieldestimation. InCVPR,2013. 2 2000. 2 [24] N. Papenberg, A. Bruhn, T. Brox, S. Didas, and J. Weick- [3] P.Arbelaez, M.Maire, C.Fowlkes, andJ.Malik. Contour ert. Highly accurate optic flow computation with theoreti- detectionandhierarchicalimagesegmentation. IEEETrans. callyjustifiedwarping. IJCV,2006. 1 PAMI,2011. 7 [25] R.Ranftl,K.Bredies,andT.Pock. Non-localtotalgeneral- [4] S. Baker, D. Scharstein, J. P. Lewis, S. Roth, M. J. Black, ized variation for optical flow estimation. In ECCV, 2014. andR.Szeliski. Adatabaseandevaluationmethodologyfor 9 opticalflow. IJCV,2011. 2,5 [26] X.Ren. Localgroupingforopticalflow. InCVPR,2008. 2 [5] L. Bao, Q. Yang, and H. Jin. Fast edge-preserving patch- [27] F. Steinbrucker, T. Pock, and D. Cremers. Large displace- matchforlargedisplacementopticalflow. IEEETrans.Im- ment optical flow computation without warping. In ICCV, ageProcessing,2014. 2 2009. 2 [6] C. Barnes, E. Shechtman, D. B. Goldman, and A. Finkel- [28] D.Sun,S.Roth,andM.J.Black. Aquantitativeanalysisof stein. The generalized PatchMatch correspondence algo- currentpracticesinopticalflowestimationandtheprinciples rithm. InECCV,2010. 1,2,6 behindthem. IJCV,2014. 1,2,9 [7] J.Braux-Zin,R.Dupont,andA.Bartoli.Ageneraldenseim- [29] D. Sun, E. B. Sudderth, and M. J. Black. Layered image agematchingframeworkcombiningdirectandfeature-based motion with explicit occlusions, temporal consistency, and costs. InICCV,2013. 2,9 depthordering. InNIPS,2010. 2 [8] T.Brox,A.Bruhn,N.Papenberg,andJ.Weickert. Highac- curacyopticalflowestimationbasedonatheoryforwarping. [30] C.Vogel,S.Roth,andK.Schindler. Anevaluationofdata InECCV,2004. 1,2,5 costsforopticalflow. InGCPR,2013. 9 [9] T.BroxandJ.Malik. Largedisplacementopticalflow: de- [31] L.Wasserman. AllofStatistics:AConciseCourseinStatis- scriptor matching in variational motion estimation. IEEE ticalInference. Springer,2010. 3 Trans.PAMI,2011. 1,2,7,8,9 [32] O. Weber, Y. S. Devir, A. M. Bronstein, M. M. Bronstein, [10] D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A and R. Kimmel. Parallel algorithms for approximation of naturalistic open source movie for optical flow evaluation. distancemapsonparametricsurfaces. ACMTrans.Graph., InECCV,2012. 1,2,5 2008. 4 10