Lecture Notes in Computer Science 7082 CommencedPublicationin1973 FoundingandFormerSeriesEditors: GerhardGoos,JurisHartmanis,andJanvanLeeuwen EditorialBoard DavidHutchison LancasterUniversity,UK TakeoKanade CarnegieMellonUniversity,Pittsburgh,PA,USA JosefKittler UniversityofSurrey,Guildford,UK JonM.Kleinberg CornellUniversity,Ithaca,NY,USA AlfredKobsa UniversityofCalifornia,Irvine,CA,USA FriedemannMattern ETHZurich,Switzerland JohnC.Mitchell StanfordUniversity,CA,USA MoniNaor WeizmannInstituteofScience,Rehovot,Israel OscarNierstrasz UniversityofBern,Switzerland C.PanduRangan IndianInstituteofTechnology,Madras,India BernhardSteffen TUDortmundUniversity,Germany MadhuSudan MicrosoftResearch,Cambridge,MA,USA DemetriTerzopoulos UniversityofCalifornia,LosAngeles,CA,USA DougTygar UniversityofCalifornia,Berkeley,CA,USA GerhardWeikum MaxPlanckInstituteforInformatics,Saarbruecken,Germany Daniel Cremers Marcus Magnor Martin R. Oswald Lihi Zelnik-Manor (Eds.) Video Processing and Computational Video International Seminar Dagstuhl Castle, Germany, October 10-15, 2010 Revised Papers 1 3 VolumeEditors DanielCremers TechnischeUniversitätMünchen,Germany E-mail:[email protected] MarcusMagnor TechnischeUniversitätBraunschweig,Germany E-mail:[email protected] MartinR.Oswald TechnischeUniversitätMünchen,Germany E-mail:[email protected] LihiZelnik-Manor TheTechnion,IsraelInstituteofTechnology,Haifa,Israel E-mail:[email protected] ISSN0302-9743 e-ISSN1611-3349 ISBN978-3-642-24869-6 e-ISBN978-3-642-24870-2 DOI10.1007/978-3-642-24870-2 SpringerHeidelbergDordrechtLondonNewYork LibraryofCongressControlNumber:2011938798 CRSubjectClassification(1998):I.4,I.2.10,I.5.4-5,F.2.2,I.3.5 LNCSSublibrary:SL6–ImageProcessing,ComputerVision,PatternRecognition, andGraphics ©Springer-VerlagBerlinHeidelberg2011 Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,re-useofillustrations,recitation,broadcasting, reproductiononmicrofilmsorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9,1965, initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violationsareliable toprosecutionundertheGermanCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnotimply, evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotectivelaws andregulationsandthereforefreeforgeneraluse. Typesetting:Camera-readybyauthor,dataconversionbyScientificPublishingServices,Chennai,India Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface With the swift development of video imaging technology and the, drastic im- provementsinCPUspeedandmemory,bothvideoprocessingandcomputational video are becoming more and more popular. Similar to the digital revolution in photography of fifteen years ago, today digital methods are revolutionizing the waytelevisionandmoviesarebeingmade.Withtheadventofprofessionaldigital movie cameras, digital projector technology for movie theaters, and 3D movies, the movie and television production pipeline is turning all-digital, opening up numerous new opportunities for the way dynamic scenes are acquired, video footage can be edited, and visual media may be experienced. This book provides a compilation of selected articles resulting from a wor- shopon“VideoProcessingandComputationalVideo”,heldatDagstuhlCastle, Germany in October 2010. During this workshop, 43 researchers from all over the worlddiscussedthestate ofthe art,contemporarychallenges,andfuture re- searchin imaging,processing,analyzing,modeling, and rendering of real-world, dynamicscenes.Theseminarwasorganizedinto11sessionsofpresentations,dis- cussions, and special-topic meetings. The seminar brought together junior and seniorresearchersfromcomputer vision,computer graphics,and imagecommu- nication, both from academia and industry to address the challenges in compu- tational video. Forfivedays,workshopparticipantsdiscussedtheimpactofaswellastheop- portunitiesarisingfromdigitalvideoacquisition,processing,representation,and display. Over the course of the seminar, the participants addressed contempo- rarychallengesindigitalTVandmovieproduction;pointedatnewopportunities in an all-digital production pipeline; discussed novel ways to acquire; represent andexperiencedynamiccontent;accruedawish-listforfuture videoequipment; proposed new ways to interact with visual content; and debated possible future mass-marketapplications for computational video. Viable research areas in computational video identified during the seminar included motion capture of faces, non-rigid surfaces, and entire performances; reconstruction and modeling of non-rigid objects; acquisition of scene illumina- tion, time-of-flight cameras;motion field and segmentation estimation for video editing; as well as free-viewpoint navigation and video-based rendering. With regardto technologicalchallenges,seminar participants agreedthat the “rolling shutter” effect of CMOS-based video imagers currently poses a serious prob- lem for existing computer vision algorithms. It is expected, however, that this problem will be overcome by future video imaging technology. Another item on the seminar participants’ wish list for future camera hardware concerned high frame-rate acquisition to enable more robust motion field estimation or time- multiplexed acquisition. Finally, it was expected that plenoptic cameraswill hit VI Preface the commercial market within the next few years, allowing for advanced post- processingfeaturessuchasvariabledepth-of-field,stereopsis,ormotionparallax. The papers presented in these post-workshop proceedings were carefully se- lected througha blind peer-reviewprocess with three independent reviewersfor each paper. WearegratefultothepeopleatDagstuhlCastleforsupportingthisseminar. We thank all participants for their talks and contributions to discussions and all authors who contributed to this book. Moreover, we thank all reviewers for their elaborate assessment and constructive criticism, which helped to further improve the quality of the presented articles. August 2011 Daniel Cremers Marcus Magnor Martin R. Oswald Lihi Zelnik-Manor Table of Contents Video Processing and Computational Video Towards Plenoptic Raumzeit Reconstruction......................... 1 Martin Eisemann, Felix Klose, and Marcus Magnor Two Algorithms for Motion Estimation from Alternate Exposure Images ......................................................... 25 Anita Sellent, Martin Eisemann, and Marcus Magnor UnderstandingWhatWeCannotSee:AutomaticAnalysisof4DDigital In-Line Holographic Microscopy Data............................... 52 Laura Leal-Taix´e, Matthias Heydt, Axel Rosenhahn, and Bodo Rosenhahn 3D Reconstruction and Video-Based Rendering of Casually Captured Videos.......................................................... 77 Aparna Taneja, Luca Ballan, Jens Puwein, Gabriel J. Brostow, and Marc Pollefeys Silhouette-Based Variational Methods for Single View Reconstruction .................................................. 104 Eno T¨oppe, Martin R. Oswald, Daniel Cremers, and Carsten Rother Single Image Blind Deconvolution with Higher-Order Texture Statistics ....................................................... 124 Manuel Martinello and Paolo Favaro Compressive Rendering of Multidimensional Scenes................... 152 Pradeep Sen, Soheil Darabi, and Lei Xiao Efficient Rendering of Light Field Images ........................... 184 Daniel Jung and Reinhard Koch Author Index.................................................. 213 Towards Plenoptic Raumzeit Reconstruction Martin Eisemann, Felix Klose, and Marcus Magnor Computer Graphics Lab, TU Braunschweig, Germany Abstract. Thegoalofimage-basedrenderingistoevokeavisceralsense ofpresenceinasceneusingonlyphotographsorvideos.Ahugevarietyof differentapproacheshavebeendevelopedduringthelast decade.Exam- iningtheunderlyingmodelswefindthreedifferentmaincategories:view interpolation based on geometry proxies, pureimage interpolation tech- niquesand complete sceneflow reconstruction. In thispaperwe present threeapproachesforfree-viewpointvideo,oneforeachofthesecategories anddiscusstheirindividualbenefitsanddrawbacks.Wehopethatstudy- ingthedifferentapproacheswillhelpothersinmakingimportantdesign decisions when planning a free-viewpoint videosystem. Keywords: Free-Viewpoint Video, Image-Based Rendering, Dynamic Scene-Reconstruction. 1 Introduction As humans we perceive most of our surroundings through our eyes, and visual stimuli affect all of our senses, drive emotion, arouse memories and much more. That is one of the reasons why we like to look at pictures. A major revolution occurred with the introduction of moving images, or videos. The dimension of time was suddenly added, which gave incredible freedom to film- and movie makers to tell their story to the audience. With more powerful hardware, computation power and clever algorithms we are now able to add a new dimension to videos, namely the third spatial di- mension. This will give users or producers the possibility to change the camera viewpointonthefly.Butthereisadifferencebetweenthespatialdimensionand time. While free-viewpointvideo allowsto change the viewpoint notonly to po- sitionscapturedbytheinputcamerasbutalsotoanyotherpositionin-between, the dimension of time is usually only captured at discrete time steps, caused by the recording framerate of the input cameras. For a complete scene flow repre- sentation, not only space but also time needs to be reconstructed faithfully. In this paper we present three different approaches for free-viewpoint video and space-time interpolation. After reviewing related work in the next section we will continue with our Floating Textures [1] in Section 3 as an example for highqualityfree-viewpointvideowithdiscretetimesteps.Foreachdiscretetime stepa geometryof the scene is reconstructedandtexturedby multiview projec- tive texture mapping, as this process is error-pronewe present a warping-based refinement to correct for the resulting artifacts. In Section 4 we will describe D.Cremersetal.(Eds.):VideoProcessingandComputationalVideo,LNCS7082,pp.1–24,2011. (cid:2)c Springer-VerlagBerlinHeidelberg2011 2 M. Eisemann, F. Klose, and M. Magnor the transition from discrete to continuous space-time interpolation, by discard- ing the geometry and working on images correspondences alone we can create perceptually plausible image interpolations [2,3,4]. As purely image-based ap- proaches place restrictions on the viewing position, we introduce an algorithm towardscompletesceneflowreconstructioninSection5[5].Allthreeapproaches havetheirbenefits anddrawbacksandthe choiceshouldalwaysbe basedonthe requirements one has for his application. 2 Related Work In a slightly simplified version,the plenoptic function P(x,y,z,θ,φ,t) describes radianceasafunctionof3-Dpositioninspace(x,y,z),direction(θ,φ) andtime t [6]. With sufficiently many input views a direct reconstructionof this function ispossible.Initiallydevelopedforstaticsceneslightfieldrendering[7]ispossibly the most puristic and closest variant to direct resampling. Light field rendering can be directly extended to incorporate discrete [8,9] or even continuous time steps[10].Tocoveralargerdegreeofviewinganglesatacceptableimagequality, however, a large number of densely packed images are necessary [11,12,13]. By employing a prefiltering step the number of necessary samples can be reduced, but at the cost of more blurry output images [14,15,16]. Wider camera spacings require more sophisticated interpolation techniques. One possibility is the incorporation of a geometry proxy. Given the input im- ages a 3D representation for each discrete time step is reconstructed and used for depth guided resampling of the plenoptic function [17,18,19,20,21,22]. For restricted scene setups the incorporation of template models proves beneficial [23,24,25].Onlyfewapproachesreconstructatemporallyconsistentmesh,which allows for continuous time interpolation [26,27]. To deal with insufficient recon- structionaccuracyAganjet al.[28]andTakaiet al.[29]deformtheinputimages or both the mesh and the input images, respectively, to diminish rendering ar- tifacts. Unfortunately none of these approaches allow for real-time rendering without a time intensive preprocessing phase. Insteadofreconstructinga geometryproxy,purely image-basedinterpolation techniques rely only onimagecorrespondences.If complete correspondencesbe- tween image pixels can be established, accurate image warping becomes possi- ble [30].Mark et al. [31]follow the seminalapproachof Chen et al. [30] but also handle occlusion and discontinuities during rendering.While useful to speed up renderingperformance,theirapproachesareonlyapplicableto syntheticscenes. Beieret al. [32]proposeto use a manuallyguidedline-basedwarpingmethod to interpolate between two images, known from its use in Michael Jackson’s music video”Black&White”.Aphysically-validviewsynthesisbyimageinterpolation is proposed by Seitz et al. [33,34]. For very similar images, optical flow techniques have proven useful [35,36]. Highlypreciseapproachesexist,whichcanbecomputedatreal-timeor(almost) interactive rates [37,38,39]. Einarsson et al. [40] created a complete acquisition system, the so-called light stage 6, for acquiring and relighting human locomo- tion.Due tothehighamountofimagesacquiredtheycoulddirectlyincorporate Towards Plenoptic Raumzeit Reconstruction 3 optical flow techniques to create virtual camera views in a light field renderer, by direct warping of the input images. Correspondence estimation is only one part of an image-based renderer. The imageinterpolationitselfisanothercriticalpart.Fitzgibbonetal.[41]useimage- based priors, i.e., they enforce similarity to the input images, to remove any ghosting artifacts. Drawbacks are very long computation times and the input images must be relatively similar in order to achieve good results. Mahajan et al. [42] proposed a method for plausible image interpolation that searches for the optimal path for a pixel transitioning from one image to the other in the gradientdomain.Aseachoutputpixelintheinterpolatedviewistakenfromonly a single source image, ghosting or blurring artifacts are avoided, but if wrong correspondences are estimated unaesthetic deformations might occur. Linz et al. [43] extend the approach of Mahajan et al. [42] to space-time interpolation withmulti-imageinterpolationbasedonGraph-cutsandsymmetricopticalflow. In the unstructured video rendering by Ballan et al. [44] the static background ofasceneisdirectlyreconstructed,whiletheactorintheforegroundisprojected onto a billboard and the view switches at a specific point between the cameras where the transition is least visible. 3 Floating Textures Image-based rendering (IBR) systems using a geometry proxy have the benefit of free camera movement for each reconstructed time step. The drawback is that any reconstruction method with an insufficient number of input images is imprecise. While this may be no big problem when looking at the mesh alone, it will become rather obvious when the mesh is to be textured again. Therefore the challengeisgeneratinga perceptuallyplausible renderingwith onlyasparse setup of cameras and a possibly imperfect geometry proxy. Commonly in IBR the full bidirectional reflectance distribution function, i.e. how a point on a surface appears depending on the viewpoint and light- ing, is approximated by projective texture mapping [45] and image blending. Typically the blending factors are based on the angular deviation of the view vector to capture view-dependent effects. If too few input images are available or the geometry is too imprecise, ghosting artifacts will appear as the projected textures will not match on the surface. In this section we assume that a set of input images, the corresponding,pos- sibly imprecise, calibrationdata and a geometry proxy are given. The task is to findawaytotexturethisproxyagainwithoutnoticeableartifactsandshadowing the imprecision of the underlying geometry. Without occlusion, any novel viewpoint can, in theory, be rendered directly from the input images by warping, i.e. by simply deforming the images, so that the following property holds: Ij =WI →I ◦Ii , (1) i j where WI →I ◦Ii warps an image Ii towards Ij according to the warp field i j WI →I .TheproblemofdeterminingthewarpfieldWI →I betweentwoimages i j i j 4 M. Eisemann, F. Klose, and M. Magnor !t Fig.1. Rendering with Floating Textures [1]. The input photos are projected from camera positions Ci onto the approximate geometry GA and onto the desired image plane of viewpoint V. The resulting intermediate images Ivi exhibit mismatch which is compensated by warping all Ivi based on the optical flow to obtain the final image IvFloat. Ii, Ij is a heavily researched area in computer graphics and vision. If pixel distances betweencorrespondingimage features arenottoo large,algorithmsto robustly estimate per-pixel optical flow are available [37,46]. The issue here is that in most cases these distances will be too large. Inordertorelaxthecorrespondencefindingproblem,theproblemcanliterally be projected into another space, namely the output image domain. By first projecting the photographs from cameras Ci onto the approximate geometry surface GA and rendering the scene from the desired viewpoint Cv, creating the intermediate images Iv, the correspondingimage features are broughtmuch i closer together than they have been in the originalinput images,Figure 1. This opens up the possibility of using optical flow estimation to the intermediate imagesIvi torobustlydeterminethepairwiseflowfieldsWIv→Iv.Tocompensate i j formorethantwoinputimages,alinearcombinationoftheflowfieldsaccording to (3) can be applied to all intermediate images Iv, which can then be blended i togethertoobtainthefinalrenderingresultIv .Toreducecomputationalcost, Float instead of establishing for n input photos (n−1)n flow fields, it often suffices to consider only the 3 closest input images to the current viewpoint. If more than 3 input images are needed, the quadratic effort can be reduced to linear complexity by using intermediate results. Theprocessingstepsaresummarizedinthefollowingfunctionsandvisualized in Figure 1: (cid:2)n IvFloat = (WIvi ◦Ivi)ωi (2) i=1 (cid:2)n WIv = ωjWIv→Iv (3) i i j j=1