Total variational optical flow for robust and accurate bladder image mosaicing Sharib Ali To cite this version: Sharib Ali. Total variational optical flow for robust and accurate bladder image mosaicing. Image Processing [eess.IV]. Université de Lorraine, 2016. English. NNT: 2016LORR0006. tel-01754509v2 HAL Id: tel-01754509 https://theses.hal.science/tel-01754509v2 Submitted on 30 Jan 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. E´cole doctorale IAEM Lorraine Total variational optical flow for robust and accurate bladder image mosaicing ` THESE pr´esent´ee et soutenue publiquement le 4 Janvier 2016 pour l’obtention du Doctorat de l’Universit´e de Lorraine (Mention: Automatique, Traitement du Signal et des Images, G´enie Informatique) par Sharib ALI Composition du jury: Rapporteurs : C´edric DEMONCEAUX PU, Universit´e de Bourgogne Franche-Comt´e Laboratoire Le2i, UMR CNRS 6306 Jo˜ao BARETTO PU, Universit´e de Coimbra, Portugal Electrical and Computer Engineering Department Examinateurs : Adrien BARTOLI PU, Universit´e d’Auvergne ISIT, UMR CNRS 6284 CENTI, Facult´e de M´edecine Franc¸ois GUILLEMIN PU-PH, Institut de Canc´erologie Jean Godinot, Reims Invit´es : Pascal ESCHWEGE PU-PH, CHU Nancy-Hˆopitaux de Brabois Isma¨el DIDELON SD-Innovation, Frouard Directeur de th`ese: : Christian DAUL PU, Universit´e de Lorraine CRAN, UMR CNRS 7039 Co-Directeur de th`ese: : Walter BLONDEL PU, Universit´e de Lorraine CRAN, UMR CNRS 7039 Centre de Recherche en Automatique de Nancy UMR 7039 Universit´e de Lorraine - CNRS avenue de la forˆet de Haye 54516 Vandoeuvre-l`es-Nancy Tel : +33 (0)3 83 59 59 59 Fax : +33 (0)3 83 59 56 44 Acknowledgment First of all, I would like to express my deepest gratitude towards my advisors Prof. Christian Daul and Prof. Walter Blondel. Without their joint motivation and support, this thesis work would not have been possible. Both of them were very patient and helpful at the same time. I had also an opportunity to learn and enjoy teaching with Prof. Daul. His hard work and passion in both research and teaching have inspired me a lot. He has been constantly encouraging me till the final submission of my thesis. I am overwhelmed by his consistent support throughout my thesis duration. I definitely look forward to working with you in future. I would like to give my sincere thanks to Prof. François Guillemin for helping me understand thecystoscopicdataduringmyfirstyearofPh.D.andalsoforacceptingtobeinmyexamination board. I am very thankful to all the jury members and the invitees for agreeing to participate in my thesis defense. I sincerely would like to thank Prof. Cédric Demonceaux and Prof. João Baretto for reviewing this manuscript and Prof. Adrien Bartoli for accepting to be in the exam- ination board. I must not forget to thank Prof. Fabrice Meriaudeau for being there whenever i needed some professional and personal advises. I would like to mention and thank Ernest for his time in helping me figure out some issues with software programming. I have learnt some very useful tips from him. I would also like to thank Marine for sharing some patient data with me. She has also been quite positive and a helping hand in data acquisition at CRAN. I would like to thank Christine, Carole and all my collegues at CRAN for being wonderful, nice and helpful. I must admit that I will miss you all and I wish you all a very marvelous time in your lives. Iwouldliketoacknowledgeandthanktheconstantsupportofmyfamilyandfriends. Iwould like to thank Binod for giving me shelter for few days during my thesis writing. Yes, i have been jobless and homeless during first weeks of October. I would like to convey my special thanks to one of my most wonderful friend Mariia, who not only patiently listened to my thesis writing frustrations at times but also encouraged me to do my best. She also helped me with my English corrections. Thank you Mariia for your patience and going through my this manuscript. I hope you enjoyed it at times too. I must not miss my brother Shakir Ali and my dearest sister Mona, they both have been the greatest blessings to me. Their constant support and encouragement have always helped me a lot to become who I am today. Thank you both of you. This thesis is also for you all. i I would like to dedicate this thesis to my late mom Samsun Nesha. ii List of Tables 2.1 Variants of gradient-based assumptions according to [Papenberg et al., 2006]. . . 37 2.2 Average end-point error, in pixels and average angular error, in degrees (AEPE) AAE are given for some well known TV-L1 based methods on Middlebury test dataset. Themethodsarepresentedintheorderoftheirrankonthisbenchmarkingfortest dataset(referfordetails: http://vision.middlebury.edu/flow/eval/results/ results-e1.php). An average value (avg.) is also provided for validating the algorithm tolerance to varying textures in this dataset. . . . . . . . . . . . . . . . 62 3.1 AEPE/AAE(inpixels/indegrees)degrees)givenfordifferentTVmethodsapplied on the Middlebury data-base. AAE and AEPE are mathematically defined in Chapter 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.2 Standard deviation in registration parameters of phantom sequence I. First two column presents the standard deviation from constant translation t2 +t2 for x y approximately 20 pixels and 50 pixels. Last four columns present t(cid:113)he deviation from constant pure in-plane rotations for values 1◦, 3◦, 5◦ and 7◦. . . . . . . . . . 81 3.3 Homography parameter values for simulated sequences I and II. The RGB color channels are blurred for some images of simulated sequence-I. The value of the standard deviation σ of the gaussian function used for blurring [Chadebecq blurr et al., 2012] is 2.5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.4 Parameter settings for the optical flow determination. The dense point corre- spondence is used to determine the homography parameters. NA stands for not applicable parameter to a method. . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.5 Mean registration ((cid:15)ˆ ) and mosaicing ((cid:15)ˆ ) errors in pixels for 50 image pairs i,i+1 0,50 of the simulated video sequence-I. Bladder simulated video sequences without (σ = 0) and with (σ = 2.5) additional Gaussian blur (depicting defo- blurr blurr cus/refocus in cystoscopy [Chadebecq et al., 2012]) are used for quantifying the robustness of the methods. Mean registration time for image pairs with size of 512 512 pixels are also presented (CPU implementation time for images under × no blur). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.6 Errors obtained for the methods compared on simulated sequence-II. Mean reg- istration time of image pairs with size 400 400 pixels for CPU implementation × are also given unless GPU mentioned. . . . . . . . . . . . . . . . . . . . . . . . . 87 3.7 Error quantification (EPE/AAE in pixels and degrees respectively) for the Grove 3 image pair with and without illumination change. . . . . . . . . . . . . . . . . . 90 iii 4.1 Comparison of state-of-the-art methods with the proposed ROF-NND method withtheoverallAEPE(inpixels)andAAE(indegrees)ontheMiddleburyoptical flow benchmark [Baker et al., 2011]. Runtimes are provided for the Urban image pairunderCPUimplementationunlessmentionedasGPU.Averagerankingispro- vided online at http://vision.middlebury.edu/flow/eval/results-e1.php. . 108 4.2 KITTI flow benchmark comparition for the proposed and existing reference state- of-the-art methods (without incorporating stereo-matching or epipolar geometry). Average runtimes (t) are given for CPU implementation unless mentioned. For details see also http://www.cvlibs.net/datasets/kitti/eval_stereo_flow. php?benchmark=flow. Noc represents errors evaluated for non-occluded regions and occ represents errors evaluated for all image pixels including occlusion. The % of bad pixels and the AEPE for the pixels at AEPE threshold of 3 pixels are given. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 4.3 PerformanceonMPISintelbenchmark((http://sintel.is.tue.mpg.de/results)). Only methods with variational approach implementation are shown here for final pass dataset. The column “s0-10" and “s10-40" represents the AEPE over regions with flow vector magnitudes ranging in [0,10] pixels and [10,40] pixels. Average runtime (t) is given for CPU implementation unless mentioned. NA stands for not applicable. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 4.4 Percentage of bad pixels and AEPE value (in brackets), at AEPE threshold of 3 pixels, for the state-of-the-art methods and the proposed ROF-NND method. Theresultsaregivenforthenon-occludedgroundtruthofthefourKITTItraining image sequences (#11, #15, #44 and #74 ) which include illumination changes.114 4.5 Large displacements tests on four training image sequences of the KITTI dataset (pair number 117, 144, 147 and 181) with non-occluded ground truth results. Two criteria at error threshold of 3 pixels (percentage of bad pixels and the AEPE value given in brackets), are used to evaluate state-of-the-art methods and the proposed ROF-NND method. See also http://www.dagm.de/symposien/ special-sessions/ for such large displacements tests). . . . . . . . . . . . . . . 115 4.6 Htrue homography parameter intervals used for computing the displacements be- i,i+1 tween consecutive images. θ, s ,s , f ,f , t ,t and h ,h are the in- x y x y x y 1 2 { } { } { } { } plane rotation, shear, scale, 2D translation and perspective parameters respectively.117 4.7 Registration and mosaicing results obtained for datset “data I” (human skin ep- ithelium). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 4.8 Registration and mosaicing results for dataset “data-II” (pig bladder phantom). . 119 iv List of Figures 1 Exemple de mosaïque de l’épithelium d’une vessie. Ce champ étendu a été cal- culé avec 900 images d’une vidéo-séquence d’une durée de 43 secondes. Le cercle blanc en pointillés rep ésente la première image et correspond au champ de vue de l’endoscope. Cette vidéo-séquence (données patient) a été acquise avec un cystoscope rigide durant une procédure clinique standard en lumière blanche. . . xviii 2 Estimation précise du flot optique pour la séquence “marble” (voir le lien http: //i21www.ira.uka.de/image_sequences/). (a) Image 10 de la séquence, (b) image 20 de la séquence, (c) vérité terrain correspondant au flot optique entre (a) et (b). La teinte représente l’orientation des vecteurs et la saturation de la couleur code la longueur des vecteurs, (d) résultats obtenus avec la méthode variationnelle totale classique basée sur la norme l1, (e) résultats avec la méthode RFLOW proposée et (f) flot optique obtenu avec la méthode ROF-NDD proposée. xxi 3 Estimation de points homologues dans les images de la vessie. (a) Image source I (image avec le numéro i+1 dans la séquence. (b) Image cible (iième image i+1 de la séquence). La cible a été floutée et la valeur moyenne de ses niveaux de gris abaissée (image assombrie par rapport à celle dans (a)). (c) Champ de déplace- ments obtenu avec la méthode variationnelle totale classique basée sur la norme l1. Les vecteurs du champ (flèches) sont visualisés tous les cinq pixels dans les directions x et y des axes des images. (d) Champ de déplacements obtenu avec le mod`le proposé. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii 4 Paire d’images utilisées pour les tests de robustesse vis-`‘a-vis des changements d’illumination. (a) Image originale textures “Grove3” (source). (b) Image cible texturée (“Grove3”) dans laquelle des changements importants d’illuminations ont été simulés. (c) Vérité terrain donnant le flow optique exacte entre (a) et (b). (d) Flot optique déterminé avec la méthode ROF-NDD pour la paire d’images dans (a) et (b). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii 5 Mosaïquesd’imagesdevessieacquisessousdeuxdifférentesmodalités. (a)Mosaïque construite avec 200 images acquises en lumière blanche. Certaines images sont floues ou affectées par de d’importantes réflexions spéculiares. (b) Mosaïque (modalité de fluorescence) qui visualise une région d’intérêt après une une ré- section transurétrale. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiii 6 Bladder epithelium mosaic (extended FOV) using 500 frames of a 20 seconds video- sequence. The black dashed lines represent the camera trajectory and a red rectangle region represents the starting frame (also low FOV seen through cystocopy). . . . . . . xxv v 1.1 Bladder wall layers and different stagings of carcinoma. Courtesy: The Urology Group (http://www.urologygroupvirginia.com/). . . . . . . . . . . . . . . . . 2 1.2 Sketch illustrating a cystoscopic examination procedure. A cystoscope is inserted into the bladder via the urethra. The images of the acquired video-sequence are displayed on a screen and appear in circular and small field of view (FOV). The imageshowninthisdiagramwasobtainedforawhitelightsource. Courtesy: The Urology Care Foundation (http://www.urologyhealth.org/). . . . . . . . . . . 4 1.3 Endoscopesusedinurology. a)Rigidcystoscope, KarlStorzcompany. b)Flexible cystoscope (EndoEYE model from the Olympus company). . . . . . . . . . . . . 4 1.4 Image mosaicing framework. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 Examplesofimagesobtainedfromdifferentendoscopicapplications. (a-b)urinary bladder (WL and FL cystoscopy), (c) near urethral opening (WL cystoscopy), (d) esophagus (gastroscopy), (e) stomach (gastroscopy), (f) larynx (laryngoscopy), (g) pituitary gland (endo-nasal neuro surgery), (h) colon polyp (colonoscopy) and (i) microscopic image of cardiac type epithelium in vivo (confocal laser endomi- croscopy, CLE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.6 Image mosaics obtained for different endoscopic applications given in Fig. 1.5. (a) 2D large extended FOV mosaic for images acquired with a WL cystoscope [Hernandez-Mier et al., 2010], (b) 2D planar panoramic image built generated in real-time using FL cystoscopy video frames [Behrens et al., 2009], (c) 2D image mosaic representing a quasi-planar surface near the urethral opening [Ali et al., 2013b], (d-e) mosaic of gastroscopic quasi-planar image sequences showing ex- tended FOV around angularis [Liu et al., 2015] and pylorus regions, (f) image mosaic of larynx generated with a general-purpose stitching software [Schuster et al., 2012], (g) real time view expansion of an endo-nasal region [Berger et al., 2013], (h) 3D reconstruction of polyp region using a shape-from-motion approach for a colonoscopic image aquisition set-up [Koppel et al., 2007] and (i) extended FOV mosaic of CLE for round cardiac type epithelium in vivo [Vercauteren, 2008]. 9 1.7 Acquisition of cystoscopic data and image texture variability illustration. (a) Schematic sketch of the bladder scene (b) Example of an image with contrasted texture(c)Imagewithvignettingeffect(d)Examplewithweakcontrastedtexture image (e) Image with motion blur. . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.8 Background removal using median filtering. a) Original image, b) background image obtained with median filtering technique, c) estimated mask from the back- ground image in (b), d) pixel-wise difference of the original gray level image of (a) and the low-pass filtered image (median filter) in (b), e) image normalized to zero mean with pixels from FOV mask only shown in (c), (f) intensity profile V along the red line shown in (a) for the images (b), (d) and (e). . . . . . . . . . . . . . . 13 1.9 Contrast enhancement in cystoscopic image sequences. (a) poor contrast image (I ) due to view-point, b) target image to which image in (a) need to be regis- test tered, c) reference image (I ), d) enhanced image (I ), (e) Singular values ref enhanced profile V(σ) in good and bad contrast images. . . . . . . . . . . . . . . . . . . . . 16 vi 1.10 Mosaicing results based on feature point extraction with and without image pre- processing. a) Impact of large brightness variability affecting few images (the two first images on the right are underexposed). b)The alignment errors are indicated by the arrows which point regions where the textures (vessels) of two images should be perfectly superimposed (these structures are infact shifted). c) Mosaic after SVD enhancement of images. d) After illumination correction with the SVD technique, the structures of the two first images are now perfectly superimposed. 17 1.11 A 3D point X lying on a plane π has projection x on image I an x on image j, i i j (j = i+1 for consecutive image pairs). These points are projectively equivalent and can be mapped by a 2D homography Hπ which can be used to express the i,j points of I in the coordinate system of I . . . . . . . . . . . . . . . . . . . . . . . 18 j i 1.12 An image I showing set of selected points such that points chosen for homography estimation using the 4-point DLT algorithm are well-distributed in the image. . . 18 1.13 Feature extraction under different image quality/texture conditions [Ali et al., 2013b]. a) Contrasted texture: dense matching with SURF feature extraction technique. This image shows its own extracted feature points (in green) and the successfully extracted and matched feature points (in red) of a second contrasted bladder images. b)Blurred textures: the sparse and undistributed feature match- ing with SURF is illustrated by the too few green-red mark pairs. Registering the image with this poor information lead to inaccurate results. c) Dense correspon- dence with variational optical flow method on image (b). Red points correspond again to the key feature points in the target image and green points represents the key feature points in the source image. Yellow line connects the matched key feature points in the target and the source images overlaid. . . . . . . . . . . . . 21 1.14 Two composited maps before and after blending. a, c) Without blending, b, d) with Laplacian-Gaussian blending technique described in [Burt and Adelson, 1983, Szeliski, 2006]. Intensity discrepancies along the image transitions during stitching are diminished in the blended mosaics. To limit the contrast expansion with the Laplacian blending algorithm, the background of the blended mosaic has been subtracted form the Laplacian blended mosaic with the weight of 0.1. Structures present in the mosaic are preserved and enhanced while keeping the original texture. In (d) the small structures in the red circles are preserved by the blending technique. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.1 Representation of motion in video-data. a) Video-frame as a function of space (x,y) and time t, b) 2D displacement (u,v) in between consecutive video frames. 30 2.2 Illustration of temporal alliasing effects on optical flow. (a) Small motion giving correct nearest match, i.e. no aliasing. (b) Large motion, nearest match is not correct due to aliasing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3 Illustration of the ambiguity of the OFC equation. Left: Aperture problem, only flow field (u,v) normal to the edge (denoted by solid arrow) can be computed. However, the two other (dashed) arrows can also represent the actual solution. Right: in images regions without intensity variations all solutions are possible for vector (u,v). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.4 Illustration of intensity order transforms (b-e) in a 3 3 neighborhood patch 3×3 × P around the pixel of interest marked in grey. a) Intensity of the original pixels in , (b) Rank, (c) Census, (d) Complete rank and (e) Complete census. . . . . 40 3×3 P vii 2.5 Illustration of bahaviour of penalty functions for the spatial terms. In red: Horn and Schunk quadratic term (over-smoothing effect). In blue: Charbonnier convex forulation with (p = 0.5). In blue dashed: Charbonnier non-convex penalty with p < 0.5. In magenta: Lorenzian function with σ = 0.03. . . . . . . . . . . . . . . 42 2.6 Classification of regularizers used in various optical flow models. . . . . . . . . . 44 2.7 Illustration of convexity concept. (a) A convex set. (b) A non-convex (concave) set; (c) A convex function F : Rn R is represented by a curve and the line in → green represents the convex set of points between the points v and v . Linear 1 2 combination of points which is present on this line gives the convex set in the function domain F(v )and F(v ) representing the curve. . . . . . . . . . . . . . . 47 1 2 2.8 a) F(v) is convex and differentiable so F(v) F(v )+ F(v )T(v v ) . b) A 1 1 1 ≥ ∇ − function F : Rn R, and a value v∗ R such that it represents the slope of the → ∈ function F and the conjugate function F∗(v∗) is the maximum gap. . . . . . . . . 48 2.9 Limitationsofgradientdescenttechniquesfornon-convexandconvexnon-differentiable functions. (a) Gradient descent giving local minimal solution for non-convex ap- proach. b) Non-differentiability in convex functions (usually all norms) being modeled as differentiable function by adding a small constant (cid:15), TV(v) being a 1D representation of the function. . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.10 Blob and vessel measures determined by the magnitudes of Eigen values. (a) Blob like structure is obtained when λ λ ). (b) Elongated vessel structure (with + − ∼ λ >> λ ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 + − 2.11 Structure estimate and its gradient. (a, d) Original image of classical scene and bladder scene. (b, e) Structure estimates of (a) and (d) respectively and (c, f) respective gradient images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 2.12 Flow color code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 2.13 Visual validation of the improvement of classical TV-L1 algorithm. (a, b) Frame 16 and 17 respectively of Marble sequence, (c) ground truth flow between (a) and (b),(d)flowfieldobtainedwiththeclassicalHorn-Schunckapproach, (e)flowfield obtained with the classical TV-L1 algorithm and (f)flow field obtained with the structure estimate (RFLOW). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.14 Results on the Middlebury test image sequences. The images (a, d, g) of the first column represents the Ground Truth, the second column (b, e, h) corresponds to the optical flow estimation by the TV-L1-improved and the third column (c, f, i) represents the optical flow estimation using the proposed model. Flow errors are shown adjacent to the color representation of OF field. . . . . . . . . . . . . . . . 62 2.15 Optical flow estimation using the proposed method for dynamic scenes in Middle- bury data-set. (a-c) Backyard sequence, (d-e) Basketball sequence. The images pairsareinthefirstandlastcolumnandtheflowfieldisgiveninthecentralcolumn. 63 2.16 Homologous point estimation in WL modality. a) Source image I , b) target 2 image I , c) displacement field obtained using classical TV-l1 method [Pock et al., 1 2007], d)displacementfieldwiththeproposedmodel. Targetimageisblurredand darkened relative to the source image. Flow vectors (arrows) at every 5th pixel in x and y directions are shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 viii
Description: