SIMD and GPU-Accelerated Rendering of Implicit Models by Pourya Shirazian Iran University of Science and Technology, 2008 Azad University of Tehran, 2005 A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY in the Department of Computer Science (cid:13)c Pourya Shirazian, 2014 University of Victoria All rights reserved. This dissertation may not be reproduced in whole or in part, by photocopying or other means, without the permission of the author. ii SIMD and GPU-Accelerated Rendering of Implicit Models by Pourya Shirazian Iran University of Science and Technology, 2008 Azad University of Tehran, 2005 Supervisory Committee Dr. Brian Wyvill, Supervisor (Department of Computer Science, University of Victoria) Dr. Roy Eagleson, Outside Member (Department of Electrical and Computer Engineering, Western University) Dr. Paul Lalonde, Departmental Member (Department of Computer Science, University of Victoria) iii Supervisory Committee Dr. Brian Wyvill, Supervisor (Department of Computer Science, University of Victoria) Dr. Roy Eagleson, Outside Member (Department of Electrical and Computer Engineering, Western University) Dr. Paul Lalonde, Departmental Member (Department of Computer Science, University of Victoria) ABSTRACT Implicit models inherently support automatic blending and trivial collision detection which makes them an effective tool for designing complex organic shapes with many applications in various areas of research including surgical simulation systems. How- ever, slow rendering speeds can adversely affect the performance of simulation and modelling systems. In addition, when the models are incorporated in a surgical sim- ulation system, interactive and smooth cutting becomes a required feature for many procedures. In this research, we propose a comprehensive framework for high-performance rendering and physically-based animation of tissues modelled using implicit surfaces. Our goal is to address performance and scalability issues that arise in rendering complex implicit models as well as in dynamic interactions between surgical tool and models. Complexmodelscanbecreatedwithimplicitprimitives, blendingoperators, affine transformations, deformations and constructive solid geometry in a design environ- ment that organizes all these in a scene graph data structure called the BlobTree. We show that the BlobTree modelling approach provides a very compact data structure which supports the requirements above, as well as incremental changes and trivial col- lision detection. A GPU-assisted surface extraction algorithm is proposed to support iv interactive modelling of complex BlobTree models. Using a finite element approach we discretize those models for accurate physically- based animation. Our system provides an interactive cutting ability using smooth intersection surfaces. We show an application of our system in a human skull cran- iotomy simulation. v Contents Supervisory Committee ii Abstract iii Table of Contents v List of Tables viii List of Figures x Acknowledgements xvi Dedication xvii 1 Introduction 1 1.1 General aim of this Research . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Limitation of Current Models . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.5 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Background Material 7 2.1 Implicit Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Sweep Surfaces and Sketching . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Surface extraction methods . . . . . . . . . . . . . . . . . . . 12 2.3.2 GPU-Accelerated rendering of implicits . . . . . . . . . . . . . 15 2.3.3 Volume mesh cutting . . . . . . . . . . . . . . . . . . . . . . . 16 3 High Performance Rendering on Multi-Core Architectures 19 vi 3.1 Architecture Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2 Naming Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3.1 BlobTree Linearization . . . . . . . . . . . . . . . . . . . . . . 24 3.3.2 Surface Extraction . . . . . . . . . . . . . . . . . . . . . . . . 26 3.4 MPU size Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.6 Chapter Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4 GPU discretization 43 4.1 Many-cores architectures . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3 Memory foot prints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4 Stackless BlobTree traversal . . . . . . . . . . . . . . . . . . . . . . . 49 4.5 GPU Surface Extraction Algorithm . . . . . . . . . . . . . . . . . . . 53 4.6 Analysis and Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5 Deformable Model 59 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.2 Physical Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.3 Collision and Contact with tools . . . . . . . . . . . . . . . . . . . . . 62 5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 6 Real-time Cutting 70 6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2 Tetrahedral Mesh Data structure . . . . . . . . . . . . . . . . . . . . 72 6.3 Cutting Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.3.1 Edge Intersections . . . . . . . . . . . . . . . . . . . . . . . . 77 6.3.2 Produce cut-node list . . . . . . . . . . . . . . . . . . . . . . . 78 6.3.3 Filter intersected-edges . . . . . . . . . . . . . . . . . . . . . . 80 6.3.4 Compute configuration codes for cut elements . . . . . . . . . 80 6.3.5 Topological Updates . . . . . . . . . . . . . . . . . . . . . . . 83 6.4 Cutting Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 7 Evaluation, Analysis and Comparisons 89 7.1 Previous work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 vii 7.2 Architectural constraints . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3.1 The Eggshell Model . . . . . . . . . . . . . . . . . . . . . . . . 91 7.3.2 Craniotomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 8 Conclusions 97 8.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 Bibliography 100 viii List of Tables Table 3.1 Comparison of speedups and field value evaluations per triangle (FVEPT) for polygonization of Tower model with different SIMD instruction sets. Note that FVEPT was 17 before adding SIMD optimizations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Table 3.2 Comparison of our polygonization method against Schmidt et al. ’s [62] when rendering Medusa model at 5 different resolutions on one single core with AVX instructions. All timings are in milliseconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Table 4.1 Memory footprint of the input BlobTree in our GPU polygoniza- tion algorithm in bytes. The entire BlobTree for a model with 64K nodes (primitives and operators) takes up about 20 MiB in our current system. . . . . . . . . . . . . . . . . . . . . . . . . 49 Table 4.2 Stackless BlobTree traversal improved the performance of our BlobTree field evaluation significantly. Here is the comparison between our novel stackless approach versus the stack-based im- plementation for various models. Timings are the average of 100 runs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Table 5.1 The following deformable models are used in our experiments. . 65 Table 6.1 The following look-up table is used to differentiate between differ- ent cutting configurations based on the number of cut-edges and cut-nodes. All cases subdivide tetrahedral elements into smaller cells except for case Z where the original cell is left intact. . . . 81 Table 6.2 Lookup table for generating sub-elements for type A configuration 82 Table 6.3 Lookup table for generating sub-elements for type B configura- tion. Sub-elements are separated by semi-colon to fit on the line. 82 Table 6.4 Mesh quality measurements. . . . . . . . . . . . . . . . . . . . . 86 ix Table 7.1 Segmented brain data-set statistics. . . . . . . . . . . . . . . . . 93 x List of Figures Figure 2.1 BlobTree structureofacoffeemugcreatedwithCSGandskeletal implicit primitives. . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 3.1 The MPU is our unit of computation per each core illustrated as a 2D cross section here. Field-values due to every 4 or 8 points are computed in parallel with SSE or AVX instructions, respectively. When the field at a vertex is zero no iso-surface will pass in the neighborhood of a unit circle (sphere in 3D) centered at that vertex. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Figure 3.2 Left Column: The one-time preparation steps before scheduling kernel functions for computation. Middle Column: The early discard kernel function. Right Column: The MPU processing kernel function. . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 3.3 Thereductionprocesscombinesalltransformationnodesatprim- itives. After the operation, all internal operators are associated with identity transformation matrices. . . . . . . . . . . . . . . 24 Figure 3.4 The BlobTree is flattened to a structure of arrays (SOA), and is stored at aligned memory addresses for performance reasons. Each row represents an array of values of the same type. N is determined in such a way that the entire structure fits in the cache memory of the processor. . . . . . . . . . . . . . . . . . 25 Figure 3.5 Top: A cell edge is intersected with part of the surface shown in blue. By performing one field evaluation using AVX or two with SSE instructions the interval containing the intersection point can be identified. The final root is computed using linear interpolation within the interval marked with bold line segment. 28 Figure 3.6 AveragetimespentpereachMPU whenpolygonizingtheTowers model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Description: