High-Performance High-Order Simulation of Wave and Plasma Phenomena by AndreasKlo¨ckner Dipl.-Math. techn.,Universita¨tKarlsruhe(TH);Karlsruhe,Germany,2005 M.S.,BrownUniversity;Providence,RI,2006 Adissertationsubmittedinpartialfulfillmentofthe requirementsforthedegreeofDoctorofPhilosophy inTheDivisionofAppliedMathematicsatBrownUniversity PROVIDENCE,RHODEISLAND May2010 (cid:13)c Copyright2010 byAndreasKlo¨ckner ThisdissertationbyAndreasKlo¨cknerisacceptedinitspresentform byTheDivisionofAppliedMathematicsassatisfyingthe dissertationrequirementforthedegreeofDoctorofPhilosophy. Date JanSickmannHesthaven,Ph.D.,Advisor RecommendedtotheGraduateCouncil Date JohnnyGuzma´n,Ph.D.,Reader Date Chi-WangShu,Ph.D.,Reader ApprovedbytheGraduateCouncil Date SheilaBonde,DeanoftheGraduateSchool iii Vitae Biographical Information Birth August5th,1977 Konstanz,Germany Education 2005–2010 Ph.D.inAppliedMathematics(inprogress) DivisionofAppliedMathematics,BrownUniversity,Providence, RI Advisor: JanHesthaven 2005 DiplomdegreeinAppliedMathematics(Technomathematik) Institut fu¨r Angewandte Mathematik, Universita¨t Karlsruhe, Ger- many Advisor: WillyDo¨rfler 2001–2002 ExchangeStudent,DepartmentofMathematics UniversityofNorthCarolinaatCharlotte,Charlotte,NC 2000 VordiplominComputerScience,Universita¨tKarlsruhe,Germany Experience 6/2006–9/2006 J.WallaceGivensResearchAssociate MathematicsandComputerScienceDiv.,ArgonneNat’l Laboratory,Illinois Worked on high-order unstructured electromagnetic simulation of particle accelerators (with Paul Fischer, Misun Min, and col- leaguesatANL’sAdvancedPhotonSource). iv 2/2005–7/2005 ResearchAssociate(WissenschaftlicherMitarbeiter) Institutfu¨rAngewandteMathematik,Universita¨tKarlsruhe, Germany Workedonvariousextensionsofmythesisresearch(withWilly Do¨rfler). 5/2002–11/2002 ResearchIntern DaimlerChryslerResearch&Technology,PaloAlto,CA Worked on driver stress detection, precision GPS, and software infrastructure(withStefanSchro¨dl). Publications 2010 ViscousShockCapturingwithanExplicitlyTime-Stepped DiscontinuousGalerkinMethod. AK,T.Warburton,J.S.Hesthaven. Inpreparation. 2009 PyCUDA:GPURun-TimeCodeGenerationfor High-PerformanceComputing. AK, N. Pinto, Y. Lee, B. Catanzaro, P. Ivanov, and A. Fasih. Submitted, available at http://arxiv.org/abs/0911. 3456. 2009 NodalDiscontinuousGalerkinMethodsonGraphicsProcessors. AK,T.Warburton,J.Bridge,J.S.Hesthaven. JournalofCompu- tationalPhysics,Volume228,Issue21,20November2009. 2009 DeterministicNumericalSchemesfortheBoltzmannEquation. A.Narayan,AK.BrownUniversityScientificComputingTechnni- calReport2009-39. 2005 OntheComputationofMaximallyLocalizedWannierFunctions. DiplomThesis,Universita¨tKarlsruhe,Germany. v Acknowledgments First and foremost, the support of my advisor Jan Hesthaven has been the cornerstone of my working life in the past five years. He was a source of questions, of answers, of inspiration, he encouraged me to be bold in the scientific questions I pursue, all while givingmegreatfreedominfollowingmyinterests. Hehasalsopatientlyputupwiththe thingsthatturnedoutnottobesosmartinhindsight. Beyondscience,hehasbeenarole modelanda tremendousinfluenceonmy lifeasawhole. Iconsidermyselflucky tohave hadhimasamentor. Overtheyears,IhaveworkedverycloselywithTimWarburtonatRiceUniversityon manyofthetopicsthatthisthesisdiscusses. Hisgenerosity,help,andinsighthavebenefited meinmanyways. Both of the above, along with Chi-Wang Shu and Johnny Guzma´n have graciously agreedto serveon myPhD committee, spenttime thinkingabout mywork, andprovided invaluablefeedback. Throughoutmygraduatestudies,Ihavehadthehonorofworkingonvariousprojects withalargeanddiversegroupofpeople. Theirinsights,commentaryandencouragement, sharedinmanyconversations,wereandcontinuetobeagreatassettomyscientificlife. ThegraduatestudentandpostdoccommunityatBrown’sDivisionofAppliedMathe- maticsis agreat crowdin whichtogrow upacademically. Many ofyouhavebecomemy vi friends,andIhopewewillbeabletostaycloseevenaslifescattersusacrosstheglobe. Nvidia Corporation have been very generous with equipment and travel support and were instrumental in initiating, furthering and publicizing the GPU-based part of my research. Manycontributorsaroundtheworldhavecreatedtheopen-sourcesoftwareandtools on which my work has crucially depended. This notably includes the communities that haveformedaroundmyvariousprojects. Partsofthisthesisarebasedontwopublications,[Klo¨ckneretal.,2009b]and[Klo¨ckner etal.,2009a]. Mycoauthorshavecontributedconsiderablytobotharticlesthroughtheir ideas,suggestions,andfeedback. Last,butbynomeansleast,myparentsBinaandHeinrichKlo¨cknerhave,throughout myentirelife,givenmetheirunconditionalsupport,advice,andlove. Thankyou,allofyou. Some of the computational meshes used in this work were generated using Triangle [Shewchuk,1996]andTetGen[SiandGaertner,2005]. ThesurfacemeshforFigure5.10 originatesintheFlightGearflightsimulatorandwasprocessedusingBlenderandMeshLab, atooldevelopedwiththesupportoftheEpochEuropeanNetworkofExcellence. vii Abstract of “High-Performance High-Order Simulation of Wave and Plasma Phenomena” byAndreasKlo¨ckner,Ph.D.,BrownUniversity,May2010 Thisthesispresentsresultsaimingtoenhanceandbroadentheapplicabilityofthediscon- tinuousGalerkin(“DG”)methodinavarietyofways. DGwaschosenasafoundationfor this work because it yields high-order finite element discretizations with very favorable numericalpropertiesforthetreatmentofhyperbolicconservationlaws. Inafirstpart,IexamineprogressthatcanbemadeonimplementationaspectsofDG. In adapting the method to mass-market massively parallel computation hardware in the formofgraphicsprocessors(“GPUs”),Iobtainanincreaseincomputationperformanceper unitofcostbymorethananorderofmagnitudeoverconventionalprocessorarchitectures. KeytothisadvanceisarecipethatadaptsDGtoavarietyofhardwarethroughautomated self-tuning. I discuss new parallel programming tools supporting GPU run-time code generation which are instrumental in the DG self-tuning process and contribute to its reaching application floating point throughput greater than 200 GFlops/s on a single GPU andgreater than 3 TFlops/son a 16-GPUcluster in simulationsof electromagnetics problemsinthreedimensions. Ifurtherbrieflydiscussthesolverinfrastructurethatmakes thispossible. Inthesecondpartofthethesis,Iintroduceanumberofnewnumericalmethodswhose motivationispartlyrootedinthe opportunitycreatedbyGPU-DG:First,Iconstruct and examine a novel GPU-capable shock detector, which, when used to control an artificial viscosity,helps stabilize DGcomputationsingas dynamicsandanumber ofotherfields. Second,Idescribemypursuitofamethodthatallowsthesimulationofrarefiedplasmas using a DG discretization of the electromagnetic field. Finally, I introduce new explicit multi-rate time integrators for ordinary differential equations with multiple time scales, withafocusonapplicabilitytoDGdiscretizationsoftime-dependentproblems. Contents Vitae iv Acknowledgments vi 1 Introduction 1 1.1 AboutthisThesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 TheScientificMethodandtheComputationalExperiment . . . . . . . . . 3 1.3 AnArgumentforHybridCodes . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 AssemblingaSetofTools . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 ReproducibilityforResultsinthisThesis . . . . . . . . . . . . . . . . . . 7 2 Preliminaries 10 2.1 TheDiscontinuousGalerkinMethod . . . . . . . . . . . . . . . . . . . . 11 2.1.1 ImplementingDG . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 GPUHardware: ABriefIntroduction . . . . . . . . . . . . . . . . . . . . 15 2.2.1 SpecificsofNvidiahardware . . . . . . . . . . . . . . . . . . . . 18 3 ACode-GeneratingDiscontinuousGalerkinSolver 21 3.1 OntheDesignofaDiscontinuousGalerkinPDESolver . . . . . . . . . . 22 3.2 ALanguageforDiscontinuousGalerkinMethods . . . . . . . . . . . . . 26 3.2.1 FluxesandFlux-LocalBinding . . . . . . . . . . . . . . . . . . . 30 3.2.2 CommonSubexpressionElimination . . . . . . . . . . . . . . . . 31 3.2.3 AnExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 TheProcessingPipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 TypeInferenceandOperatorSpecialization . . . . . . . . . . . . 35 3.3.2 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.3 Target-SpecificRewriting . . . . . . . . . . . . . . . . . . . . . . 37 3.4 TheVirtualMachine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.1 TheCompilationStep . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.2 TheExecutionModel . . . . . . . . . . . . . . . . . . . . . . . . 40 viii 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4 CodeGenerationonGraphicsProcessors 45 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 GPUSoftwareCreation . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 ProblemsSolvedbyGPURun-TimeCodeGeneration . . . . . . . . . . . 51 4.3.1 AutomatedTuning . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.2 TheCostofFlexibility . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.3 High-PerformanceAbstractions . . . . . . . . . . . . . . . . . . 54 4.3.4 GPUsandtheNeedforFlexibility . . . . . . . . . . . . . . . . . 56 4.4 PyCUDA:AScripting-BasedApproachtoGPURTCG . . . . . . . . . . 57 4.4.1 AbstractionsinPyCUDA . . . . . . . . . . . . . . . . . . . . . . 61 4.4.2 CodeGenerationwithPyCUDA . . . . . . . . . . . . . . . . . . 63 4.4.3 PyOpenCL:OpenCLandGPURTCG . . . . . . . . . . . . . . . 66 4.5 SuccessfulApplications . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 DiscontinuousGalerkinMethodsonGraphicsProcessors 70 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 DGontheGPU:Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 DGontheGPU:Implementation . . . . . . . . . . . . . . . . . . . . . . 78 5.3.1 HowtoreadthisSection . . . . . . . . . . . . . . . . . . . . . . 78 5.3.2 FluxLifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.3 FluxExtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.4 Element-LocalDifferentiation . . . . . . . . . . . . . . . . . . . 89 5.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.1 FurtherResults: DoublePrecision,DistributedComputation . . . 105 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6 ViscousShockCapturinginaTime-ExplicitDiscontinuousGalerkinMethod111 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2 BasicDesignConsiderations . . . . . . . . . . . . . . . . . . . . . . . . 116 6.3 ApplicationsandEquations . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.1 AdvectionEquation . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.2 Second-OrderWaveEquation . . . . . . . . . . . . . . . . . . . 121 6.3.3 Burgers’Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.3.4 Euler’sEquationsofGasDynamics . . . . . . . . . . . . . . . . 123 6.4 ASmoothness-EstimatingDetectorfortheSelectiveApplicationofArtifi- cialViscosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4.1 DetectionMethodsintheLiterature . . . . . . . . . . . . . . . . 124 6.4.2 EstimatingSolutionSmoothness . . . . . . . . . . . . . . . . . . 127 6.4.3 AmbiguitiesinTwoandMoreDimensions . . . . . . . . . . . . . 140 6.5 FromSmoothnesstoViscosity . . . . . . . . . . . . . . . . . . . . . . . 142 6.5.1 ScalingtheViscosity . . . . . . . . . . . . . . . . . . . . . . . . 142 6.5.2 SmoothingtheViscosity . . . . . . . . . . . . . . . . . . . . . . 146 ix
Description: