ebook img

PhD Thesis - Andreas Klöckner PDF

303 Pages·2010·10.5 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview PhD Thesis - Andreas Klöckner

High-Performance High-Order Simulation of Wave and Plasma Phenomena by AndreasKlo¨ckner Dipl.-Math. techn.,Universita¨tKarlsruhe(TH);Karlsruhe,Germany,2005 M.S.,BrownUniversity;Providence,RI,2006 Adissertationsubmittedinpartialfulfillmentofthe requirementsforthedegreeofDoctorofPhilosophy inTheDivisionofAppliedMathematicsatBrownUniversity PROVIDENCE,RHODEISLAND May2010 (cid:13)c Copyright2010 byAndreasKlo¨ckner ThisdissertationbyAndreasKlo¨cknerisacceptedinitspresentform byTheDivisionofAppliedMathematicsassatisfyingthe dissertationrequirementforthedegreeofDoctorofPhilosophy. Date JanSickmannHesthaven,Ph.D.,Advisor RecommendedtotheGraduateCouncil Date JohnnyGuzma´n,Ph.D.,Reader Date Chi-WangShu,Ph.D.,Reader ApprovedbytheGraduateCouncil Date SheilaBonde,DeanoftheGraduateSchool iii Vitae Biographical Information Birth August5th,1977 Konstanz,Germany Education 2005–2010 Ph.D.inAppliedMathematics(inprogress) DivisionofAppliedMathematics,BrownUniversity,Providence, RI Advisor: JanHesthaven 2005 DiplomdegreeinAppliedMathematics(Technomathematik) Institut fu¨r Angewandte Mathematik, Universita¨t Karlsruhe, Ger- many Advisor: WillyDo¨rfler 2001–2002 ExchangeStudent,DepartmentofMathematics UniversityofNorthCarolinaatCharlotte,Charlotte,NC 2000 VordiplominComputerScience,Universita¨tKarlsruhe,Germany Experience 6/2006–9/2006 J.WallaceGivensResearchAssociate MathematicsandComputerScienceDiv.,ArgonneNat’l Laboratory,Illinois Worked on high-order unstructured electromagnetic simulation of particle accelerators (with Paul Fischer, Misun Min, and col- leaguesatANL’sAdvancedPhotonSource). iv 2/2005–7/2005 ResearchAssociate(WissenschaftlicherMitarbeiter) Institutfu¨rAngewandteMathematik,Universita¨tKarlsruhe, Germany Workedonvariousextensionsofmythesisresearch(withWilly Do¨rfler). 5/2002–11/2002 ResearchIntern DaimlerChryslerResearch&Technology,PaloAlto,CA Worked on driver stress detection, precision GPS, and software infrastructure(withStefanSchro¨dl). Publications 2010 ViscousShockCapturingwithanExplicitlyTime-Stepped DiscontinuousGalerkinMethod. AK,T.Warburton,J.S.Hesthaven. Inpreparation. 2009 PyCUDA:GPURun-TimeCodeGenerationfor High-PerformanceComputing. AK, N. Pinto, Y. Lee, B. Catanzaro, P. Ivanov, and A. Fasih. Submitted, available at http://arxiv.org/abs/0911. 3456. 2009 NodalDiscontinuousGalerkinMethodsonGraphicsProcessors. AK,T.Warburton,J.Bridge,J.S.Hesthaven. JournalofCompu- tationalPhysics,Volume228,Issue21,20November2009. 2009 DeterministicNumericalSchemesfortheBoltzmannEquation. A.Narayan,AK.BrownUniversityScientificComputingTechnni- calReport2009-39. 2005 OntheComputationofMaximallyLocalizedWannierFunctions. DiplomThesis,Universita¨tKarlsruhe,Germany. v Acknowledgments First and foremost, the support of my advisor Jan Hesthaven has been the cornerstone of my working life in the past five years. He was a source of questions, of answers, of inspiration, he encouraged me to be bold in the scientific questions I pursue, all while givingmegreatfreedominfollowingmyinterests. Hehasalsopatientlyputupwiththe thingsthatturnedoutnottobesosmartinhindsight. Beyondscience,hehasbeenarole modelanda tremendousinfluenceonmy lifeasawhole. Iconsidermyselflucky tohave hadhimasamentor. Overtheyears,IhaveworkedverycloselywithTimWarburtonatRiceUniversityon manyofthetopicsthatthisthesisdiscusses. Hisgenerosity,help,andinsighthavebenefited meinmanyways. Both of the above, along with Chi-Wang Shu and Johnny Guzma´n have graciously agreedto serveon myPhD committee, spenttime thinkingabout mywork, andprovided invaluablefeedback. Throughoutmygraduatestudies,Ihavehadthehonorofworkingonvariousprojects withalargeanddiversegroupofpeople. Theirinsights,commentaryandencouragement, sharedinmanyconversations,wereandcontinuetobeagreatassettomyscientificlife. ThegraduatestudentandpostdoccommunityatBrown’sDivisionofAppliedMathe- maticsis agreat crowdin whichtogrow upacademically. Many ofyouhavebecomemy vi friends,andIhopewewillbeabletostaycloseevenaslifescattersusacrosstheglobe. Nvidia Corporation have been very generous with equipment and travel support and were instrumental in initiating, furthering and publicizing the GPU-based part of my research. Manycontributorsaroundtheworldhavecreatedtheopen-sourcesoftwareandtools on which my work has crucially depended. This notably includes the communities that haveformedaroundmyvariousprojects. Partsofthisthesisarebasedontwopublications,[Klo¨ckneretal.,2009b]and[Klo¨ckner etal.,2009a]. Mycoauthorshavecontributedconsiderablytobotharticlesthroughtheir ideas,suggestions,andfeedback. Last,butbynomeansleast,myparentsBinaandHeinrichKlo¨cknerhave,throughout myentirelife,givenmetheirunconditionalsupport,advice,andlove. Thankyou,allofyou. Some of the computational meshes used in this work were generated using Triangle [Shewchuk,1996]andTetGen[SiandGaertner,2005]. ThesurfacemeshforFigure5.10 originatesintheFlightGearflightsimulatorandwasprocessedusingBlenderandMeshLab, atooldevelopedwiththesupportoftheEpochEuropeanNetworkofExcellence. vii Abstract of “High-Performance High-Order Simulation of Wave and Plasma Phenomena” byAndreasKlo¨ckner,Ph.D.,BrownUniversity,May2010 Thisthesispresentsresultsaimingtoenhanceandbroadentheapplicabilityofthediscon- tinuousGalerkin(“DG”)methodinavarietyofways. DGwaschosenasafoundationfor this work because it yields high-order finite element discretizations with very favorable numericalpropertiesforthetreatmentofhyperbolicconservationlaws. Inafirstpart,IexamineprogressthatcanbemadeonimplementationaspectsofDG. In adapting the method to mass-market massively parallel computation hardware in the formofgraphicsprocessors(“GPUs”),Iobtainanincreaseincomputationperformanceper unitofcostbymorethananorderofmagnitudeoverconventionalprocessorarchitectures. KeytothisadvanceisarecipethatadaptsDGtoavarietyofhardwarethroughautomated self-tuning. I discuss new parallel programming tools supporting GPU run-time code generation which are instrumental in the DG self-tuning process and contribute to its reaching application floating point throughput greater than 200 GFlops/s on a single GPU andgreater than 3 TFlops/son a 16-GPUcluster in simulationsof electromagnetics problemsinthreedimensions. Ifurtherbrieflydiscussthesolverinfrastructurethatmakes thispossible. Inthesecondpartofthethesis,Iintroduceanumberofnewnumericalmethodswhose motivationispartlyrootedinthe opportunitycreatedbyGPU-DG:First,Iconstruct and examine a novel GPU-capable shock detector, which, when used to control an artificial viscosity,helps stabilize DGcomputationsingas dynamicsandanumber ofotherfields. Second,Idescribemypursuitofamethodthatallowsthesimulationofrarefiedplasmas using a DG discretization of the electromagnetic field. Finally, I introduce new explicit multi-rate time integrators for ordinary differential equations with multiple time scales, withafocusonapplicabilitytoDGdiscretizationsoftime-dependentproblems. Contents Vitae iv Acknowledgments vi 1 Introduction 1 1.1 AboutthisThesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 TheScientificMethodandtheComputationalExperiment . . . . . . . . . 3 1.3 AnArgumentforHybridCodes . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 AssemblingaSetofTools . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.5 ReproducibilityforResultsinthisThesis . . . . . . . . . . . . . . . . . . 7 2 Preliminaries 10 2.1 TheDiscontinuousGalerkinMethod . . . . . . . . . . . . . . . . . . . . 11 2.1.1 ImplementingDG . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 GPUHardware: ABriefIntroduction . . . . . . . . . . . . . . . . . . . . 15 2.2.1 SpecificsofNvidiahardware . . . . . . . . . . . . . . . . . . . . 18 3 ACode-GeneratingDiscontinuousGalerkinSolver 21 3.1 OntheDesignofaDiscontinuousGalerkinPDESolver . . . . . . . . . . 22 3.2 ALanguageforDiscontinuousGalerkinMethods . . . . . . . . . . . . . 26 3.2.1 FluxesandFlux-LocalBinding . . . . . . . . . . . . . . . . . . . 30 3.2.2 CommonSubexpressionElimination . . . . . . . . . . . . . . . . 31 3.2.3 AnExample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.3 TheProcessingPipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.3.1 TypeInferenceandOperatorSpecialization . . . . . . . . . . . . 35 3.3.2 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3.3 Target-SpecificRewriting . . . . . . . . . . . . . . . . . . . . . . 37 3.4 TheVirtualMachine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.1 TheCompilationStep . . . . . . . . . . . . . . . . . . . . . . . . 38 3.4.2 TheExecutionModel . . . . . . . . . . . . . . . . . . . . . . . . 40 viii 3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4 CodeGenerationonGraphicsProcessors 45 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.2 GPUSoftwareCreation . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.3 ProblemsSolvedbyGPURun-TimeCodeGeneration . . . . . . . . . . . 51 4.3.1 AutomatedTuning . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3.2 TheCostofFlexibility . . . . . . . . . . . . . . . . . . . . . . . 53 4.3.3 High-PerformanceAbstractions . . . . . . . . . . . . . . . . . . 54 4.3.4 GPUsandtheNeedforFlexibility . . . . . . . . . . . . . . . . . 56 4.4 PyCUDA:AScripting-BasedApproachtoGPURTCG . . . . . . . . . . 57 4.4.1 AbstractionsinPyCUDA . . . . . . . . . . . . . . . . . . . . . . 61 4.4.2 CodeGenerationwithPyCUDA . . . . . . . . . . . . . . . . . . 63 4.4.3 PyOpenCL:OpenCLandGPURTCG . . . . . . . . . . . . . . . 66 4.5 SuccessfulApplications . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5 DiscontinuousGalerkinMethodsonGraphicsProcessors 70 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.2 DGontheGPU:Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.3 DGontheGPU:Implementation . . . . . . . . . . . . . . . . . . . . . . 78 5.3.1 HowtoreadthisSection . . . . . . . . . . . . . . . . . . . . . . 78 5.3.2 FluxLifting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.3.3 FluxExtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.4 Element-LocalDifferentiation . . . . . . . . . . . . . . . . . . . 89 5.4 ExperimentalResults . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 5.4.1 FurtherResults: DoublePrecision,DistributedComputation . . . 105 5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 6 ViscousShockCapturinginaTime-ExplicitDiscontinuousGalerkinMethod111 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 6.2 BasicDesignConsiderations . . . . . . . . . . . . . . . . . . . . . . . . 116 6.3 ApplicationsandEquations . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.1 AdvectionEquation . . . . . . . . . . . . . . . . . . . . . . . . . 120 6.3.2 Second-OrderWaveEquation . . . . . . . . . . . . . . . . . . . 121 6.3.3 Burgers’Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 122 6.3.4 Euler’sEquationsofGasDynamics . . . . . . . . . . . . . . . . 123 6.4 ASmoothness-EstimatingDetectorfortheSelectiveApplicationofArtifi- cialViscosity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 6.4.1 DetectionMethodsintheLiterature . . . . . . . . . . . . . . . . 124 6.4.2 EstimatingSolutionSmoothness . . . . . . . . . . . . . . . . . . 127 6.4.3 AmbiguitiesinTwoandMoreDimensions . . . . . . . . . . . . . 140 6.5 FromSmoothnesstoViscosity . . . . . . . . . . . . . . . . . . . . . . . 142 6.5.1 ScalingtheViscosity . . . . . . . . . . . . . . . . . . . . . . . . 142 6.5.2 SmoothingtheViscosity . . . . . . . . . . . . . . . . . . . . . . 146 ix

Description:
dissertation requirement for the degree of Doctor of Philosophy. Date. Jan Sickmann Worked on high-order unstructured electromagnetic simulation of particle
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.