ebook img

Towards a More Principled Compiler: Register Allocation and PDF

197 Pages·2009·9.17 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Towards a More Principled Compiler: Register Allocation and

Towards a More Principled Compiler: Register Allocation and Instruction Selection Revisited David Ryan Koes CMU-CS-09-157 October 2009 SchoolofComputerScience CarnegieMellonUniversity Pittsburgh,PA15213 ThesisCommittee: SethCopenGoldstein,Chair PeterLee AnupamGupta MichaelD.Smith,HarvardUniversity Submittedinpartialfulfillmentoftherequirements forthedegreeofDoctorofPhilosophy. Copyright©2009DavidRyanKoes ThisresearchwassponsoredbytheNationalScienceFoundationundergrantnumbersCCF-0702640,CCR-0205523, EIA-0220214,andIIS-0117658;andHewlettPackardundergrantnumber1010162. The views and conclusions contained in this document are those of the author and should not be interpreted as representingtheofficialpolicies,eitherexpressedorimplied,ofanysponsoringinstitution,theU.S.governmentor anyotherentity. Keywords: Compilers,RegisterAllocation,InstructionSelection,BackendOptimization ForMary,Andrew,andAlex ButespeciallyforMary iv Abstract Backend optimizations are a critical part of an optimizing compiler. This thesis develops a principled approach for understanding, evaluating, and solving backend optimization problems. Our principled approach is to develop a comprehensive and expressive model of the backend optimization problem, and design solution tech- niques for this model that achieve or approach optimality. We apply our principled approachtotheclassicalbackendoptimizationsofregisterallocationandinstruction selection. Wedevelopanexpressivemodelofregisterallocationbasedonmulti-commodity network flow. This model exactly represents the complexities of the target architec- ture. Wedesignprogressivesolutiontechniquesforourmodel. Progressivesolution techniques quickly find an initial solution and then improve upon the solution as moretimeisallottedforcompilation. Ourprogressiveallocatorallowstheprogram- mer to explicitly manage the trade-off between compile-time and code quality. As more time is allowed for compilation, the resulting allocation approaches optimal, andsubstantialimprovementsincodequalityareobtained. Wedescribeanexpressivedirectedacyclicgraphrepresentationoftheinstruction selection problem and develop a near-optimal, linear-time algorithm that solves the instruction selection problem using this expressive model. Our principled approach toinstructionselectionresultsinsignificantimprovementsincodequalitycompared totraditionalalgorithms. We evaluate our principled approaches to register allocation and instruction se- lection on a range of architectures and benchmarks. We achieve significant reduc- tionsincodesizeandincreasesinperformancerelativetopreviousapproaches. Our resultsconfirmthatourprincipledapproachisamajoradvanceinthestateoftheart ofbackendoptimization. Contents 1 Introduction 1 1.1 ProblemDescription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.1 RegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.2 InstructionSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2 RelatedWork 9 2.1 RegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.1 GraphColoringRegisterAllocation . . . . . . . . . . . . . . . . . . . . 10 2.1.2 SSARegisterAllocators . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1.3 LinearScanAllocators . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.4 AlternativeHeuristicAllocators . . . . . . . . . . . . . . . . . . . . . . 20 2.1.5 OptimalRegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.2 InstructionSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3 GlobalMCNFRegisterAllocationModel 29 3.1 Multi-commodityNetworkFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 LocalRegisterAllocationModel . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2.1 SourceNodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.2.2 SinkNodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.3 AllocationClassNodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.2.4 CrossbarGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.5 InstructionGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.2.6 FullModel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.3 GlobalRegisterAllocationModel . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.4 PersistentMemory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 3.5 ModelingCosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 3.6 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.7 HardnessofSingleGlobalFlow . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.8 Simplifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4 EvaluationMethodology 67 vi 4.1 Benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 CodeQualityMetrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.1 CodeSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.2.2 CodePerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3 InstructionSetArchitectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3.1 x86-32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3.2 x86-64 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.3.3 ARM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.3.4 Thumb . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 4.4 Microarchitectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5 HeuristicRegisterAllocation 75 5.1 IterativeHeuristicAllocator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.1.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.1.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.1.3 AsymptoticAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2 SimultaneousHeuristicAllocator . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.1 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.2 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.2.3 AsymptoticAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 5.3 BoundaryConstraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5.3.1 AsymptoticAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.4 HybridAllocator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5.5 CompileTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 5.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6 ProgressiveRegisterAllocation 115 6.1 RelaxationTechniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.1.1 LinearProgrammingRelaxation . . . . . . . . . . . . . . . . . . . . . . 116 6.1.2 LagrangianRelaxation . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.2 SubgradientOptimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.2.1 FlowCalculation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 6.2.2 StepUpdate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.2.3 PriceUpdate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 6.2.4 PriceInitialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.3 ProgressiveRegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 6.3.1 CodeQuality: Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 6.3.2 CodeQuality: Performance . . . . . . . . . . . . . . . . . . . . . . . . . 141 6.3.3 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 6.3.4 CompileTime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 7 Near-OptimalLinear-TimeInstructionSelection 151 vii 7.1 ProblemDescriptionandHardness . . . . . . . . . . . . . . . . . . . . . . . . . 151 7.2 NOLTIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155 7.3 0-1ProgrammingSolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 7.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 7.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.5.1 Optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 7.5.2 ComparisonofAlgorithms . . . . . . . . . . . . . . . . . . . . . . . . . 166 7.5.3 ImpactonCodeSize . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.5.4 CompileTimePerformance . . . . . . . . . . . . . . . . . . . . . . . . . 167 7.6 LimitationsandFutureWork . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 7.7 InteractionwithRegisterAllocation . . . . . . . . . . . . . . . . . . . . . . . . 169 7.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 8 Conclusion 171 Bibliography 173 viii List of Figures 1.1 Thestructureofatypicalcompiler. . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Simpleregisterallocationexample. . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Anexampleofinstructionselectiononatree-basedIR. . . . . . . . . . . . . . . 6 2.1 Theflowofatraditionalgraphcoloringalgorithm. . . . . . . . . . . . . . . . . 10 2.2 Liverangesandthecorrespondinginterferencegraph. . . . . . . . . . . . . . . . 10 2.3 Anexampleofthesimplifyandselectphasesofagraphcoloringallocator. . . . . 11 2.4 Thelinearorderingofbasicblocks,liveintervals,andlifetimeholes. . . . . . . . 16 2.5 Resultofsimplelinearscanandsecond-chancebinpackinglinearscan. . . . . . . 17 2.6 Percentoffunctionswhichdonotspill. . . . . . . . . . . . . . . . . . . . . . . . 23 2.7 Decreaseincodequalityresultingfromspillcodeandassignmentheuristics. . . . 24 2.8 Theeffectofvariouscomponentsofregisterallocation. . . . . . . . . . . . . . . 25 3.1 Asimpleexampleofamulti-commoditynetworkflowproblem. . . . . . . . . . 30 3.2 Asimpleexampleoflocalregisterallocation. . . . . . . . . . . . . . . . . . . . 34 3.3 SourcenodesofaMCNFmodelofregisterallocation. . . . . . . . . . . . . . . . 35 3.4 SinknodesofaMCNFmodelofregisterallocation. . . . . . . . . . . . . . . . . 37 3.5 CrossbargroupsforthelocalregisterallocationproblemofFigure3.2. . . . . . . 38 3.6 Twopossiblecrossbargroupnetworkstructures. . . . . . . . . . . . . . . . . . . 39 3.7 InstructiongroupsforthelocalregisterallocationproblemofFigure3.2. . . . . . 41 3.8 ThefullMCNFmodelofthelocalregisterallocationproblemofFigure3.2. . . . 43 3.9 Asimplecontrolflowgraph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.10 ThethreetypesofflownodesintheglobalMCNFmodelofregisterallocation. . 44 3.11 EntryandexitgroupsofaglobalMCNFmodelofregisterallocation. . . . . . . . 45 3.12 Acrossbargroupwithnodesforanti-variables. . . . . . . . . . . . . . . . . . . 48 3.13 Anetworkthatdemonstratesvaluemodification,loadremat. andanti-variables. . 49 3.14 TheaccuracyofthecodesizeglobalMCNFcostmode. . . . . . . . . . . . . . . 53 3.15 Impactofsingle-executioncostsondynamicmemoryoperations. . . . . . . . . . 54 3.16 Impactonperformanceofvaryingsingle-executioncosts. . . . . . . . . . . . . . 55 3.17 Decreaseincodequalitywhencoalescingisseparatedfromanoptimalallocator . 58 3.18 AnexampleofareductionfromglobalMCNFtominimumgraphlabeling. . . . 62 3.19 Decreaseincodequalitywhenmoveinsertionisrestrictedinanoptimalallocator. 65 5.1 Anexampleofthebehavioroftheiterativeheuristicallocator. . . . . . . . . . . 78 5.2 Asimpleexampleofglobalvariableusage. . . . . . . . . . . . . . . . . . . . . 81 5.3 Theimportanceofblockorderingintheiterativeallocator. . . . . . . . . . . . . 84 ix 5.4 Theimportanceoftiebreakingstrategiesintheiterativeallocator. . . . . . . . . 85 5.5 Runningtimeofiterativeallocatorforallbenchmarkedfunctions. . . . . . . . . 87 5.6 Exampleexecutionofthesimultaneousheuristicallocator. . . . . . . . . . . . . 90 5.7 Exampleevictiondecisionsinthesimultaneousheuristicallocator. . . . . . . . . 94 5.8 Effectoftiebreakingheuristicsoncodequalityinthesimultaneousallocator . . 97 5.9 Anexamplecontrolflowgraphdecomposedintotraces. . . . . . . . . . . . . . . 98 5.10 Effectoftracedecompositionsoncodequalityinthesimultaneousallocator. . . . 99 5.11 Effectoftraceupdatepolicyoncodequalityinthesimultaneousallocator . . . . 102 5.12 Runningtimeofthesimultaneousallocatorforallbenchmarkedfunctions. . . . . 104 5.13 ACFGthatillustratesthesubtletiesofsettingboundaryconstraints. . . . . . . . 105 5.14 Codesizeimprovementofheuristicallocators. . . . . . . . . . . . . . . . . . . 109 5.15 Codesizeimprovementofheuristicallocators. . . . . . . . . . . . . . . . . . . 110 5.16 Memoryoperationreductionofheuristicallocators. . . . . . . . . . . . . . . . . 111 5.17 Averagecodequalityimprovementofheuristicallocators . . . . . . . . . . . . . 112 5.18 Slowdownofvariousallocatorsrelativetoextendedlinearscan. . . . . . . . . . . 113 6.1 Thepercentageoffunctionsthatdemonstrateanintegralitygap. . . . . . . . . . 117 6.2 LinearprogrammingsolutiontimesoftheglobalMCNFproblem. . . . . . . . . 118 6.3 Convergencebehaviorofthebasicsubgradientoptimizationalgorithm. . . . . . . 122 6.4 Convergenceofsubgradientoptimizationwithdifferentflowcalculations. . . . . 124 6.5 Graphicaldepictionoffiveratiostepupdaterules. . . . . . . . . . . . . . . . . 125 6.6 Convergenceofsubgradientoptimizationwithdifferentstepupdaterules. . . . . 126 6.7 ConvergenceofthesubgradientoptimizationwithNewton’smethodstepupdate. 128 6.8 Examplepricebehaviorusingdifferentpriceupdatestrategies. . . . . . . . . . . 129 6.9 Convergenceofsubgradientoptimizationwithdifferentpriceupdatestrategies. . 131 6.10 Effectofpriceinitializationontheinitiallowerbound. . . . . . . . . . . . . . . 134 6.11 Convergenceofsubgradientoptimizationwithdifferentpriceinitializations. . . . 134 6.12 Convergenceofheuristicpriceinitializationwithdifferentinitialallocations. . . . 135 6.13 Thebehaviorofthreeheuristicallocatorswithinaprogressiveallocator. . . . . . 137 6.14 Averagecodesizeimprovementoftheprogressiveallocator. . . . . . . . . . . . 138 6.15 Codesizeimprovementoftheprogressiveallocator. . . . . . . . . . . . . . . . 139 6.16 Codesizeimprovementoftheprogressiveallocator. . . . . . . . . . . . . . . . 140 6.17 Averagememoryoperationreductionoftheprogressiveallocator. . . . . . . . . 142 6.18 Averageperformanceimprovementoftheprogressiveallocator. . . . . . . . . . 142 6.19 Memoryoperationreductionoftheprogressiveallocator. . . . . . . . . . . . . . 143 6.20 Codeperformanceimprovementoftheprogressiveallocatorforx86-32. . . . . . 144 6.21 Codeperformanceimprovementoftheprogressiveallocatorforx86-64. . . . . . 145 6.22 Effectofblockfrequencyestimatoroncodequality. . . . . . . . . . . . . . . . . 146 6.23 Codesizeoptimalityboundsofprogressiveallocator. . . . . . . . . . . . . . . . 148 6.24 Codeperformanceoptimalityboundsofprogressiveallocator. . . . . . . . . . . . 149 6.25 Registerallocationtimebreakdownofprogressiveallocator. . . . . . . . . . . . 149 7.1 Anexampleofinstructionselectionasatilingproblem. . . . . . . . . . . . . . . 152 7.2 ExpressingBooleansatisfiabilityasaninstructionselectionproblem. . . . . . . . 154 x

Description:
Keywords: Compilers, Register Allocation, Instruction Selection, Backend solve the halting problem (halting side-effect free code would be replaced by a nop.
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.