Runtime Compilation of Array-Oriented Python Programs by Alex Rubinsteyn Adissertation submittedin partialfulfillment of the requirementsfor thedegreeof Doctorof Philosophy Departmentof ComputerScience NewYorkUniversity September2014 ProfessorDennisShasha Dedication This thesisis dedicatedto myparentsandto theareacode60076. iii Acknowledgements When I came to New York in 2007, I brought with me a Subaru Outback (mostly full of books), a thinly acquired degree in Neuroscience, a rapidly shrinking bank ac- count, anda nebulousplan to becomea mathematician. WhenI wrote to aresearcher atMIT,seekingaposition in hislab, Ihadtoadmitthat: “myGPAis horrible,myrec- ommendations grudgingly extractedfrom laughablesources.” To myearnestsurprise, he never replied. Undeterred and full of confidence in the victory of my enthusiasm over my historical inability to get anything done, I applied to Courant’s Masters pro- gram in Mathematics and was promptly rejected. In a panic, I applied to Columbia’s SchoolofContinuingEducationandwasjustasquicklyturnedaway. Ipepperedthem with embarrassing pleas to reconsider, until one annoyed administrator replied that “inconsistencyandconcernpermeateeachsemester”of mytranscript. Ouch. ThatIstillendeduphavingtheprivilegetopursuemycuriosityfeelslikeamiracle andIowe alarge debtof gratitude to manypeople. I would like to thank: • My former project-mate Eric Hielscher, with whom I carved out many of the ideaspresentin this thesis. • Myadvisor,DennisShasha,whogaveusguidance,support,disciplineandchoco- late almonds. • ProfessorAlanSiegel,whohelpedmegetstartedonthisgradschooladventure, taughtmeaboutalgorithms,andgotmeajobwhichbothpaidthetuitionformy Masters and trained me in “butt-time” (meaning, I needed to learn how sit for morethan anhour). • ThejobthatProfessorSiegelconjuredformewasreadingforNektariosPaisios, iv whobecamemyfriendandcollaborator. Weworkedtogetheruntilhegraduated, andIthinkboth benefitedgreatlyfrom thearrangement. • Professor Amir Pnueli, who was a great teacherand whose course in compilers strongly influencedme. • Myfloorsecretary, Leslie,who bravelyshields us allfrom absurdities so we can getworkdone. Withoutyou, I probablywould havedroppedout bynow. • Ben, for being a great friend and making me leave my office to eat dinner at QuantumLeap. • Geddes, for demolishing the walls we imagine between myth and reality. Stay stubborn, reality doesn’tstandachance. • Most of all, I am grateful for a million things to my parents, Irene Zakon and ArkadyRubinsteyn. v Abstract The Python programming language has become a popular platform for data anal- ysis andscientificcomputing. Tomitigate thepoorperformanceof Python’s standard interpreter,numericallyintensivecomputationsaretypicallyoffloadedtolibraryfunc- tions written in high-performance compiled languages such as Fortran or C. When there is no efficient library implementation available for a particular algorithm, the programmermustacceptsuboptimalperformanceorswitchtoalow-levellanguageto implementtheroutine. ThisthesisseekstogivePythonprogrammersameanstoimplementhigh-performance algorithmsinahigh-levelform. WepresentParakeet,aruntimecompilerforanarray- oriented subset of Python. Parakeet selectively augments the standard Python inter- preterbycompilingandexecutingfunctions explicitlymarkedfor accelerationbythe programmer. Parakeetusesruntimetypespecializationtoeliminatetheperformance- defeatingdynamicismofuntypedPythoncode. Parakeet’spervasiveuseofdataparal- leloperatorsasameansforimplementingarrayoperationsenableshigh-levelrestruc- turing optimization and compilation to parallel hardware such as multi-core CPUs and graphics processors. We evaluate Parakeet on a collection of numerical bench- marks and demonstrate its dramatic capacity for accelerating array-oriented Python programs. vi Contents Dedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Listof Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Listof Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Listof Code Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Listof Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv 1 Introduction 1 2 Overviewof Parakeet 7 2.1 TypedIntermediateRepresentation . . . . . . . . . . . . . . . . . . . . 9 2.2 DataParallelOperators . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Compilation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Type Specialization . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4 Backends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.6 Differencesfrom Python . . . . . . . . . . . . . . . . . . . . . . . . . . 16 vii 2.7 DetailedCompilation Pipeline . . . . . . . . . . . . . . . . . . . . . . . 17 2.7.1 From Python into Parakeet . . . . . . . . . . . . . . . . . . . . 19 2.7.2 UntypedRepresentation . . . . . . . . . . . . . . . . . . . . . . 20 2.7.3 Type-specializedRepresentation . . . . . . . . . . . . . . . . . 21 2.7.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.7.5 GeneratedC code . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.7.6 Generatedx86 Assembly . . . . . . . . . . . . . . . . . . . . . . 23 2.7.7 Execution Times . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3 History and Related Work 27 3.1 ArrayProgramming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2 DataParallelProgramming . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.2.1 Collection-OrientedLanguages . . . . . . . . . . . . . . . . . . 32 3.3 RelatedProjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Parakeet’s IntermediateRepresentation 35 4.1 SimpleExpressions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.2 StatementsandControl Flow . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 ArrayProperties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 SimpleArrayOperators . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.5 MemoryAllocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6 HigherOrderArrayOperators . . . . . . . . . . . . . . . . . . . . . . . 43 4.6.1 MappingOperations . . . . . . . . . . . . . . . . . . . . . . . . 44 4.6.2 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.6.3 Scans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.7 FormalSyntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 viii 5 TypeInference and Specialization 48 5.1 TypeSystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 5.2 TypeSpecializationAlgorithm . . . . . . . . . . . . . . . . . . . . . . . 50 5.2.1 SpecializationRules forStatements . . . . . . . . . . . . . . . . 51 5.2.2 SpecializationRules forHigherArrayOperators . . . . . . . . 54 6 Optimizations 62 6.1 StandardCompilerOptimizations . . . . . . . . . . . . . . . . . . . . . 64 6.1.1 Simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.1.2 DeadCode Elimination . . . . . . . . . . . . . . . . . . . . . . . 64 6.1.3 Loop InvariantCode Motion . . . . . . . . . . . . . . . . . . . . 65 6.1.4 ScalarReplacement . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2.1 NestedFusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.2.2 Horizontal Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.3 Symbolic Execution andShapeInference . . . . . . . . . . . . . . . . . 70 6.4 ValueSpecialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 7 Evaluation 73 7.1 Growcut . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 7.2 MatrixMultiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 7.3 RosenbrockGradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7.4 ImageConvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 7.5 UnivariateRegression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.6 TensorRotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 7.7 HarrisCornerDetector . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 ix 7.8 JuliaFractal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 7.9 SmoothedParticle Hydrodynamics . . . . . . . . . . . . . . . . . . . . 83 8 Conclusion 85 9 Bibliography 89 x
Description: