ebook img

Heterogeneous Computing with Open: CL PDF

278 Pages·2011·11.206 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Heterogeneous Computing with Open: CL

Heterogeneous Computing with OpenCL Heterogeneous Computing with OpenCL Benedict Gaster Lee Howes David R. Kaeli Perhaad Mistry Dana Schaa AcquiringEditor:ToddGreen DevelopmentEditor:RobynDay ProjectManager:Andre´ Cuello Designer:JoanneBlank MorganKaufmannisanimprintofElsevier 225WymanStreet,Waltham,MA02451,USA #2012AdvancedMicroDevices,Inc.PublishedbyElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans, electronicormechanical,includingphotocopying,recording,oranyinformationstorageand retrievalsystem,withoutpermissioninwritingfromthepublisher.Detailsonhowtoseek permission,furtherinformationaboutthePublisher’spermissionspoliciesandourarrange- mentswithorganizationssuchastheCopyrightClearanceCenterandtheCopyrightLicensing Agency,canbefoundatourwebsite:www.elsevier.com/permissions. Thisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythe Publisher(otherthanasmaybenotedherein). Notices Knowledge and best practice in this field are constantly changing. As new research and experiencebroadenourunderstanding,changesinresearchmethodsorprofessional practicesmaybecomenecessary.Practitionersandresearchersmustalwaysrelyontheirown experienceandknowledgeinevaluatingandusinganyinformationormethodsdescribed herein.Inusingsuchinformationormethodstheyshouldbemindfuloftheirownsafetyand thesafetyofothers,includingpartiesforwhomtheyhaveaprofessionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors, assumeanyliabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproduct liability,negligenceorotherwise,orfromanyuseoroperationofanymethods,products, instructions,orideascontainedinthematerialherein. LibraryofCongressCataloging-in-PublicationData HeterogeneouscomputingwithOpenCL/BenedictGaster...[etal.]. p.cm. ISBN978-0-12-387766-6 1. Parallelprogramming(Computerscience)2. OpenCL(Computerprogramlanguage) I.Gaster,Benedict. QA76.642.H482012 005.2’752–dc23 2011020169 BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary. ISBN:978-0-12-387766-6 ForinformationonallMKpublications visitourwebsiteatwww.mkp.com PrintedintheUnitedStatesofAmerica 12 13 14 15 10 9 8 7 6 5 4 3 2 1 Foreword Formorethantwodecades,thecomputerindustryhasbeeninspiredandmotivated bytheobservationmadebyGordonMoore(A.K.A“Moore’slaw”)thatthedensity oftransistorsondiewasdoublingevery18months.Thisobservationcreatedthean- ticipation that the performancea certain applicationachieves onone generationof processorswillbedoubledwithintwoyearswhenthenextgenerationofprocessors willbeannounced.Constantimprovementinmanufacturingandprocessortechnol- ogieswasthemaindriveofthistrendsinceitallowedanynewprocessorgeneration toshrinkallthetransistor’sdimensionswithinthe“goldenfactor”,0.3(idealshrink) and to reduce the power supply accordingly. Thus, any new processor generation coulddoublethedensityoftransistors,togain50%speedimprovement(frequency) whileconsumingthesamepowerandkeepingthesamepowerdensity.Whenbetter performancewasrequired,computerarchitectswerefocusedonusingtheextratran- sistors for pushing the frequency beyond whatthe shrink provided, and for adding new architectural features that mainly aim at gaining performance improvement for existingand new applications. During the mid 2000s, the transistor size became so small that the “physics of small devices” started to govern the characterization of the entire chip. Thus fre- quency improvement and density increase could not be achieved anymore without asignificant increase ofpower consumptionandofpowerdensity.Arecent report bytheInternationalTechnologyRoadmapforSemiconductors(ITRS)supportsthis observationandindicatesthatthistrendwillcontinuefortheforeseeablefutureandit willmostlikelybecomethemostsignificantfactoraffectingtechnologyscalingand the future ofcomputer based system. Tocopewiththeexpectationofdoublingtheperformanceeveryknownperiodof time(not 2yearsanymore),twomajor changeshappened(1)insteadofincreasing the frequency, modern processors increase the number of cores on each die. This trendforcesthesoftwaretobechangedaswell.Sincewecannotexpectthehardware toachievesignificantlybetterperformanceforagivenapplicationanymore,weneed todevelopnewimplementationsforthesameapplicationthatwilltakeadvantageof themulticorearchitecture,and(2)thermalandpowerbecomefirstclasscitizenswith any design of future architecture. These trends encourage the community to start lookingatheterogeneoussolutions:systemswhichareassembledfromdifferentsub- systems, each of them optimized to achieve different optimization points or to ad- dress different workloads. For example, many systems combine “traditional” CPU architecture with special purpose FPGAs or Graphics Processors (GPUs). Such an integrationcanbedoneatdifferentlevels;e.g.,atthesystemlevel,attheboardlevel and recently at the core level. Developingsoftwareforhomogeneousparallelanddistributedsystemsisconsid- eredtobeanon-trivialtask,eventhoughsuchdevelopmentuseswell-knownpara- digms and well established programming languages, developing methods, algorithms, debugging tools, etc. Developing software to support general-purpose vii viii Foreword heterogeneoussystemsisrelativelynewandsolessmatureandmuchmoredifficult. As heterogeneous systems are becoming unavoidable, many of the major software and hardware manufacturers start creating software environmentsto supportthem. AMD proposed the use of the Brook language developed in Stanford University, to handle streaming computations, later extending the SW environment to include theClosetoMetal(CTM)andtheComputeAbstractionLayer(CAL)foraccessing their low level streaming hardware primitives in order to take advantage of their highlythreadedparallelarchitecture.NVIDIAtookasimilarapproach,co-designing theirrecentgenerationsofGPUsandtheCUDAprogrammingenvironmenttotake advantageofthehighlythreadedGPUenvironment.Intelproposedtoextendtheuse of multi-core programming to program their Larrabee architecture. IBM proposed theuseofmessage-passing-basedsoftwareinordertotakeadvantageofitshetero- geneous,non-coherentcellarchitectureandFPGAbasedsolutionsintegratelibraries written inVHDL with Cor Cþþ based programstoachieve the best of two envi- ronments.Eachoftheseprogrammingenvironmentsoffersscopeforbenefitingdo- main-specificapplications,buttheyallfailedtoaddresstherequirementforgeneral purposesoftwarethatcanservedifferenthardwarearchitecturesinthewaythat,for example, Java code can run on very different ISA architectures. TheOpenComputingLanguage(OpenCL)wasdesignedtomeetthisimportant need.ItwasdefinedandmanagedbythenonprofittechnologyconsortiumKhronos The language and its development environment “borrows” many of its basic con- cepts from very successful, hardware specific environments such as CUDA, CAL, CTM,andblendsthemtocreateahardwareindependentsoftwaredevelopmenten- vironment.Itsupportsdifferentlevelsofparallelismandefficientlymapstohomo- geneous or heterogeneous, single- or multiple-device systems consisting of CPUs, GPUs,FPGAandpotentiallyotherfuturedevices.Inordertosupportfuturedevices, OpenCLdefinesasetofmechanismsthatifmet,thedevicecouldbeseamlesslyin- cludedaspartoftheOpenCLenvironment.OpenCLalsodefinesarun-timesupport thatallowstomanagetheresources,combinedifferenttypesofhardwareunderthe sameexecutionenvironmentandhopefullyinthefutureitwillallowtodynamically balance computations, power and other resources such as memory hierarchy, in a moregeneral manner. This book is a text book that aims to teach students how to program heteroge- neousenvironments.Thebookstartswithaveryimportantdiscussiononhowtopro- gram parallel systems and defines the concepts the students need to understand before starting to program any heterogeneous system. It also provides a taxonomy thatcanbeusedforunderstandingthedifferentmodelsusedforparallelanddistrib- uted systems. Chapters 2 – 4 build the students’ step by step understanding of the basicstructuresofOpenCL(Chapter2)includingthehostandthedevicearchitecture (Chapter3).Chapter4providesanexamplethatputstogethertheseconceptsusinga nottrivial example. Chapters5and6extendtheconceptswelearnedsofarwithabetterunderstand- ingofthenotionsofconcurrencyandrun-timeexecutioninOpenCL(Chapter5)and thedissectionbetweentheCPUandtheGPU(Chapter6).Afterbuildingthebasics, Foreword ix thebookdedicates4Chapters(7-10)tomoresophisticatedexamples.Thesesections arevitalforstudentstounderstandthatOpenCLcanbeusedforawiderangeofap- plicationswhicharebeyondanydomainspecificmodeofoperation.Thebookalso demonstrateshowthesameprogramcanberunondifferentplatforms,suchasNvi- dia or AMD. The book ends with three chapters which are dedicated to advanced topics. Nodoubtthatthisisaveryimportantbookthatprovidesstudentsandresearchers withabetterunderstandingoftheworldofheterogeneouscomputersingeneraland the solutions provided by OpenCL in particular. The book is well written, fits stu- dents’differentexperiencelevelsandso,canbeusedeitherasatextbookinacourse onOpenCL,ordifferentpartsofthebookcanbeusedtoextendothercourses;e.g., thefirsttwochaptersarewellfittedforacourseonparallelprogrammingandsome of the examplescan beused asa partofadvancedcourses. Dr. Avi Mendelson MicrosoftR&DIsrael AdjunctProfessor,Technion Preface OUR HETEROGENEOUS WORLD Ourworldisheterogeneousinnature.Thiskindofdiversityprovidesarichnessand detailthatisdifficulttodescribe.Atthesametime,itprovidesalevelofcomplexity andinteractioninwhichawiderangeofdifferententitiesareoptimizedforspecific tasks and environments. Incomputing,heterogeneouscomputersystemsalsoaddrichnessbyallowingthe programmertoselectthebestarchitecturetoexecutethetaskathandortochoosethe righttasktomakeoptimaluseofagivenarchitecture.Thesetwoviewsoftheflex- ibility of a heterogeneous system both become apparent when solving a computa- tional problem involves a variety of different tasks. Recently, there has been an upsurgeinthe computerdesigncommunityexperimenting withbuildingheteroge- neoussystems.Weareseeingnewsystemsonthemarketthatcombineanumberof differentclassesofarchitectures.Whathasslowedthisprogressionhasbeena lack of standardized programming environment that can manage the diverse set of resources inacommon framework. OPENCL OpenCLhasbeendevelopedspecificallytoeasetheprogrammingburdenwhenwrit- ingapplicationsforheterogeneoussystems.OpenCLalsoaddressesthecurrenttrend toincreasethenumberofcoresonagivenarchitecture.TheOpenCLframeworksup- portsexecutiononmulti-corecentralprocessingunits,digitalsignalprocessors,field programmablegatearrays,graphicsprocessingunits,andheterogeneousaccelerated processing units. The architectures already supported cover a wide range of ap- proachestoextractingparallelismandefficiencyfrommemorysystemsandinstruc- tion streams. Such diversity in architectures allows the designer to provide an optimized solution to his or her problem—a solution that, if designed within the OpenCLspecification,canscalewiththegrowthandbreadthofavailablearchitec- tures.OpenCL’sstandardabstractionsandinterfacesallowtheprogrammertoseam- lessly“stitch”togetheranapplicationwithinwhichexecutioncanoccuronarichset of heterogeneous devices fromoneor many manufacturers. THIS TEXT Untilnow,therehasnotbeenasingledefinitivetextthatcanhelpprogrammersand softwareengineersleveragethepowerandflexibilityoftheOpenCLprogramming standard.Thisisourattempttoaddressthisvoid.Withthisgoalinmind,wehavenot attempted to create a syntax guide—there are numerous good sources in which programmers can find a complete and up-to-date description of OpenCL syntax. xi xii Preface Instead, this text is an attempt to show a developer or student how to leverage the OpenCL framework to build interesting and useful applications. We provide a numberofexamplesofrealapplicationstodemonstratethepowerofthisprogram- mingstandard. Ourhopeisthatthereaderwillembracethisnewprogrammingframeworkand explorethefullbenefitsofheterogeneouscomputingthatitprovides.Wewelcome commentsonhowtoimproveuponthistext,andwehopethatthistextwillhelpyou buildyour next heterogeneous application. Acknowledgments WethankManjuHegdeforproposingthebookproject,BaoHuongPhanandTodd GreenfortheirmanagementandinputfromtheAMDandMorganKaufmannsides oftheproject,andJayOwenforconnectingtheparticipantsonthisprojectwitheach other. On the technical side, we thank Jay Cornwall for his thorough work editing muchofthistext,andwethankJoachimDeguara,TakahiroHarada,JustinHensley, Marc Romankewicz, and Byunghyun Jang for their significant contributions to in- dividualchapters,particularlythesequenceofcasestudiesthatcouldnothavebeen producedwithouttheirhelp.AlsoinstrumentalwereJariNikara,TomiAarnio,and EeroAhofromtheNokiaResearchCenterandJannePietia¨inenfromtheTampere UniversityofTechnology. xiii About the Authors Benedict R. Gaster is a software architect working on programming models for next-generation heterogeneous processors, particularly examining high-level ab- stractionsforparallelprogrammingontheemergingclassofprocessorsthatcontain both CPUs and accelerators such as GPUs. He has contributed extensively to the OpenCL’s design and has represented AMD at the Khronos Group open standard consortium. He has a Ph.D. in computer science for his work on type systems for extensible records andvariants. Lee Howes has spent the past 2 years working at AMD and currently focuses on programming models for the future of heterogeneous computing. His interests lie indeclarativelyrepresentingmappingsofiterationdomainstodataandincommu- nicatingcomplicatedarchitecturalconceptsandoptimizationssuccinctlytoadevel- oper audience, both through programming model improvements and through education. He has a Ph.D. in computer science from Imperial College London for work inthisarea. DavidKaeli receivedaB.S.andPh.D.inelectricalengineeringfromRutgersUni- versityandanM.S.incomputerengineeringfromSyracuseUniversity.HeisAsso- ciate Dean of Undergraduate Programs in the College of Engineering and a Full ProfessorontheECEfacultyatNortheasternUniversity,wherehedirectstheNorth- easternUniversityComputerArchitectureResearchLaboratory(NUCAR).Priorto joining Northeastern in 1993, he spent 12 years at IBM, the last 7 at T. J. Watson ResearchCenter,YorktownHeights,NY. He hasco-authored morethan 200 criti- cally reviewed publications. His research spans a range of areas, including micro- architecture to back-end compilers and software engineering. He leads a number ofresearchprojectsintheareaofGPUcomputing.HecurrentlyservesastheChair oftheIEEETechnicalCommitteeonComputerArchitecture.HeisanIEEEFellow and amember ofthe ACM. PerhaadMistry isaPh.D.candidateatNortheasternUniversity.HereceivedaB.S. inelectronicsengineeringfromtheUniversityofMumbaiandanM.S.incomputer engineering fromNortheastern University.Heiscurrentlyamemberofthe North- easternUniversityComputerArchitectureResearchLaboratory(NUCAR)andisad- visedbyDr.DavidKaeli.Heworksonavarietyofparallelcomputingprojects.He has designed scalable data structures for the physics simulations for GPGPU plat- forms and has also implemented medical reconstruction algorithms for heteroge- neous devices. His current research focuses on the design of profiling tools for heterogeneous computing. He is studying the potential of using standards such as OpenCL for building tools that simplify parallel programming and performance analysisacrossthe variety ofheterogeneous devices available today. xv

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.