ebook img

High Performance Computing PDF

695 Pages·2017·47.247 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview High Performance Computing

High Performance Computing Modern Systems and Practices Thomas Sterling Matthew Anderson Maciej Brodowicz School of Informatics, Computing, and Engineering Indiana University, Bloomington Foreword by C. Gordon Bell MorganKaufmannisanimprintofElsevier 50HampshireStreet,5thFloor,Cambridge,MA02139,UnitedStates Copyright©2018ElsevierInc.Allrightsreserved. Nopartofthispublicationmaybereproducedortransmittedinanyformorbyanymeans,electronicor mechanical,includingphotocopying,recording,oranyinformationstorageandretrievalsystem,without permissioninwritingfromthepublisher.Detailsonhowtoseekpermission,furtherinformationaboutthe Publisher’spermissionspoliciesandourarrangementswithorganizationssuchastheCopyrightClearance CenterandtheCopyrightLicensingAgency,canbefoundatourwebsite:www.elsevier.com/permissions. ThisbookandtheindividualcontributionscontainedinitareprotectedundercopyrightbythePublisher(other thanasmaybenotedherein). Notices Knowledgeandbestpracticeinthisfieldareconstantlychanging.Asnewresearchandexperiencebroadenour understanding,changesinresearchmethods,professionalpractices,ormedicaltreatmentmaybecome necessary. Practitionersandresearchersmustalwaysrelyontheirownexperienceandknowledgeinevaluatingandusing anyinformation,methods,compounds,orexperimentsdescribedherein.Inusingsuchinformationormethods theyshouldbemindfuloftheirownsafetyandthesafetyofothers,includingpartiesforwhomtheyhavea professionalresponsibility. Tothefullestextentofthelaw,neitherthePublishernortheauthors,contributors,oreditors,assumeany liabilityforanyinjuryand/ordamagetopersonsorpropertyasamatterofproductsliability,negligenceor otherwise,orfromanyuseoroperationofanymethods,products,instructions,orideascontainedinthe materialherein. LibraryofCongressCataloging-in-PublicationData AcatalogrecordforthisbookisavailablefromtheLibraryofCongress BritishLibraryCataloguing-in-PublicationData AcataloguerecordforthisbookisavailablefromtheBritishLibrary ISBN:978-0-12-420158-3 ForinformationonallMorganKaufmannpublicationsvisit ourwebsiteathttps://www.elsevier.com/books-and-journals Publisher:KateyBirtcher AcquisitionEditor:SteveMerken DevelopmentalEditor:NateMcFadden ProductionProjectManager:PunithavathyGovindaradjane Designer:MarkRogers TypesetbyTNQBooksandJournals Dedicated to Dr. Paul C. Messina Leader, colleague, collaborator, mentor, friend Foreword High Performance Computing is a needed follow-on to Becker and Sterling’s 1994 creation of the Beowulfclustersrecipetobuildscalablehighperformancecomputers(alsoknownasasupercomputers) from commodity hardware. Beowulf enabled groups everywhere to build their own supercomputers. NowwithhundredsofBeowulfclustersoperatingworldwide,thiscomprehensivetextaddressesthecrit- ical missing link of an academic course for training domain scientists and engineersdand especially computer scientists. Competence involves knowing exactly how to create and run (e.g., controlling, debugging, monitoring, visualizing, evolving) parallel programs on the congeries of computational elements(cores)thatconstitutetoday’ssupercomputers. Masteryoftheseever-increasing,scalable,parallelcomputingmachinesgivesentryintoacompar- ativelysmallbutgrowingelite,andistheauthors’goalforreadersofthebook.Lestthereaderbelieves the name is unimportant: the first conference in 1988 was the ACM/IEEE Supercomputing Confer- ence, also known as Supercomputing 88; in 2006 the name evolved to the International Conference on High Performance Computing, Networking, Storage, and Analysis, abbreviated SCXX. About 11,000 peopleattended SC16. Itishardtodescribea“supercomputer,”butIknowonewhenIseeone.Personally,Ineverpassup a visit to a supercomputer having seen the first one in 1961dthe UNIVAC LARC (Livermore Advanced Research Computer) at Lawrence Livermore National Laboratory, specified by Edward Tellertorunhydrodynamicsimulationsfornuclearweaponsdesign.LARCconsistedofafewdozen cabinetsofdenselypackedcircuitboardinterconnectedwithafewthousandmilesofwiresandafew computationalunitsoperatingata100kHzrate.In2016thelargestSunwayLightsupercomputerin ChinaoperatedatrilliontimesfasterthanLARC.Itconsistsofover10millionprocessingcoresoper- ating at a 1.5GHz rate, and consumes 15MW. The computer is housed in four rows of 40 cabinets, containing256processingnodes.Anodehasfourinterconnected8MBprocessors,controlling64pro- cessingelementsorcores.Thusthe10.6millionprocessingelementsdeliver125peakpetaflops,i.e., 160cabinets(cid:1)256physicalnodes(cid:1)4computers(cid:1)(1controlþ8(cid:1)8)processingelementsorcores with a 1.31PB memory (160(cid:1)256(cid:1)4(cid:1)8GB). Several of the Top 500 supercomputers have O(10,000) computing nodes that connect and control graphic processing units (GPUs) with O(100) cores. Today’s challenge for computational program developers is designing the architecture and implementationof programsto utilize these megaprocessor computers. From a user perspective, the “ideal high performance computer” has an infinitely fast clock, executes a single instruction stream program operating on data stored in an infinitely large and fast single-memory, and comes in any size to fit any budget or problem. In 1957 Backus established the von Neumann programming model with Fortran. The first or “Cray” era of supercomputing from the 1960s through the early 1990s sawthe evolutionof hardware to support this simple, easy-to-use idealbyincreasingprocessorspeed,pipelininganinstructionstream,processingvectorswithasingle instruction,andfinallyaddingprocessorsforaprogramheldinthesingle-memorycomputer.Bythe early1990sevolutionofasinglecomputertowardtheidealhadstopped:clockspeedsreachedafew GHz,andthenumberofprocessorsaccessingasinglememorythroughinterconnectionwaslimitedto a few dozen. Still, the limited-scale, multiple-processor shared memory is likely to be the most straightforward toprogram and use! xix xx FOREWORD Fortunately,inthemid-1980sthe“killermicroprocessor”arrived,demonstratingcosteffectiveness and unlimited scaling just by interconnecting increasingly powerful computers. Unfortunately, this multicomputererahasrequiredabandoningboththesinglememoryandthesinglesequentialprogram idealofFortran.Thus“supercomputing”hasevolvedfromahardwareengineeringdesignchallengeof the single (mono-memory) computer of the Seymour Cray era (1960e95) to a software engineering designchallengeofcreatingaprogramtoruneffectivelyusingmulticomputers.Programsfirstoper- ated on 64 processing elements (1983), then 1000 elements (1987), and now 10million (2016) pro- cessing elements in thousands of fully distributed (mono-memory) computers in today’s multicomputerera.Soineffect,today’shighperformancecomputing(HPC)nodesarelikethesuper- computersofadecadeago,asprocessingelementshavegrown36%peryearfrom1000computersin 1987to 10million processing elements (contained in100,000 computernodes). HighPerformanceComputingistheessentialguideandreferenceformasteringsupercomputing, as the authors enumerate the complexity and subtleties of structuring for parallelism, creating, and running these large parallel and distributed programs. For example, the largest climate models simulateocean,ice,atmosphere,andlandconcurrentlycreatedbyateamofadozenormoredomain scientists, computational mathematicians, and computerscientists. Programcreationincludesunderstandingthestructureofthecollectionofprocessingresourcesand theirinteractionfordifferentcomputers,frommultiprocessorstomulticomputers(Chapters2and3), andthevariousoverallstrategiesforparallelization(Chapter9).Othertopicsincludesynchronization andmessage-passingcommunicationamongthepartsofparallelprograms(Chapters7and8),addi- tional libraries that form a program (Chapter 10), file systems (Chapter 18), long-term mass storage (Chapter17),andcomponentsforthevisualizationofresults(Chapter12).Standardbenchmarksfor asystemgiveanindicationofhowwellyourparallelprogramislikelytorun(Chapter4).Chapters16 and17introduceanddescribethetechniquesforcontrollingacceleratorsandspecialhardwarecores, especiallyGPUs,attachedtonodestoprovideanextratwoordersofmagnitudemoreprocessingper node.TheseattachmentsareanalternativetothevectorprocessingunitsoftheCrayera,andtypified by the Compute Unified Device Architecture, or CUDA, model and standard to encapsulate parallelism acrossdifferent accelerators. Unlike the creation, debugging, and execution of programs that run interactively on a personal computer,smartphone,orwithinabrowser,supercomputerprogramsaresubmittedviabatchprocess- ingcontrol.Runningaprogramrequiresspecifyingtothecomputertheresourcesandconditionsfor controlling your program with batch control languages and commands (Chapter 5), getting the pro- gramintoareliableanddependablestatethroughdebugging(Chapter14),checkpointing,i.e.,saving intermediateresultsonatimelybasisasinsuranceforthecomputationalinvestment(Chapter20),and evolvingandenhancinga program’s efficacythroughperformance monitoring (Chapter 13). Chapter21concludeswithaforwardlookattheproblemsandalternativesformovingsupercom- putersandtheabilitytousethemtopetascaleandbeyond.Infact,theonlypartofHPCnotdescribed in this book is the incredible teamwork and evolution of team sizes for writing and managing HPC codes. However, the most critical aspect of teamwork resides with the competence of the individual members. This book is yourguide. Gordon Bell October2017 Preface THE PURPOSE OF THIS TEXTBOOK Highperformancecomputing(HPC)isamultidisciplinaryfieldcombininghardwaretechnologiesand architecture,operatingsystems,programmingtools,software,andend-userproblemsandalgorithms. Acquiring the necessary concepts, knowledge, and skills for capable engagement within HPC routinelyinvolvesanapprenticeshipatoneofafewrarefiedsiteswiththeessentialexperts,facilities, andmissionobjectives.Whetherone’sgoalsareassociatedwithspecificend-userdomainssuchassci- ence,engineering,medicine,orcommercialapplications,orfocusedontheenablingsystems’technol- ogies and methodologies that make supercomputing effective, the entry-level practitioner must embraceawiderangeofdistinctbutinterrelatedandinterdependentareasthatrequireanunderstand- ing of their synergies to yield the necessary expertise. The study material could easily encompass a dozenormorebooksandmanuals,buteventogethertheywouldnotdeliverthenecessaryperspective thatfullyembodiesthefieldasawholeandguidesthestudentinpursuitofaneffectivepathtoachieve sufficient expertise. Thistextbookisdesignedtobridgethegapbetweenmyriadsourcesofnarrowfocusandtheneed forasinglesourcethatspansandinterconnectstherangeofdisciplinescomprisingtheHPCfield.Itis anentry-leveltextrequiringaminimumofprerequisites,butprovidesafullunderstandingofthedo- mainsandtheirmutualeffectsthatmakesupercomputinganinterdisciplinaryfield.Fromapractical point of view, this textbook builds valuable and specific skill-sets for parallel programming, debug- ging,performancemonitoring,systemresourceusageandtools,andresultvisualizationamongother usefultechniques.Theseskillsareprovidedinthereinforcingcontextofbasicfoundationalconcepts of prolonged relevance, andknowledge ofdetailed attributes ofhardwareand software system com- ponentsmorelikelyto evolveover time. The textbook is chartered as support for a single-semester course for beginners to prepare them- selves for a diversity of roles in supercomputing to pursue their chosen professional career goals. It is appropriate for future computational scientists who are dedicated to the use of supercomputers to solve science, engineering, or societal-domain applications, among others. It provides a base-level descriptionofpossibletargetcapabilitiesforsystemdesignersandengineersinhardwareandsoftware. It also is a foundation for those who wish to proceed as researchers in supercomputing itself, as an introductorypresentationofconventionalsystemsandpracticesaswellasarepresentationofthechal- lengesfacingthisexcitingdomainofexploration.Thebookisequallyappropriateforthoseengagedin supportingsupercomputingenvironments,suchasdatacenters andsystemadministrators,operators, and management. In informing future professionals, the textbook can be used in multiple ways. It servesasareferenceworkofbasicinformationforsupercomputing.Itprovidesasequenceoflecture contentforclassroomdelivery.Itsupportsahands-onapproachwithsubstantialexamples,allofwhich canbeexecutedonparallelcomputers,andexercisestoguidestudentsastheylearnbydoing.Itmakes clearwhereskill-setsandtrainingarepresented,withaneasy-to-learntutorialstyle.Conceptsarepre- sentedinadetailedbutaccessibleformtoestablishthe“why”ofmethodsconveyedandassistfuture users in decision-making based on fundamental truths, factors, and sensitivities. Finally, this book unifieswithinthesamecontextthemanysetsoffactsassociatedwiththemultiplicityofsubdisciplines that incombination make up the field of supercomputing. xxi xxii PREFACE ORGANIZATION OF THIS BOOK This textbook serves as a bridge between the reader’s initial curiosity, interests, and requirements in HPC andthe ultimate knowledge,capabilities,and proficiencytobe acquired through its study. It is a starting point for those in pursuit of a number of different possible professional paths that share a commonfoundationinthenatureanduseofthesestate-of-the-artsystems.Whetherthereaderintends ultimatelytobeabletobuildhardwareorsoftwaresystems,usesuchsystemsasacriticaltoolinthe pursuitofotherfieldsinscience,engineering,commerce,orsecurity,conductresearchtodevisefuture means of pushing the state of the art inHPC, oradminister, manage, andmaintain HPC systems for otherusers,thetextbookisstructuredtocreateaseamlessflowoftopics,eachbenefitingfromthose precedingwhilecontributingtothefoundationssupportingthosefollowing.Thusthebookpresentsits major subjects in an order that provides early basic skills of HPC use evenas it conveys underlying conceptsuponwhichadeeperunderstandingofthesecomplexsystemsandtheiruseisbased.Where necessary, an introductory view of a topic is givenwith enough informationto consider other topics that are dependent, only to return in greater depth in later chapters. The readers’ understanding and capabilities are ratcheted up through incremental enhancement across the diversity of interrelated topical areas. Thetextbookisaboutcomputingperformance.Forcurrentandnext-generationsystems,thismeans theuseandexploitationofworkloadparallelismtoachievescalabilityandthemeansofmanagingdata toachieveefficiencyofoperation.Thefourprincipaloverarchingsubjectdomainsarelistedbelow. • System hardware architecture, and enabling technologies. • Programming models,interfaces, andmethods. • System software environments, support, andtools. • Parallel algorithms and distributeddata structures. Thiswouldsuggestanobviouspedagogical organizationofthetextbookbasedonalogical flow. But there is another dimension to HPC: alternative strategies for organizing and coordinating paral- lelism and data management, and the roles of each of the component layers that contribute to them. This book presentsfour major strategies. • Jobstream parallelism, throughput, orcapacitycomputing. • Communicating sequential processes, ormessage passing. • Multiple-threaded shared memory. • SIMD orgraphics processing unit(GPU) accelerated. Fromapedagogicalperspective,theauthorswishtoconveythreekindsofinformationtofacilitate thelearningprocessandhopefullyalsotheenjoymentofthereader.Atthefoundationallevelarethe conceptsthatestablishunderstandingoftheunderlyingprinciplesthatguidetheformandfunctionof HPC.Thereisalotofbasicinformationaswellassomecultural(who,what,when)factsmakingup thenecessarycollectionofknowledgethatprovidestheframework(scaffolding)ofthefield.Finally, therearetheskill-setsthatteachhowtodothings.Whileadmittedlynotorthogonaltoeachother,the textbookapproachesthepresentationofallthematerialineachcaseasoneofthesethreeforms.For example,chapterswithheadingsthatbegin“TheEssential.”(suchas“TheEssentialOpenMP”)are crafted as skills modules with a tutorial presentation style for easiest learning. While the mixing of conceptsandknowledgeisunavoidable,separatesectionsemphasizeoneortheother.Theimportance PREFACE xxiii ofthisdistinctionisthatwhilemuchoftheknowledgeaboutthisrapidlyevolvingfield willchange, andevenbecomeobsoleteinsomecases,thebasic conceptsoffered areinvariantwithtimeandwill servethereaderwithstronglong-termunderstandingevenasthedetailsofsomespecificmachineor language may becomelargelyirrelevant over time. Thistextbookisorganizedfirstaccordingtothefourseparatemodelsofparallelcomputation,and thenforeachmodelaccordingtotheunderlyingconcepts,therelevantknowledgewithanemphasison systemarchitecturesthatsupportthem,andtheskillsrequiredtotrainthereaderinhoweachclassof systemisprogrammed.Inpreparationforthisapproach,someinitialmaterial,includingtheintroduc- torychapter,providesthebasicpremisesandcontextuponwhichthetextbookisestablished.Eachof thefourparallelcomputingmodelsisdescribedintermsofconcepts,knowledgedetails,andprogram- mingskills.Butwhilethiscoversalargepartoftheusefulinformationneededtounderstandandpro- gramHPCsystems,itmissessomeofthecross-cuttingtopicsrelatedtoenvironmentsandtoolsthatare animportant,evenpervasive,aspectofthefullcontextofasystemthatmakesittrulyusefulbeyond thelimitsofanidealizedbeginner’sviewpoint.Afterall,theintentofthetextbookistogivethereader an effective working ability to take advantage of supercomputers in the professional workplace for diverse purposes. Thus a number of important and useful tools and methods of their use are given in aneffectiveorder.Finally, the reader is givenaclear picture ofthewide fieldofHPC, and where within this broader context the subject matter of this book fits. This can be used to guide planning forfuturepursuitsandmoreadvancedcoursesselectedinpartbasedonreaders’ultimateprofessional goals.Theoverall structure andflowofthistextbookare summarized below. I. INTRODUCTORY AND BASIC IDEAS (CHAPTERS 1 AND 4) These chapters provide a firm grounding on the basics, including an introduction to the domains of executionmodels,architectureconcepts,performanceandparallelismmetrics,andthedominantclass ofparallelcomputingsystems(commodityclusters).Theygiveafirstexperiencewithrunningparallel programsthroughtheuse ofaspecial kind ofbenchmarksthatallowmeasurement andcomparisons amongdifferent HPC systems.It is here thata sense ofthe history, the evolutionofthe contributing ideas,and the cultureof the field is first giventothe reader. II. THROUGHPUT COMPUTING FOR JOB-STREAM PARALLELISM (CHAPTERS 5 AND 11) Although among the simplest ways to take advantage of parallel computers, throughput computing (also referred to as capacity computing) as widely used is sufficient for many objectives and workflows. It can also prove to be among the most efficient, as it usually exhibits the most coarse- grained tasks and a minimum of control overheads. Widely used middleware that manages job-stream workloads such as SLURM and PBS are given in tutorial form for both independent jobs and relatedsets, such as parameter sweeps and Monte Carlosimulations. III. SHARED-MEMORY MULTITHREADED COMPUTING (CHAPTERS 6 AND 7) Oneofthedominantmodelsofuserparallelprocessingistask(orthread)parallelisminthecontextof sharedmemory.Alltheuserdatacan bedirectlyaccessedbyanyoftheuserthreads,andsequential xxiv PREFACE consistency is assumed by hardware cache coherence. This part of the book describes this parallel executionmodel,thecharacteristicsofshared-memorymultiprocessors,andtheOpenMPparallelpro- gramming language. IV. MESSAGE-PASSING COMPUTING (CHAPTER 8) Fortrulyscalableparallelcomputingthatmayemployamillioncoresormoreonasingleapplication, thedistributed-memoryarchitectureandcommunicatingsequentialprocessingexecutionmodelisthe dominantapproach.ThispartofthebookbuildsontopicsassociatedwiththenodesusedforSMPsand theclusterapproachpreviouslydescribedforthroughputcomputingbyaddingthesemanticsofmes- sage passing, collective operations, and global synchronization. It is in this section that message- passinginterface(MPI)istaught,thesinglemostwidelyemployedprogramminginterfaceforscalable science and engineeringapplications. V. ACCELERATING GPU COMPUTING (CHAPTERS 15 AND 16) For certain widely used dataflow patterns, higher-level structures of specialized cores can provide exceptionalperformanceandenergyefficiency.Suchsubsystems,classifiedinthemostgeneralsense as“accelerators,”canspeedupapplicationsbymanytimes,sometimesbyoveranorderofmagnitude. AlsoreferredtoasGPGPUs,theseoftentaketheformofattachedarrayprocessors,butinsomecases arebeingintegratedwithinsingle-socketpackagesoreventhesamedie.Thispartofthetextbookde- scribesGPUstructures,availableproducts,andprogramming,withanemphasisononeprogramming interface, OpenACC. VI. BUILDING SIGNIFICANT PROGRAMS (CHAPTERS 9, 10, AND 12e14) BythispointinthebookthereaderiswellacquaintedwiththeprimarymodesofHPC,knowstherules for the principal programming interfaces, and has hands-on experience with making basic parallel functionsworkwithintheseframeworks.Butformorecomplicated,moresophisticated,moreuseful, andfranklymoreprofessionalsupercomputingprogramsanumberofadditionalmethodsandtoolsare required.ThissegmentofthetextbooktakestheHPCnovicefromthebeginnerleveltothatofuseful apprentice.Severalkeytopicsandskillsareintroducedheretogivethestudentthenecessaryabilities tobeusefulinsystemdesignandapplication.Firstamongtheseisabroadarrayofparallelalgorithms foradiversesetofneeds.Manyofthesearealreadymadeavailableincollectionsknownas“libraries” that can savethe application developer an enormous amount of time, if appropriately used. To get a program from its first draft to its final correct and efficient form requires a combined approach involvingparalleldebuggingforcorrectnessofanswersandperformanceoptimizationthroughoper- ation monitoring. Tools and methods for both are presented here, including the detailed skill-sets required.Finally,HPCrunstendtoproduceenormousamountsofdatadasmuchasterabytesorpeta- bytesofresultsinasingleexecution.Scientificvisualization,theproducingofimagesorevenmovies fromsuchmassivedatasets,istheonlypracticalwaytoachieveunderstandingoftheresultsofatech- nicalcomputingsimulation.Examplesofwidelyusedtoolsforthispurposearepresented,withessen- tial techniques tomakethem useful.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.