ebook img

Tools for High Performance Computing 2011: Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing, September 2011, ZIH, Dresden PDF

165 Pages·2012·4.451 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Tools for High Performance Computing 2011: Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing, September 2011, ZIH, Dresden

Tools for High Performance Computing 2011 • Holger Brunst Matthias S. Mu¨ller (cid:2) Wolfgang E. Nagel Michael M. Resch (cid:2) Editors Tools for High Performance Computing 2011 Proceedings of the 5th International Workshop on Parallel Tools for High Performance Computing, September 2011, ZIH, Dresden 123 Editors MichaelM.Resch HolgerBrunst Ho¨chstleistungsrechenzentrum MatthiasS.Mu¨ller Stuttgart(HLRS) WolfgangE.Nagel Universita¨tStuttgart Zentrumfu¨rInformationsdienste Nobelstraße19 undHochleistungsrechnen(ZIH) 70569Stuttgart TechnischeUniversita¨tDresden Germany 01062Dresden Germany Frontcoverfigure:MPIcommunicationpatternofa3Dcloudsimulationon512CPUcores withdynamicloadbalancing.Hilbertspace-fillingcurvesareusedforthedistributionofthe simulationdata. ISBN978-3-642-31475-9 ISBN978-3-642-31476-6(eBook) DOI10.1007/978-3-642-31476-6 SpringerHeidelbergNewYorkDordrechtLondon LibraryofCongressControlNumber:2012948084 MathematicsSubjectClassification(2010):68-06,68Q85,68Q60,68N19,68U99,94A99,68N99 (cid:2)c Springer-VerlagBerlinHeidelberg2012 Thisworkissubjecttocopyright.AllrightsarereservedbythePublisher,whetherthewholeorpartof thematerialisconcerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation, broadcasting,reproductiononmicrofilmsorinanyotherphysicalway,andtransmissionorinformation storageandretrieval,electronicadaptation,computersoftware,orbysimilarordissimilarmethodology nowknownorhereafterdeveloped.Exemptedfromthislegalreservationarebriefexcerptsinconnection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’slocation,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer. PermissionsforusemaybeobtainedthroughRightsLinkattheCopyrightClearanceCenter.Violations areliabletoprosecutionundertherespectiveCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,servicemarks,etc.inthispublication doesnotimply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevant protectivelawsandregulationsandthereforefreeforgeneraluse. While the advice and information in this book are believed to be true and accurate at the date of publication,neithertheauthorsnortheeditorsnorthepublishercanacceptanylegalresponsibilityfor anyerrorsoromissionsthatmaybemade.Thepublishermakesnowarranty,expressorimplied,with respecttothematerialcontainedherein. Printedonacid-freepaper SpringerispartofSpringerScience+BusinessMedia(www.springer.com) Preface In the pursuit of maintaining exponential growth in the performance of high- performance computers, the HPC community is currently targeting Exascale sys- tems. The initial planning for Exascale already started when the first Petaflop systemwasdelivered.Manychallengesneedtobeaddressedtoreachtherequired performance level. Scalability, energy efficiency, and fault-tolerance need to be increased by orders of magnitude.The goal can only be achievedwhen advanced hardware is combined with a suitable software stack. In fact, the importance of software is rapidly growing. As a result, many international projects focus on the necessary software. The International Exascale Software Project (IESP), the European Exascale Software Initiative (EESI), and the Virtual Institute for High ProductivitySupercomputing(VI-HPS)areexamples.Theyallsharetheviewthat toolsareofcomponentsofthesoftwarestack. The Parallel Tools Workshop that took place in Dresden on September 26–27, 2011, is the fifth in a series of workshops that started in 2007 at the High Per- formance Computing Center Stuttgart (HLRS). The goal of this series is to bring together tool developers and users from science and industry in an interactive environment.Participantsfromresearchanddevelopersfromscience andindustry wereinvitedtothisinteractiveworkshopwhichattractedscientistsfromalloverthe world. Thisyear’spresentationshavebeeninthefieldsofSystemManagement,Parallel DebuggingandPerformanceAnalysisfromawiderangeofscientificandindustrial tool developers. This includes tools from vendors such as Allinea, ClusterVision, Intel,RogueWaveSoftware,andSysFera,aswellasresearchinstitutions,including Technische Universita¨t Dresden, Universita¨t Erlangen, University of Oregon, and Rice University. Contribution from research and computer centers came from Barcelona Supercomputing Center, Research Center Ju¨lich, Karlsruhe Institute v vi Preface of Technology, Lawrence Berkeley National Laboratory, Lawrence Livermore NationalLaboratory,Pacific NorthwestNationalLaboratory,and the High Perfor- manceComputingCenterStuttgart. Dresden,Germany HolgerBrunst MatthiasS.Mu¨ller WolfgangE.Nagel MichaelM.Resch Contents 1 CreatingaToolSetforOptimizingTopology-Aware NodeMappings ............................................................. 1 MartinSchulz,AbhinavBhatele,Peer-TimoBremer,Todd Gamblin,KatherineIsaacs,JoshuaA.Levine,andValerio Pascucci 1.1 MotivationandBackground ......................................... 1 1.2 GainingInsightfromMultiplePerspectives ........................ 3 1.2.1 TheHACModel............................................ 3 1.2.2 DomainsRelevantforNodeMappings.................... 4 1.2.3 MappingCommunicationtoHardwareData.............. 4 1.3 CreatingaFlexibleMeasurementEnvironment .................... 5 1.3.1 ConcurrentMeasurementsUsingPNMPI................. 5 1.3.2 GatheringDataintheHardwareDomain.................. 6 1.3.3 GatheringDataintheCommunicationDomain........... 6 1.3.4 PhaseAttribution ........................................... 6 1.3.5 DataStorageinStructuredYAMLFiles................... 7 1.3.6 InvestigatingBestCaseMappings......................... 7 1.3.7 ApproximatingCollectives................................. 8 1.4 PreliminaryResults .................................................. 8 1.5 RelatedWork......................................................... 10 1.6 ConclusionsandFutureWork ....................................... 10 References.................................................................... 12 2 UsingSamplingtoUnderstandParallelProgram Performance................................................................. 13 NathanR.TallentandJohnMellor-Crummey 2.1 Introduction........................................................... 13 2.2 CallPathProfiling.................................................... 15 2.3 PinpointingScalingBottlenecks..................................... 16 vii viii Contents 2.4 BlameShifting ....................................................... 17 2.4.1 ParallelIdlenessandOverheadinWorkStealing......... 18 2.4.2 LockContention............................................ 19 2.4.3 LoadImbalance............................................. 20 2.5 CallPathTracing..................................................... 21 2.6 Data-CentricPerformanceAnalysis................................. 22 2.7 Conclusions........................................................... 22 References.................................................................... 23 3 likwid-bench: An Extensible Microbenchmarking Platformforx86MulticoreComputeNodes............................. 27 JanTreibig,GeorgHager,andGerhardWellein 3.1 Introduction........................................................... 27 3.2 RelatedWork......................................................... 28 3.3 Architecture........................................................... 29 3.4 Benchmark.pttFileFormat.......................................... 30 3.5 CommandLineSyntax............................................... 31 3.6 Examples ............................................................. 32 3.6.1 IdentifyingBandwidthBottlenecks........................ 32 3.6.2 CharacterizingccNUMAProperties....................... 33 3.7 ConclusionandOutlook ............................................. 35 References.................................................................... 35 4 AnOpen-SourceTool-ChainforPerformanceAnalysis ............... 37 KevinCoulomb,AugustinDegomme,MathieuFaverge,and Franc¸oisTrahay 4.1 Introduction........................................................... 38 4.2 RelatedWork......................................................... 38 4.3 InstrumentingApplicationswithEZTRACE ........................ 39 4.3.1 TracingtheExecutionofanApplication.................. 39 4.3.2 InstrumentinganApplication.............................. 40 4.4 CreatingTraceFileswithGTG ..................................... 41 4.4.1 OverviewofGTG .......................................... 41 4.4.2 InteractionBetweenGTGandEZTRACE................. 42 4.5 AnalyzingTraceFileswithVITE ................................... 42 4.5.1 AGenericTraceVisualizer ................................ 43 4.5.2 DisplayingMillionsofEvents ............................. 43 4.6 Evaluation ............................................................ 45 4.6.1 OverheadofTraceCollection.............................. 45 4.6.2 NASParallelBenchmarks.................................. 46 4.7 ConclusionandFutureWork ........................................ 47 References.................................................................... 47 Contents ix 5 DebuggingCUDAAcceleratedParallelApplications withTotalView.............................................................. 49 ChrisGottbrathandRoydLu¨dtke 5.1 Introduction........................................................... 49 5.1.1 HPCDebuggingChallenges ............................... 50 5.1.2 CUDA and HeterogeneousAcceleration Architectures................................................ 50 5.1.3 ChallengesIntroducedbyCUDA.......................... 51 5.1.4 TheTotalViewDebugger................................... 52 5.2 TotalViewforCUDA................................................. 53 5.2.1 PreviousExperience:Cell.................................. 53 5.2.2 TheNVIDIAGPUArchitectureandCUDA.............. 54 5.2.3 TheTotalViewModel:ExtendedforCUDA.............. 55 5.2.4 ChallengesandFeatures.................................... 56 6 Advanced Memory Checking Frameworks for MPI ParallelApplicationsinOpenMPI....................................... 63 ShiqingFan,RainerKeller,andMichaelResch 6.1 Introduction........................................................... 64 6.2 OverviewofDebuggingTools....................................... 65 6.2.1 Valgrind..................................................... 65 6.2.2 IntelPin ..................................................... 66 6.3 DesignandImplementation.......................................... 67 6.3.1 ValgrindExtensions ........................................ 68 6.3.2 MemPin..................................................... 69 6.4 MemoryChecksinParallelApplication ............................ 70 6.4.1 Pre-communicationChecks................................ 70 6.4.2 Post-communicationChecking ............................ 72 6.5 PerformanceImplications............................................ 73 6.6 DetectableErrorClassesandFindingsfromActual Applications.......................................................... 75 6.7 Conclusion............................................................ 77 References.................................................................... 77 7 Score-P:AJointPerformanceMeasurementRun-Time InfrastructureforPeriscope,Scalasca,TAU,andVampir ............ 79 AndreasKnu¨pfer,ChristianRo¨ssel,DieteranMey,Scott Biersdorff,Kai Diethelm,Dominic Eschweiler,Markus Geimer,MichaelGerndt,DanielLorenz,Allen Malony, WolfgangE.Nagel,YuryOleynik,PeterPhilippen,Pavel Saviankou,DirkSchmidl,SameerShende,RonnyTschu¨ter, MichaelWagner,BertWesarg,andFelixWolf 7.1 Introduction........................................................... 79 7.1.1 MotivationforaJointMeasurementInfrastructure....... 80 7.2 TheSILCandPRIMAProjects...................................... 81 7.3 Score-P................................................................ 83 x Contents 7.4 TheOpenTraceFormatVersion2................................... 84 7.4.1 TheSIONlibOTF2Substrate.............................. 84 7.5 TheCUBE4FormatandGUI........................................ 85 7.6 TheOPARI2InstrumentorforOpenMP ............................ 85 7.7 TheOn-LineAccessInterface....................................... 86 7.8 EarlyEvaluation...................................................... 87 7.8.1 Run-TimeMeasurementOverhead........................ 87 7.8.2 TraceFormatMemoryConsumption...................... 88 7.9 ConclusionandOutlook ............................................. 89 References.................................................................... 90 8 Trace-BasedPerformanceAnalysisforHardwareAccelerators...... 93 GuidoJuckeland 8.1 Introduction........................................................... 93 8.2 RelatedWork......................................................... 94 8.3 GatheringAcceleratorRelatedPerformanceInformation.......... 94 8.3.1 AcceleratorAPIs............................................ 95 8.3.2 TrackingAcceleratorEvents............................... 97 8.3.3 IncludingAcceleratorSpecificData....................... 99 8.4 Example: Integration of CUDA and OpenCL intoVampirTrace/Vampir............................................ 100 8.5 SummaryandFutureWork .......................................... 103 References.................................................................... 103 9 Folding:DetailedAnalysiswithCoarseSampling...................... 105 HaraldServat,Germa´nLlort,JuditGime´nez,KevinHuck, andJesu´sLabarta 9.1 Introduction........................................................... 105 9.2 Folding:InstrumentationandSampling............................. 106 9.2.1 TheHardwareCounters.................................... 108 9.2.2 TheCallstack ............................................... 109 9.3 ExampleofUsage.................................................... 110 9.4 ValidationoftheResults............................................. 113 9.5 RelatedWork......................................................... 115 9.6 ConclusionsandFutureDirections.................................. 116 References.................................................................... 117 10 AdvancesintheTAUPerformanceSystem.............................. 119 AllenMalony,SameerShende,WyattSpear,CheeWaiLee, andScottBiersdorff 10.1 Introduction........................................................... 119 10.2 InstrumentationOfGPUAcceleratedCode......................... 120 10.2.1 SynchronousMethod....................................... 120 10.2.2 EventQueueMethod....................................... 121 10.2.3 CallbackMethod............................................ 122 10.2.4 TAUPerformanceSystemImplementation ............... 122

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.