Tools for High Performance Computing · · · Michael Resch Rainer Keller Valentin Himmler · Bettina Krammer Alexander Schulz Editors Tools for High Performance Computing Proceedings of the 2nd International Workshop on Parallel Tools for High Performance Computing, July 2008, HLRS, Stuttgart MichaelResch,[email protected] RainerKeller,[email protected] ValentinHimmler,[email protected] BettinaKrammer,[email protected] AlexanderSchulz,[email protected] Höchstleistungsrechenzentrum Stuttgart(HLRS) Nobelstr.19 70569Stuttgart Germany Frontcoverfigure:VisualisationofParallelToolsforHPC,Vampir,Totalview,Acumem,Kcachegrind andtheNECSX-8 ISBN 978-3-540-68561-6 e-ISBN 978-3-540-68564-7 DOI 10.1007/978-3-540-68564-7 LibraryofCongressControlNumber:2008927892 MathematicsSubjectClassification(2000):68-06,68N18,68Q8568Q60,68U99,94A99 (cid:2)c 2008Springer-VerlagBerlinHeidelberg Thisworkissubjecttocopyright.Allrightsarereserved,whetherthewholeorpartofthematerialis concerned,specificallytherightsoftranslation,reprinting,reuseofillustrations,recitation,broadcasting, reproductiononmicrofilmorinanyotherway,andstorageindatabanks.Duplicationofthispublication orpartsthereofispermittedonlyundertheprovisionsoftheGermanCopyrightLawofSeptember9, 1965,initscurrentversion,andpermissionforusemustalwaysbeobtainedfromSpringer.Violations areliableforprosecutionundertheGermanCopyrightLaw. Theuseofgeneraldescriptivenames,registerednames,trademarks,etc.inthispublicationdoesnot imply,evenintheabsenceofaspecificstatement,thatsuchnamesareexemptfromtherelevantprotective lawsandregulationsandthereforefreeforgeneraluse. Coverdesign:WMXDesign,Heidelberg Printedonacid-freepaper 987654321 springer.com Preface Developingsoftwareforcurrentandespeciallyforfuturearchitectureswillrequire knowledgeaboutparallelprogrammingtechniquesofapplicationsandlibrarypro- grammers.Multi-coreprocessorsarealreadyavailabletoday,andprocessorswitha dozenandmorecoresareonthehorizon. The major driving force in hardware development, the game industry, has al- ready shown interest in using parallel programming paradigms, such as OpenMP for further developments. Therefore developers have to be supported in the even morecomplextaskofprogrammingforthesenewarchitectures. HLRS has a long-lasting tradition of providing its user community with the most up-to-date software tools. Additionally, important research and development projects are worked on at the center: among the software packages developed are the MPI correctness checker Marmot, the OpenMP validation suite and the MPI- implementationsPACX-MPIandOpenMPI.Allofthesesoftwarepackagesarebe- ingextendedinthecontextofGermanandEuropeancommunityresearchprojects, suchasParMA,theInterActiveEuropeanGrid(I2G)projectandtheGermanCol- laborative Research Center (Sonderforschungsbereich 716). Furthermore, indus- trial collaborations, i.e. with Intel and Microsoft allow HLRS to get its software production-gradeready. In April 2007, a European project on Parallel Programming for Multi-core Ar- chitectures, in short ParMA was launched, with a major focus on providing and developingtoolsforparallelprogramming. This project is funded through the ITEA initiative and involves partners from industryandresearchfromapplicationprovidersandtoolsdevelopers,suchasplat- formproviderBull,AllineawithitsparalleldebuggerDDT,theCenterforInforma- tionServicesandHighPerformanceComputing(ZIH)withtheparallelperformance analyserVampir-NGandtheCentralInstituteforAppliedMathematics(ZAM)with Kojak/Scalasca. Asaspin-offofalltheseactivitiesthe1st ParallelToolsWorkshopwasheldon 7-9th of July, 2007 at the High-PerformanceComputingCenter Stuttgart(HLRS). Participants from research and developers from science and industry were invited to this interactive workshop which attracted 67 scientists from all over the world. v vi Preface The focus was on presentations on the various tools, but also on giving hands-on sessionstodemonstratethestrengthsofeachtool. With this year’s 2nd Parallel Tools Workshop on July the 7th/8th, HLRS wants to offer its industrial and scientific user community, precisely this information in the form of a thorough publication on the software packages, again ranging from debuggingtoolstoperformanceanalysisandbestpracticesinintegrateddeveloping environmentsforparallelplatforms.Thepapersofthisworkshoparepresentedhere. Lastyear’sworkshopbroughttogethersoftwaredevelopersfromtheUS,Germany, FranceandGreatBritain,andweexpectanevenwideraudiencethisyear. This year’s contributions are in the fields of Integrated Development Environ- ments, Parallel Debugging and Performance Analysis tools from a wide range of scientific and industrial tool developers. This includes tools from vendors such as Cray,Intel,IBM,Sun,Acumem,AllineaandTotalview,aswellasresearchinstitu- tions,includingtheUniversityofOregon,TechnicalUniversityofDresdenandthe ResearchCenterinJuelich. Stuttgart,April2008 MichaelResch,RainerKeller ValentinHimmler,BettinaKrammer AlexanderSchulz Contents I IntegratedDevelopmentEnvironments SunHPCClusterTools™7+:ABinaryDistributionofOpenMPI....... 3 TerryDontje,DonKerr,DanielLacher,PakLui,EthanMallove,Karen Norteman,RolfVandevaart,andLeonardWisniewski 1 Introduction.............................................. 3 2 History.................................................. 4 3 Sun-Drivenfeatures ....................................... 5 4 SunProductActivity ...................................... 13 5 ProsandCons ............................................ 15 6 Futureworkandconclusions................................ 16 References..................................................... 17 An Integrated Environment For the Development of Parallel Applications.................................................... 19 GregoryR.WatsonandCraigE.Rasmussen 1 Introduction.............................................. 19 2 Challenges............................................... 21 3 Architecture.............................................. 23 4 ASimpleCaseStudy ...................................... 28 5 FutureDirections ......................................... 31 6 Conclusion............................................... 33 References..................................................... 34 DebuggingMPIProgramsontheGridusingg-Eclipse ................ 35 Christof Klausecker, Thomas Ko¨ckerbauer, Robert Preissl, and DieterKranzlmu¨ller 1 Introduction.............................................. 35 2 RelatedWork............................................. 36 3 Overviewofg-EclipseApproach ............................ 37 4 RemoteBuilder........................................... 38 vii viii Contents 5 GridApplicationLaunchers ................................ 39 6 TraceViewer............................................. 39 7 ConclusionsandFutureWork ............................... 44 References..................................................... 44 II ParallelCommunicationandDebugging EnhancedMemorydebuggingofMPI-parallelApplicationsinOpen MPI........................................................... 49 ShiqingFan,RainerKeller,andMichaelResch 1 Introduction.............................................. 49 2 OverviewofMemcheck.................................... 50 3 DesignandImplementation................................. 51 4 PerformanceImplications .................................. 53 5 Detectableerrorclassesandfindingsinactualapplications....... 57 6 Conclusionandfuturework................................. 59 References..................................................... 60 MPICorrectnessCheckingwithMarmot ........................... 61 BettinaKrammer,TobiasHilbrich,ValentinHimmler,BlasiusCzink,Kiril Dichev,andMatthiasS.Mu¨ller 1 Introduction.............................................. 62 2 RelatedWork............................................. 62 3 DesignofMarmot......................................... 63 4 Collaborationwithothertools............................... 70 5 ExperienceswithrealApplications........................... 72 6 HowtoinstallanduseMarmot .............................. 75 7 ConclusionandFutureWork................................ 76 References..................................................... 76 MemoryDebugginginParallelandDistributedApplications ........... 79 ChrisGottbrath 1 Introduction.............................................. 79 2 TheChallengesofMemoryDebugginginParallel Development............................................. 80 3 ClassifyingMemoryErrors ................................. 80 4 DetectingMemoryLeaks................................... 82 5 TheMemoryScapeDebugger ............................... 82 6 MemoryScapeArchitecture................................. 83 7 MemoryScapeFeatures .................................... 84 8 MemoryScapeUsageTips.................................. 87 9 MemoryScapeUserCaseStudy:SIMULIAUsesMemoryScape toFindandFixBugsQuickly ............................... 88 10 FutureMemoryScapeProductPlans.......................... 90 11 Conclusion............................................... 90 Contents ix III PerformanceAnalysisTools SequentialPerformanceAnalysiswithCallgrindandKCachegrind ..... 93 JosefWeidendorfer 1 Introduction.............................................. 93 2 Callgrind:aCall-GraphbuildingOnlineCacheSimulator........ 97 3 KCachegrind:ProfileVisualization...........................105 4 UsageExample...........................................110 5 FutureDevelopment.......................................111 References.....................................................113 ImprovingCacheUtilizationUsingAcumemVPE .................... 115 ErikHagersten,MatsNilssonandMagnusVesterlund 1 Introduction..............................................116 2 ThroughputStudyofSPECCPU2006 .......................118 3 FirstGenerationPerformanceToolsBasedonHardware Counters.................................................120 4 Enter:TheNewPerformanceTool ...........................122 5 UtilizationStudyoftheWorstSPECCPU2006Applications ....126 6 TuningExample:179.art ...................................128 7 TuningExample:RevisitingtheThroughputApplications .......132 8 Conclusion...............................................134 References.....................................................135 ParallelPerformanceAnalysisTools TheVampirPerformanceAnalysisTool-Set ......................... 139 AndreasKnu¨pfer,HolgerBrunst,JensDoleschal,MatthiasJurenz,Matthias Lieber,HolgerMickler,MatthiasS.Mu¨ller,andWolfgangE.Nagel 1 Introduction..............................................139 2 PerformanceAnalysisviaProfilingorTracing .................140 3 InstrumentationwithVampirTrace ...........................141 4 Run-TimeMeasurementandEventRecording .................144 5 TraceVisualizationwithVampirandVampirServer.............148 6 RelatedWork.............................................154 7 ConclusionsandFutureWork ...............................154 References.....................................................155 UsageoftheSCALASCAtoolsetforscalableperformanceanalysisof large-scaleparallelapplications ................................... 157 FelixWolf,BrianJ.N.Wylie,ErikaA´braha´m,DanielBecker,Wolfgang Frings,Karl Fu¨rlinger,Markus Geimer,Marc-Andre´ Hermanns,Bernd Mohr,ShirleyMoore,MatthiasPfeifer,andZolta´nSzebenyi 1 Introduction..............................................157 2 Overview ................................................158 3 InstrumentationandMeasurement ...........................159 x Contents 4 TraceAnalysis............................................162 5 UnderstandingPerformanceBehavior ........................164 6 Outlook .................................................166 References.....................................................167 EvolutionofaParallelPerformanceSystem ......................... 169 AllenD.Malony,SameerShende,AlanMorris,ScottBiersdorff,Wyatt Spear,KevinHuck,andAroonNataraj 1 Introduction..............................................169 2 TAUPerformanceSystemDesignandArchitecture ............170 3 TAUInstrumentation......................................172 4 TAUMeasurement .......................................178 5 TAUAnalysis............................................183 6 ConclusionandFutureWork................................186 References.....................................................188 CrayPerformanceAnalysisTools .................................. 191 LuizDeRose,BillHomer,DeanJohnson,SteveKaufmann,andHeidiPoxon 1 Introduction..............................................191 2 TheCrayPerformanceAnalysisTools........................192 3 ConclusionsandFutureWork ...............................198 References.....................................................199 Index .............................................................201 List of Contributors ErikaA´braha´m,156 DanielLacher,3 DanielBecker,156 MatthiasLieber,139 ScottBiersdorff,168 PakLui,3 HolgerBrunst,139 EthanMallove,3 BlasiusCzink,61 AllenD.Malony,168 LuizDeRose,191 HolgerMickler,139 KirilDichev,61 BerndMohr,156 JensDoleschal,139 ShirleyMoore,156 TerryDontje,3 AlanMorris,168 ShiqingFan,49 MatthiasS.Mu¨ller,139,61 WolfgangFrings,156 WolfgangE.Nagel,139 KarlFu¨rlinger,156 AroonNataraj,168 MarkusGeimer,156 MatsNilsson,114 ChrisGottbrath,79 KarenNorteman,3 ErikHagersten,114 MatthiasPfeifer,156 Marc-Andre´ Hermanns,156 HeidiPoxon,191 TobiasHilbrich,61 RobertPreissl,35 ValentinHimmler,61 CraigERasmussen,19 BillHomer,191 MichaelResch,49 KevinHuck,168 SameerShende,168 DeanJohnson,191 WyattSpear,168 MatthiasJurenz,139 Zolta´nSzebenyi,156 SteveKaufmann,191 RolfVandevaart,3 RainerKeller,49 MagnusVesterlund,114 DonKerr,3 GregoryR.Watson,19 ChristofKlausecker,35 JosefWeidendorfer,93 AndreasKnu¨pfer,139 LeonardWisniewski,3 ThomasKo¨ckerbauer,35 FelixWolf,156 BettinaKrammer,61 BrianJ.N.Wylie,156 DieterKranzlmu¨ller,35 xi