| Contents Beyond the NAS Parallel Benchmarks: Heasurlng Dynamic Program Performance 1. AbriefhistoryofNPB and Grid Computing Applications 2. Whatis(not) beingmeasuredbyNPB? 3. Irregulardynamicapplications(UABenchmark) RobF.Vanderwljngaart (ComputerSciencesCorp.) 4. Wideareadistributedcomputing(NASGrid Benchmarks- NGB) RupakBlswas,HlchaeJFmmkin (NASDivision) HuiyuFeng(GeorgeWashingtonUnlve_) NASAAmesResearchCenter Hoffett Reld,CA I A Brief History of NPB A Brief History of NPB (_cont'd) Goal: Peasureperformanceofmodern(parallel) • EP Random-numbergeneral_ architecturesrunningscientificapps • IS Integer sort Contents: 5 kernels,3pseudoappa(implicitCFD) •CG Conjugategradient • HG HultJgddmethodforPoissioneqn Approach:NPBI: Paper-and-pencilspecs NPB2:Sourcecodeimplemental_ons • FT SpectTalmethod(FFT)forLaplaceeqn (F77/C/HPX) PBN: Sourcecodeimplementations (HPF/OpenHP/.lava) Pseudo • BT ADI; Block-Tridiagonalsystems apps: • SP ADI; ScalarPentadiagonalsystems • LU Lower-UppersymmetricGauss-Seidel J $ | What istested inNPB2? What isnottested inNPB2? _Name Network •_jmamica_ changing memory ao:esses "t funct_ns bmdvddth tatmcy I_lrw_ctth cache UA •Zrregula_r ao:esses ) ,EP v" •False shadng / cache coherence costs IS ¢" ,/ CG • System soitware I Grid computing middleware _1 fh'G •Fault tolerance NGB 'FT ] •Wlde area(public)networkbandwidth/latency v' SP UJ ! | Un_red AdaptiveMeshRefinement(UA} UA (cont'd) QSkuctured, static grids -_ fixed-static memory access Differentkindsoftriangularrefinement QMessage passing -, private address spaces Choice of parallelizat_on paradigm not (very)important Unsb'uctured, adaptive mesh refinement (Btswas, Ofiker) • No solver - only refine, (re)partilt_, and remap •Implementations: >MPI - SGI Origin, CrayT3E >OpenMP - SGI Origin >MultJthreading - Tera (Cray) MTA | i 2 1 UA _cont'd) UA (cont'd) Overviewof PLUM Overallassessment Code _no_ S_/ab#/ty Po_ab#/ty Parad/gm it_mase increase MPI 100% 70% Medium High OpenMP 10% 5% None Medium Mul_- threading 2% 7% High Low | UA (cont'd) UA(cont'd) Complete,practically usefulbenchmark: Flamepm_gaUon problem(Feng&Mavfiplis): •Includessolver •Scalarb'ansportecln: _ +V.VT=_AT-e-_F(T) •Is 3-dimensional •Vekx_tyfield given:linearproblem •DoescoarseninginaddilJonto refinement • Rectangulardomain •SpendslittleUmeon(re)parl_ioning/(re)mapping • Rectangularnonconformingelements •Is wellloadbalanced •Speclzalelementsofrelativelyhigh order(5_) •Requiresnodatafiles •Mixede_cplictt/impliciUtmeintegration •Is compact:NPB2apps~ S,000lines(SP,BT,LU) •Spatialrefinement/coarsening •Interface opscheapc_mparedtooverallscheme II 141 UA (cont'd) N/kS Grid Benchmarks (NGB) _ _ 3,__,veiofrefinement (hot) (Cl_ Goal: Heasureperformanceofmoderndistributed systems,emphasizing GridcompuUng, Alsogaugesfunctionality. Contents: 4 compoundtasks:3 pseudoapps(implicit CFD),2_, allfromNPB2. Approach:NGBI: Palx_'-and-pendlspecs. NGB2:Sourcecodeimplementations (NPB2/PBN+ Globus/L_ion/Corba/Condor/Java ...) Rame --_ J J NGB coned NGB cont'd Must define boundaries because: •ProvidesyntheticGridapp.forscientificcomputing: •Gridconcept/environment/infrastructurenotwetldefined ,_dataflowgraphcoupling NPBcodes •Gddhasverymanysoftware/hardwarecomponents •Specify: •Gridhascomplex functk_nalhierarchy ;_abstractsewioesneoessafy •Gridisatime/organization/applicationdependenttarget _,problemsizes(classes):S,W, A,B.C Cannot freeze benchmark Implementation because: •Donotspecify: • NoclearGridenvironmentwinner ;,-fauRrecoveryreqs •Nodear pictureofrepresentativeGridapplication securityreqs •Measureandreportturnaroundtime