ebook img

Charm++ for Productivity and Performance PDF

28 Pages·2012·0.86 MB·English
by  
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Charm++ for Productivity and Performance

Charm++ for Productivity and Performance A Submission to the 2011 HPC Class II Challenge Laxmikant V. Kale Anshu Arya Abhinav Bhatele Abhishek Gupta Nikhil Jain Pritish Jetley Jonathan Lifflander Phil Miller Yanhua Sun Ramprasad Venkataraman Lukasz Wesolowski Gengbin Zheng Parallel Programming Laboratory DepartmentofComputerScience UniversityofIllinoisatUrbana-Champaign May 7, 2012 Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 1/24 Benchmarks Required Dense LU Factorization 1D FFT Random Access Optional Molecular Dynamics Barnes-Hut Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 2/24 Metrics: Performance Our Implementations in Charm++ Code Machine Max Cores Best Performance LU Cray XT5 8K 67.4% of peak FFT IBM BG/P 64K 2.512 TFlop/s RandomAccess IBM BG/P 64K 22.19 GUPS Cray XE6 16K 1.9 ms/step (125K atoms) MD IBM BG/P 64K 11.6 ms/step (1M atoms) Barnes-Hut IBM BG/P 16K 27×109 interactions/s Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 3/24 Remember: Lots of freebies! automatic load balancing, fault tolerance, overlap, composition, portability Metrics: Code Size Our Implementations in Charm++ Code C++ CI Total1 Libraries LU 1231 418 1649 BLAS FFT 112 47 159 FFTW, Mesh RandomAccess 155 23 178 Mesh MD 645 128 773 Barnes-Hut 2871 56 2927 TIPSY C++ Regular C++ code CI Parallel interface descriptions and control flow DAG 1Required logic, excluding test harness, input generation, verification, etc. Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 4/24 Metrics: Code Size Our Implementations in Charm++ Code C++ CI Total1 Libraries LU 1231 418 1649 BLAS FFT 112 47 159 FFTW, Mesh RandomAccess 155 23 178 Mesh MD 645 128 773 Barnes-Hut 2871 56 2927 TIPSY C++ Regular C++ code CI Parallel interface descriptions and control flow DAG Remember: Lots of freebies! automatic load balancing, fault tolerance, overlap, composition, portability 1Required logic, excluding test harness, input generation, verification, etc. Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 4/24 Block-centric (cid:73) Algorithm from a block’s perspective (cid:73) Agnostic of processor-level considerations Separation of concerns (cid:73) Domain specialist codes algorithm (cid:73) Systems specialist codes tuning, resource mgmt etc LU: Capabilities Composable library (cid:73) Modular program structure (cid:73) Seamless execution structure (interleaved modules) Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 5/24 Separation of concerns (cid:73) Domain specialist codes algorithm (cid:73) Systems specialist codes tuning, resource mgmt etc LU: Capabilities Composable library (cid:73) Modular program structure (cid:73) Seamless execution structure (interleaved modules) Block-centric (cid:73) Algorithm from a block’s perspective (cid:73) Agnostic of processor-level considerations Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 5/24 LU: Capabilities Composable library (cid:73) Modular program structure (cid:73) Seamless execution structure (interleaved modules) Block-centric (cid:73) Algorithm from a block’s perspective (cid:73) Agnostic of processor-level considerations Separation of concerns (cid:73) Domain specialist codes algorithm (cid:73) Systems specialist codes tuning, resource mgmt etc Lines of Code Module-specific CI C++ Total Commits Factorization 517 419 936 472/572 83% Mem. Aware Sched. 9 492 501 86/125 69% Mapping 10 72 82 29/42 69% Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 5/24 LU: Decomposition Previously factored block U U block A Active panel block T Trailing submatrix block A U U U U A T T T T A T T T T A T T T T A T T T T Column being factored Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 6/24 LU: Pseudo-Synchronous Scheduling Time Proc 1 Proc 2 Proc 3 Proc 4 Proc 5 Proc 6 Proc 7 ... Rank 1 Proc n update Reduction root Contribute to Reduction up Trailing Update Active Panel reduction tree Kaleetal. (PPL,Illinois) Charm++forProductivityandPerformance May7,2012 7/24

Description:
Barnes-Hut. IBM BG/P. 16K. 27 × 109 interactions/s. Kale et al. (PPL, Illinois). Charm++ for Productivity and Performance. May 7, 2012. 3 / 24
See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.