Performance Study: Abaqus/Standard 6.8-3 Stan Posey Director, Industry and Applications Market Development Panasas, Fremont, CA, USA Bill Loewe, Ph.D. Sr. Applications Engineer Panasas, Fremont, CA, USA Background on Abaqus/Standard Study Motivation Since Apr 2007, SIMULIA and Panasas have made joint investments in a business and technical alliance that ensures Abaqus will fully leverage Panasas PanFS This study demonstrates benefits of Panasas parallel file system and parallel storage for Abaqus/Standard 6.8-3 with tests for both single job and mulit-job computing Considerations Abaqus is an application from SIMULIA -- not a benchmark kernel The FEA model and tests are relevant to customer practice All tests were run on a dedicated system at Panasas The results were validated by SIMULIA SlidSeli d2e 2 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas 3 Panasas Study on Abaqus/Standard 6.8-3 Abaqus/Standard 6.8-3: Model S4b 5M DOF Non-linear Static Analysis Automotive engine block cylinder head bolt-up SlidSeli d3e 3 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas Abaqus/Standard I/O Scheme Job Task IO Scheme IO Operation Work Dir: serial IO Read nodes, CSM implicit solver start elements and Abaqus/Standard is element matrix control file generation and direct and single- assembly into global matrix step, with out-of-core Scratch Dir: parallel IO READS and WRITES matrix factor (dominant phase, Factor matrix as much as 85% out-of-core, -- I/O occurs in the sparse of total time, often I/O wait) reads/writes factor phase of the solver . -- this scheme is for static, if . . an eigen (Lanzcos) solution, . . then I/O can be VERY heavy . -- NOTE: Abaqus also has an FBS solve phase, stress recovery, Work Dir: serial IO implicit iterative solver multiple RHS’s Write solution results complete [100’s of GB’s of I/O] SlidSeli d4e 4 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas Panasas Study on Abaqus/Standard 6.8-3 Features of the Hardware System Configurations Features of Penguin cluster configuration: Processors: 2.3GHz QC AMD Opteron 8 nodes, 64 cores Nodes: 8 x 2 Sockets x 4 cores; 2 GB/core 10 GigE Interconnect: 10GigE CISCOSYSTEMS Local FS: Ext3, single drive per node, 160 GB SATA, 7200 RPM 10 GigE Features of the Panasas storage system: 3 shelves: 1 director + 10 storage blades NOTE: Panasas total 30 Each shelf 10 TB, total of 30 TB TB in 12U, installed and operational in just 1 hr! SlidSeli d5e 5 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas S4b Performance for Single Core Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS PanFS -- Num Ops PanFS -- Total Time 18000 Times for Single Job on a Single Core Local FS -- Num Ops Local FS -- Total Time 11% 15108 13674 Lower s 12000 12732 12770 d is n 5M DOF o better c Engine Block e S n i e NOTE: PanFS m NOTE: Num-Ops Ti 6000 times within 1% 11% Advantage al Difference is IO t in Total Time o T vs. Local FS 0 1 Job x 1 Core x 1 Node SlidSeli d6e 6 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas Numerical vs. IO Computational Profile Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS Local FS -- IO-Ops Job Profiles of Numerical Ops % vs. IO % 18000 Local FS -- Num-Ops PanFS -- IO-Ops PanFS -- Num-Ops 15108 13674 IO – 16% IO – 7% Lower s 12000 is d n better o c 5M DOF e Engine Block S n 97% i me 93% 84% NOTE: PanFS Ti 6000 Numerical Numerical l 11% Advantage a t Operations Operations o T in Total Time 50% vs. Local FS r e v ol S 0 1 Job x 1 Core x 1 Node SlidSeli d7e 7 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas S4b Performance for 1 Core x 8 Nodes Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS PanFS -- Num Ops PanFS -- Total Time 18000 Average Times of 8 Simultaneous Jobs Local FS -- Num Ops Local FS -- Total Time 5% 15064 14373 s d Lower n 12000 12641 12654 o is c 5M DOF e better Engine Block S n i e m NOTE: PanFS i NOTE: N-Ops T l 6000 times within 1% 5% Advantage a ot Difference is IO T in Total Time vs. Local FS 0 8 Jobs x 1 Core x 8 Nodes Average of 8 Jobs | Each on 1 Core | Each on 1 Node | 7 Cores Idle on Each Node SlidSeli d8e 8 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas Numerical vs. IO Computational Profile Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS Local FS -- IO-Ops Job Profiles of Numerical Ops % vs. IO % 18000 Local FS -- Num-Ops PanFS -- IO-Ops PanFS -- Num-Ops 15064 14373 IO – 19% s IO – 12% d Lower n o 12000 is c e better S n 5M DOF i Engine Block e m 97% i T 88% 81% l a NOTE: PanFS ot 6000 Numerical Numerical T 5% Advantage Operations Operations in Total Time 50% vs. Local FS r e v ol S 0 8 Jobs x 1 Core x 8 Nodes Average of 8 Jobs | Each on 1 Core | Each on 1 Node | 7 Cores Idle on Each Node SlidSeli d9e 9 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas S4b Performance for 8 Cores x 1 Node Abaqus/Standard 6.8-3: Comparison of PanFS vs. Local FS PanFS -- Total Time 6000 Singe Job on Single 8-Core Node Local FS -- Total Time 5495 s d n 46% o c Lower e 4000 S is n 5M DOF i better Engine Block e m i T 2946 l a NOTE: PanFS t o T 2000 46% Advantage in Total Time vs. Local FS 0 1 Job x 8 Cores x 1 Node SlidSeli d1e0 10 PPleleaassee K Keeeepp C Coonnfifdideenntitaial lt oB eCtuwseteonm CerS aCn da nPda Pnaasnaassas
Description: