Table Of ContentMichael M. Resch · Johannes
Gebert · Hiroaki Kobayashi
Wolfgang Bez
Editors
Sustained Simulation
Performance
2021
12 3
Sustained Simulation Performance 2021
Michael M. Resch • Johannes Gebert
Hiroaki Kobayashi • Wolfgang Bez
Editors
Sustained Simulation
Performance 2021
Proceedings of the Joint Workshop
on Sustained Simulation Performance,
University of Stuttgart (HLRS)
and Tohoku University, 2021
Editors
Michael M. Resch Johannes Gebert
High-Performance Computing Center High-Performance Computing Center
University of Stuttgart, HLRS University of Stuttgart
Stuttgart, Germany Stuttgart, Germany
Hiroaki Kobayashi Wolfgang Bez
Graduate School of Information Sciences NEC High Performance Computing
Tohoku University Europe GmbH
Aoba-ku, Japan Düsseldorf, Germany
ISBN 978-3-031-18045-3 ISBN 978-3-031-18046-0 (eBook)
https://doi.org/10.1007/978-3-031-18046-0
Mathematics Subject Classification (2020): 65-XX, 65Exx, 65Fxx, 65Kxx, 68-XX, 68Mxx, 68Uxx,
6 8Wxx, 70-XX, 70Fxx, 70Gxx, 76-XX, 76Fxx, 76Mxx, 92-XX, 92Cxx
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2023
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors
or the editors give a warranty, expressed or implied, with respect to the material contained herein or for
any errors or omissions that may have been made. The publisher remains neutral with regard to
jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
The Workshop on Sustained Simulation Performance was held online at HLRS in
March 2021 and in a hybrid mode at the Cyberscience Center, Tohoku University
in December 2021. The collaboration between the High-Performance Computing
Center Stuttgart, Tohoku University and NEC has been marked by the Covid pan-
demic,inwhichwedemonstratedourabilitytoadapttonewsituationsandcontinue
ourpartnership.Ultimately,wearehappytocontinuetherelationshipthatbeganin
2004withtheestablishmentofwhatwecalledthe‘TeraflopWorkshop’.Whilethe
homepage still remembers this name, the workshop evolved into the Workshop on
SustainedSimulationPerformancewithmorethan30eventsontwocontinents.
Perhaps we were able to adapt so quickly to the pandemic because the field of
high-performancecomputinghasalwaysevolvedrapidly.WhileHPCsystemswere
designed for many years as single processor vector machines, they now are large
clustersystemswithfastinterconnectsandrathertypicallywithacombinationofa
varietyofprocessorsandaccelerators–amongthemstillvectorprocessors.Climate
and weather simulation is one of the scientific fields that has a particularly high
demand for computing power, and research has shown that we want to use our
resources more sustainably. This is at odds with the ever larger systems with ever
higher energy consumption of modern HPC systems. At the same time, however,
there has been a tremendous increase in efficiency. The contributions of this book
andtheupcomingworkshopswillhelptocontinueandacceleratethedevelopment
offastandefficienthigh-performancecomputing.
We would like to thank all the contributors and organizers of this book and the
Sustained Simulation Performance Workshops. We especially thank Prof. Hiroaki
Kobayashiforhisclosecollaborationoverthepastyearsandlookforwardtointen-
sifyingourcooperationinthefuture.
Stuttgart,Germany MichaelM.Resch
December2021 JohannesGebert
v
Contents
Supercomputer for Quest to Unsolved Interdisciplinary Datascience
(SQUID)anditsFiveChallenges .................................. 1
SusumuDate,YoshiyukiKido,YukiKatsuura,YukiTeramaeandShinichiro
Kigoshi
Simulating Molecular Docking on the SX-Aurora TSUBASA Vector
Engine........................................................ 21
LeonardoSolis-Vasquez,ErichFochtandAndreasKoch
Simulation of Field-induced Chiral Phenomena in Inhomogeneous
Superconductivity .............................................. 37
HironoKaneyasu,KoukiOtsuka,SingoHaruna,ShinjiYoshidaandSusumu
Date
ExploitingHybridParallelismintheLBMImplementationMusubion
Hawk......................................................... 53
HaraldKlimach,KannanMasilamaniandSabineRoller
MPIContinuationsAndHowToInvokeThem ...................... 67
JosephSchuchartandGeorgeBosilca
XevolverforPerformanceTuningofCPrograms .................... 85
HiroyukiTakizawa,ShunpeiSugawara,YoichiShimomura,KeichiTakahashi
andRyusukeEgawa
ScalabilityEvaluationoftheCFDSolverCODAontheAMDNaples
Architecture................................................... 95
MichaelWagner
vii
Supercomputer for Quest to Unsolved
Interdisciplinary Datascience (SQUID) and its
Five Challenges
SusumuDate,YoshiyukiKido,YukiKatsuura,YukiTeramaeandShinichiro
Kigoshi
AbstractTheCybermediaCenteratOsakaUniversitystartedtheoperationofasu-
percomputingsystemnamedSupercomputerforQuesttoUnsolvedInterdisciplinary
Datascience(SQUID)inMay2021.SQUIDisahybridsupercomputingsystemcom-
posed of three kinds of heterogeneous compute nodes and delivers 16.591 PFlops
as the theoretical performance. This paper overviews the architecture and struc-
tureofSQUIDandthenexplainsthefivechallengeswhichwehavesetindesigning
SQUID:Tailor-madecomputing,HPCandHPDAintegration,Cloud-interlinkedand
-synergized,Securecomputingenvironment,andDataaggregationenvironment.Af-
ter that, the future issues to be tackled through the actual operation of SQUID are
described.
1 Introduction
Recently,theglobalizationofacademicresearchhasbeenaccelerating.Itrequiresthe
aggregationandintegrationofcomputing,dataandevenhumanresources.Accom-
paniedwiththeglobalizationofacademicresearch,itwouldbecomemorecommon
andgeneralthatresearchersandscientistswhoarewithdifferentorganizationswork
together as a team for solving a common scientific problem [4]. This trend is not
exceptionalinOsakaUniversitybutobservedworldwide.Forthehigherproductivity
inglobalizedacademicresearch,theinformationandcommunicationtechnologies
(ICT)wouldtakearoleofgreaterimportance.Forthereason,theCybermediaCenter
atOsakaUniversity(CMC)whichisasupercomputingcenterandinchargeofthe
administration and management of ICT infrastructures including supercomputing
SusumuDateandYoshiyukiKido,
CybermediaCenter,OsakaUniversity,Japan,e-mail:date@cmc.osaka-u.ac.jp
YukiKatsuura,YukiTeramaeandShinichiroKigoshi
DepartmentofInformationandCommunicationTechnologyServices,OsakaUniversity,Japan
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2023 1
M. M. Resch et al. (eds.), Sustained Simulation Performance 2021,
https://doi.org/10.1007/978-3-031-18046-0_1
2 S.Date,Y.Kido,Y.Katsuura,Y.TeramaeandS.Kigoshi
system for research and education [2], is expected to implement the supercomput-
ing systems well prepared for the rapid expansion and globalization of academic
researches.
Furthermore, high performance data analysis (HPDA) has been increasing its
importance.Today,manyresearchersandscientistsareenthusiasticaboutapplying
dataanalysistechniques,characterizedwithkeywordssuchasartificialintelligence
(AI), machine learning (ML) and deep learning (DL), to a large amount of data
settosolvetheirscientificproblems.Suchenthusiasm,expectationandconcernto
HPDAhavetriggeredtheutilizationofsupercomputingsystemsbyresearcherswho
have never used any supercomputing system so far. As a result, it is expected that
newlydevelopedsupercomputingsystemsshouldaccommodatethenewcomputing
needsandrequirementsderivedfromHPDAaswellastraditionalhighperformance
computing(HPC).
In the background above, the CMC has developed and installed a new super-
computing system named Supercomputer for Quest to Unsolved Interdisciplinary
Datascience(SQUID)[13]inMay2021,inahopethatthenewsupercomputingsys-
temfacilitatesresearchersandscientistswhoworkonresearchesfortheadvancement
ofacademiaandindustriestoexploreunsolveddatascientificproblems.Forrealiz-
ing SQUID, we have set the five challenges toward our envisaged next-generation
supercomputingsystems.Inthispaper,webrieflyintroduceSQUIDandthenexplain
thefivechallenges.
This paper is structured as follows. Section 2 briefly introduces the hardware
configuration of SQUID. In Section 3 the five challenges set in realizing SQUID
are explained. After that, Section 4 describes the issues to be tackled. Section 5
summarizesthispaper.
2 HardwareconfigurationofSQUID
Figure 1 shows the exterior view of SQUID installed at the CMC. This SQUID is
ahybridsupercomputingsystemcomposedofthreedifferentarchitectures;general-
purposeCPU,GPUandvectornodes.Allofprocessorsandacceleratorsdeployed
on the compute nodes of SQUID are cooled with DLC (direct liquid cooling) for
stableoperationandhighperformancedeliverypurpose.Fortheparallelfilesystem,
Lustre-basedDDNEXAScalerwasadoptedtoprovideuserswithasingleandfast
diskimageof20PBHDDand1.2PBNVMeSSD.MellanoxInfiniBandHDR(200
Gbps) was adopted to connect all of three types of compute nodes and the Lustre
parallelfilesystem(Fig.2).Asthetopology,thecombinationaluseoftheDragonfly+
andFat-treewasadopted.AstotheDragonfly+topologyforCPUnodes,1520CPU
nodesaredividedtothreegroups(513nodes,513nodesand494nodes)andagroup
is connected to each of other two groups with 95 IB HDR links (19 Tbps). The
CPUnodesineachgroupareconnectedastheFat-treetopologytotakeadvantage
offull-bisectionalbandwidth.Ontheotherhand,theGPUnodes,thevectornodes,
thefileserversforLustrefilesystem,andothermanagementserversforSQUIDare
SQUIDanditsFiveChallenges 3
Fig.1:ExteriorviewofSQUID.
Fig.2:OverviewoftheinterconnectonSQUID.
connected as the Fat-tree topology to utilize full-bisectional bandwidth. The spine
switches of the Fat-tree interconnect for the GPU node, the vector nodes, the file
serversandothermanagementserversareconnectedtoeachgroupoftheCPUnodes
with36IBHDRlinks(7.2Tbps).
Table1showsthesystemperformanceandconfigurationofSQUID.Thetheoret-
icalperformanceofSQUIDreaches16.591PFlops.ThemajorportionofSQUID,
asthetableindicates,istheclusterofgeneral-purposeCPUnodes.SQUIDhas1520
4 S.Date,Y.Kido,Y.Katsuura,Y.TeramaeandS.Kigoshi
Table1:SystemperformanceandconfigurationofSQUID.
computenode general-purposeCPUnodes CPU:IntelXeonPlatinum8368
(16.591PFlops) 1,520nodes(8.871PFlops) (IceLake/2.4GHz38C)x2
Memory:256GB
GPUnodes CPU:IntelXeonPlatinum8368
42nodes(6.797PFlops) (IceLake/2.4GHz38C)x2
Memory:512GB
GPU:NVIDIAHGXA1008-GPUboard(Delta)
vectornodes CPU:AMDEPYC7402P
36nodes(0.922PFlops) (ROME/2.8GHz24C)x1
Memory:128GB
vector:NECSX-AuroraTSUBASAType20Ax8
storage DDNEXAScaler(Lustre) HDD:20.0PB
NVMe1.2PB
interconnect MellanoxInfiniBandHDR(200Gbps)
front-endnode front-endnodeforHPC CPU:IntelXeonPlatinum8368
4nodes (IceLake/2.4GHz38C)x2
Memory:256GB
front-endnodeforHPDA CPU:IntelXeonPlatinum8368
4nodes (IceLake/2.4GHz38C)x2
Memory:512GB
securefront-endnode CPU:IntelXeonPlatinum8368
1node (IceLake/2.4GHz38C)x2
Memory:256GB
Fig.3:TheinternalarchitectureofaSQUIDCPUnode.
CPUnodesintotal.Figure3showstheblockdiagramoftheCPUnode.EachCPU
nodehas2IntelXeonPlatinum8368(IceLake/2.4GHz,38Core)processorsand
256GBmemorydeployed.Thetwoprocessorsareconnectedon3UPI(UltraPath
Interconnect)linksand8channelsof16GBDDR4-3200DIMMsareconnectedto