ebook img

PATCHWORK: A Multipatch Infrastructure for Multiphysics/Multiscale/Multiframe Fluid Simulations PDF

1 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview PATCHWORK: A Multipatch Infrastructure for Multiphysics/Multiscale/Multiframe Fluid Simulations

Draft version January 23, 2017 PreprinttypesetusingLATEXstyleAASTeX6v.1.0 PATCHWORK: A MULTIPATCH INFRASTRUCTURE FOR MULTIPHYSICS/MULTISCALE/MULTIFRAME FLUID SIMULATIONS Hotaka Shiokawa1,2, Roseanne M. Cheng2, Scott C. Noble3, Julian H. Krolik2 1Harvard-SmithsonianCenterforAstrophysics,60GardenStreet,Cambridge,MA02138,USA 2 DepartmentofPhysicsandAstronomy,JohnsHopkinsUniversity,Baltimore,MD21218,USA and 7 3DepartmentofPhysicsandEngineeringPhysics,UniversityofTulsa,Tulsa,OK74104,USA 1 0 2 n ABSTRACT a J We present a “multipatch” infrastructure for numerical simulation of fluid problems in which sub- 9 regions require different gridscales, different grid geometries, different physical equations, or different 1 reference frames. Its key element is a sophisticated client-router-server framework for efficiently link- ing processors supporting different regions (“patches”) that must exchange boundary data. This ] M infrastructure may be used with a wide variety of fluid dynamics codes; the only requirement is that their primary dependent variables be the same in all patches, e.g., fluid mass density, internal en- I . ergy density, and velocity. Its structure can accommodate either Newtonian or relativistic dynamics. h p The overhead imposed by this system is both problem- and computer cluster architecture-dependent. - Compared to a conventional simulation using the same number of cells and processors, the increase o r in runtime can be anywhere from negligible to a factor of a few; however, one of the infrastructure’s st advantages is that it can lead to a very large reduction in the total number of zone-updates. a Keywords: methods:numerical — hydrodynamics — MHD [ 1 v 1. INTRODUCTION different grid systems in different regions, perhaps con- 0 trasting in resolution, perhaps in symmetry, e.g., polar 1.1. Importance of multiphysics/multiscale/multiframe 1 vs. Cartesian. Those involving disparities in mecha- 6 capability nismsarecalled“multiphysicsproblems”. Intheseprob- 5 Many important physical processes involve heteroge- 0 lems, one must solve entirely different equations: those neous systems in which the nature of the matter in dif- . of magnetohydrodynamics (MHD) rather than those of 1 ferent regions exhibits strong contrasts. The material 0 hydrodynamics, or with or without transport processes may vary in its characteristic internal length or time 7 such as viscosity or diffusion. Problems with internal 1 scales, or in its local geometric symmetry. There may frame shifts we dub “multiframe problems”. For these, : even be contrasts in which the physical mechanisms of v itwouldbedesirabletotranslatetheequationsfromone Xi importance differ between regions: for example, chemi- frametoanotherindifferentportionsofthecalculation. cal reactions or self-gravity may be significant in some, Astrophysics is rich in problems to which at least one, r butnotalllocations. Theseregionsmayalsomovewith a and sometimes all, of these labels apply, and therefore respect to one another, perhaps changing shape as they at least one, and sometimes all, of the difficulties, both do. When the regions have relative motion, the fact technical and conceptual, that they pose. that physics is often most concisely described in a sys- To illustrate their significance, consider a few exam- tem’s mean rest-frame means that no single rest-frame ples. The topic that initially motivated our work is the isappropriatefortheentireproblem. Atthesametime, mechanics of accretion around a binary system. For us, interactions between these regions may nonetheless de- the partners in the binary are supermassive black holes mandsimulationmethodsallowingdatafromoneregion (see, e.g., Schnittman 2013), but much about this prob- to inform the behavior of another. lem changes little whether the binary comprises a pair Problemsexhibitingstrongcontrastsinlengthortime ofproto-stars(e..g,asimagedandanalyzedbyMayama scales are called “multiscale problems”. We will also et al. 2010) or a pair of black holes. In this situation, use this term to include contrasts in grid symmetry. In therearewidelydisparatescalesbecausethestructureof multiscaleproblems, numericalmethodsworkbestwith 2 thecircumbinarydiskvariesonthescaleaoftheorbital becomes inconsequential. And it is a multiframe prob- separation,whereasmostoftheaccretionpoweremerges lem because the mechanics of a nearly hydrostatic star at the inner edges of the disks orbiting around the indi- are definitely best viewed in the star’s frame where the vidual masses (often called “mini-disks”), which could fluid velocities are small, whereas the mechanics of an be a great deal smaller. In addition, throughout these accretionflowarefarmoreeasilyunderstoodintheblack disks,eventodefinethesaturationleveloftheMHDtur- holeframe. Itsmultiframenaturealsocreatesacontrast bulence supplying the accretion torques requires treat- in grid symmetry because coherent stars are most nat- ment of lengthscales small compared to the disk scale urally treated in a spherical coordinate system whose heights, which could be considerably smaller than the origin is the center of the star, whereas flow around a radial scale. There are also differences in the symmetry black hole is best described in a cylindrical or spheri- of well-designed local grids. Because angular momen- cal coordinate system whose origin is the center of the tum transport in accretion disks is slow compared to black hole. Thus, this problem, too, involves all these the orbital time, it is very important that there be lit- categories of complication. tle numerical momentum diffusion; this fact demands a grid mimicking the symmetry of their nearly-circular 1.2. State of the art and its limitations flow. However, such a grid would be polar and centered The desirability of overcoming these challenges has on the binary center-of-mass for the circumbinary disk, not gone entirely unnoticed by the computational com- whereas for each mini-disk, it would be polar and cen- munity, and a number of partial solutions have been tered on the object whose gravity is most important for developed. Adaptive Mesh Refinement (AMR) meth- that disk. A single Cartesian grid for the entire sys- ods can dynamically adjust spatial resolution to fol- tem would likely produce an intolerable level of numer- low local lengthscales (Berger & Oliger 1984; Berger ical diffusion. This binary accretion problem is also one & Colella 1989); a closely-related scheme, overlapping that demands multiple reference frames for much the moving grids, can be used to follow a coherent region same reason it requires multiple sub-grids with differ- with a distinct spatial scale or symmetry. Multipro- ent symmetries. The physics of the circumbinary flow gram/multiData(MPMD)methods(Barney2017)offer is easiest to grasp in the center-of-mass frame; that of aconvenientwaytoevolvedifferentregionsaccordingto the individual mini-disks in the frame of each member the different mechanisms acting in them1. We will call of the binary. Thus, one would like to be able to divide these separate regions “patches”, hence the name “mul- this calculation into at least three different zones, each tipatch” for our general approach. Because each patch with its own grid and reference frame. is run by an independent program under MPMD, their Another example may be found in tidal disruption of only interaction is through the exchange of boundary stars by supermassive black holes, which has become conditions. a subject of great interest in recent years as numer- For some of these methods, professionally-supported ousexampleshavebeenfound(inbothoptical/UV,e.g. implementations are available. Chombo, for example, Gezari et al. (2009) and Arcavi et al. (2014), and in is a particularly well-developed AMR package (Adams X-rays: Auchettl et al. (2016)). This is a multiscale et al. 2014). It permits the use of two different meth- problem because it is necessary both to resolve dynam- ods to divide up regions into separate grids, embedded ics within the star as it is broken apart and to follow boundaries and mapped multiblocks. There are also the fluid dynamics of the debris as it gradually accretes relativistic versions of these fixed multiblock methods onto the black hole. Measured in terms of gravitational (Clough et al. 2015; Schnetter et al. 2014). General rel- radii r defined relative to the mass M of the black g hole (r GM/c2), main sequence stars are 1M−1r ativistichydrodynamicshasbeentreated(Blakelyetal. g ≡ ∼ 6 g 2015) using the techniques of Overture (Brown et al. in diameter, where M is the black hole in units of 6 1997), which offers users the option of moving overlap- 106M . Thus, to follow their break-up requires cells (cid:12) 0.1M−1r in size. On the other hand, the debris ping grids. Numerical relativity calculations can use (cid:28) 6 g multiblock infrastructure with AMR, but all dependent orbits have semi-major axes 103M−1/3r , so that the ∼ 6 g variables must be defined to a global Cartesian tensor fluidmotionafterstellarbreak-uptakesplaceonamuch basis (Pollney et al. 2017). Similarly, there are numer- larger scale. Nonetheless, despite this dramatic scale ous MPMD systems permitting computation of differ- contrast, the break-up of the star is inextricably tied to ent physics in different parts of a global system. These the much larger-scale debris motion. It is also a multi- physicsproblembecausestellarself-gravity,notsurpris- ingly, is of the essence so long as the star stays in one 1 In practical terms, users run multiple executables piece,butafteritsmatterisspreadsufficientlywidely,it all sharing the same Message Passing Interface (MPI) “MPI COMM WORLD”communicator 3 range in their applications from linking multiscale fluid true diffusion problems cannot be formulated covari- simulation to molecular dynamics (Nie et al. 2006) to antly; transformation to a frame in which local motions modeling bloodflow through the brain (Grinberg et al. areslowpermitsacleanuseofthelocalNewtonianlimit 2013). in which diffusive transport is mathematically consis- Anotherapproach tosolvingthe problemsofmultiple tent. scales,butnotmultiplephysics,istheuseofmovingun- 1.3. Our innovation structured grids (e.g., the codes AREPO: Springel 2010 and TESS: Duffell & MacFadyen 2011). Schemes like The system we present here, which we call PATCH- these very flexibly place resolution where it is required WORK, is designed to eliminate the limitations due to forthehydrodynamics. Ithasalsorecentlybecomepos- useofasinglereferenceframeforallpatches, whilealso sibletoextendthemfromhydrodynamicstomagnetohy- maximizing its ability for simultaneously dealing with drodynamics (Duffell 2016; Mocz et al. 2016). They do issues of multiple scales, multiple grid symmetries, and not, however, naturally retain the virtues of conforming multiple varieties of local physics. Exploiting the flexi- to natural symmetries of the problem (e.g., suppressing bilityoftheMPMDapproach,itutilizeswell-definedco- numerical diffusion by aligning cell axes with the fluid ordinate transformations informed by relativistic meth- velocity),nordotheyreadilypermittheuseofcontrast- ods (but not restricted to relativistic problems) to sim- ing physics in different regions. With significant effort, ulate heterogeneous systems in which regions requiring it is possible to avoid the first drawback (Duffell 2016), independenttreatmentareregardedasindependentpro- but a new solution must be created for each new prob- cesses operating in independent reference frames. Each lem. patchhasitsowngrid,withitsownresolutionscaleand Despite the real successes of all these different symmetry; among the many benefits offered by inde- schemes, there remain significant barriers to their em- pendent local coordinate systems, patches give an easy ploymentonmanykindsofproblems. Multiblocksmust solution to the problems raised by coordinate singular- fit smoothly against one another in a fixed configu- ities (e.g., at the origin of a polar system). Each patch ration, while embedded boundaries require Cartesian also solves its own equations, whatever is necessary to grids. Neither of these allows relative motion of the do the job in that region. cell blocks. Most importantly, none of the methods in- Therelationshipsbetweentheseframesaredefinedby troducedsofarachievesthesimplificationandefficiency coordinatetransformationswiththeabilitytoeliminate gains that arise from following moving regions’ physics local mean velocities, so as to reap the benefits just de- in their own reference frames. scribed, while retaining a common overall time so that The advantages of working in the most suitable refer- the entire simulation can advance together. This ap- ence frame can be substantial. Consider, for example, a proach creates an important simplification—the same hydrodynamics problem in which structure A, contain- coordinatetransformationthatrelatesarrayindex-space ing only subsonic motions relative to its own center-of- tospatialcoordinatescanalsobeusedtoeliminatebulk mass that vary on short lengthscales, moves superson- velocities. This single coordinate transformation also ically within a larger background fluid B with longer applies to the metric with respect to the coordinates. gradient scales. If such a problem were treated with In addition, provided that the physical quantities en- a moving grid scheme, the time-step within region A tering into the boundary conditions are scalars, vectors would be severely limited by its supersonic velocity and or rank-2 tensors, their transformation from the coor- its small cell sizes; transformation to the moving frame dinate system of one region to that of another is well- could reduce the number of time-steps required by a defined and straightforward. In this way, the free use of large factor. Numerical accuracy would also be sub- arbitrarilycurvilinearandarbitrarilydiscretizedcoordi- stantially improved as there would be no need to per- natescanbecombinedwiththevirtuesoftreatinglocal form numerous close subtractions of velocities in order physics in its most natural reference frame. to find the relative velocities between cells. However, in order for the patches to exchange bound- Analogous advantages can stem from equation sim- ary conditions at simultaneous times, the time coordi- plification. Suppose, for example, that in the moving nate in all the patches must be the same. In relativistic regionthereisadiffusivetransportprocessthatisunim- terms, this means that the coordinate transformations portantinthebackground. Treatingthistransportpro- relating patches with relative motion are not Lorentz cess in the moving frame eliminates what would oth- transformations;asaresult,thereferenceframesofmov- erwise be a large, unnecessary, advective flux. If the ing patches are in general non-inertial. This policy may velocity contrast between the regions approaches the be somewhat unfamiliar, but because the equations of relativistic level, treating everything in the background physics can all be written in completely covariant fash- frame introduces serious conceptual problems because ion, their form under this sort of transformation is well- 4 defined. tion networks, a Poisson solver for self-gravity, or other Our version of the multipatch system also offers sev- sorts of equations. Evolving an individual patch is the eral additional features. Because the separate patches responsibilityofanindividualprocesswithintheMPMD are simulated using independent programs, they can environment. Although it is possible to run the simu- have independent time-steps; because long time-step lation as a single program, using one program for each patches need many fewer updates to traverse the same patch keeps the code simple and conceptually clear. As physical time, when parallelized those patches can be a result, the method is intrinsically parallelized: there assigned many more cells per processor to achieve bet- must be at least one processor for each patch. ter load-balancing. In addition, patches can be added PATCHWORK coordinates a number of different in- or removed from time to time as conditions change and dividual patch processes through the incorporation of different demands arise. several specific routines into the fluid simulation code To mitigate the complexity and overhead created by chosen by the user. One of these routines, the one con- inter-patch communications when the computation is trolling inter-patch communications, is independent of parallelized, we have created a client-router-server sys- theindividualproblem, andisthereforenotintendedto tem that efficiently links the correct processors in each bealteredbyusers. Othersmustconformtoastandard patch to their boundary condition partners in other form,butmustbesuppliedbytheuserbecausetheyare patches. specifictotheproblem. Notallofthesearerequiredfor Lastly, our package is structured as a “wrapper” to all problems, but three are always necessary. One de- fit around a user-supplied hydrodynamic or magneto- finestheinitialconditions,butdiffersfromconventional hydrodynamic simulation code. The only requirement initial condition specification only in that the user may placed upon these codes is that their primary depen- needtochoosetheminafashioninformedbyknowledge dent variables (the ones whose boundary conditions are ofhowtheotherpatcheswillbeinitialized. Asecondim- exchanged between patches) are all the same. In many plements the boundary conditions on the outer physical simulations, they may be the fluid primitive variables, boundaryoftheproblemvolume,againanoperationoc- properrest-massdensity,properinternalenergydensity, curringinallfluidsimulations. Thethirdcontainsinfor- and4-velocity,butthatspecificlistisnotarequirement. mation that is specific to multipatch operation: the co- Thus, different codes can be used in different patches ordinatetransformationJacobianbetweenanindividual without harm to the system, provided only that they patch’s coordinates and the “background coordinates” share a common core set of dependent variables. (see below for definition); the trajectory followed by the patch relative to the background coordinates; and 2. METHOD the location of the problem volume boundaries in back- The multipatch method is applicable to any numeri- ground coordinates. Individual patches need to know cal simulation code in which the computational domain where the overall problem boundaries are in case they is discretized into a fixed set of discrete grid-cells, and encounter these boundaries as they move. Several other the primary dependent variables, the ones exchanged multipatch-specificroutinesareoptionalandwillbede- between patches as boundary conditions, are the same scribed later. in every patch. For development and testing purposes, A system of “background coordinates” underlies the we have used it with the finite volume general relativis- entireregionbeingsimulated. Thelocationsandbound- tic hydrodynamics code HARM3D (Noble et al. 2009) aries of all the patches are defined in terms of this sys- runningineverypatch, butitshouldbeconsistentwith tem. It is always Cartesian, and its time coordinate is a great many simulation codes. Subject to the stipula- the universal time for all the patches’ coordinate sys- tion about variable consistency, different codes can run tems. Its purpose is both to serve as a reference for in different patches. positions and to serve as a “common language” for all patches to describe the locations of exchanged data. 2.1. Overview Individual patches can have any shape or size. They PATCHWORK’s structure is based on the concept of can be stationary relative to the background coordi- “patches”. A patch is a region of space defined by the nates or move. Their internal coordinate systems and user. Locations within it are described by its particular grids are entirely independent of all other patches’ spa- coordinate system and discretized according to its own tial coordinate systems and grids. It is convenient in particular grid. Its physical evolution in time is gov- manyproblemstodividethepatchesintotwocategories, erned by a particular set of equations, always including “global”and“local”. Frequently,onepatchprovidesthe the Euler fluid equations, but potentially extensible to greatmajorityofboundaryconditiondatafortheother theNavier-StokesequationsortheMHDequations,and patches and occupies all or a large part of the problem potentially supplementable by chemical or nuclear reac- volume. When that is the case, that patch is deemed 5 “global”, and its reference frame is tied to the frame of whose radial grid is logarithmically-spaced. In the up- the background coordinates. Its internal spatial coordi- per panel of the figure, the physical boundary of the natesystem,however,canstillbedefinedindependently Cartesian patch is shown by the inner white box; the ofthebackgroundcoordinates. Alltheotherpatchesare areacoveredbyits“ghostzones”,thecellsneededtoes- then considered to be “local”. Physics in whatever part tablish boundary conditions for the physical region, lies oftheproblemvolumefallingunderalocalpatchiscon- betweenthetwowhiteboxes. Thelowerpanelshowsthe trolled by that patch; when there is a global patch, its converse situation: the jagged white contour shows the processisrelievedofallresponsibilityforcomputingthe boundaryofterritoryintheglobalpatchnotcoveredby time-dependenceofthematterwithinsuchacoveredre- physical cells of the local patch; the cells between that gion. jagged contour and the white cells are where the global Tomakephysicalsense,thereisonesmoothly-varying patch needs boundary data. spacetime for the entire problem volume. Its metric, The first step is to discover whichprocessors in which however, can be expressed in terms of any of the spe- patches have the information. To minimize inter-patch cific coordinate systems. Transformations from one co- communication time, we organize this process to avoid ordinate system to another adjust the metric elements exchanging unused data. Because this procedure is al- accordingly. most independent of whether the patch needing bound- Physicalconsistencylikewisedemandsthatallpatches ary data is a global or a local patch, for this part of are updated according to the same time coordinate and the discussion we call them “patch A” and “patch B”. must reach a given value of this time coordinate to- We begin by labeling all the zones in patch A (here this gether. Such synchronization is achieved automatically happens to be the global patch) with an integer array, if all advance with the same time-step. However, as illustrated in Figure 2. The values in this “flag array” we discuss below (Sec. 2.6), this is not necessary. If denote whether a zone is a ghost zone, and if so, what somepatchescanbeevolvedstablyandaccuratelywith type ofghost zone. This arraymustbe updated at each a longer time-step than others, it is necessary only for time-step if any of the relevant patches move (at each the patches all to be synchronized after one time-step synchronization time-step in the case of heterogenous of the patch with the longest step. Note that because time-steps: Sec. 2.6). In the figure, the white zones there is a single time coordinate for all patches, the co- are in the interior of patch A, and have nothing to do ordinate transformations between them are not Lorentz with boundary conditions. Gray zones are the zones in transformations unless the relative velocity between the patch A completely covered by patch B; they, too, are two patches being linked is zero and both patches are irrelevanttoboundaryconditions2. Azoneintheglobal inertial. patch is considered to be covered by the local patch if In the course of each update, patches bordering on its center falls within the local patch’s physical region. one another must exchange boundary condition data. Theredzonesareghost-cellsinthepatchAgriddirectly Accomplishing this step is the core of our system. touching physical cells in patch A. Fluxes across their inner (in a topological sense) faces are used in updates 2.2. Boundary condition exchange between patches of physical patch A cells. Blue zones are the outer layer Within any particular patch, we distinguish three of ghost-cells; they are used in the internal reconstruc- typesofboundaryzonesforindividualprocessors. Inthe tionbywhichcell-centervaluesofphysicalquantitiesare first category, there are boundaries whose ghost zones extrapolatedtothefacetouchingthephysicalboundary. arecoveredbyotherprocessorsinthepatch. Thesecond It is important to note that this system is thoroughly categoryisboundarieslyingonthephysicalboundaryof agnostic about many of the possible choices made in the problem. The third category is of greatest interest different codes. Because the coordinates at which the to the multipatch scheme, those whose ghost zones are boundary data are needed are determined by the fluid covered by processors assigned to other patches. codeoperatingintherequestingpatch,itdoesn’tmatter The first two can be handled by conventional means. whetherthatcodedefinesthevariablesatcell-centersor Here we describe how the boundary information is ob- the centers of cell-faces or anywhere else; it knows the tainedforghostzonesinthethirdcategory. Webeginby locations at which it needs the information, and it is displaying an example so that readers can easily visual- the job of the responding code (which may be an en- izetheissuesinvolved(Fig.1). Inthisfigure, thefluid’s internal energy density is represented by color contours and grid cells are delineated by black lines. We have 2IfpatchAwerealocalpatch,itwouldhavegrayzonesonlyif patchBwereanotherlocalpatch,andpatchBtookpriorityover chosen to follow the fluid mechanics in this example by patchA;wehavenotyetimplemented“local-local”boundarydata means of two patches, a finely-resolved Cartesian local exchange,butplantodososoon. patch and a more coarsely-resolved polar global patch 6 tirely different one) simply to interpolate its data, no matter how defined in terms of location within cells, to the proper point. The system is even capable of accom- modating codes with different numbers of ghost-cells. HARM3D, for example, requires three layers of ghost- cells,butPATCHWORKcontainsaparameterthatcan be set to whatever number of layers the user’s code needs. Figure 2. The “flags” assigned to the global grid-cells for thesamesnapshotshowninFigure1. Redandblueindicate inner and outer ghost-zones (for the global patch), respec- tively. White cells are ordinary cells in the global patch interior; gray labels global cells covered by the local patch andignored. Thethingraysquaresshowthephysical(inner) and numerical (outer) boundaries of the local patch. these locations is sent to all other potential patch B’s along with a request for interpolation values at the co- ordinate locations. Patch B interpolates within its grid in order to find the values at the locations desired by patch A. It then transforms the data from its coordi- natesystemtothebackgroundcoordinatesystem,using the coordinate transformation Jacobian linking patch B to the background system. Only then are the bound- arydatatransmittedbacktopatchA,whichtransforms it from the background system to its own coordinates. This procedure enables every patch to deal with the in- coming coordinate list independently, without knowing anything about other patches. Doing things this way is especially important when patch B moves. NotethatifpatchAistheglobalpatch,thefirststepis done differently. At initialization, the local patches are Figure 1. Asnapshotofinternalenergydensity(colorcon- informed of the cell locations at which the global patch tours) and grid-cells in a 3D blast wave simulation. (Upper panel): White squares show the physical boundary (inner) dependent variable data are defined. Because the local and numerical boundary (outer) of the local patch. Where patchesknowtheirownpositionsintermsofbackground the local and global patches overlap, only the local grid is coordinates,theycandetermineontheirownwhatdata shown. (Lower panel): Like the upper panel, but where the patches overlap, only the global grid is shown. The colored the global patch needs. This alternate procedure has cells inside the jagged loop are filled with data interpolated the virtue of diminishing inter-patch data transmission. fromthelocalpatchtotheglobalpatch. Thewhitecellsin- sidethejaggedloopareunusedwhenthepatchesareinthis 2.3. Interpolation configurationbecausethelocalpatchupdatesthephysicsin their volume. The physical (inner) and numerical (outer) Although remarked on only briefly in the preceding boundaries of the local patch are shown as thin white lines sub-section, there are a number of subtleties to data for reference. interpolation, and multiple mechanisms may be used. In the current version of our system, we use a compara- Once patch A determines the locations of its ghost tivelysimplemethod,butthiscouldreadilybeupgraded zones’ centers, a list of the background coordinates for to something more sophisticated for problems requiring 7 it. Stationary or moving patches can be added or re- In principle, an arbitrary number of zones could be moved throughout the simulation anywhere within the usedtosupportinterpolationtoasinglepoint. However, physical problem volume. This is done using the flag itisgenerallybestfortheinterpolationstenciltoextend array for the ghost-zones discussed in Section 2.2. Al- away from the point by a number of zones that is no thoughtheseflagsaremostoftenusedtosignaltheneed morethanthenumberofghost-zonelayers(usually2to for data interpolation from overlapping patches, they 3), in order to avoid complications with parallelization can also be used to signal the need to interpolate data of the code. forotherreasonsaswell—suchasremovingorintroduc- For our current method, we employ tri-linear interpo- inganewpatch. Toremovealocalpatch,onetemporar- lation. We locate the grid corner closest to the interpo- ily changes the flags on all the zones in the global patch lation point and define the stencil in 3D to be the cen- coveredbythelocalpatchto“ghostzones”sothatallof ters of all eight cells touching that corner. This method them are filled with interpolated data provided by the works quite well when the dimensions of the cells in local patch. Once these zones are filled with data, one patch A and patch B are comparable (see Sec. 3.3), but changes the flags back to their normal state. To add can lead to errors when they are not. In some sense, a local patch, one creates a new patch process, and in this is unsurprising: if there is structure on the finest its initial condition sets all its cell flags to indicate they scalesupportedbyoneofthepatches,itcannotbewell- areghostzones. Justasinthepatchremovaloperation, represented by a much coarser grid in the other. How- these cells are then filled with the data they need, and ever,thetroublecanalsomoveintheoppositedirection the flags can be reset to normal as soon as that is done. because the eight cells in the finer grid nearest the in- However, the simulation must be stopped immediately terpolation point may together cover only a small part after a patch removal or immediately prior to a patch of the volume of the ghost-zone in question if its grid addition because either one demands a new domain de- is much coarser. Sometimes errors of this latter vari- composition for parallel processor assignment. etycanbesubstantiallyreducedbyreplacingthevalues 2.5. Parallelization and inter-patch communication in the inner layer of ghost-cells with a wider average over nearby cells. Such an operation effectively magni- One of the most difficult tasks in developing a multi- fiesthevolumeofthefiner-scalegridcontributingtothe patch code is its parallelization. It requires a sophisti- coarser-grid ghost-cells. cated infrastructure combining two levels of data com- Without special methods, interpolation does not nec- munication. In one, boundary data exchange within essarily conserve quantities. To achieve strict mass (or a patch, a single executable exchanges information be- momentumorenergy)conservationinourdatainterpo- tween its multiple processors exactly in the way made lation would require identifying all cells that fall within familiar by ordinary parallelized methods. In the other the ghost-cell and summing their contributions. If the level, boundary data exchange between patches, it is ghost-cell boundaries cut obliquely (or even worse, in a necessary to enable effective data communication when curve) across some of the interpolation cells, one would the pairing of processors with overlapping boundaries needtoadjusttheirvolumesaccordingly. Althoughthis evolves dynamically, and two independent executables, is possible if both global and local patches are in Carte- both running within the MPMD environment, must be sian coordinates, it becomes a non-trivial mathematical coordinated. problemonceanyofthepatchesareincurvilinearcoor- Todescribehowweachievethis,wefirstdefineanota- dinates. tion. WelabeleachCPUinasimulationbyCi ,wherei j When interpolation that is more nearly conservative is a patch-ID and j denotes local CPU rank within that is desirable, yet the cell geometry makes rigorous con- patch. We set the patch-ID of the global patch to i=0 servationarduous,anotheravenuemaybetaken. Inthis and that of local patches to i=1,2,...,N 1, where N − method, one constructs the conserved quantity density isthetotalnumberofpatchesinthesimulation, includ- for a coarse ghost-cell by averaging the conserved quan- ing the global patch. The index j runs from 0 to n 1, i − tity densities for all those cells in the finer-scale patch where n is the number of CPUs used for patch i. i whose centers are located inside that ghost-cell. This Consider a CPU at the edge of a patch, designated method is not perfect because not all the averaged cells Ci . This CPU possesses boundary zones that need 0 contribute the same volume to the ghost-cell, but when to be filled with interpolated values. It needs to know the cell-size contrast is large, there are many small cells which CPUs Ck in other patches are lying under these j entirely contained with the large ghost-cell and only a zones (there could be multiple values of j satisfying few at its edges, so the error should be small. this criterion, and sometimes multiple values of k). It must also contact them to request interpolation values. 2.4. Adding and removing patches The partner CPUs Ck , on the other hand, need to j 8 know in advance that other CPUs may be contacting n . p2 them. Because these relationships constantly change if the patches move relative to one another, this in- formation must somehow be updated dynamically, even though the patches may have differing time-steps. To solve this problem, we construct a client-router- serversystem,settingupinter-patchcommunicationre- lationships that can persist throughout the simulation. In its simplest form, one CPU in each patch is cho- sen to serve as the router, its liaison with all the other patches. Then, when client CPU Ci needs informa- j tionfrombeyondtheboundaryofpatchi,ittransforms the coordinates of the cell-centers in question to back- ground coordinates and broadcasts that list to proces- sors Ck , where the r processor in patch k is the des- rk k ignated router for that patch. The router processor in the kth patch then transforms the list from background coordinatestopatchkcoordinates. Ifallthecell-centers on the list lie outside patch k, the router replies accord- ingly. On the other hand, if some of them are inside Figure 3. Schematic view of client-router-server relations for the multiple-router scheme. White, blue, and green patch k, the router processor determines which of the patches represent the global patch (patch 0), local patch 1, other processors working on patch k have responsibility andlocalpatch2,respectively. Exampleclients,routers,and for those cells and distributes the request to those pro- serversaremarkedwithC’s,R’s,andS’s,respectively. Data requestsareshownbyredarrows,datareturnedbybluear- cessors. These processors, the servers, interpolate their rows. The local patches reside inside the global patch, but data to the correct positions, transform the results to theyareplacedoutsidetheglobalpatchandenlargedforvi- background coordinates, and return the results to the sualizationoftheinformationexchangesystem. Thesquares router. Finally, the router transmits the information in the patches represent CPU domains, not grid-cells. In a possible instance of data exchange, a client CPU in a local back to the client, CPU Cij. patch, C1 , (upper left in patch 1) sends its list of ghost 0 This communication scheme is conceptually simple zones to its designated router in the global patch, C0 , (up- 0 and easy to code. However, if only a single CPU is per left in patch 0). C0 then communicates with appropri- 0 given router duties for an entire patch, the communica- ate server CPUs on its patch to collect the requested data andreturnsthedatatoitsclient. Simultaneously,CPUC1 tionloadissharedveryunevenlyandthegreatmajority 3 also requests data from the global patch, working with its ofprocessorssitidlewhilewaitingfortherouterstofin- globalpatchrouterC0 ,which,inturncollectstheinforma- 3 ish their work. To divide the workload more evenly, we tionfromtherelevantserverCPUsandtransmitsitbackto regardall processorsaspotentialroutersfortheirpatch theclient. Evenwhilethesetwopatch1CPUscommunicate with their partners in the global patch, it is possible for a and redefine the client-router relationship uniquely for CPU in the global patch, for example C0 (lower right in 15 each individual CPU (see Fig. 3). These relationships patch 0), to be a client, requesting data from other patches are defined at the beginning of the simulation and re- such as patch 2; in this case, the router is C26. main unchanged unless patches are added or removed. For example, one may decide that C1 always contacts 0 C0 for any information regarding patch 0, C1 always 2.6. Heterogeneous time-steps 0 1 contacts C0 , and so on. The function of the router is One of the common problems in simulating multi- 1 unchanged; it still determines which, if any, of the pro- scale systems with grid-based hydrodynamics codes is cessorsonitspatchholdstheinformationrequestedand that the time-step of the entire computational domain acts as the go-between connecting clients and servers. is limited to a small value by a few regions with small Althoughthevaryingnumbersofprocessorsperpatch grid-cells and high characteristic fluid velocity. As a re- makeanexactlyevendivisionoflaborimpossible,asim- sult,theremainderofthesimulation,wheretheintrinsic ple assignment scheme can spread it in a reasonably timescales can be much larger, is required to integrate even-handed manner. If Cp1 requires information re- with unnecessarily short time-steps, leading to a large i garding patch p , it contacts computational cost. However, the multipatch method, 2 in which different regions are updated by independent router-p (Cp1 )=Cp2 . 2 i i modnp2 processes,allowseachpatchtohavedifferenttime-steps Note that CPUs on patch p could have 0 or multiple while nonetheless evolving the system in a fashion syn- 2 clients on patch p , depending on their index, n , and chronized across all patches. We call this mode of oper- 1 p1 9 ation one of “heterogeneous time-steps”, in contrast to each shell would always be [(N )Ω(r )/2π]−1, φ,k min,k (cid:39) the simpler “homogeneous time-step” case in which all where N is the number of azimuthal cells in patch k, φ,k patches are forced to have the same time-step. Ω(r) is the orbital frequency as a function of radius, Heterogeneous time-steps can be managed with great and r is the smallest radius in patch k. In such min,k flexibility. Theonlyrestrictionplacedonthetime-steps a case, load-balancing could be achieved fairly reliably in different patches is that the update times should all and would need no adjustment during the simulation. be synchronized at intervals equal to the longest of the More often, however, the Q may vary as functions of l time-steps, ∆T max (∆t ), where, as before, k is an time. When this condition obtains, because the MPMD k k ≡ index labeling the different patches. To optimize com- environment does not permit dynamic domain decom- putational resource use, the user adjusts the number of position, perfect load-balancing will nearly always be CPUsassignedtoeachpatchsothatthewall-clocktime an unreachable goal. Nonetheless, as we show in Sec. 4, to advance by a time ∆T is approximately the same for even approximate load-balancing by combining appro- all patches. priately chosen numbers of processors per cell in each In practice, the coordination works as follows. At the patch with heterogeneous time-steps can lead to signifi- nthsynchronizationtimetsync, thedifferentpatchesex- cant gains in computational efficiency. These gains can n change boundary condition data. They then also ex- be sustained even if the ratios Q change significantly l change information about their time-steps so they can throughthesimulationiftheuserperiodicallystopsthe determine which is the longest and in which patch it is simulationandrestartswithanadjustedchoiceinnum- found (call that patch K). If all the other patches re- bers of processors per patch. ceive their boundary data either from patch K or the 3. PHYSICS TESTS problem boundary, the next synchronization time is set to be tsync = tsync+∆T. If ∆T is a factor Q (> 1 by In this section, we showcase the performance quality n+1 n l definition) larger than the time-step ∆t in some other of the multipatch method. First we demonstrate that l patch l, patch l performs [Q ] updates, where [x] is it accurately reproduces the analytic solutions to two l ∼ the greatest integer x, while patch K works to ad- classic hydrodynamic simulation test-cases even when vance to tsync. Whe≤n patch l reaches a time t(cid:48) such critical features of these solutions pass through patch n+1 that tsync t(cid:48) <∆t , ∆t is reset to tsync t(cid:48) to achieve boundaries and the grid symmetries and resolutions of synchrno+n1iz−ation. Paltchels that have anr+ri1v−ed at tsync be- the patches differ sharply. We then explore how well n+1 fore the rest of the patches wait until all have reached non-conservative mass density interpolation maintains it. When that has been achieved, the cycle is repeated. mass conservation and identify the conditions in which Because, bydefinition, conditionsinpatchK changeby it does not. atmostamodestamountoveratime∆T,itisunneces- 3.1. Sod shock tube saryfortheotherpatchestoreceiveboundarycondition information from it during their individual time-steps For this test, we created a square planar 2D problem within the interval ∆T. However, when there are more volume in which, following the Sod prescription (Sod than two patches, it is possible that some patches may 1978), the fluid is initially at rest everywhere, but there require boundary data from other patches whose time- is a sharp pressure and density discontinuity at a spe- stepsareshorterthan∆T. Whenthatoccurs, thatpair cific value of x within the volume. There is no initial must exchange boundary data at times determined by variation as a function of y. Within this volume, we the longer of their two time-steps. Note that processors placedasquarelocalpatchandgaveitaconstantveloc- withinthesamepatchtradeboundarydataintheusual itysothatitmovesdiagonallyinthexy-direction. Both way at each of that patch’s internal time-steps. patchesaredescribedinCartesiancoordinatesandhave If all the patches are solving the same equations, op- uniformspatialgridsinMinkowskispace. Thelocalgrid timalloadbalancingcanbeachievedwhenpatchK can was oriented parallel to the global patch grid. be identified with reasonable reliability in advance, and Because HARM3D is framed in terms of relativistic the ratios Q can similarly be estimated. If those crite- dynamics, it is convenient to choose c as the unit of l ria are met, all that is necessary is to assign CPUs in speed. Givenarbitrarycode-unitsoflength(cid:96) andmass 0 patch K a number of cells N Q N for each of the densityρ ,theunitoftimeis(cid:96) andtheunitofpressure K l l 0 0 (cid:39) other patches labeled by l. is ρ c2. To ensure Newtonian flow (Hawley et al. 1984), 0 In some situations, the Q might be essentially fixed it suffices to make p/ρ 1 when measured in code- l (cid:28) throughout the simulation. For example, this would be units. We also chose an adiabatic index γ =1.4. the case in a simulation of gas dynamics in an isotropic Our problem volume was 40 code-units on a side, and gravitational potential in which the patches are nested its8002 cellshaddimensions0.05 0.05;thelocalpatch × spherical shells. In such a situation, the time-step for had a side-length of 8 and was cut into 3202 cells of 10 dimension0.025 0.025. Fortheinitialstate,thegaswas until E/R3 is small enough to be comparable with the × s divided at x=0 into left (L) and right (R) states, with ambientpressure. Hereξisadimensionlessnumber 1. ∼ density and pressure ρ =1.0 105, p =1.0 and ρ = To simulate this, we follow Fryxell et al. (2000) and L L R × 1.25 104, p =0.1. The sound speed was therefore divide the initial state into two regions. As in the Sod R × (cid:39) 3–4 10−3 on both sides, clearly sub-relativistic. Zero- shocktubeproblem,wechoosetheunitofvelocitytobe × gradientboundaryconditionswereusedfortheproblem c, but use arbitrary code-units for length and mass. In exterior. terms of these units, region 1 is a small sphere of radius The origin of the local patch was placed initially at δr =25. Its initial pressure p =(γ 1)E/(4πδr3)=1 1 − (0,10) in global patch Cartesian coordinates, so that it for adiabatic index γ (again = 1.4), while its density straddled the discontinuity. Its velocity was V(cid:126) = 1.0 ρ =10−3. Region 2 is everything outside r =δr. Here 1 × 10−3(xˆ+yˆ), so that it traveled both subsonically and the initial pressure p = 10−10 and initial density ρ = 2 2 ratherslowerthantheshockfront. Asaresult,although ρ . The dimensionless coefficient of eqn. 1 is a function 1 theshockinitiallyformsinsidethelocalpatch,theshock of γ; for γ =1.4, it is 1.175 (Ostriker & McKee 1988). eventually passes out of the local patch. The contact For the monopatch, the computational domain is a discontinuity stays within the local patch for a longer cube of dimension 2500 having N = 4003 equal- mono time, but it, too, eventually leaves it. volume cubical zones with side-length ∆ = 6.25. For In Figure 4, we compare 1D cuts through the data themultipatchsimulation,theglobalpatchisasphereof at three different times with exact analytic solutions radius1000describedinsphericalcoordinates, butwith (Laney 1998). The left-hand column shows t = 500, two kinds of cut-out: a sphere of radius 290 surround- when the shock is completely contained in the local ing the origin and a bi-cone of half opening-angle π/10 patch; the middle column displays t = 900, the time at surrounding the polar axis. Its angular grid is uniform which the shock emerges from the local patch; the right with120cellsinpolarangleθand320cellsinazimuthal column illustrates t = 2500, when the contact disconti- angleφ,butitsradialgridhas80logarithmically-spaced nuity is about to leave the local patch. As this figure cells. The local patch is a cube of side-length 600 cen- shows, our numerical results in both the global and the tered on the origin with 1203 equal-volume cubical cells local patches follow the exact solution very closely at of side-length 5, similar to the Cartesian cell-size in all times. Thus, in this test, crossing a patch boundary the monopatch simulation. Where the local and global creates no artifacts, even though the local patch moves patches overlap, their cell dimensions are likewise simi- obliquely across two different kinds of discontinuity. lar. The quality with which all these discretized represen- 3.2. Sedov-Taylor blast wave tations reproduce the analytic result is shown in Fig- ure5, whichagainshowsthesituationatthreedifferent The purpose of this test was to show the performance times. In this case, at the earliest time the shock front of the multipatch when at least one of the patches has is entirely within the local patch, while it is a short dis- a grid whose symmetry is a poor match to the natu- tance outside the local patch in the middle time, and ral symmetry of the problem. To that end, we study a far outside the local patch at the last time. At the ear- Sedov-Taylor3Dsphericalblastwave(Sedov1959)with liest time, the data for the Cartesian local patch and a central local patch using Cartesian coordinates and a the Cartesian monopatch are, not surprisingly, nearly global patch using spherical coordinates. As a standard identical; the entire global patch remains in the initial of comparison, we also contrast a monopatch simula- state at this time. Interestingly, the shock at this time tion with entirely Cartesian coordinates. Although the is at slightly larger radius than predicted by the an- Cartesian grids of the local patch are poor matches to alytic solution in both the monopatch and the multi- thesphericalsymmetryofthephysicalproblem,theydo patch simulation. At the middle time, the local patch havethevirtueofcoveringthecoordinatesingularityat and monopatch still closely agree, but the shock region the origin. is located in the global patch. The global patch data Ablastwaveisformedwhenalargeamountofenergy conform noticeably more closely to the analytic solu- E is deposited in a small region. If the ambient gas is tion than do the monopatch data, presumably because motionlessandhasuniformdensityρ ,asphericalshock 1 the spherical geometry of the global patch better rep- wave travels rapidly outward. Once the mass swept up resents the essentially spherical character of the blast by the shock exceeds the mass originally located in the wave. This cannot be an ordinary resolution effect be- small energy-deposition region, the shock front’s radial cause the cell-sizes of the two grids in the shock region position as a function of time is given by are quite similar at this time. At still later times, the (cid:18) (cid:19)1/5 monopatch and global patch data are slightly closer to E R = ξ t2/5 (1) one another, with both showing some departures from s ρ 1

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.