ebook img

Ultrascale Simulations of Non-smooth Granular Dynamics PDF

0.76 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Ultrascale Simulations of Non-smooth Granular Dynamics

Computational Particle Mechanics manuscript No. (will be inserted by the editor) Ultrascale Simulations of Non-smooth Granular Dynamics Tobias Preclik · Ulrich Ru¨de 5 1 Received:31.12.2014/Accepted: 0 2 Abstract Thisarticlepresentsnewalgorithmsformas- 1 Introduction n a sively parallel granular dynamics simulations on dis- J tributed memory architectures using a domain parti- Granular matter exhibits intriguing behaviours akin to 3 tioning approach. Collisions are modelled with hard 2 solids, liquids or gases. However, in contrast to those contactsinordertohidetheirmicro-dynamicsandthus fundamentalstatesofmatter,granularmatterstillcan- to extend the time and length scales that can be simu- ] not be described by a unified model equation homoge- E lated.Themulti-contactproblemissolvedusinganon- nizing the dynamics of the individual particles [26]. To C linearblockGauss-Seidelmethodthatisconformingto date, the rich set of phenomena observed in granular . the subdomain structure. The parallel algorithms em- s matter, can only be reproduced with simulations that c ploy a sophisticated protocol between processors that [ resolve every individual particle. In this paper, we will delegate algorithmic tasks such as contact treatment consider methods where also the spatial extent and ge- 1 and position integration uniquely and robustly to the ometric shape of the particles can be modelled. Thus v processors.Communicationoverheadisminimizedthrough 0 in addition to position and translational velocity the aggressivemessageaggregation,leadingtoexcellentstrong 1 orientationandangularvelocityofeachparticleconsti- 8 and weak scaling. The robustness and scalability is as- tute the state variables of the dynamical system. The 5 sessed on three clusters including two peta-scale super- shapes of the particles can be described for example 0 computerswithupto458752processorcores.Thesim- . by geometric primitives, such as spheres or cylinders, 1 ulationscanreachunprecedentedresolutionofuptoten with a low-dimensional parameterization. Composite 0 billion (1010) non-spherical particles and contacts. 5 objects can be introduced as a set of primitives that 1 arerigidlygluedtogether.Eventually,evenmesheswith : Keywords Granular Dynamics · High Performance v a higher-dimensional parameterization can be used. In Computing · Non-smooth Contact · Parallel Com- i this article the shape of the particles does not change X puting · Message Passing Interface in time, i.e. no agglomeration, fracture or deformation r a takes place. The rates of change of the state variables MathematicsSubjectClassification(2000) 65Y05 · are described by the Newton-Euler equations, and the 70F35 · 70F40 · 70E55 particleinteractionsaredeterminedbycontactmodels. Two fundamentally different model types must be T.Preclik Lehrstuhl fu¨r Informatik 10 (Systemsimulation), Friedrich- distinguished: Soft and hard contacts. Soft contacts al- Alexander-Universita¨t Erlangen-Nu¨rnberg, Cauerstr. 11, low a local compliance in the contact region, whereas 91052Erlangen,Germany hard contacts forbid penetrations. In the former class E-mail:[email protected] the contact forces can be discontinuous in time, lead- U.Ru¨de ing to non-differentiable but continuous velocities after Lehrstuhl fu¨r Informatik 10 (Systemsimulation), Friedrich- integration. The differential system can be cast e.g. as Alexander-Universita¨t Erlangen-Nu¨rnberg, Cauerstr. 11, 91052Erlangen,Germany an ordinary differential equation with a discontinuous E-mail:[email protected] right-hand side or as differential inclusions. However, 2 Tobias Preclik,Ulrich Ru¨de the resulting differential system is typically extremely computational power to integrate such systems for a stiff if realistic material parameters are employed. relevant simulation time. Consequently a massive par- Inthelatterclass,discontinuousforcesarenotsuffi- allelization of the numerical method for architectures cienttoaccomplishnon-penetrationoftheparticles.In- with distributed memory is absolutely essential. stead,impulsesarenecessarytoinstantaneouslychange velocities on collisions or in self-locking configurations ifCoulombfrictionispresent[33].Strongermathemat- Inthelasthalfdecadeseveralapproacheswerepub- icalconceptsarerequiredtodescribethedynamics.For lished suggesting parallelizations of the methods inte- that purpose, Moreau introduced the measure differen- gratingtheequationsofmotionofrigidparticlesinhard tial inclusions in [27]. contact[38,39,20,34,15,16,28].Theapproachputfor- Hard contacts are an idealization of reality. The ward in this article builds conceptually on these previ- rigidity of contacts has the advantage that the dynam- ous approaches but exceeds them substantially by con- ics of the micro-collisions does not have to be resolved sistently parallelizing all parts of the code, consistently intime.However,thisalsointroducesambiguities:The distributing all simulation data (including the descrip- rigidity has theeffectthatthe forcechainsalongwhich tion of the domain partitioning), systematically mini- a particle is supported are no longer unique [29]. If en- mizing the volume of communication and the number ergy is dissipated, this also effects the dynamics. To of exchanged messages, and relying exclusively on ef- integrate measure differential inclusions numerically in ficient nearest-neighbor communication. The approach time, two options exist: In the first approach the inte- describedhereadditionallysparestheexpensiveassem- grationisperformedinsubintervalsfromoneimpulsive bly of system matrices by employing matrix-free com- event to the next [25, 11]. At each event an instan- putations. All this is accomplished without sacrificing taneous impact problem must be solved whose solu- accuracy.Thematrix-freenessallowsthedirectandstraight tion serves as initial condition of the subsequent inte- forward evaluation of wrenches in parallel and thus re- gration subinterval. Impact problems can range from ducestheamountofcommunicateddata.Furthermore, simple binary collisions, to self-locking configurations, an exceptionally robust synchronization protocol is de- to complicated instantaneous frictional multi-contact fined, which is not susceptible to numerical errors. The problemswithsimultaneousimpacts.Thedynamicsbe- excellentparallelscalingbehaviouristhendemonstrated tween events are described by differential inclusions, fordiluteanddensetestproblemsinstrong-andweak- differential algebraic equations or ordinary differential scaling experiments on three clusters with fundamen- equations.Predictingthetimesoftheupcomingevents tally different interconnect networks. Among the test correctly is non-trivial in general and handling them in machinesarethepeta-scalesupercomputersSuperMUC order in parallel is impeding the scalability [25]. In the and Juqueen. The results show that given a sufficient second approach no efforts are made to detect events, computational intensity of the granular setup and an but the contact conditions are only required to be sat- adequate interconnect, few hundred particles per pro- isfied at discrete points in time. This approach is com- cess are enough to obtain satisfactory scaling even on monly referred to as a time-stepping method. millions of processes. This article focuses on the treatment of hard con- tactsinordertoavoidthetemporalresolutionofmicro- collisionsandthusthedependenceofthetime-steplength In Sect. 2 of this paper the underlying differential onthestiffnessofthecontacts.Inordertoavoidtheres- equations and the time-continuous formulation of the olution of events a time-stepping method is employed. hard contact models are formulated. Sect. 3 proposes This considerably pushes the time scales accessible to a discretization scheme and discrete constraints for the granular flow simulations for stiff contacts. hardcontactmodel.Theproblemofreducingthenum- To estimate the order of a typical real-life problem ber of contacts in the system for efficiency reasons is size of a granular system, consider an excavator bucket addressed in Sect. 4. Subsequently, an improved nu- with a capacity of 1m3. Assuming sand grains with merical method for solving multi-contact problems in a diameter of 0.15mm, and assuming that they are parallel is introduced in Sect. 5 before turning to the packed with a solid volume fraction of 0.6, the exca- design of the parallelization in Sect. 6. The scalability vator bucket contains in the order of 1010 particles. In oftheparallelizationisthendemonstratedinSect.7by such a dense packing the number of contacts is in the meansofdiluteanddensesetupsonthreedifferentclus- same order as the number of particles. Only large scale ters. Finally, the algorithms and results are compared parallel systems with distributed memory can provide topreviousworkbyotherauthorsinSect.8beforesum- enoughmemorytostorethedataandprovidesufficient marizing in Sect. 9. UltrascaleSimulationsofNon-smoothGranularDynamics 3 2 Continuous Dynamical System and time t. The wrench contributions from contact re- actions are summed up with external forces f and ext The Newton-Euler equations for a system with νb par- torques τext such as fictitious forces from non-inertial ticles are [22] reference frames. (cid:18)x˙(t)(cid:19) (cid:18) v(t) (cid:19) Let λj(t) ∈ R3 be the contact reaction of a con- = , tact j ∈ C, where C = {1..ν } is the set of poten- ϕ˙(t) Q(ϕ(t))ω(t) c tial contact indices. Let (j ,j )∈B2 be the index pair (cid:18) (cid:19) (cid:18) (cid:19) 1 2 v˙(t) f(s(t),t) M(ϕ(t)) = , of both particles involved in the contact j, where B = ω˙(t) τ(s(t),t)−ω(t)×I(ϕ(t))ω(t) {1..ν }isthesetofbodyindices.Letxˆ (x(t),ϕ(t))∈ b j where the positions x(t) ∈ R3νb, the rotations ϕ(t) ∈ R3 be the location of contact j, then the wrench on R4νb, translational velocities v(t) ∈ R3νb, and angular body i is veloDciitffieerseωnt(tp)a∈raRm3eνtbearirzeatthioensstaexteistvafroirabtlhees arotttaitmioents., (cid:18)fτii((ss((tt)),,tt))(cid:19)=(cid:18)fτii,,eexxtt((ss((tt)),,tt))(cid:19)+(cid:88)j∈C(cid:20)(xˆj(x(t),ϕ(1t))−xi(t))×(cid:21)λj(t) but quaternions having four real components are the j1=i (cid:88)(cid:20) 1 (cid:21) (1) parameterizationofchoicehere.Independentofthepa- −j∈C (xˆj(x(t),ϕ(t))−xi(t))× λj(t), rameterization, the derivatives of the rotation compo- j2=i (cid:124) (cid:123)(cid:122) (cid:125) nentscanbeexpressedintermsofamatrix-vectorprod- wrenchcontributions uct between a block-diagonal matrix and the angular where (·)× is a matrix, which when multiplied to a velocities [10]. If the rotation of particle i is described vector corresponds to the cross product between its by the quaternion qw +qxi+qyj+qzk ∈ H then, ac- operand (·) and the vector. cording to [10], the i-th diagonal block of Q(ϕ(t)) is In contrast to soft contact models, the contact re-   actions in hard contact models cannot be explicitly ex- −q −q −q x y z pressed as a function of the state variables but are de- Qii(ϕi(t))= 21−qqwz qqwz −qqxy. finedimplicitly,e.g.byimplicitnon-linearfunctions[21], complementarity constraints [1, 3], or inclusions [36]. q −q q y x w In any case, the constraints distinguish between reac- Each particle has an associated body frame whose tions in the directions normal to the contact surfaces origin coincides with the body’s center of mass and and reactions in the tangential planes of the contact whoseaxesareinitiallyalignedwiththeaxesoftheob- surfaces. The former are used to formulate the non- servational frame. The body frame is rigidly attached penetration constraints, and the latter are used to for- to the body and translates and rotates with it. All of mulate the friction constraints. For that reason, each the state variables and other quantities are expressed contact j is associated with a contact frame, where in the observational frame unless noted otherwise. Fur- the axis n (x(t),ϕ(t)) ∈ R3 points along the direc- j thermore, the matrix tion normal to the contact surface, and the other two  diag m 1  axes tj(x(t),ϕ(t)) ∈ R3 and oj(x(t),ϕ(t)) ∈ R3 span i M(ϕ(t))=i=1..νb  the tangential plane of the contact. diag Iii(ϕi(t)) LetSibethesetofpointsintheobservationalframe i=1..νb definingtheshapeofparticlei,andletf (x (t),ϕ (t),y)∈ i i i istheblock-diagonalmassmatrix,where1denotesthe Rbetheassociatedsigneddistancefunctionforapoint 3 × 3 identity matrix. The mass matrix contains the y intheobservationalframe.Thesigneddistancefunc- constant particle masses m and the particles’ inertia tion shall be negative in the interior of S . Assum- i i matricesI (ϕ (t))abouttheparticles’centersofmass. ing that all particles are (strictly) convex with suffi- ii i The latter can be calculated by similarity transforma- ciently smooth boundaries, then for a pair of particles tionsfromtheconstantbodyframeinertiamatricesI0. (j ,j ) involved in a contact j, the contact location ii 1 2 Ifthebodyframesareattachedsuchthattheycoincide xˆ (x(t),ϕ(t)) is defined by the optimization problem j withtheprincipalaxesoftheirparticles,thenthebody frame inertia matrices are diagonal, and floating-point xˆ (t):=xˆ (x(t),ϕ(t)) operations as well as memory can be saved. The lower- j j (2) right block of the mass matrix corresponds to the ma- = argmin fj1(xj1(t),ϕj1(t),y), trixI(ϕ(t)).f(s(t),t)andτ(s(t),t)arethetotalforces fj2(xj2(t),ϕj2(t),y)≤0 andtorques(togethertheyarereferredtoaswrenches) with associated contact normal acting at the particles’ centers of mass. Both may de- pend on any of the state variables s(t) of the system n (t):=n (x(t),ϕ(t))=∇ f (x (t),ϕ (t),xˆ (t)), j j y j2 j2 j2 j 4 Tobias Preclik,Ulrich Ru¨de pointing outwards with respect to S and associated These non-penetration conditions can be comple- j2 signed contact distance mented by a friction condition. The most prominent model for dry frictional contact is the Coulomb model ξ (t) := ξ (x(t),ϕ(t)) = f (x (t),ϕ (t),xˆ (t)) j j j1 j1 j1 j which restricts the relative contact velocity in the tan- gentialplaneofthecontact.Therelativecontactveloc- which is negative in the case of penetrations. ityforapairofparticles(j ,j )involvedinacontactj For convex particles each pair of bodies results in 1 2 is a potential contact, and thus the total number of con- tacts νc is limited by 12νb(νb −1). Non-convex objects δvj+(s(t))=vj+1(t)+ωj+1(t)×(xˆj(x(t),ϕ(t))−xj1(t)) e.g. can be implemented as composite objects of con- −v+(t)−ω+(t)×(xˆ (x(t),ϕ(t))−x (t)). vex particles. By convention a positive reaction in nor- j2 j2 j j2 maldirectionisrepulsive,andthusthecontactreaction Let λj(t)actspositivelyonparticlej1 andnegativelyonj2, (cid:18)t (x(t),ϕ(t))Tδv+(s(t))(cid:19) thus explaining the signs in (1). By applying the oppo- δvj+,to(t):=δvj+,to(s(t))= oj(x(t),ϕ(t))Tδvj+(s(t)) site reactions at the same point in the observational j j frame,notonlythelinearmomentumcanbeconserved be the relative contact velocity in the tangential plane but also the angular momentum of the system. Con- afterapplicationofthecontactimpulses,thentheCou- servation of energy can only hold if the contact model lomb conditions for a non-impulsive point in time t are does not include dissipative effects. Hard-contact mod- (cid:107)λ (t)(cid:107) ≤µ λ (t) and els require the Signorini condition to hold. Written as j,to 2 j j,n (cid:107)δv+ (t)(cid:107) λ (t)=−µ λ (t)δv+ (t). a complementarity condition for a contact j, it reads j,to 2 j,to j j,n j,to However, if (cid:107)δv+ (t)(cid:107) = 0 these conditions must be ξj(t)≥0 ⊥ λj,n(t)≥0, j,to 2 supplemented by the constraint where λ (t) = n (t)Tλ (t). The signed contact dis- tanceisrj,enquiredtojbenonj-negative,resultinginanon- (cid:107)δ˙v+j,to(t)(cid:107)2λj,to(t)=−µjλj,n(t)δ˙v+j,to(t) penetration constraint. The contact reaction in direc- on acceleration level in order to determine the friction tion of the contact normal is also required to be non- force. Likewise constraints for the friction impulse are negative, resulting in non-adhesive contact reactions. necessary.Atthispointwerefrainfromformulatingthe Furthermore, both quantities must be complementary, measure differential inclusion in detail since it would meaningthateitherofthemmustbeequaltozero.This not contribute information essential to the remaining effects that the contact reaction can only be non-zero paper which only deals with the discrete-time system. if the contact is closed. However,theSignoriniconditiondoesnotdetermine the contact reaction force if the contact is closed. In 3 Discrete Dynamical System thatcasethenon-penetrationconstraintonthevelocity level, In simulations of granular matter impulsive reactions areabundant.Higher-orderintegratorsfortime-stepping ξ˙+(t)≥0 ⊥ λ (t)≥0, j j,n schemes are still subject to active research [31]. In par- ticular, discontinuities pose problems for these integra- must be added to the system, where ξ˙+ is the right j tors. Hence, the continuous dynamical system is dis- derivative of the signed contact distance with respect cretizedinthefollowingwithanintegratoroforderone, to time. The constraint allows the contact to break resembling the semi-implicit Euler method and similar only if no reaction force is present and otherwise forces to the one suggested in [2]. ξ˙+(t) = 0. In the latter case the reaction force is still j Lets,x,ϕ,v andω denotethegivendiscrete-time not fixed. The non-penetration constraint on the accel- state variables at time t and λ the contact reactions at eration level, time t. Then the state variables at time t+δt are func- ξ¨+(t)≥0 ⊥ λ (t)≥0, tions depending on the contact reactions: s(cid:48)(λ), x(cid:48)(λ), j j,n ϕ(cid:48)(λ), v(cid:48)(λ) and ω(cid:48)(λ). The discrete-time Newton- then determines the force also if ξ¨+(t)=0. When con- Euler equations integrated by the proposed scheme are j sidering impacts, a non-penetration constraint for the reaction impulse in the direction normal to the contact (cid:18)x(cid:48)(λ)(cid:19) (cid:18)x(cid:19) (cid:18) v(cid:48)(λ) (cid:19) surfacemustbeformulated,and,ifthecontactisclosed, ϕ(cid:48)(λ) = ϕ +δt Q(ϕ)ω(cid:48)(λ) , (3) an additional constraint modelling an impact law such (cid:18)v(cid:48)(λ)(cid:19) (cid:18)v(cid:19) (cid:18) f(s,λ,t) (cid:19) as Newton’s impact must be added. ω(cid:48)(λ) = ω +δtM(ϕ)−1 τ(s,λ,t)−ω×I(ϕ)ω . UltrascaleSimulationsofNon-smoothGranularDynamics 5 Positions and orientations at time t+δt appear exclu- Adetaileddiscussionofsolutionalgorithmsforone- sively on the left-hand side of the position and orien- contactproblemsisoutofthescopeofthisarticle.How- tation integration. Velocities at time t+δt appear on ever,splittingmethods,wherenon-penetrationandfric- the left-hand side of the velocity integration and addi- tionconstraintsaresolvedseparately,arepronetoslow tionallyintheintegrationofpositionsandorientations. convergence or cycling. In [7] Bonnefon et al. solve the Thenumericalintegrationofthequaternionhastheef- one-contact problem by finding the root of a quartic fectthatthequaterniongraduallyloosesitsunitlength. polynomial.Numerousotherapproachesexistformod- This deficiency can be compensated by renormalizing ifiedfrictionlaws,notablythosewherethefrictioncone the quaternions after each integration. is approximated by a polyhedral cone and solution al- Insteadofdiscretizingeachofthefiveintermittently gorithms for linear complementarity problems can be activecontinuous-timecomplementarityconstraints,the used [1, 30]. In any case the algorithm of choice should Signorini condition is only required to hold at the end be extremely robust in order to successfully resolve ν c ofeachtimestep.Thishastheeffectthatimpulsivere- contacts per iteration and time step, where ν can be c actions are no longer necessary to satisfy the condition in the order of 1010 in this article. since the condition is no longer required to be fulfilled instantaneously.Furthermore,thesigneddistancefunc- 4 Contact Detection tion gets linearized, resulting in ξ (t+δt)=ξ (t)+δtξ˙ (t)+O(δt2), The contact problem F(λ) = 0 has O(ν2) non-linear j j j b equations.Thus,alreadythesetupofthecontactprob- wherethetimederivativeofthesignedcontactdistance lem would not run in linear time, much less the so- can be determined to be lution algorithm even if it were optimal. The contact constraintsofacontactj canberemovedfromthesys- ξ˙ (t)=n (t)Tδv+(s(t)) j j j tem without altering the result if the contact is known undertheassumptionthatthecontactpointxˆj(t)trans- tostayopen(λj =0)withinthecurrenttimestep.Let latesandrotatesinaccordancewithbodyj2,suchthat Si(t)=(cid:8)y ∈R3(cid:12)(cid:12)fi(xi(t),ϕi(t),y)≤0(cid:9) xˆ˙ (t)=v (t)+ω (t)×(xˆ (t)−x (t)). j j2 j2 j j2 be the set of points in space corresponding to the ro- tated and translated shape of particle i at time t and Let the time-discrete relative contact velocity be let δv(cid:48)(λ)=v(cid:48) (λ)+ω(cid:48) (λ)×(xˆ −x ) j −vj(cid:48)1(λ)−ωj(cid:48)1(λ)×(xˆj −xj1), Hi(t)=Si(t)+(cid:8)y ∈R3(cid:12)(cid:12)(cid:107)y(cid:107)2 ≤hi(t)(cid:9) j2 j2 j j2 beanintersectionhullthatsphericallyexpandsthepar- where the velocities are discretized implicitly. The dis- ticle shape by the radius h (t) > 0. If h (t) is chosen i i crete non-penetration constraint then is largeenoughthenanalgorithmfindingintersectionsbe- ξ tween the hulls can detect all contacts that can poten- j +nTδv(cid:48)(λ)≥0 ⊥ λ ≥0. (4) δt j j j,n tiallybecomeactiveinthecurrenttimestep.Apossible choice for the expansion radius is The term ξj acts as an error correction term if pen- δt etrations are present (ξj < 0). In that case it can be hi(t)=δt((cid:107)vi(t)(cid:107)2+(cid:107)ωi(t)(cid:107)2ri)+τ, (6) scaled down to avoid introducing an excessive amount where r = max (cid:107)y(cid:107) is the bounding radius of of energy. If no numerical error is present, the contact i y∈Si(0) 2 particle i, and τ is a safety margin. The safety mar- is inelastic. The frictional constraints translate into gin becomes necessary since an explicit Euler step is (cid:107)λ (cid:107) ≤µ λ and underlying the derivation of (6). In practice, the us- j,to 2 j j,n (5) (cid:107)δv(cid:48) (λ)(cid:107) λ =−µ λ δv(cid:48) (λ). ageofintersectionhullsreducesthenumberofcontacts j,to 2 j,to j j,n j,to considerably.E.g.,monodispersesphericalparticlescan Let F (λ) = 0 denote a non-linear system of equa- have at most 12 contacts per particle if the expansion j tionsequivalenttotheconstraintsfrom(4)and(5)ofa radiiaresmallenough[32],resultinginO(ν )potential b single contact j, and let F(λ) denote the collection of contacts. all F (λ). Neither F(λ) = 0 nor F (λ) = 0 for given Broad-phase contact detection algorithms aim to j j λ have unique solutions. Let F−1(0,λ ) be a possible find as few as possible candidate particle pairs for con- j j j solution of the one-contact problem of contact j, given tactsbyusinge.g.spatialpartitioningapproachesorex- the contact reactions λ of all other contacts j. ploitingtemporalcoherenceoftheparticlepositions[9]. j 6 Tobias Preclik,Ulrich Ru¨de 1: k←0 The algorithm is of iterative nature and needs an 2: λ(k)←0 appropriate stopping criterion to terminate. In each it- 3: whileconvergencecriterionnotmetdo 4: forj←1toν do eration k a sweep over all contacts is performed, where c 5: forl∈C do eachcontactj isrelaxed,givenanapproximationofall 6: λ˜(k,j)←(cid:40)λ(lk+1) ifl<j∧sc(l)=sc(j) othercontactreactionsλ˜(k,j).InthesubdomainNBGS, l λ(lk) else the approximation of contact reaction l is taken from 7: endfor the current iteration if it was already relaxed (l < j) 8: y←F−1(0,λ˜(k,j)) j j and if it is associated with the same subdomain as the 9: λ(k+1)←ωy+(1−ω)λ(k) j j contact j to be relaxed (s (l) = s (j)). In all other 10: endfor c c cases, the approximation is taken from the previous it- 11: k←k+1 12: endwhile eration.Thecontactreactionλ(k+1) isthenaweighted j mean between the previous approximation and the re- Algorithm 1: The subdomain NBGS method with re- laxation result. If all contacts are associated with the laxation parameter ω. same subdomain and ω =1 then Alg. 1 corresponds to aclassicNBGS.Ifeachcontactisassociatedtoadiffer- ent subdomain then Alg. 1 corresponds to a non-linear The candidate pairs are then checked in detail in the block Jacobi (NBJ) with relaxation parameter ω. narrow-phase contact detection, where (2) is solved for eachpair,leadingtothecontactlocationxˆ ,normaln j j and signed distance ξ for a contact j. j To solve (2) for non-overlapping particles, the Gil- 6 Parallelization Design bert-Johnson–Keerthi(GJK)algorithmcanbeused[13, Sect. 6.1 introduces the domain partitioning approach. 4]. For overlappingparticle shapes the expanding poly- Sect. 6.2then discusses requirements thatmust bemet topealgorithm(EPA)computesapproximatesolutions[5]. in order to be able to treat all contacts exactly once in For simple geometric primitives like spheres, the opti- parallel. Sect. 6.3 explains how accumulator and cor- mization problem can be solved analytically. The in- rection variables can be used in order to reduce data dices of all contacts found that way form the set of po- dependenciestootherprocesses.InSect.6.4conditions tential contacts C = {1..ν } at time t. Let F(λ) = 0 c are discussed under which the set of communication fromnowondenotethecontactproblemwhereallcon- partnerscanbereducedtothenearestneighbors.Time- tact conditions and contact reactions whose indices are integration and the subsequent necessity of synchro- not part of C have been filtered out. nization are addressed in Sect. 6.5 before summarizing the time-stepping procedure in Sect. 6.6. 5 Numerical Solution Algorithms To solve the multi-contact problem, when suitable so- 6.1 Domain Partitioning lution algorithms for the one-contact problems F−1 j are given, a non-linear block Gauss-Seidel (NBGS) can Under the assumption that no contacts are present, be used as propagated by the non-smooth contact dy- there exists no coupling between the data of any two namics(NSCD)method[18].Unfortunately,theGauss- particles,andtheproblembecomesembarrassinglypar- Seidel algorithm cannot be efficiently executed in par- allel: Each process integrates (cid:98)νb(cid:99) or (cid:100)νb(cid:101) particles. νp νp allel for irregular data dependencies as they appear in Let s (i)∈P determine the process responsible for the b contact problems [20]. time-integrationofparticleiasofnowreferredtoasthe As an alternative, a more general variant is pro- parent process. All data associated with this particle, posed here, accommodating the subdomain structure that is the state variables (position, orientation, veloc- that will arise in the domain partitioning. Therefore, ities) and constants (mass, body frame inertia matrix, each contact j ∈ C is associated with a subdomain shape parameters), are instantiated only at the parent number s (j) ∈ P, where P = {1..ν } is the set of process in order to distribute the total memory load. c p subdomain indices for ν subdomains. Alg. 1 presents However, contacts or short-range potentials introduce p pseudo-code for the subdomain NBGS with the relax- data dependencies to particles that in general are not ationparameterω >0.Theinitialsolutionischosento instantiated on the local process nor on a process close be zero, however, any other initialization can be used, to the local one, rendering a proper scaling impossible. in particular contact reactions from the previous time A domain partitioning approach alleviates this prob- step. lem. UltrascaleSimulationsofNon-smoothGranularDynamics 7 LetΩdenotethecomputationaldomainwithinwhich These additionalinstantiations shallbetermedshadow all particles are located and Ω ⊆ Ω,p ∈ P, a family copies in the following. They must be kept in synchro- p of disjoint subdomains into which the domain shall be nization with the original instantiation on the parent partitioned. In this connection subdomain boundaries process. In order to agree upon the detecting process areassociatedtoexactlyoneprocess.Oneprocessshall responsible for treating the contact without commu- be executed per subdomain. The number of processes nication a rule is needed. Here, the statement that a cane.g.correspondtothenumberofcomputenodesin process is responsible for treating a contact refers to a hybrid parallelization or to the total number of cores the responsibility of the process for executing the re- oreventhreadsinahomogeneousparallelization.Inthe laxationoftherespectivecontactinAlg.1.Thetypical domain partitioning approach the integration of a par- choiceforthisrulerequiresthattheprocesswhosesub- ticlewhosecenterofmassx islocatedinasubdomain domain contains the point of contact is put in charge i Ω at time t is calculated by process p. That way data to treat the contact [34]. p dependencies typically pertain the local or neighboring However, this rule only works if the process whose subdomains since they are considered to be of short subdomaincontainsthepointisabletodetectthecon- range. Let s (i) be adapted accordingly. Special care tact. This is only guaranteed if the point of contact is b is required when associating a particle to a subdomain located within the hull intersection. Also, if the point whose center of mass is located on or near subdomain of contact is located outside of the domain Ω, then no interfaces.Especially,periodicboundaryconditionscan process will treat it. complicate the association process since the finite pre- A more intricate drawback of this approach is that cision of floating-point arithmetics does in general not it can fail in case of periodic boundary conditions: If allowaconsistentparametricdescriptionofsubdomains the contact point is located near the periodic bound- across periodic boundaries. Sect. 6.5 explains how the ary, the periodic image of the contact point will be synchronizationprotocolcanbeusedtoallowareliable detected at the other end of the simulation box. Due association. to the shifted position of the contact point image and Thedomainpartitioningshouldbechosensuchthat thelimitednumericalprecision,thesubdomainscanno an equal number of particles is located initially in each longer consistently decide the subdomain affinity. subdomainandsustainedoverthecourseofthesimula- A more robust rule to determine the subdomain tion in order to balance the computational load which affinity can be established by fulfilling the following re- isdirectlyproportionaltothenumberofparticles.Par- quirement: ticles now migrate between processes if their positions Requirement 2 All shadow copy holders of a parti- change the subdomain. Migration can lead to severe cle maintain a complete list of all other shadow copy load imbalances that may need to be addressed by dy- holders and the parent process of that particle. namicallyrepartitioningthedomain.Suchload-balancing Then each process detecting a contact can determine techniques are beyond the scope of this article. the list of all processes detecting that very same con- tact, which is the list of all processes with an instan- tiation of both particles involved in the contact. This 6.2 Shadow Copies list is exactly the same on all processes detecting the contact and is not prone to numerical errors. The rule Apurelocalinstantiationofparticleshastheeffectthat can then e.g. appoint the detecting process with small- contacts cannot be detected between particles that are est rank to treat the contact. In order to enhance the notlocatedonthesameprocess.Aprocesscandetecta locality of the contact treatment, the rule should fa- contact if both particles involved in the contact are in- vor the particle parents if they are among the contact stantiated on that process. In order to guarantee that witnesses. Any such rule defines a partitioning of the at least one process can detect a contact, the condi- contact set C. Let C be the set of all contacts treated p tion that a contact j must be detected by all processes by process p ∈ P. Then process p instantiates all con- whose subdomains intersect with the hull intersection tacts j ∈C . p H ∩H is sufficient if the intersection of the hull in- j1 j2 tersectionandthedomainisnon-empty.Thiscondition can be fulfilled by the following requirement: 6.3 Accumulator and Correction Variables Requirement 1 A particle i must be instantiated not The contact relaxations in Alg. 1 exhibit sums with only on the parent process but also on all processes non-local data dependencies. In the following, the re- whose subdomains intersect with the particle’s hull. dundantevaluationofthesesumsispreventedbyintro- 8 Tobias Preclik,Ulrich Ru¨de ducingaccumulatorvariablesandthenon-localdatade- parameter. Since the subdomain NBGS respects the pendencies are reduced by introducing correction vari- subdomain affinity of the contacts, the remote wrench ables. contributionstoparticleicancelout,andjustthetotal The relaxation of a contact j depends on the data wrench on particle i from the last iteration is needed of the state variables of both particles (j ,j ) involved in addition to corrections stemming from contacts that 1 2 in the contact, their constants and shape parameters, were already relaxed by the same process. ascanbeseenbyinspecting(4),(5)andthedefinitions of the terms appearing therein. All of these quantities (cid:18)fi(λ˜(k,j))(cid:19)=(cid:18)fi(λ(k))(cid:19)+ (cid:88) (cid:20) 1 (cid:21)(λ(k+1)−λ(k)) are instantiated on the detecting process, either as a τi(λ˜(k,j)) τi(λ(k)) l∈Csc(j) (xˆl−xi)× l l shadow copy or as an original instance. The contact l<j l1=i variables of contact j (location, signed distance and − (cid:88) (cid:20) 1 (cid:21)(λ(k+1)−λ(k)) the contact frame) are also required. They are avail- (xˆl−xi)× l l able on the detecting process since they result from l∈Cl<scj(j) l2=i the positions, orientations, and the shape parameters of the particles (j ,j ) in the contact detection. Fur- Our implementation instantiates variables on pro- 1 2 thermore, the force and torque terms from (1) acting cess p for the reaction approximations λ[p] ∈ R3|Cp| of on these particles additionally depend on the locations all contacts treated by process p. Any updates to the xˆ and reaction approximations λ˜(k,j) of all other con- reaction approximations occur in place. Furthermore, l l tactslinvolvingoneoftheparticles(j ,j ).Neitherthe an implementation can instantiate accumulator vari- 1 2 locationsnorthereactionapproximationsofthesecon- ables f[p],τ[p] ∈ R3|Bp| on process p for the wrenches tacts are necessarily available on the process treating fromthelastiterationofallinstantiatedparticles(shadow contact j. To rectify this deficiency, one can introduce copiesandoriginalinstances),whereB containsthein- p contactshadowcopiessothatlocationandreactionap- dicesofallshadowcopiesandoriginalinstancesinstan- proximation can be mirrored at every instantiation of tiated on process p. This set is partitioned into B p,local bothparticlesinvolvedinthecontact.However,theor- and B , containing the indices of the original in- p,shadow ganisational overhead of contact shadow copies can be stances and the shadow copies respectively. circumvented.Itisnotnecessarythattheprocesstreat- Instead of evaluating the wrench contribution sums ing the contact evaluates all the wrench contributions each time when calculating the total wrench on parti- to the particles involved in the contact. Instead, parts cleianew,thecontributionscanbeaccumulatedasthe ofthewrenchcontributionsumcanbeevaluatedonthe contactsarerelaxed.Forthatpurpose,implementations processesactuallytreatingtheremotecontactsandcan can instantiate corrections variables δf[p] ∈R3|Bp| and subsequently be communicated: δτ[p] ∈R3|Bp|.Then,afterline9ofAlg.1,thesewrench corrections can be updated by assigning (cid:18)fτii((λλ))(cid:19)=(cid:18)fτii,,eexxtt(cid:19)+l(cid:88)l1∈=Ci(cid:20)(xˆl−1xi)×(cid:21)λl−l(cid:88)l2∈=Ci(cid:20)(xˆl−1xi)×(cid:21)λl  (cid:32)δfj[1sc(j)](cid:33)←(cid:32)δfj[1sc(j)](cid:33)+(cid:20) 1 (cid:21)(λ(k+1)−λ(k)), =(cid:18)fτii,,eexxtt(cid:19)+p(cid:88)∈Pw(cid:124)renll(cid:88)∈1ch=Cpico(cid:20)n(txˆrilbu−t1ioxni)(f×i,(cid:21)pλτil,p(cid:123)−)(cid:122)Tll(cid:88)∈2t=oCpip(cid:20)ar(txˆiclle−i1fxroim)×p(cid:21)roλcelss(cid:125)p (cid:32)δδδfττjjj[[[12ss2sccc(((jjj)))]]](cid:33)←(cid:32)δδδfττjjj[[[12ss2sccc(((jjj)))]]](cid:33)−(cid:20)((xxˆˆjj−−1xxjj12))××(cid:21)(λj(jk+1)−λj(jk)). Thetotalwrenchonparticleicanalsobeexpressed The evaluation of the total wrench on particle i in in terms of the total wrench on particle i at the begin- line 8 of Alg. 1 when relaxing contact j in iteration k ning of iteration k: becomes (cid:18)fτii((λλ))(cid:19)=(cid:18)fτii((λλ((kk))))(cid:19)+p(cid:88)∈Pl(cid:88)∈Cp(cid:20)(xˆl−1xi)×(cid:21)(λl−λ(lk)) (cid:18)fτii((λλ˜˜((kk,,jj))))(cid:19)=(cid:32)fτii[[sscc((jj))]](cid:33)+(cid:32)δδfτii[[sscc((jj))]](cid:33), l1=i  that is the sum of the accumulator and the correction (cid:20) (cid:21) − (cid:88) (xˆl−1xi)× (λl−λ(lk)) variAabtltehs.e end of each iteration the wrench corrections l∈Cp l2=i foreachbodyhavetobereducedandaddedtotheaccu- When relaxing the contact j in iteration k of the mulatedwrenchfromthelastiteration.Thiscanbeper- subdomain NBGS, the wrench on particle i ∈ {j ,j } formed in two message exchanges. In the first message 1 2 is evaluated with the reaction approximation λ˜(k,j) as exchange each process sends the wrench correction of UltrascaleSimulationsofNon-smoothGranularDynamics 9 eachshadowcopytoitsparentprocess.Theneachpro- be the set of process indices in direct neighborhood of cess sums up for each original instance all wrench cor- process p’s subdomain, and let rectionsobtainedfromtheshadowcopyholders,itsown wrench correction, and the original instance’s accumu-  (cid:12)   (cid:12)(cid:12) (cid:91)  lated wrench. Subsequently, the updated accumulated ldd=mininf (cid:107)yp−yq(cid:107)2(cid:12)yp∈Ωp,yq ∈ Ωq , wrench of each original instance is sent to the shadow p∈P  (cid:12)(cid:12) q∈P\(Np∪{p})  copy holders in a second message-exchange communi- cation step. The wrench corrections are then reset ev- be the shortest distance from a point inside a subdo- erywhere. main to a non-nearest neighbor. Then the condition The accumulated wrenches f[p], τ[p] are initialized on each process p before line 3 in Alg. 1 to r +(cid:107)v (t)(cid:107) δt+τ <l ∀i∈B (7) i i 2 dd (cid:32)fi[p](cid:33)←(cid:18)fi,ext(cid:19) ∀i∈B ensures in the first approximation that no hull extends τi[p] τi,ext p pastneighboringsubdomains.Thisimmediatelydefines a hard upper limit of r <l −τ for the bounding ra- i dd unlesstheinitialsolutionischosentobenon-zero.The dius and thus for the size of all objects. Furthermore, wrench corrections are initially set to 0. If the external given the particle shapes, velocities, and safety mar- forcesandtorquesarenotknownoneachprocessorare gins, the condition defines an upper limit for the time- scattered among the processes having instantiated the step length. The introduction of condition (7) entails particles, the initialization requires another two mes- that on a process p only the description of the subdo- sageexchanges,astheyarenecessaryattheendofeach mains within Ωp +(cid:8)y ∈R3(cid:12)(cid:12)(cid:107)y(cid:107)2 ≤ldd(cid:9) needs to be iteration. available, meaning that the description of non-nearest- Analternativetostoringaccumulatedwrenchesand neighbor subdomains can be dispensed with, and that wrench corrections is to store accumulated velocities the description of nearest-neighbor subdomains do not and velocity corrections. In that case, a process p in- havetobecorrectoutsideofthel -surroundingofΩ . dd p stantiates variables v[p], ω[p], δv[p], δω[p] ∈R3|Bp|. The This leads to a localized description of the domain par- accumulated velocities are set to v(cid:48)(λ(k)) and ω(cid:48)(λ(k)) titioning on each process, describing the surrounding i i for all i ∈ B in each iteration. They are initialized subdomains only. p and updated accordingly. The velocity corrections are Typically, the size limit stemming from (7) is not a initialized and updated analogously to the wrench cor- problem for the particles of the granular matter them- rections. Hereby, the velocity variables can be updated selves,butverywellforboundariesormechanicalparts in place. In the classic NBGS no wrench or velocity the granular matter interacts with. However, the num- correctionvariableswouldbenecessary,butthecorrec- ber of such enlarged bodies is typically significantly tionscouldbeaddedtothevelocityvariablesrightaway smaller than and independent of the number of small- whichissimilartotheapproachsuggestedbyTasoraet sizedparticles,suggestingthattheycanbetreatedglob- al. in [37]. ally.LetB bethesetofallbodyindicesexceeding global the size limit. These bodies will be referred to as being global in the following. All associated state variables 6.4 Nearest-Neighbor Communication andconstantsshallbeinstantiatedonallprocessesand initialized equally. The time-integration of these global In the following we describe how the strict locality of bodies then can be performed by all processes equally. particle interactions can be used to optimize the paral- If a global body i has infinite inertia (m = ∞ and i lel communication and synchronization by exchanging I0 =∞1),suchasastationarywalloranon-stationary ii messages only between nearest neighbors. So far the vibratingplate,thebodyvelocitiesareconstant,andno shadow copies can be present on any process, and the wrenchesneedtobecommunicated.Globalbodieshav- corrections in the summation over wrench or velocity ing a finite inertia can be treated by executing an all- corrections can originate from a long list of processes. reducecommunicationprimitivewheneverreducingthe However, by requiring that the particle hulls do not wrenchorvelocitycorrectionsofthesmall-sizedbodies. extend past any neighboring subdomains, all message Insteadofonlyinvolvingneighboringprocesses,theall- exchanges can be reduced to nearest-neighbor commu- reduceoperationsumsupthecorrectionsforeachglobal nications. Let body with finite inertia from all processes and broad- casts the result, not requiring any domain partitioning N ={q∈P\{p}|inf{(cid:107)y −y (cid:107) |y ∈Ω ,y ∈Ω }=0} information. p p q 2 p p q q 10 Tobias Preclik,Ulrich Ru¨de 6.5 Time-Integration and Synchronization Protocol 1: proceduresimulateTimeStep 2: C =broadPhaseCollisionDetection p,bp 3: C =narrowPhaseCollisionDetection(C ) p,np p,bp HavingsolvedthecontactproblemF(λ)=0byAlg.1, 4: C =filterContacts(C ) p p,np the time-integration defined in (3) needs to be per- 5: initializeAccumulatorAndCorrectionVariables 6: k←0 formed. If the NBGS implementation uses velocity ac- 7: λ[p]←0 cumulators, the integrated velocities are at hand after 8: whileconvergencecriterionnotmetdo the final communication of the velocity corrections. If 9: forj←1toν ∧j∈C do c p instead the NBGS implementation uses wrench accu- 10: λ[p]←ωF−1(0,λ[p])+(1−ω)λ[p] j j j j mulators, the wrenches are at hand, and the velocities 11: endfor of all local bodies can be updated immediately. 12: reduceCorrections 13: k←k+1 Subsequently, the time-integration of the positions 14: endwhile can take place. Updating a body’s position or orienta- 15: integrateStateVariables tioneffectsthatthelistofshadowcopyholderschanges 16: synchronize 17: endprocedure since the intersection hull possibly intersects with dif- ferent subdomains. Also, the body’s center of mass can Algorithm 2: A single time step of the simulation on moveoutoftheparent’ssubdomain.Inordertorestore process p. the fulfillment of the requirements 1 and 2, a process must determine the new list of shadow copy holders and the new parent process for each local body after proposed by Shojaaee et al. in [34] are canonical. Con- the position update. Shadow copy holders must be in- cerning the geometry of the subdomains at least the formed when such shadow copies become obsolete and subdomain closures can be used for intersection test- must be removed. Analogously, processes must be no- ing. In our implementation we chose to determine al- tified when new shadow copies must be added to their most minimal sets of shadow copy holders by testing state.Inthiscasecopiesofthecorrespondingstatevari- the intersections of the actual hull geometries of the ables,constants,listofshadowcopyholderindices,and particles with the closures of the subdomains. This re- index of the parent process must be transmitted. ducesthenumberofshadowcopiesandthustheoverall All other shadow copy holders must obtain the new communication volume in exchange for more expensive state variables, list of shadow copy holder indices, and intersection tests. indexoftheparentprocess.Hereby,theconditionfrom (7)guaranteesthatallcommunicationpartnersareneigh- bors. All information can be propagated in a single ag- 6.6 Summary gregated nearest-neighbor message-exchange. The in- formationshouldbecommunicatedexplicitlyandshould Alg.2summarizesthestepsthatneedtobeexecutedon not be derived implicitly, in order to avoid inconsisten- a process p when time-integrating the system for a sin- cies.Thisisessentialtoguaranteeasafedetermination gletimestepδtinparallel.Thealgorithmrequiresthat ofthecontacttreatmentresponsibilitiesaswellastime- all shadow copies are instantiated on all subdomains integration responsibilities. theirhullintersectswith.Furthermore,theshadowcopies Our implementation of the synchronization proto- must be in sync with the original instance, and the colmakesuseofseparatecontainersforstoringshadow global bodies must also be in sync to each other. The copiesandoriginalinstancesinordertobeabletoenu- positions of all local bodies must be located within the merate these different types of bodies with good per- local subdomain. The time step proceeds by executing formance. Both containers support efficient insertion, the broad-phase contact detection which uses the po- deletion and lookup operations for handling the fluctu- sitions, orientations, shapes, hull expansion radii, and ations and updates of the particles efficiently. Further- possiblyinformationfromprevioustimesteps,inorder more,thedeterminationofthenewlistofshadowcopy to determine a set of contact candidates (body pairs) holders involves intersection tests between intersection C on process p in near-linear time. p,bp hullsoflocalbodiesandneighboringsubdomainsasre- Then, in the narrow-phase contact detection, for quirement1explainsinSect.6.2.However,determining all candidates the contact location, associated contact theminimalsetofshadowcopyholdersisnotnecessary. frame, and signed contact distance is determined if the Any type of bounding volumes can be used to ease in- hullsactuallyintersect.Finally,thissetofdetectedcon- tersectiontesting.Inparticularboundingsphereseither tacts C needs to be filtered according to one of the p,np with tightly fitting bounding radii r + h (t) or even rulespresentedabove,resultinginC ,thesetofcontacts i i p with an overall bounding radius max r +h (t) as to be treated by process p. Before entering the itera- i∈B i i

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.