ASTRONOMY A&A manuscript no. AND (will be inserted by hand later) ASTROPHYSICS Your thesaurus codes are: missing; you have not inserted them February 5, 2008 A 3D radiative transfer framework: I. non-local operator splitting and continuum scattering problems Peter H. Hauschildt1 and E. Baron1,2,3 1 HamburgerSternwarte, Gojenbergsweg 112, 21029 Hamburg, Germany; [email protected] 2 HomerL.DodgeDept.ofPhysicsandAstronomy,UniversityofOklahoma,440W.Brooks,Rm100,Norman,OK73019-2061 6 USA;[email protected] 0 3 CRD,Lawrence Berkeley National Laboratory, MS 50F-1650, 1 Cyclotron Rd,Berkeley, CA 94720-8139 USA 0 2 received 18/July/2005, accepted 03/Jan/2006 n a J Abstract. We describe a highly flexible framework to radiation (the cooling function) although Dykema et al. 9 solve 3D radiation transfer problems in scattering dom- (1996) presented a full time-dependent 2-D NLTE radia- inatedenvironmentsbasedonalongcharacteristicspiece- tive transfer code, based on a variable Eddington factor 1 wise parabolic formal solution and an operator splitting method and equivalent two level atom formulation. v 3 method. We find that the linear systems are efficiently Recently Rijkhorst et al. (2005) presented a method 8 solvedwithiterativesolverssuchasGauss-SeidelandJor- of including 3-D radiative transfer into modern 3-D hy- 1 dan techniques. We use a sphere-in-a-box test model to drodynamicalcodes (such as the ASCI FLASH code) but 1 compare the 3D results to 1D solutions in order to as- they only treated the solution of the radiative transfer 0 6 sess the accuracy of the method. We have implemented equation in the absence of scattering, that is they their 0 the method for static media, however, it can be used to method only treats the formal solution of the radiative / solve problems in the Eulerian-frame for media with low transfer equation and not the full self consistent scatter- h p velocity fields. ing problem where the right hand side of the radiative - transfer problem involves the radiation field itself. How- o ever,inastrophysicalsystems,theeffectofscatteringcan- r 1. Introduction t not be ignored and in fact it is due to scattering that the s a With the increase in computer power in the last few radiation field decouples from the local emission and ab- v: years, 3D hydrodynamical calculations are becoming in- sorption and the effects of the existence of the boundary i creasinglycommoninastrophysics.Mosthydrodynamical are communicated globally over the atmosphere (Mihalas X calculationstreatradiationinasimplifiedmanner,sincea 1978). It is just this strong non-locality of the radiation ar full solution of the 3-D non-LTE radiative transfer prob- field that makes the solution of the generalized radiative lem is numerically much more expensive than the hydro- transfer problem so computationally demanding. dynamicalcalculationitself,whichalreadystretchthelim- Steiner (1991) presented a 2-D multi-grid its ofmodernparallelcomputers.Inmanyinstances,such method based on the short-characteristics method as the thermonuclearexplosionof a white dwarf (thought (Olson & Kunasz 1987; Olson et al. 1987) and showed tobetheprogenitorofTypeIasupernova),thegoalofthe that it worked in the case of a purely absorptive at- hydrodynamicalsimulationsis tounderstandthe mode of mosphere, i.e., that the formal solution was tractable. combustion and to handle the effects of turbulence in as Vath (1994) presenteda 3-Dshortcharacteristicsmethod realistic manner as possible and radiative transfer effects and showed that it had adequate scaling on a SIMD areignored.However,ingeneralsincetheultimatevalida- parallel architecture. In a series of papers 3-D, short tion or falsification of the results of sophisticated hydro- characteristics methods for disk systems were presented dynamical modeling will be via comparison with the ob- by Adam (1990); Hummel (1994a,b); Hummel & Dachs servedradiationfromtheastrophysicalobjectbeingstud- (1992);Papkalla(1995).Ourmethodissimilarinspiritto ied,andtheradiationstronglyaffectsthephysicalstateof these works, but we present a more detailed description thematterintheatmosphereoftheobject(wheretheob- of the construction of the approximate lambda operator servedradiationoriginates)theeffectofdetailedradiative (ALO), and the method of solution of the scattering transfer effects cannot be ignored. problem. In many multi-dimensional hydrodynamics codes (for Fabiani Bendicho et al.(1997)presentedamulti-level, example ZEUS3D, Norman 2000), radiative transfer is multi-grid, multi-dimensional radiative transfer scheme, treated in a simplified matter in order to determine the using a lower triangular ALO and solving the scattering amount of energy transfered between the matter and problem via a Gauss-Seidel method. 2 Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle van Noort, Hubeny, & Lanz (2002) presented a we use this for the tests presented in this paper for con- method of solving the full NLTE radiative transfer venience. problem using the short characteristics method in 2-D The applications that we intend to solve with the for Cartesian, spherical, and cylindrical geometry. They 3DRTframeworkwillinvolveopticallythickenvironments also used the technique of accelerated lambda iteration with a significant scattering contribution, e.g., modeling (ALI)(Olson et al.1987;Olson & Kunasz1987),however the light reflected by an extrasolar giant planet close to they restricted themselves to the case of a diagonalALO. its parent star. Therefore, not only a formal solution is In addition they considered the case including velocity required but the full solution of the 3D radiative trans- fields, but their method is feasible only in the case of a fer equation with scattering. In this paper, we describe a small velocity gradient across the atmosphere. method based on the operator splitting approach.Opera- ∗ We describe below a simple framework to solve three tor splitting works best if a non-local Λ operator is used dimensional(3D)radiationtransport(RT)problemswith inthe calculations(e.g.,Hauschildt et al.1994),therefore ∗ a non-local operator splitting method. In subsequent pa- we describe a non-local Λ method here. The operator pers we will extend this framework to solve 3D RT prob- splitting method can be combined with other methods, lemsinrelativisticallymovingconfigurations.Ourmethod like multigrids, to allow for greater flexibility and better is similar to those described above, although we con- convergence,which we will discuss in a later paper. sider both short characteristics and long characteristics methods for the formal solution of the radiative trans- 2.2. Radiative transfer equation fer equation. We show that long-characteristics produce a significantly better numerical solution in our test cases Thestaticradiativetransferequationin3-Dmaybewrit- for a strongly scattering dominated atmosphere. Short- ten characteristics are known to be diffusive and it is appar- nˆ ·∇I(ν,x,nˆ)=η(ν,x)−χ(ν,x)I(ν,x,nˆ) (1) ent in our numerical results. We also implement a partial where I(ν,x,nˆ) is the specific intensity at frequency ν, parallelizationofthemethod,althoughwedeferafulldis- position x, in the direction nˆ, η(ν,x) is the emissivity at cussion of the parallelization to future work. frequency ν and position x, and χ(ν,x) is the total ex- tinction at frequency ν and position x. The source func- tion S = η/χ. Here, we will work in the steady-state 2. Method so that ∂I/∂t = 0, and in Cartesian coordinates so the ∇ = ∂ + ∂ + ∂ and the direction nˆ is defined by two ∂x ∂y ∂z In the following discussion we use notation of Hauschildt angles (θ,φ) at the position x. (1992). First, we will describe the process for the formal solution, then we will describe how we construct the non- ∗ 2.3. The operator splitting method local approximate Λ operator, Λ , and methods to solve the necessary linear equations in the operator splitting ThemeanintensityJ isobtainedfromthesourcefunction step. S by a formal solution of the RTE which is symbolically written using the Λ-operator Λ as J =ΛS. (2) 2.1. Framework ThesourcefunctionisgivenbyS =(1−ǫ)J+ǫB,whereǫ The 3D RT equations are easiest solved in a Cartesian denotesthethermalcouplingparameterandB isPlanck’s coordinate system (e.g., Fabiani Bendicho et al. 1997), function. therefore we use a Cartesian grid of volume cells (vox- The Λ-iterationmethod, i.e. to solve Eq. 2 by a fixed- els). Basically, the voxels are allowed to have different point iteration scheme of the form sizes but for the tests presented later in this paper we Jnew =ΛSold, Snew =(1−ǫ)Jnew+ǫB, (3) use fixed size voxels (as the simplest option). The values fails in the case of large optical depths and small ǫ. Here, ofphysicalquantities,suchastemperatures,opacitiesand Sold is the current estimate for the source function S and mean intensities, are averages over a voxel, which, there- Snewisnew,improved,extimateofSforthenextiteration. fore, also fixes the local physical resolution of the grid. The failuretoconvergeofthe Λ-iterationiscausedby the In the following we will specify the size of the voxel grid factthatthelargesteigenvalueoftheamplificationmatrix by the number of voxels along each positive axis, e.g., is approximately(Mihalas et al.1975) λmax ≈(1−ǫ)(1− n = n = n = 32 specifies a voxel grid from voxel T−1), where T is the optical thickness of the medium. x y z coordinates (−32,−32,−32) to (32,32,32) for a total of For small ǫ and large T, this is very close to unity and, (2∗32+1)3 =274625voxels,65alongeachaxis.Thevoxel therefore, the convergence rate of the Λ-iteration is very (0,0,0)isatthecenterofthevoxelgrid.Thevoxelcenters poor. A physicaldescription of this effect can be found in are the grid points. The voxel coordinates are related by Mihalas (1980). grid scaling factors to physical space, depending on the TheideaoftheALIoroperatorsplitting(OS)method problem. The framework does not require n = n = n , is to reduce the eigenvaluesof the amplification matrix in x y z Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle 3 ∗ theiterationscheme(Cannon1973)byintroducinganap- of Jfs, (b) the time needed to construct Λ , (c) the time ∗ proximate Λ-operator (ALO) Λ and to split Λ according required for the solution of Eq. 6, and (d) the number of to iterationsrequiredforconvergencetotheprescribedaccu- ∗ ∗ Λ=Λ +(Λ−Λ ) (4) racy.Points(a),(b)and(c)dependmostlyonthenumber and rewrite Eq. 3 as of spatial points, and can be assumed to be fixed for any Jnew =Λ∗Snew+(Λ−Λ∗)Sold. (5) givenconfiguration.However,thenumberofiterationsre- This relation can be written as Hamann (1987) quiredtoconvergencedependsstronglyonthebandwidth [1−Λ∗(1−ǫ)]Jnew =Jfs−Λ∗(1−ǫ)Jold, (6) of Λ∗. This indicates, that there is anoptimum bandwidth ∗ where Jfs = ΛSold and Jold is the current estimate of of the Λ -operator which will result in the shortest pos- the mean intensity J. Equation6 is solvedto get the new sible CPU time needed for the solution of the RTE, see values of J which is then used to compute the new source Hauschildt et al. (1994). function for the next iteration cycle. Mathematically, the OS method belongs to the same 2.4. Formal solution family of iterative methods as the Jacobi or the Gauss- Seidelmethods(Golub & Van Loan1989).Thesemethods The formal solution through the voxel grid can be per- have the general form formed by a variety of methods. So far, we have imple- Mxk+1 =Nxk+b (7) mentedbothashort-characteristic(SC,Olson et al.1987) for the iterative solution of a linear system Ax=b where and a long-characteristic (LC) method. Long and short the system matrix A is split according to A=M −N. In characteristics are shown schematically in Fig. 1. In our ∗ the case of the OS method we have M = 1−Λ (1−ǫ) current implementation, the long characteristics are fol- ∗ and, accordingly,N =(Λ−Λ )(1−ǫ) for the system ma- lowedcontinuouslythroughthevoxelgrid,theshortchar- trix A = 1−Λ(1−ǫ). The convergence of the iterations acteristicsstartatthecenterofavoxelandstepclosestto depends onthe spectralradius,ρ(G), ofthe iterationma- the center of the next voxel.The distances along a (short trix G=M−1N. For convergence the condition ρ(G)<1 or long) characteristic are then used to compute the op- ∗ mustbefulfilled,thisputsarestrictiononthechoiceofΛ . tical depth steps. Along a characteristic (either short or Ingeneral,the iterationswillconvergefasterfor asmaller long), the formal solution is computed using a piece-wise spectralradius.Toachieveasignificantimprovementcom- parabolic(PPM)orpiece-wiselinear(PLM)interpolation ∗ pared to the Λ-iteration, the operator Λ is constructed and integration of the source function (Olson & Kunasz sothattheeigenvaluesoftheiterationmatrixGaremuch 1987). Auer (2003) discusses the effect that high order smaller than unity, resulting in swift convergence. Using interpolationmaycauseproblems,therefore,weautomat- parts of the exact Λ matrix (e.g., its diagonal or a tri- ically use piece-wise linear interpolation if the change in diagonal form) will optimally reduce the eigenvalues of the source function along the 3 points of the PPM step ∗ the G. The calculation and the structure of Λ should be wouldbelargerthanaprescribedthreshold(typicallyfac- simpleinordertomaketheconstructionofthelinearsys- torsof100)oriftheopticaldepthalongthecharacteristic ∗ tem in Eq. 6 fast. For example, the choice Λ =Λ is best is verysmall(typically lessthan10−3).Depending onthe inviewoftheconvergencerate(itisequivalenttoadirect direction (θ,φ) of the characteristic, the formal solution solutionbymatrixinversion)buttheexplicitconstruction proceeds through the voxel grid. of Λ is more time consuming than the construction of a simpler Λ∗. The solution of the system Eq. 6 in terms of Therefore,along a characteristic[whichisinthestatic linear algebra,using modern linear algebra packagessuch case just a straight line with given (θ,φ)] the transport as, e.g., LAPACK (Anderson et al. 1992), is so fast that its equation is simply CPU time can be neglected for the small number of vari- dI =I−S (8) ables encountered in 1D problems (typically the number dτ ofdiscreteshellsisabout50).However,for2Dor3Dprob- Withthisdefinition,theformalsolutionoftheRTEalong lems the size of Λ gets very large due to the much larger thecharacteristicscanbewritteninthefollowingway(cf. number of grid points as compared to the 1D case. Ma- Olson & Kunasz 1987, for a derivation of the formulae) trixinversions,whicharenecessarytosolveEq.6directly, where we have suppressed the index labeling the charac- therefore become extremely time consuming. This makes teristic; τ denotes the optical depth along the character- i the direct solution of Eq. 6 more CPU intensive even for istic with τ1 ≡0 and τi−1 ≤τi while τ is calculated using ∗ Λ ’s ofmoderate bandwidth, exceptfor the trivialcase of piecewiselinearinterpolationofχalongthecharacteristic, ∗ a diagonal Λ . Different methods like modified conjugate viz. gradient methods (Turek 1993) may be effective for these I(τi) = I(τi−1)exp(τi−1−τi) 2D or 3D problems. τi The CPU time required for the solution of the RTE + S(τ)exp(τ −τ )dτ (9) Z i using the OS method depends on several factors: (a) the τi−1 time required for a formal solution and the computation I(τi) ≡ Ii−1exp(−∆τi−1)+∆Ii (10) 4 Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle i labels the points along a characteristic and ∆τ is cal- has the advantage that the solid angle points can vary i culatedusing piecewiselinearinterpolationofχalongthe fromvoxeltovoxel(importantforconfigurationswithve- characteristic locity fields where the transfer equation is solved in the ∆τi−1 =(χi−1+χi)|si−1−si|/2 (11) locally co-moving frame). In the static case, the accu- Thestartingpointsofeachcharacteristicareatthecenter racy of the MC method is only insignificantly worse than of the voxels on their starting faces of the voxel grid. thatofthedeterministic quadrature,whichindicatesthat The source function S(τ) along a characteristic is in- the MC integration will be very useful in the case of 3D terpolated by linear or parabolic polynomials so that radiation transport in moving media. ∆Ii =αiSi−1+βiSi+γiSi+1 (12) with ∗ 2.4.1. Computation of Λ αi = e0i+[e2i−(∆τi+2∆τi−1)e1i] /[∆τi−1(∆τi+∆τi−1)] (13) As demonstrated by Olson et al. (1987) and Olson & Kunasz (1987), the coefficients α, β, and γ βi = [(∆τi+∆τi−1)e1i−e2i]/[∆τi−1∆τi] (14) can be used to construct diagonal and tri-diagonal γi = [e2i−∆τi−1e1i]/[∆τi(∆τi+∆τi−1)] (15) Λ∗ operators for 1D radiation transport problems. In for parabolic interpolation and fact, up to the full Λ matrix can be constructed by a αi = e0i−e1i/∆τi−1 (16) straightforward extension of the idea (Hauschildt et al. βi = e1i/∆τi−1 (17) 1994; Hauschildt & Baron 2004). These non-local Λ∗ γ = 0 (18) operators not only lead to excellent convergence rates i for linear interpolation. The auxiliary functions are given but they also avoid the problem of false convergence by that is inherent in the Λ iteration method and can also ∗ e0i = 1−exp(−∆τi−1) (19) be an issue for diagonal (purely local) Λ operators. Therefore, it is highly desirable to implement a non-local e1i = ∆τi−1−e0i (20) ∗ Λ also for the 3D case. The tri-diagonal operator in the e2i = (∆τi−1)2−2e1i (21) 1D case is simply a nearest neighbor Λ∗ that considers and∆τi ≡τi+1−τi is the opticaldepth alongthe charac- the interaction of a point with its two direct neighbors. teristic from point i to point i+1. The linear coefficients ∗ In the 3D case, the nearest neighbor Λ considers the have to be used (at least) at the last integration point interaction of a voxel with the (up to) 33 − 1 = 26 along each characteristic, and for some cases it might be surrounding voxels (this definition considers a somewhat bettertoalsouselinearinterpolationforsomeinnerpoints largerrange of voxels that a strictly face-centeredview of so as to ensure stability. just 6 nearest neighbors). This means that the non-local The integration over solid angle can be done in the ∗ Λ requires the storage of 27 (26 surrounding voxels plus static case using a simple Simpson or trapezoidalquadra- local, i.e., diagonal effects) times the total number of ture formula. However, in the case of Lagrangian frame ∗ voxels Λ elements. This can be reduced, for example if radiationtransport,theangles(θ,φ)varyalonga(curved) one considers only the voxels that share a face to a total characteristic. Therefore, the (θ,φ) grid changes for each of 7 elements for each voxel. However, this would ignore voxelanddeveloping a quadratureformula in advance re- the effect of characteristicsthat pass’diagonally’through quires a pass through all voxels, storing all (θ,φ) points a voxel and would therefore lead to a slower convergence foreachofthem.For largergridsthis will amountto sub- rate. stantial long term memory requirements as the resulting ∗ The construction of the Λ operator proceeds in the quadrature weights will have to be stored for each (θ,φ) same way as discussed in Hauschildt (1992). In the 3D pair at all voxels. To avoid this, we have implemented a case, the ’previous’ and ’next’ voxels along each char- simple Monte-Carlo(MC) scheme to performthe integra- acteristic must be known so that the contributions can tion over solid angle. In the MC integration, the integral be attributed to the correct voxel. Therefore, we use a 1 2π π datastructurethatattachestoeachvoxelitseffectsonits J = Isinθdθdφ 4π Z Z neighbors.Theschemecanbeextendedtriviallytoinclude 0 0 longer range interactions for better convergence rates (in is replaced by a simple MC sum of the form particular on larger voxel grids), however the memory re- 1 quirements to simply store Λ∗ ultimately scales like n3 J ≈ Isinθ 2π2 X where n is the total number of voxels. The storage re- ∗ wherethe sumgoesoverallsolidanglepoints (θ,φ). The quirements canbe reducedby, e.g.,using Λ ’s of different (θ,φ)arerandomlyselectedandgivenequalweightinthe widths for different voxels. Storage requirements are not MCsums.Thisalsoworksforprecribed(θ,φ)gridsaslong somucha problemif adomaindecompositionparalleliza- asthe numberof(θ,φ)pointsissufficiently large.The ac- tion method is used and enough processors are available. curacy is improved by maintaining the normalization nu- Below we will show some results for test cases with larger mericallyforaunityvaluedtestfunction.TheMCmethod operators. Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle 5 We describe here the general procedure of calcu- As an alternative to direct solutions of Eq. 6 we have lating the Λ∗ with arbitrary bandwidth, up to the implementediterativesolvers.Thesesolvershavethehuge full Λ-operator, for the method in spherical symme- advantagethattheirmemoryrequirementsareminimalto try (Hauschildt et al. 1994). The construction of the modest.Inaddition,theycanbetailoredtothespecialfor- ∗ Λ is described in Olson & Kunasz (1987), so that we mat ofthe matrix inEq.6 which makescoding the meth- here summarize the relevant formulae. In the method of odsverysimple.Weobtainedextremelygoodresultswith ∗ Olson & Kunasz (1987), the elements of the row of Λ theJordanandtheGauss-Seidelmethods,whichconverge are computed by setting the incident intensities (bound- rapidlytoarelativeaccuracyof10−10.Foreithermethod, aryconditions)tozeroandsettingS(i ,i ,i )=1forone Ngaccelerationturnedouttobeveryuseful,reducingthe x y z voxel (i ,i ,i ) and performing a formal solution analyt- numberofiterationssignificantly.Inaddition,theRapido x y z ically. method(Zurmu¨hl & Falk 1986)whichwasconstructedto solve systems of the form ∗ We describe the construction of Λ using the (1−M)x=a (27) example of a single characteristic. The contri- ∗ can be used. However, Rapido has strong constraints on butions to the Λ at a voxel j are given by thespectralradiusofM andthiscannotbeusedforprob- Λ =0 for i<j−1 (22) i,j lems that involvesubstantialscattering.Inthe test calcu- Λj−1,j =γj−1 for i=j−1 (23) lationsdiscussedbelowwehaveusedtheJordanorGauss- Λj,j =Λj−1,jexp(−∆τj−1)+βjk for i=j (24) Seidelsolversastheyarethefastestofthesolverswehave implemented.Weverifiedthatallsolversgivethesamere- Λj+1,j =Λj,jexp(−∆τj)+αj+1 for i=j+1 (25) sults. Λi,j =Λi−1,jexp(−∆τi−1) for j+1<i (26) These contributions are computed along a characteristic, 3. Application examples here i labels the voxels along the characteristic under consideration. These contributions are integrated over AsafirststepwehaveimplementedthemethodasaMPI solid angle with the same method (either deterministic parallelizedFortran95program.Theparallelizationofthe or through the Monte-Carlo integration) that is used for formal solution is presently implemented over solid angle ∗ the computation of the J. For a nearest neighbor Λ , the spaceasthisisthesimplestparallelizationoptionandalso processofEq.26 is stoppedwith i=j+1,otherwiseit is one of the most efficient (a domain decomposition paral- continued until the required bandwidth has been reached lelization method will be discussed in a subsequent pa- (or the characteristichas reachedanoutermostvoxeland per). In addition, the Jordansolverof the Operatorsplit- terminates). tingequationsisparallelizedwithMPI(seebelowforscal- ing properties of the MPI implementation). The number of parallelization related statements in the code is small, 2.5. Operator splitting step about 320 out of a total of about 7900. Our basic continuum scattering test problem is sim- ∗ For a diagonal Λ the solution of Eq. 6 is just a division ilar to that discussed in Hauschildt (1992) and in ∗ by[1−Λ (1−ǫ)]whichrequiresverylittle effort.Forthe Hauschildt & Baron (2004). This test problem covers a ∗ non-localΛ operator,a matrixequationhasto be solved large dynamic range of about 9 dex in the opacities and at each step in order to compute Jnew. For the nearest overallopticaldepthstepsalongthecharacteristicsand,in neighbor operatorthe matrix structure is that of a sparse ourexperience,constitutesareasonablychallengingsetup band matrix where the bandwidth of the matrix is pro- for the radiative transfer code.The application of the 3D portionalto the squareofthe maximumnumberofpoints code to ’real’ problems is in preparation and requires a along a coordinate and there are a total of 27 non-zero substantialamountofdevelopmentwork(inprogress).For entries in every column of the matrix. This system can the 1D code we have found that the test case is actually be solved by any suitable method. For example, we have pretty much a worst case scenario and that it generally implemented a solver based on LAPACK (Anderson et al. works better in real world problems. We use a sphere 1992) routines. Although this solver works fine for small with a grey continuum opacity parameterized by a power grids [(2∗16+1)3 voxels with a bandwidth of 1123], the law in the continuum optical depth τstd. The basic model memory requirements rise beyond available single-CPU parameters are limits with already only (2∗32+1)3 voxels with a band- width of 4291. For a more realistic grid of (2∗256+1)3 1. Inner radius rc = 1013cm, outer radius rout = 1.01× voxels the bandwidth is more than 250,000. Therefore, a 1015cm. standard band matrix solver is only useful for compari- 2. Minimumopticaldepthinthecontinuumτmin =10−4 std son and testing on very small grids. Other sparse matrix andmaximum opticaldepth in the continuum τmax = std solvers(forexample,Xiaoye & Demmel2003)havesimilar 104. memory scaling properties. 3. Grey temperature structure with Teff =104 K. 6 Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle − 4. Outer boundary condition I ≡0 and diffusion inner Note the factor of about 1000 difference between the so- bc boundary condition for all wavelengths. lution for ǫ = 10−4 and the results of the formal solution 5. Continuumextinctionχ =C/r2,withtheconstantC with S = B shown in Fig. 2 at the largest distances (the c fixed by the radius and optical depth grids. iterations were started with S = B). Figure 3 shows the 6. Parameterized coherent & isotropic continuum scat- results for the ‘large’ and ‘medium’ spatial grids and for tering by defining 2 different solid angle discretizations (642 and 162 solid anglepoints). Forboththe 1Dandthe 3Dcode the mean χc =ǫcκc+(1−ǫc)σc (28) intensities were iterated to a relative accuracy of 10−8 with 0≤ǫ ≤1. κ and σ are the continuum absorp- (see below for details on convergence rates). The graph c c c tion and scattering coefficients. highlights the need for a rather fine solid angle grid for the test problem. With a small number of angular points The test model is just an optically thick sphere put into (bottompanelinFig.3)thenumericallyinducedspreadof the 3D grid. This problemis used because the results can the mean intensities is significantly larger than with a 42 be directly compared with the results obtained with our finer angular grid (shown in the middle panel). A change 1Dsphericalradiationtransportcode(Hauschildt1992)to in spatial gridresolutionhas a much smaller effect on the assesstheaccuracyofthemethod.Thesphereis centered results than the number of solid angle points as shown in at the center of the Cartesian grid, which is in each axis the top two panels of Fig. 3. The spatial resolutionof the 10% larger than the radius of the sphere. For the test ‘small’ grid is, however, too coarse to represent the test calculations we use voxel grids with the same number of problem, in particular in the inner parts of the structure. spatialpointsineachdirection(seebelow).Thesolidangle The importance of the angular resolution is also space was discretized in (θ,φ) with n = n if not stated θ φ demonstrated in Fig. 4 which shows the contours of the otherwise.Inthefollowingwediscusstheresultsofvarious meanintensityonthesix‘faces’ofthevoxelcubefor1293 tests. In all tests we use the LC method for the 3D RT spatial points and 162 (left panel) and 642 (right panel) solution unless stated otherwise. solid angles. The calculation with 642 solid angle points shows symmetric contours whereas the smaller 162 angle 3.1. LTE tests points modelshowsasymmetriesand‘banding’like struc- tures on all faces (in particular on the x−y faces). This In this test we have set ǫ = 1 to test the accuracy of clearly indicates that for the test conditions used the an- the formalsolution by comparingto the results of the 1D gular resolution has to be larger than 162 for accurate code.The 1Dsolveruses50radialpoints,distributedlog- solutions. This can also be seen in surface plots of the arithmically in optical depth. For the 3D solver we tested mean intensities at the −nx faces of the z−y planes for ‘small’ grids with n =n =n =2∗32+1 points along x y z both calculations, cf. Fig. 5. The surface calculated with each axis, for a total of 653 ≈ 2.7×105 voxels, as well 642 angles is much smoother and shows no or little band- as ‘medium’ (nx = ny = nz = 2∗ 64+ 1 with a total ing compared to the surface for 162 angular points. The of 1293 ≈ 2.1×106 voxels) and ’large (n = n = n = x y z effects of higher spatial resolutionon the J surface at the 2∗96+1 with a total of 1933 ≈ 7.2×106 voxels). The z−y face is shown in Fig. 6. Clearly, 642 angles produce large grid was limited by available memory for the stor- a smooth surface without significant artifacts for the test ∗ age of the non-local Λ operator. The solid angle space case. discretization uses, in general, n = n = 64 points. In θ φ Theconvergencepropertiesofthe variousmethods for Fig. 2 we show the mean intensities as function of dis- this test case are shown in Fig. 7. As expected, the Λ tance from the center for both the 1D (+ symbols) and iteration converges very slowly and requires more than the 3D solver. The results show excellent agreement be- 300 iterations to reach a relative error of less than 10−8. tweenthetwosolutions,thusthe3DRTformalsolutionis ∗ The diagonal Λ operator converges significantly better comparable in accuracy with the 1D formal solution. We than the Λ iteration but is still rather slowly convergent. demonstrate below that for the conditions used in these ∗ ThenearestneighborΛ convergessubstantiallyfaster(by tests a larger number of solid angle points significantly ∗ morethanafactorof3thanthe diagonalΛ ).The diago- improves the accuracy of the mean intensities. nal and nearest neighbor iterations can be accelerated by using Ng’s method (Ng 1974), as shownin Fig.7.In both 3.2. Tests with continuum scattering cases,a4thorderNgaccelerationwasusedafter20initial 3.2.1. ǫ=10−4 iterations, starting Ng acceleration too early can lead to convergence failures since Ng acceleration is based on ex- For this test, we use the same basic structure setup as for trapolation. An attempt to apply the Ng acceleration to the LTE test, however we now use ǫ = 10−4 for a scat- the Λ iteration was not successful, similar to the 1D case, tering dominated atmosphere. The comparison with the as the convergence rate of the Λ iteration appears to be results obtained with the 1D RT code (with 100 radial too small to be useful for the Ng method. For compari- points)are comparedto the 3DRT code resultsin Fig.3. son, we show in Fig. 7 the convergence properties of the Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle 7 ∗ 1D RT solver with a tri-diagonal Λ and Ng acceleration themeanintensitiesishuge,nearly12dex,the resultsare (here started much earlier). The convergence rates of the quite accurate. For this case the lack of spatial resolution 1Dtri-diagonaland3Dnearestneighbormethodsarevery intheinnerpartsofthevoxelgridcanbe seen.Here|∇J| comparable. The convergence properties are also relevant is huge and cannot be fully resolved by the 3D RT code for the overall speed of the method: whereas the time for (the 1Dcodenaturallyhasmuchhigherresolution).How- theformalsolution(foragivennumberofvoxelsandpro- ever, only a few voxels away from the center the results cessors) depends roughly linearly on the total number of agree very well. solid angle points, the time for the solution of the opera- The convergence plots in Fig 12 show the results for torsplitting stepdoesnotdepend onthe number ofangle a very difficult test case with τmax = 108 for a fixed std points andactuallydecreases withiterationnumberifthe voxelandsolidanglegridwith653 voxelsand162 angular Jordan, Gauss-Seidel or Rapido solvers are used. There- points. For this test, the Λ iteration fails completely. The fore, the nearest neighbor operator will become more and diagonal operator provides significant speed-up, but still more efficient as the number of angles and/or voxels in- requires more than 1600 iterations to reach the required creases. convergence limit. For this test, the Ng acceleration does The smaller initial corrections of the Λ iteration are not work with the diagonal operator, the iteration pro- actually an indication that it just corrects too little, the cess failed immediately after it was started. It is likely operator splitting method gives initially much larger cor- thatabetterresultcouldbeobtainedifNgaccelerationis rections. This is exactly like similar test cases in 1D that startedinthesteeppartofthediagonaloperator’sconver- we have tried. The initial correctionsare so large because gence,e.g.,afterabout500regulariterations.Thenearest the initial guesses are very wrong (intentionally) so that neighbor operator leads to much faster convergence,even between initial guess and final result we have changes by without Ng acceleration the solution converges in about close to 10 orders of magnitude. This means that the ini- 450iterations.Here,Ng accelerationworksverywellwith tial corrections of the operator splitting method have to thenearestneighboroperator,convergenceisreachedafter be large. 177 iterations. This is still about a factor of 2 more than The convergence rates will depend on the grid resolu- for the 1D code, but much better than with the diagonal tion as the optical depths between voxels will be smaller operator. The variation of the convergence rate for the forlargerresolutionsandthusthecouplingbetweenvoxels nearest neighbor operator and Ng acceleration with the will be stronger. This effect is shown in Fig. 8 where we size of the solid angle grid is shown in Fig. 13. The case show the convergence rates for various grids. The conver- with the smallest angular grid (162 points) actually con- gence rates are independent of the number of solid angle vergesmoreslowly thanthe322 and642 grids.Thehigher points but depend weakly on the number of grid points, resolution grids convergence rate compares well with the as expected. Thus, for voxel grids with higher resolution, 1D code. This highlights the importance of the non-local a larger Λ∗ (more neighbors) will be more useful than for Λ∗ operator and a large enough solid angle grid for rapid coarser voxel grids. convergence and accuracy. Figure 9 shows the convergencerates for the ǫ=10−4 ∗ ∗ testcasefordifferentΛ bandwidths.ThewiderΛ ’slead 3.2.3. Visualization toimprovedconvergencerates;however,thenetwallclock ∗ timeincreasesforthewiderΛ ’sintheparallelcode.This Wehaveimplementedasimplevisualizationofthe results is caused by the substantially larger amount of informa- in order to view images of the emitted intensities. The tion that needs to be passed between processes in order visualization uses IDL to read the results of the 3D RT ∗ to build the wider Λ . In addition, the memory require- code, performs a formal solution for a specific (θ,φ) and ments for a given number of voxels scales like the cube of displays the result as an image of the intensities leaving ∗ the number of neighbors considered in the Λ . As Fig. 9 the voxel grid. These figures are extremely helpful for demonstrates,theconvergenceisstronglyimprovedbythe discovering even small problems with the generation and ∗ use of Ng acceleration, however, for larger Λ ’s the Ng handling of the characteristics or issues due to low reso- ∗ methodcanbedetrimentalforwiderΛ comparedtonar- lution in solid angle or space. Such problems will imme- ∗ rower Λ . diately show up as asymmetries in the generated images. We show the results for the LTE test (with 653 voxels)in 3.2.2. ǫ=10−8 Fig. 14. The limb darkening of the test sphere is clearly visible in the figures. The slight pixellation and asymme- The final test we present in this paper considers a case tries are due to spatial and angular resolution. Note that withamuchlargerscatteringcontribution:ǫ=10−8.The the scales of the different panels are different due to the results for a grid with 1293 voxels and 642 angular points changing orientationof the data cube. The generated im- are shown in Fig. 10. Fig. 11 shows the results for slices ages for the ǫ=10−4 test case with 1293 and 1933 voxels alongthe coordinateaxes(the two coordinatesbeing cen- are shown in Figs. 15 and 16, respectively. For both fig- tered, respectively). Even though the dynamic range of ures,thesourcefunctionswerecalculatedwith642 angles. 8 Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle The images show much less pixellation and far less arti- Other authors (Steiner 1991; Fabiani Bendicho et al. factsthantheimagesshowninFig.14.Theresultsforthe 1997; Trujillo Bueno & Fabiani Bendicho 1995; Vath ǫ=10−8 test case are shown in Fig. 17, they look similar 1994; Auer et al. 1994; van Noort et al. 2002) have used to the ǫ=10−4 case with the same grid sizes. SC methods in multi-dimensional radiative transport problems. Short characteristics techniques are faster but require special considerations to reduce numerical diffu- 3.2.4. MPI scaling properties sion (Auer 2003). We may further look into the SC Figure 18showsthe scalingpropertiesof the MPI version method in later papers. of the 3D radiation transport code for the ǫ = 10−8 test We have generalized the operator splitting to include case.Therunswereperformedon2parallelcomputeclus- larger bandwidth operators. They lead to faster conver- ters, one equipped with 1.8GHz dual Opteron CPUs and gence although they do require more memory and ulti- an Infiniband interconnect from Delta computer and one mately more computing time. Nevertheless, they will be equipped with 2.0GHz dual G5 CPUs with Gbit ethernet usefulforhighlycomplexproblemsandwehavedeveloped ∗ networkfromApple computer(Xserves).Thespeedupwe a highly flexible approach to the construction of the Λ obtainintheMPIversionisclosetooptimal,aboutafac- operator so that the bandwidth may be set for each spa- tor of 28 with 32 MPI processes on 32 CPUs (or 16 com- tial point individually as the problem and computational putenodes).Thefactthatthespeedupisverygoodshows resources require. thattheloadbalancingisoptimalandthatthetimespent We have designed an especially general and flexible intheMPIcommunicationroutinesisnegligiblecompared framework for 3D radiative transfer problems with scat- to the compute times. tering. In future papers of this series we will describe its With 1293 voxels,the code uses about0.6GB ofmem- extensionto linetransferproblems,multi-levelNLTEcal- ory. With 10 CPUs and 642 angles, the wallclock time culations, and differentially moving flows. for a formal solution is about 310sec (400sec) on 2.0GHz Acknowledgements. This work was supported in part by by Xserve G5s (on 1.8GHz Opterons), 9sec (9sec) for the NASAgrantsNAG5-12127andNAG5-3505,NSFgrantsAST- required MPI communication, and between 3–26sec (12– 0204771 and AST-0307323. PHH was supported in part by 120sec)tosolvethelinearsystem.Sincethelinearsystem thePˆoleScientifiquedeMod´elisationNum´eriqueatENS-Lyon. issolvediteratively,the timeforthesolutionisreducesas Someofthecalculationspresentedherewereperformedatthe the overallconvergence limit is approached. National Energy Research Supercomputer Center (NERSC), whichissupportedbytheOfficeofScienceoftheU.S.Depart- ment of Energy under Contract No. DE-AC03-76SF00098; at 4. Conclusions theH¨ochstleistungsRechenzentrumNord(HLRN);andatthe Hamburger Sternwarte Apple G5 and Delta Opteron clusters We have described a framework for solving three- financially supported by the DFG and the State of Hamburg. dimensionalradiativetransferproblemsinscatteringdom- We thank all these institutions for a generous allocation of inated environments. The method uses a non-local oper- computer time. ator splitting technique to solve the scattering problem. Theformalsolutionisbasedonalongcharacteristicpiece- References wise parabolic procedure. For strongly scattering domi- nated test cases (sphere in a box) we find good conver- Adam, J. 1990, A&A, 240, 541 ∗ gence withnon-localΛ operators,as wellasminimal nu- Anderson, E., Bai, Z., Bischof, C., et al. 1992, LAPACK mericaldiffusionwiththelongcharacteristicsmethodand Users’ Guide (SIAM) adequate resolution. A simple MPI parallelization gives Auer, L. 2003, in Stellar Atmosphere Modeling, ed. excellentspeedupsonparallelclusters.Insubsequentwork I. Hubeny, D. Mihalas, & K. Werner, Vol. 288 (San we will implement a domain decompositionmethod to al- Franscisco: ASP Conf. Series), 3 low much larger spatial grids. Presently, we have imple- Auer,L.,FabianiBendicho,P.,&TrujilloBueno,J.1994, mented the method for static media, it can be used with- A&A, 292, 599 out significant changes to solve problems in the Eulerian- Cannon, C. J. 1973, JQSRT, 13, 627 frame for media with low velocity fields. The distribution Dykema, P. G., Klein, R. I., & Castor, J. I. 1996, ApJ, of matter over the voxels is, in the general 3D case, arbi- 457, 892 trary.Wechoseasphericaltestcasetobeabletocompare FabianiBendicho,P.,TrujilloBueno,J.,&Auer,L.1997, the results our 1D code. A&A, 324, 161 In Figs. 19 and 20 we compare the results for Golub, G. H. & Van Loan, C. F. 1989, Matrix computa- the ǫ = 10−4 test case using a simple implemen- tions (Baltimore: Johns Hopkins University Press) tation of the short characteristics method and the Hamann, W.-R. 1987, in Numerical Radiative Transfer, long characteristics method used in this paper. The ed. W. Kalkofen (Cambridge University Press), 35 test grid contains 1293 voxel and 642 angular points. Hauschildt, P. H. 1992, JQSRT, 47, 433 The high diffusivity of the SC method is evident. Hauschildt, P. H. & Baron, E. 2004, A&A, 417, 317 Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle 9 Fig.1. Schematic sketchof the different types of charac- Fig.5. Mean intensity surfaces at the z−y face for the teristics used in the framework. The left panel shows the test case with ǫ=10−4, 1293 spatial points, and 162 (left long characteristics,the right panel the short characteris- panel)and642 solidanglepoints.The axesarelabeledby tics. The voxel boundaries and centers (’+’ symbols) are voxelindexwith(x,y,z)=(0,0,0)beingthecenterofthe indicated. The ’+’ denote the points between which the voxel grid. Each plot shows one outside face of the voxel geometricdistanceisusedtocomputeopticaldepthsteps. cube (the physical scales are the same in all directions). Fig.2. Comparison of the results obtained for the LTE Fig.6. Mean intensity surface at the z −y face for the testwiththe1Dsolver(+symbols)andthe3Dsolver.The test case with ǫ = 10−4 with 1933 spatial points and 642 x axis shows the distances from the center of the sphere, solid angle points. The axes are labeled by voxel index the y axis the log of the mean intensity. with (x,y,z)=(0,0,0)being the center of the voxelgrid. Fig.3. Comparison of the results obtained for the scat- Fig.7. Convergence properties of the codes for the ǫ = tering dominated (ǫ = 10−4) test with the 1D solver (+ 10−4 test case. The labels indicate the different methods symbols) and the 3D solver. The x axis shows the dis- used. The 3D test runs use 653 spatial and 162 angular tances from the center of the sphere, the y axis the log of points. the mean intensity. The top panel shows the results for a (spatial;solidangle)gridwith(1933;642)points,themid- Fig.8. Convergence properties of the codes for the ǫ = dle panel for (1293;642) points and the bottom panel for 10−4 testcaseanddifferent gridsizes.The labels indicate (1293;162) points. thedifferentgridsizesused,allbuttheΛiterationusethe nearest neighbor operator with Ng acceleration. Hauschildt,P.H., St¨orzer,H.,& Baron,E. 1994,JQSRT, 51, 875 Fig.9. Convergence properties of the ǫ=10−4 test case Hummel, W. 1994a, Astrophys. Space. Sci., 216, 87 ∗ forvariousΛ operatorbandwidthchoiceswithandwith- Hummel, W. 1994b, A&A, 289, 458 out Ng acceleration. Hummel, W. & Dachs, J. 1992, A&A, 262, L17 Mihalas, D. 1978,Stellar Atmospheres (New York:W. H. Freeman) Fig.10. Comparisonofthe results obtainedfor the scat- Mihalas, D. 1980, ApJ, 237, 574 tering dominated (ǫ = 10−8) test with the 1D solver (+ Mihalas, D., Kunasz, P., & Hummer, D. 1975, ApJ, 202, symbols) and the 3D solver. The x axis shows the dis- 465 tances from the center of the sphere, the y axis the log Ng, K. C. 1974, J. Chem. Phys., 61, 2680 of the mean intensity. The graph shows the results for a Norman,M. L. 2000,in Revista Mexicana de Astronomia (spatial;solidangle)gridwith(1293;642)points.Notethe y Astrofisica Conference Series, Vol. 9, 66–71 large dynamic range (12 dex) of the mean intensities. Olson,G.L., Auer, L.H., & Buchler,J. R.1987,JQSRT, 38, 431 Fig.11. Comparisonofthe results obtainedfor the scat- Olson, G. L. & Kunasz, P. B. 1987, JQSRT, 38, 325 tering dominated (ǫ = 10−8) test with the 1D solver (+ Papkalla, R. 1995,A&A, 295, 551 symbols)andthe 3Dsolverforslicesalongthe x,y,andz Rijkhorst, E.-J., Plewa, T., Dubey, A., & Mellema, G. axes.Theplotshowstheresultsfora(spatial;solidangle) 2005, A&A, in press, astro-ph/0505213 gridwith(1293;642)points.Notethelargedynamicrange Steiner, O. 1991, A&A, 242, 290 (12 dex) of the mean intensities. TrujilloBueno,J.&FabianiBendicho,P.1995,ApJ,455, 646 Fig.12. Convergence properties of the codes for the Turek, S. 1993, preprint ǫ=10−8 testcase.Thelabelsindicate thedifferentmeth- van Noort, M., Hubeny, I., & Lanz, T. 2002, ApJ, 568, ods used. These test runs use 653 spatial and 162 angular 1066 points. Vath, H. M. 1994, A&A, 284, 319 Xiaoye, S. L. & Demmel, J. W. 2003, ACM Transaction on Mathematical Software, 29, 110 Fig.13. Convergence properties of the codes for the ǫ= Zurmu¨hl, R. & Falk, S. 1986, Matrizen und ihre Anwen- 10−8 test case and different angular grid sizes. The labels dungen, 5th edn., Vol. 2 (Berlin: Springer-Verlag) indicatethedifferentgridsizesused,allbuttheΛiteration use the nearest neighbor operator with Ng acceleration. 10 Please give a shorter version with: \authorrunning and/or \titilerunning prior to \maketitle Fig.4. Mean intensity contour plots for the test case with ǫ=10−4 with 1293 spatialpoints and 162 (left panel) and 642 solid angle points. The axes are labeled by voxel index with (x,y,z)=(0,0,0) being the center of the voxel grid. Each plot shows one outside face of the voxel cube (the physical scales are the same in all directions). Fig.14. VisualizationoftheresultsfortheLTEcase.The voxelgridhas 653 elements.The intensity imageis shown for(θ,φ)=(0,0)(upperleftpanel),(45,45)(bottomright panel), (140,250)(upper right panel), and (89,139)(bot- tom right panel) degrees. The intensities are mapped lin- early to 255 shades of gray. The direction of the axes are givenwithaxeslengthscorrespondingtoatotalof50vox- els. The borders of the front faces of the voxel cube are shown, their widths corresponds to the apparent width of a voxel. Note that the different panels are at different scales. Fig.15. Visualization of the results for the ǫ = 10−4 case with a 1293 elements voxelgrid.The intensity image is shown for (θ,φ) = (0,0) (upper left panel), (45,45) (bottom right panel), (140,250) (upper right panel), and (89,139)(bottomrightpanel)degrees.The intensitiesare mappedlinearlyto255shadesofgray.Thedirectionofthe axes are given with axes lengths corresponding to a total of 100 voxels. The borders of the front faces of the voxel cube areshown,their widths correspondsto the apparent width of a voxel. Note that the different panels are at different scales. Fig.16. Visualizationoftheresultsfortheǫ=10−4 case with a 1933 voxel grid. The intensity image is shown for (θ,φ) = (0,0) (upper left panel), (45,45) (bottom right panel), (140,250)(upper right panel), and (89,139)(bot- tom right panel) degrees. The intensities are mapped lin- early to 255 shades of gray. The direction of the axes are given with axes lengths corresponding to a total of 100 voxels. The borders of the front faces of the voxel cube areshown,theirwidthscorrespondstotheapparentwidth of a voxel. Note that the different panels are at different scales. Fig.17. Visualization of the results for the ǫ = 10−8 case with a 1293 elements voxelgrid.The intensity image is shown for (θ,φ) = (0,0) (upper left panel), (45,45) (bottom right panel), (140,250) (upper right panel), and (89,139)(bottomrightpanel)degrees.The intensitiesare mappedlinearlyto255shadesofgray.Thedirectionofthe axes are given with axes lengths corresponding to a total of 100 voxels. The borders of the front faces of the voxel cube areshown,their widths correspondsto the apparent width of a voxel. Note that the different panels are at different scales.