ebook img

Bayesian power spectrum estimation at the Epoch of Reionization PDF

1.8 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Bayesian power spectrum estimation at the Epoch of Reionization

Mon.Not.R.Astron.Soc.000,000–000(0000) Printed13January2017 (MNLATEXstylefilev2.2) Bayesian power spectrum estimation at the Epoch of Reionization L. Lentati1(cid:63), P. H. Sims1,2, C. Carilli3,1, M. P. Hobson1, P. Alexander1, P. Sutter4,5,6 7 1 1Astrophysics Group, Cavendish Laboratory, JJ Thomson Avenue, Cambridge, CB3 0HE, UK 0 2Department of Physics, Brown University, Providence, RI 02912, USA 2 3National Radio Astronomy Observatory, Socorro, NM 87801, USA n 4INFN - National Institute for Nuclear Physics, via Valerio 2, I-34127 Trieste, Italy a 5INAF - Osservatorio Astronomico di Trieste, via Tiepolo 11, 1-34143 Trieste, Italy J 6Center for Cosmology and Astro-Particle Physics, Ohio State University, Columbus, OH 43210 2 1 13January2017 ] M ABSTRACT I . We introduce a new method for performing robust Bayesian estimation of the h three-dimensional spatial power spectrum at the Epoch of Reionization (EoR), from p interferometricobservations.Theversatilityofthistechniqueallowsustopresenttwo - o approaches. First, when the observations span only a small number of independent r spatial frequencies (k-modes) we sample directly from the spherical power spectrum t s coefficients that describe the EoR signal realisation. Secondly, when the number a of k-modes to be included in the model becomes large, we sample from the joint [ probability density of the spherical power spectrum and the signal coefficients, using 1 Hamiltonian Monte Carlo methods to explore this high dimensional (∼ 20000) space v efficiently. This approach has been successfully applied to simulated observations 4 that include astrophysically realistic foregrounds in a companion publication. Here 8 we focus on explaining the methodology in detail, and use simple foreground models 3 to both demonstrate its efficacy, and highlight salient features. In particular, we show 3 that including an arbitrary flat spectrum continuum foreground that is 108 times 0 greater in power than the EoR signal has no detectable impact on our parameter . 1 estimates of the EoR power spectrum recovered from the data. 0 7 1 : Key words: methods: data analysis, cosmology: dark ages, reionization, first stars, v techniques: interferometric i X r a 1 INTRODUCTION a function of redshift are questions that currently remain largely unanswered. The Epoch of Reionization (EoR) marks a period of his- tory that began approximately 400 Myr after the big bang, While recent observations have been able to constrain when the first ionizing sources formed in an otherwise neu- the bright end of the galaxy luminosity function at low tral Universe. For a detailed review of the EoR refer to, redshifts (z (cid:46) 8) (Bouwens et al. 2010; Schenker et al. for example, Pritchard & Loeb (2012); Loeb & Furlanetto 2013),andotherobservationalprogrammeshaveplacedcon- (2013); Morales & Wyithe (2010). In brief, the emergence straintsonreionization,forexample,fromtheopticaldepth ofthesesourcesresultedinthegradualformationofionized of Thompson scatering to the CMB (Planck Collaboration ‘bubbles’ in the neutral Hydrogen that made up the sur- et al. 2014), it is thought that the key to answering these rounding intergalactic medium. These bubbles are thought questions more completely lies in the detection of the red- to have expanded over a redshift range from z ∼ 16 → 6, shifted 21–cm signal from the EoR epoch, which provides however the precise timing and duration of the period, as a direct link to the density and distribution of the neutral well as the spatial scales on which these bubbles existed as Hydrogen during that time. This motivation has meant that the detection of the 21–cm signal is one of the major goals for existing, and up- (cid:63) E-mail:[email protected] coming low frequency interferometers (e.g the Giant Me- (cid:13)c 0000RAS 2 L. Lentati et al. trewave Radio Telescope (GMRT) (Paciga et al. 2013) , mates of the three-dimensional power spectrum of the data the LOw Frequency ARray (LOFAR) (van Haarlem et al. sets. 2013),theMurchisonWidefieldArray(MWA)(Tingayetal. In Sections 2 - 4 we derive the likelihood functions for 2013),thePrecisionArraytoProbetheEpochofReioniza- boththesecases.InSection5wedescribetheguidedHamil- tion(PAPER)(Parsonsetal.2014),theHydrogenEpochof toniansamplerandhowitcanbeappliedtointerferometric ReionizationArray1andtheSquareKilometreArray(SKA) dataanalysis.InSection6wethenapplythisframeworktoa (Mellema et al. 2013)) . setofsimulationstodemonstratetheefficacyofthemethod. In this paper we will, in particular, be concerned with These include both high and low signal-to-noise data sets, extracting information from the ‘late stage’ (z ∼ 6 → 10) with and without an additional flat spectrum continuum 21–cm EoR power spectrum obtained using interferometric component. In the latter case we show that this extra con- observations, spatially averaged in either 2 or 3 dimensions tinuum component does not affect the power spectrum es- toform‘cylindrical’or‘spherical’powerspectrarespectively. timation of the EoR signal present in the data set despite In recent years, Bayesian methods have become more assuming no prior knowledge of the distribution or ampli- prevelant in the analysis of interferometric data sets , both tudesofsources.FinallyinSection7weoffersomeconclud- in terms of providing optimal imaging techniques (Sutter ing remarks. et al. 2014), and power spectrum analysis in the context of the cosmic microwave background (e.g. Sutter, Wandelt, & Malu (2012); Karakci et al. (2013)). 2 OBSERVING WITH AN InthecaseoftheEoR,inordertodeterminethepower INTERFEROMETER spectrumofthefluctuationsrobustly,onemustalsoaccount for the presence of Galactic and extragalactic foreground For a generic radio interferometer, the Measurement Equa- emission (see e.g. Shaver et al. (1999)) which can be orders tion (Hamaker, Bregman, & Sault 1996; Smirnov 2011) for ofmagnitudegreaterthantheEoR signal ofinterest.Typi- apairofantennasp,qobservingasinglepointsourceallows callytheseforegroundsaredealtwithbeforetheestimation us to construct a ‘visibility matrix’, V , as: pq ofthesignal(e.g.Harkeretal.2009;Bonaldi&Brown2014 ),butideallyonewouldfitsimultaneouslyfortheEoRsignal and the foregrounds in order to account for the covariance Vpq =JpBJHq, (1) between the two correctly, and hence produce an unbiased where H denotes the Hermitian transpose, B is the ‘bright- estimate of the significance of the EoR power spectrum re- ness matrix’, given by covered from a given data set. In this paper we outline a general Bayesian framework (cid:20) (cid:21) thatallowsjustsuchajointanalysis,whichwewillconsider I+Q U +iV B= , (2) in two different regimes. First, when the number of spatial U −iV I−Q scales we wish to include in our model for the EoR power for Stokes parameters (I,Q,U,V), and J are 2×2 Jones spectrum is small ((cid:46) 20000) we sample directly from the p,q matrices that describe the cumulative product of all pro- sphericalpowerspectrumcoefficientsoftheEoRsignal.This pogation effects along the signal path. results ina computationalproblem that islow–dimensional In this work we will be considering only observations (∼10),buthighincomputationalexpense,withlargedense thatareuncorruptedby(orcorrectedfor),forexample,cal- matrix inversions required in every likelihood calculation. ibration errors, or ionospheric effects. Thus the only contri- Assuch,whenthenumberofspatialscalestobeincludedis butionstotheJonesmatricesthatwewillconsiderarethose larger,andthematrixinversionsrequiredbytheanalysisbe- thatcomefromascalarphasedelayK foreachantennap, come computationally intractable, we sample instead from p defined: the joint probability density of the spherical power spec- trumcoefficientsandtheEoRsignalrealisation.Thisallows ustoeliminateallmatrix-matrixmultiplicationsandcostly K =exp(−2πi(u l+v m+w (n−1)), (3) p p p p matrix inversions from the likelihood calculation entirely, √ replacing them with matrix-vector operations and diagonal with l,m, and n = 1−l2−m2 the direction cosines of matrix inversions. In this case the dimensionality is much the vector from the point source to the antenna, which we larger(∼20000),andsoweperformthesamplingprocessus- denote Ω, and (u,v,w) the antenna coordinates in wave- ingaGuidedHamiltonianSampler(GHS)(Balan,Ashdown lengths,andatermthatdescribesthepowerprimarybeam & Hobson, in prep, henceforth B17, see e.g. Lentati et al. of the antenna, which we will denote Pp. (2013) for uses in other astrophysical fields) which exploits Integratingoverthewholesky,wecanthereforerewrite Hamiltonian Monte Carlo sampling methods to provide an Eq. 1 explicitly including only the discussed terms as: efficient means of sampling in large numbers of dimensions (potentially ∼ 106). This framework has been applied to (cid:90) 4π simulated interferometric observations that combined real- Vpq = dΩPp(Ω)B(Ω)PHq(Ω) istic astrophysical foregrounds, and the EoR signal in Sims 0 × exp(−2πi(u ·x)), (4) et al. (2016), where it was shown to result in unbiased esti- pq √ with x = (l,m, 1−l2−m2), and u = (u −u ,v − pq p q p v ,w −w ). q p q In practice this integral is difficult to evaluate directly, 1 http://reionization.org andsoweperformasineprojectionontotheplane(l,m)at (cid:13)c 0000RAS,MNRAS000,000–000 Bayesian power spectrum estimation 3 the field centre. If the field of interest is sufficiently small, allowing us to construct a general likelihood for a model we can also make the approximation that: vector m constructed from the set of parameters Θ as: (cid:16)(cid:112)1−l2−m2−1(cid:17)w≈−1(l2+m2)w≈0 (5) Pr(d|Θ) = 1 2 (cid:112) (2π)2Nvis)detN andsoconsideronlyx=(l,m),u=(u,v).Inthecontextof (cid:20) (cid:21) 1 thesimulationspresentedinSection6,weconsideraprimary × exp − (d−m(Θ))T N−1(d−m(Θ))(1.1) 2 beam with a full width at half maximum of 8 degrees. For an8degreeseperationwewouldhavel2+m2 ∼0.02,which Henceforthforclarityinthemathematicalnotation,wewill we consider to be in the regime where this approximation consider our data and hence model vectors to be the con- holds, resulting in the expression: catenationoftherealpartandimaginaryparts,ratherthan a set of complex values. As such d, and m will be vectors (cid:90) of length 2×N , while the diagonal elements of N will V = d2xP (x)B(x)PH(x) vis pq p q be given by the rms of the Gaussian noise in the real and imaginary parts of the observed visibilities separately. × exp(−2πi(u ·x)). (6) pq Finally if we consider only the total intensity of the skyI(x),weobtainforanypairofantennas(or‘baseline’), 2.2 A model grid operating at a single frequency ν the expression: InthecontextofEq.8,ourmodelm(Θ)willbearepresen- (cid:90) tion of the complex visibility plane S. In principle we can V (u ,ν )= d2xP (x,ν )I(x,ν )exp(2πiu ·x), (7) simplydividethecomplexplaneintoequalareacellsofside i i i i i i i ∆u, however clearly our ability to determine the properties wherewehavedroppedthesubscriptpqforthecoordinate ofthepowerspectrumcorrectlywilldependonthenumber vector ui, and have replaced the visibility matrix with the ofcellschosentomakeupourmodel.Sinceboththenumber complex number Vi(ui,νi), and with Pi(x,νi) the primary of samples required, and the speed of the likelihood evalu- beamprofileforbaselinei,whichwehavewrittenexplicitly ation will be strongly dependent upon the number of cells as a function of the observing frequency νi. used, a compromise must be made between how accurately By the convolution theorm, which states that the our model can represent the true complex plane, and our Fourier transform of a product of functions is the convo- ability to perform the computational analysis. lution of the Fourier transforms of the functions seperately, In practice a natural maximum size for the model cells we can define the apeture function A(u,ν) as the Fourier exist, as an antenna of diameter D will convolve the com- transform of the primary beam P(x,ν), and the complex plex visibility plane with a function that has scale length visibility plane S(u,ν) as the Fourier transform of the sky ∼ D/2λ. As such, from sampling theory we require ∆u < brightness I(x,ν), and so rewrite Eq. 7: D/2λinorderforourmodeltoadequatelydescribetheun- derlying visibility plane. (cid:90) As we are working with a model grid, initially it may V (u ,ν )= d2uA(u−u ,ν )S(u,ν ). (8) i i i i i i seemamorenaturalchoicetodefineourmodelintheimage domain. Here we can construct a uniformly spaced N × Inthefollowingsectionswenowdescribeourmodelfor pix N ×N cube,whereN isthenumberofpixelsalong S(u,ν ) that allows us to reconstruct the observed visibili- pix chan pix i onesideoftheimage,andN isthenumberofchannels. tiesV (u ,ν ),whileremainingcomputationallytractableto chan i i i Definingthevectorofmodelamplitudesforthepixelsinthe evaluate. image as c, we can then generate a set of model visibilities m as: 2.1 Constructing a likelihood We begin by considering the complex visibilities obtained m=F−1Pc, (12) n during an interferometric observation to be the sum of wherePisadiagonalN ×N ×N matrixthatencodes a signal component s sampled from the visibility plane, pix pix chan theprimarybeamcorrection,andF−1isa2N ×N2 N and an instrumental noise component n, where we describe n vis pix chan matrix,wherethefactor2,aspreviouslydiscussed,accounts the noise as a zero–mean statistically homogeneous Gaus- for the real and imaginary parts, and describes the inverse sianrandomfield,uncorrelatedbetweendifferentvisibilities, Fourier transform from our primary beam corrected image, with covariance matrix N given by: to the sampled (u,v) coordinates. AswewillshowinSection3,however,whenweinclude Nij =(cid:10)nin∗j(cid:11)=δijσj (9) a prior on the EoR signal, that prior is defined in k-space, suchthatcontributionstothesignalfromtermswithsimilar where (cid:104)..(cid:105) represents the expectation value, and σ is the j |k|valueswillbeconsideredtocomefromasingleGaussian rms value of the noise term for visibility j. distribution of some variance to be determined during the We can therefore write our data vector d containing analysis. When defining the prior in this way, construct- N complex visibilities as: vis ing our model in the image domain results in large dense matrix inversions, rapidly making the analysis intractable. d=s+n, (10) If we construct our model in the UV domain instead, this (cid:13)c 0000RAS,MNRAS000,000–000 4 L. Lentati et al. priormatrixbecomesdiagonal,andtheinversionstrivialto describing power at large spatial scales, to the uniformly compute. gridded image, which remains as in Eq. 13. Inprinciplewecouldreplicatetheeffectoftheprimary beam in the UV domain by performing a convolution of ourmodelUVgridwiththetelescopeapeturefunction,the 2.4 Incomplete UV coverage Fourier transform of the primary beam. This, however, is In any interferometric observation the coverage of the UV- muchlesscomputationallytractablethanperformingasim- plane will not be complete. In particular, an interferometer plemultiplicationintheimagedomain,andthenperforming isnotsensitivetothe(0,0)UVcoordinate,astheminimum a Fourier transform of this primary beam corrected image separation between two antennas cannot be less than the to the UV domain. We can however combine the speed of size of the dish, D, so that any observation made will be defining our model in the UV domain with the efficiency of insensitivetothetruemeanofthesky.Moregenerallyhow- performing the primary beam correction in the image do- everthesamplingoftheUVplanebyourinterferometerwill main by defining a new matrix P, which acts on a vector result in gaps, or areas of decreased sensitivity, the precise ofmodelparametersa,whichnowdescribestheamplitudes nature of which will be determined by the arrangement of for the real and imaginary parts of the uv–plane such that antennas and length of observation. our model visibilities m are given by: WecancomputetheweightingintheUV–planethatre- sults from the sampling of a set of N discrete visibilities, vis m = F¯a (13) which can also be considered the Fourier transform of the interferometerpointspreadfunction.Forthecurrentanaly- = F−1PFa n sis we compute these weights by defining a gridding matrix G as: wherethematrixFissimplytheFouriertransformfromour gridded(u,v)domainpoints,tothegriddedimage.Thisad- ditionalmultiplicationhasnoimpactontheevaluationtime G=F−1F , (14) n of our likelihood as we can simply precompute the matrix product F−n1PF and still evaluate the model vector m in a where Fn is a Np2ix×2Nvis matrix representing the direct single matrix–vector multiplication. FouriertransformofthevisibilitiestoanNpix×Npix image domain grid, and F−1 describes the Fourier transform from the gridded image to our gridded uv cells. In principle UV cells far from the points sampled by 2.3 Including large spatial scales the interferometer could have non–zero weights, but would contribute a negligable amount to our model. We therefore While our definition of the matrix F in Eq. 13 represents compute the weight of any UV cell, W , as: a standard 2–dimensional Fourier transform, this will not i correctlymodelpoweronspatialscalesgreaterthanthesize oftheimage,orequivlantly,onscalessuchthat|u|<D/2λ. W =GN−1GT, (15) i i i Linear trends that extend across the image, such as those andsowecanconsideronlytheN cellsinourmodelthat that can be expected from Galactic foregrounds, will ‘leak’ uv contain some fraction of the total weight. For the purposes intothemodelcoefficientsthatdescribepoweronscalesless of this work we take that fraction to be 99%, and include than the image size, with |u|>D/2λ. only this subset of cells in the definition of our matrix F. In principle we could incorporate these scales into our WorkingdirectlyintheUVdomainitisthereforetrivial modelbysimplyreducingthecellsizeinourUVmodel,∆u. toaccountfortheincompleteUVcoverageofagivenobser- Forexample,poweronscales10timesthesizeoftheimage vation.Forcompletenesswewilldiscussbrieflyoneapproach couldbeincoporatedrobustlysimplybychoosingacellsize to account for this if the model is defined in the image do- of0.1×∆u.Thishoweverisnotcomputationallytractable, main, though we will not pursue this approach further as as it will increase the dimensionality of our problem by a discussed in Section 2.2. factorof100.Inthecontextofone–dimensionalpowerspec- Given a set of N UV cells to include in the model, trumrecoveryfromatimeseriesoflengthT inotherfields, uv wethereforehaveasetofN ≡N2 −N cellsthatour van Haasteren et al. (2014) show that using a log spacing Nuv pix uv observation is not sensitive to. By constructing the N2 × for sub-harmonic frequencies (ν <1/T), and linear spacing pix forν (cid:62)1/T instepsof∆ν =1/T,allowsforaccuraterecov- NNuv matrix FNG that describes the Fourier transform of those cells to the image grid we can produce a set of basis ery of the power spectrum when there is significant power vectorsthatdescribethenull-sky.IfwetaketheSVDofthe in these low frequency terms. matrix, which results in three matrices: Wethereforetakeanequivalentapproachwithourtwo– dimensionalanalysis.Wedefineasetof10evenlylogspaced spatial scales between the size of the image, and 10 times F =USVT, (16) NG the size of the image, and include them in our model, si- multaneously with the linear UV domain grid with cell size wecantaketheresultantNp2ix×Np2ix matrixUandseparate ∆u. it into two components: The matrix F therefore no longer represents the trans- form from a uniformly gridded UV domain model, to the (cid:0) (cid:1) U= M,M . (17) C uniformly gridded image. Instead it defines the transform from our complete gridded UV model, including the points Here M is a N2 ×N matrix, representing the set of pix Nuv (cid:13)c 0000RAS,MNRAS000,000–000 Bayesian power spectrum estimation 5 N orthonormalbasisvectorsthatfullydescribetheorig- a log spacing for the sub-harmonics allows for accurate re- Nuv inal matrix F , while M is the N2 ×N complement. covery of the spectrum when there is significant power in NG C pix uv Thiscomplementcanbeconsideredaprojectionmatrixthat these low frequency terms. In van Haasteren et al. (2014) allows us to take a length N vector of parameters, and theynote,however,thatexceptforthemostextremecases, uv produce an N ×N image that contains no contribu- these sub-harmonics terms can also be well modelled by a pix pix tion from the UV cells that formed our null-sky basis. This simple quadratic in frequency. is completely analogous to the operation of projecting out In our model we therefore include a quadratic in fre- unwanted model parameters in other fields, such as pulsar quency to act as a proxy to the sub–harmoinic structure in timing (see e.g., van Haasteren & Levin 2013). our data: 2ν2κ 2.5 The full K-cube Qz = 10−o23Bc2(q0+νq1+ν2q2)≡Qzq, (21) AsmentionedinSection2.2wedefineourEoRsignalmodel whereq=(q ,q ,q )areamplitudeparameterstobefitfor. 0 1 2 directlyin(kx,ky,kz)space,usingthesetof(kx,ky)points We can therefore write our final model given the con- that correspond to the set of Nuv gridded UV coordinates catenatedNuv×nmaxη lengthvectorofofsignalcoefficients that we include in our analysis as discussed in section 2. a,andN ×3quadraticcoefficientsqdefinedinthek-cube uv These(u,v)coordinatescanbetranslateddirectlytokx,ky as: coordinates through the relations: m=F¯(F a+Q q), (22) 2πu z z k = (18) x Dm whereFzandQzbothnowrepresentblockdiagonalmatrices 2πv thatactindependentlyoneachsetofcoefficients(a,q)for k = i i y Dm eachmodelUVcelli,andF¯ isthetwo–dimensionalprimary beam corrected transform described in Eq.13, so that our likelihood at this stage becomes with D the transverse comoving distance from the obser- m vatory to the redshift of the EoR observation. Our sampled visibilities, however, are defined in (u,ν) 1 Pr(d|a,q) = space and so we must also transform our model cube from (cid:112)(2π)2Nvis)detN kmzattroixoFbszearvsing frequency ν. We therefore first define the × exp(cid:20)−12(cid:0)d−F¯(Fza+Qzq)(cid:1)T √ × N−1(cid:0)d−F¯(F a+Q q)(cid:1)(cid:3). (23) 2ν2κ 2 (cid:18)2π (cid:19) z z F (ν,n )= B sin n ν , (19) z η 10−23c2N B η chan 2.6 Including foreground models with an equivalent cosine term, B the bandwidth of the observation, and η = nη/B the Fourier domain parame- In order to make a detection of the EoR power spectrum, ter after transforming along the frequency axis, where we correctly accounting for foreground signals in the visibil- include terms up to some maximum nη = nmaxη. The fac- ities will be key. These include diffuse emission from the tor 2ν2κB/c2 from the Rayleigh-Jeans law, with κB the Galaxy, and continuum emission from extragalactic sources Boltzmann constant, and c the speed of light, at the front (e.g.Jeli´cetal.2008),whichincombinationcanbeuptofive of this expression allows us to convert from units of mK ordersofmagnitudegreaterthantheEoRsignalofinterest in the model (kx,ky,kz) cube, to Janskys (1 Jy = 10−26 (Shaver et al. 1999). Wm−2Hz−1). We can then relate η to the cosmological pa- In principle, any additional foreground model can be rameter kz via the relation: added to the model in Eq 23, either in the image domain, orintheUV.Writingtheforegroundmodelasafunctionof k = 2πH0f21E(z)η, (20) some parameters m(Θfg), we would write: z c(1+z)2 1 with z the redshift of the EoR observation, H0 the Hubble Pr(d|a,Θfg) = (cid:112) (24) constant,E(z)thedimensionlessHubbleparameter,f the (2π)2Nv)detN 21 (cid:20) frequencyofthe21cmlineemission,andcthespeedoflight. × exp −1(cid:0)d−F¯(F a+Q q)−m(Θ )(cid:1)T WhileFz representsatypical1DFouriertransform,the 2 z z fg EoR signal present in the data will include fluctuations on × N−1(cid:0)d−F¯(F a+Q q)−m(Θ )(cid:1)(cid:3),(25) scales much longer than the bandwidth of the observation. z z fg Written as in Eq. 19 this transform will not correctly ac- and then proceed to sample over the joint parameter space count for these low frequencies, causing them to ‘leak’ into (a,q,Θ ). fg the higher frequency terms included in our model biasing One approach advocated to model smooth foreground thepowerspectrumparameterestimatesatthescalesofin- emission is to use a simple polynomial in frequency (e.g. terest. In principle these low frequencies could be included Bowman, Morales, & Hewitt 2009). In Section 2.5 we note simply by adding additional log–spaced Fourier modes to that we include a quadratic in our Fourier transform from our matrix 19 with n < 1, as in Section 2.3 that using frequency to the parameter η in order to model any low η (cid:13)c 0000RAS,MNRAS000,000–000 6 L. Lentati et al. frequencyvariationsthatexistinthedatathathaveperiods be a set of independent parameters ϕ , one for each |k| bin i longer than the bandwidth of the observation. We reiterate i. here, however, that the primary purpose of the quadratic Wethenwritethejointprobabilitydensityofthemodel is simply to provide us with an unbiased estimate of the coefficientsthatdefineourpowerspectrum,andthek-space scales of interest (i.e. with n (cid:62) 1 in Eq. 19). In principle signal coefficients Pr(ϕ,a|d) as: η higher order terms could also be added, however these will be increasingly covariant with the Fourier modes included Pr(ϕ,a,q|d) ∝ Pr(d|q,a)Pr(a|ϕ)Pr(ϕ)Pr(q) (27) in the model, and so we do not take this approach. Insection3wedescribeourapproachtoestimatingthe and then marginalise over all a and q in order to find the EoRpowerspectrum,incorporatingaprioronthesignalco- posteriorfortheparametersthatdefinethepowerspectrum efficients a that incorporates the assumption that the EoR ϕ alone. signalisspatiallyhomogenous.Wedonot,however,incorpo- ForourchoiceofPr(ϕ)weuseeitherauniformpriorin ratesuchaprioronthequadratictermsinourestimationof the amplitude of the coefficient, or a uniform prior in log 10 thepowerspectrum,asthesetermswilllikelybedominated space. The latter case is the least informative prior we can byforegroundemissionandsowillnothavethesamehomo- choose,howeverwhenthegoalistosetanupperlimitaprior geneity,atleastinthecaseoftheGalacticforeground.While thatisuniforminlogspaceisnotappropriate,astheupper inprincipleaseparateGaussianpriorcouldbeincludedfor limitisdependentupontheboundsoftheprior.Whenthis these quadratic terms, in order to make our analysis of the is the case we use a prior that is uniform in the amplitude. EoRsignalmoreconservativeweusealessinformativeuni- In either case we draw our samples from the parameter ρ , i form prior on the amplitudes of these coefficients. such that: In our simulations in Section 6 we will be considering only simple continuum models, with flat spectrum sources, 2π2N2 N Ω4 howeveramoredetailedaccountontheeffectofforegrounds, ϕ = pix chan pix10ρi, (28) i |k|3V when including realistic frequency evolution and spatial i structure,usingthetechniquedescribedinthisworkisgiven with V the surveyed volume in Mpc3, and Ω the image pix in Sims et al. (2016). pixel size in radians, and the spherical power spectrum co- efficients 10ρi defined in units of mK2 (h−1 Mpc)3. Given these two choices of prior, and assuming a uni- form prior on the quadraditc amplitude parameters q such 3 ESTIMATING THE POWER SPECTRUM that Pr(q) = 1, the conditional distribution Pr(d|q,a) re- Assuming the EoR signal to be spatially homogenous the mains as in Eq. 23, while the latter part of Eqn 27 is given covariance matrix Φ of the k-space coefficients a will be by: diagonal, with components 1 (cid:104) (cid:105) Pr(a|ρ)Pr(ρ) ∝ √ exp −a∗TΦ−1a . (29) Φ =(cid:104)a a (cid:105)=ϕ δ , (26) detϕ ij i j i ij when assuming a log-uniform prior on the amplitude of the where there is no sum over i, and the set of coefficients ϕ i power spectrum coefficients (Pr(ρ)=1), and when using a representthetheoreticalpowerspectrumfortheEoRsignal. prior that is uniform in the amplitude: In the framework we will describe below we are free to choose any functional form for the coefficients ϕ . It is i htheereptohweenrtshpaetc,trsuhmouladtotnheewpiosihnttooffistamapsplinecgi,fictompoedrefolrtmo Pr(a|ρ)Pr(ρ) ∝ √ 1 exp(cid:104)−a∗TΦ−1a(cid:105)(cid:89)Ns 10ρs, (30) detϕ modelselectionforexample,thesetofcoefficientsϕ should s=1 i be given by some function f(Θ), where we sample from the with N the number of spherical power spectrum bins used s parameters Θ from which the power spectrum coefficients in the prior. ϕ can then be derived. i In Section 6 we will be comparing the results of our 3.1 A non-Gaussian prior methodwithaninputsimulationobtainedusingtheseminu- merical21cmFASTalgorithm(Mesinger,Furlanetto,&Cen The approach outlined in Section 3 explicitly assumes that 2011; Mesinger & Furlanetto 2007). After computing the the signal coefficients that fall into a specific |k| bin are EoRsimulation,21cmFASToutputsasphericalpowerspec- welldescribedbyaGaussianrandomprocess,howeverifde- trum of the simulated cube, performing a 3–dimensional sired we can relax this assumption and use a non-Gaussian FFT and averaging all the Fourier coefficients that fall priorinstead.WeusetheapproachdevelopedinRochaetal. within some spherical shell in k-space in order to calculate (2001), which is based on the energy eigenmode wavefunc- the power spectrum within that bin. tions of a simple harmonic oscillator, and has since been In order to draw the most direct comparison with the applied to other areas of astrophysics (Lentati, Hobson, & input simulation, we therefore calculate the quantity |k| = Alexander 2014), which we outline in brief below. (cid:112) kx2+ky2+kz2 for each k–space coefficient a in our model, For a general random variable x, we write the PDF for anddefineasetofbinsinthequantity|k|.Asin21cmFAST, fluctuations in x as: fwoerbdienfisnne =the1.e.d.gnemsaoxfwthitehsenbmianxstthoeblaergsepsatcbedinaisnc1l.u5dne∆d|ikn| Pr(x|σ,α)=exp(cid:20)− x2 (cid:21)(cid:12)(cid:12)(cid:12)(cid:88)∞ α C H (cid:18)√x (cid:19)(cid:12)(cid:12)(cid:12)2 (31) the model. Our model for the power spectrum ϕ will then 2σ2 (cid:12)(cid:12) n n n 2σ (cid:12)(cid:12) n=0 (cid:13)c 0000RAS,MNRAS000,000–000 Bayesian power spectrum estimation 7 withα freeparametersthatdescribetherelativecontribu- situation we can perform the marginalisation numerically, n tions of each term to the sum, and sampling directly from the high dimension, joint probabil- itydistributiondescribedinEq27,aprocessmadepossible 1 Cn(σ)= √ , (32) through the use of a GHS (B17) which we describe in the (2nn! 2πσ)1/2 Section 5. is a normalization factor. Equation 31 forms a complete set of PDFs, normalised such that: (cid:90) ∞ (cid:20) x2(cid:21) (cid:18) x (cid:19) (cid:18) x (cid:19) 4 THE SMALL K-CUBE REGIME: dx exp − C H √ C H √ =δ , ANALYTICAL MARGINALISATION OVER σ2 n n 2σ m m 2σ mn −∞ THE SIGNAL COEFFICIENTS (33) with δ the Kronecker delta, where the ground state, In order to perform the marginalisation over the signal co- mn H , reproduces a standard Gaussian PDF, and any non- efficientsaandq,wefirstsimplifyournotationbydefining 0 Gaussianityinthedistributionofxwillbereflectedinnon- the vector b as the concatenation of the vectors a and q, zero values for the coefficients α associated with higher and the matrix T such that our signal can be rewritten: n order states. The onlyconstraintwe must placeon the valuesof the m=F¯(F a+Q q)=Tb. (37) amplitudes α is: z z n(cid:88)max|αn|2 =1 (34) denoWtinegthTeTnNw−r1itTe+thΦel−o1gaosftΣh,ewlikheelriehotohde ienleEmqen2t7s,owfhtihche n=0 matrix Φ−1 that correspond to the coefficients q are set to with n the maximum number of coefficients to be in- zero, and denoting TTN−1d as d¯ is given by: max cluded in the model for the PDF. This is performed most 1 1 simply by setting: logL=− dTTTN−1Td− bTΣb+d¯Tb. (38) 2 2 (cid:118) α0 =(cid:117)(cid:117)(cid:116)1−n(cid:88)max|αn|2. (35) Taking the derivitiv∂eloogfLlogL with respect to a gives us: n=1 =−Σb+d¯T, (39) ∂b Usingthisformalismwecanthenparameteriseanynon- whichcanbesolvedtogiveusthemaximumlikelihoodvec- Gaussianity in the coefficients a by rewriting Eq. 29: tor of coefficients bˆ: (cid:20) 1 (cid:21) bˆ =Σ−1d¯. (40) Pr(a|ϕ,α) = exp − aTΦ−1a (36) 2 Re-expressing Eq. 38 in terms of bˆ: × (cid:89)n (cid:12)(cid:12)(cid:12)n(cid:88)maxα C (ϕ )H (cid:18)√ai (cid:19)(cid:12)(cid:12)(cid:12)2. i=1(cid:12)(cid:12)n=0 n n i n 2ϕi (cid:12)(cid:12) logL = −1dTTTN−1Td+ 1bˆTΣbˆ 2 2 Theadvantageofthismethodisthatonemayuseafi- 1 − (b−bˆ)TΣ(b−bˆ), (41) nitesetofnon-zeroαn tomodelthenon-Gaussianity,with- 2 outmathematicalinconsistency.Anytruncationoftheseries the 3rd term in this expression can then be integrated with still yields a proper distribution, in contrast to the more respect to the m elements in b to give: commonly used Edgeworth expansion (e.g. Contaldi et al. 2000). (cid:90) +∞ (cid:20) 1 (cid:21) I = dbexp − (b−bˆ)TΣ(b−bˆ) 2 −∞ 3.2 Performing the sampling = (2π)m det Σ−21. (42) How we now perform the sampling depends entirely on the Our marginalised probability distribution for a set of EoR sizeofthek-cubewewillbeusingtodescribetheEoRsignal power spectrum coefficients is then given as: present in the visibilities. When the size of the k-cube, and thus the number of signal parameters used to describe the det(Σ)−21 Pr(ϕ|d) ∝ (43) (cid:112) signal is small (< 20000), we can marginalise over the co- det(ϕ) det(N) efficients a analytically and sample directly from the power (cid:20) 1(cid:16) (cid:17)(cid:21) spectrum coefficients ρ, a process we describe in Section 4. × exp − dTN−1d−d¯TΣ−1d¯ . 2 In this scenario we can perform the sampling using Multi- Nest(Feroz&Hobson2008;Feroz,Hobson,&Bridges2009), WhentakingthisapproachinSection6weusetheMAGMA allowingustoperformrobustevidenceevaluation,andper- (MatrixAlgebraonGPUandMulticoreArchitectures)GPU form model selection on the EoR power spectrum. accelerated linear algebra package2 to perform the cholesky If however we wish to sample over a larger number of decomposition for each likelihood evaluation. signal coefficients, the matrix to be inverted when perform- ing the marginalisation analytically will become too large to make this approach computationally tractable. In this 2 http://icl.cs.utk.edu/magma/ (cid:13)c 0000RAS,MNRAS000,000–000 8 L. Lentati et al. 5 THE LARGE K-CUBE REGIME: for the maximum set of signal coefficients b analyti- max NUMERICAL MARGINALISATION OVER cally using Eq. 40 so when searching for the global maxi- THE SIGNAL COEFFICIENTS mum we need only search over the subset of parameters ρ. This is achieved by using either a particle swarm algorithm For a detailed account of both Hamiltonian Monte Carlo (Kennedy(1995,2001)andforusesincosmologicalparame- (HMC) and GHS refer to B17 or Lentati et al. (2013), here terestimationseee.g.Prasad&Souradeep(2012))orusing we will provide only a brief introduction of the key aspects agradientsearchoptimizationGilbert&Lemarchal(1989). of each. HMC sampling (Duane et al. 1987) has been widely applied in Bayesian computation (Neal, R 1993), and has been successfully applied to problems with extremely large 5.1 Low signal-to-noise parameterisation numbers of dimensions (∼106 see e.g. Taylor, Ashdown, & In Lentati et al. (2016) an alternative parameterisation of Hobson(2008)).WhereconventionalMCMCmethodsmove the likelihood described in Eq. 29 is described that is much through the parameter space by a random walk and there- more efficient in the low signal-to-noise regime, where the forerequireaprohibitivenumberofsamplestoexplore-high power spectrum coefficients are not detected. We can an- dimensional spaces, HMC exploits techniques that describe ticipate that, at least at first, this is likely to be the case themotionofparticlesinpotentialwells,andsuppressesthis with the EoR signal, and thus we summarise this new pa- random walk behaviour. This allows the HMC approach to rameterisationinthecontextofourthree-dimensionalpower maintain a reasonable efficiency even for high-dimensional spectrum analysis below. problems. Rather than sample from the parameters a, we instead Possibly the main shortcoming of traditional HMC samplefromtherelatedparametersuwherefortheithsig- methods is that it requires a large number of tuning pa- nal amplitude we will have: rameters in order to navigate the parameter space. In par- ticular, every parameter requires a step size, and the total √ numberofstepsineachiterationofthesamplermustalsobe a =u ϕ , (49) i i i chosen.Thesearetypicallydeterminedviaexpensivetuning runs.TheGHSisdesignedtobypassmuchofthistuningby whereasbeforeϕi isthethree-dimensionalpowerspectrum using the Hessian of the sampled probability distribution, coefficient that describes the standard deviation of the ith calculatedatitspeak,tosetthestepsizeandcovarianceof amplitude parameter. In order to still sample uniformly in the parameter space. The number of steps at each iteration the original parameters, a, we then include an additional isthendrawnfromauniformdistributionU(1,nmax),with term,thedeterminantoftheJacobiandescribingthetrans- nmax of ten found to be suitable for all tested problems. A formation from ai to ui. The Jacobian in this case has ele- single global scaling parameter for the step size is then the ments: only tunable parameter, chosen such that the acceptance rate for the GHS is ∼68%. √ J = ϕ δ , (50) Therefore in order to perform sampling we need the i,j i i,j, following: withδ thekroneckardelta.Thedeterminantistherefore: i,j, • The gradient of Ψ for each parameter x i • The peak of the joint distribution (cid:89)m √ • The Hessian at that peak det(J)= ϕi, (51) i=0 Thegradientsofourparametersaregivenbythefollowing: which acts to cancel exactly with the determinant of the matrix Ψ in Eq. 29. In the following work when using the ∂Ψ =−(d−Tb)TN−1T+bTΦ−1 (44) GHS we will use both parameterisations dependent upon ∂b whether we are in the high or low signal-to-noise regime. (cid:18) (cid:19) ∂Ψ 1 ∂Φ 1 ∂Φ = Tr Φ−1 − bTΦ−1 Φ−1b (45) ∂ρ 2 ∂ρ 2 ∂ρ i i i and the components of the Hessian are: 6 APPLICATION TO SIMULATIONS We now apply the methods described in the preceding sec- ∂2Ψ =TTN−1T+Φ−1 (46) tionstoaseriesoffivesimulations.Inthefirstthreeofthese ∂b2 we perform the simulation using a 37 element interferome- ter,withantennalocationsshowninFig.1(top,left),while ∂2Ψ ∂Φ ∂Φ 1 ∂2Φ forthefinaltwosimulationsweusea61elementarray(Fig. ∂ρ2 =bTΦ−1∂ρ Φ−1∂ρ Φ−1b− 2bTΦ−1 ∂ρ2 Φ−1b 1 bottom, left). In both cases we simulate 38 2KHz chan- i i i i (47) nels spanning the range 122.17∼129.90MHz and take our antennas configuration to be representative of the upcom- ing HERA 37 and 61 element arrays. ∂2Ψ ∂Φ =−Φ−1 Φ−1b (48) Wecalculatethetheoreticalinstrumentalthermalnoise ∂ρ ∂b ∂ρ i i for our simulation per unit time τ given the parameters in For a set of power spectrum coefficients ρ we can solve Table 1, as in Taylor, Carilli, & Perley (1999): (cid:13)c 0000RAS,MNRAS000,000–000 Bayesian power spectrum estimation 9 0.018 30 0.016 40 40 s) 20 0.014 h 30 30 gt 10 0.012 n (m) 20 20 elen 0 0.01 ositio 10 m) 10 wav-10 00..000068 Y p 0 V ( 0 V (-20 0.004 nna -10 -10 -30 0.002 nte-20 -20 0 A -30 -20 -10 0 10 20 30 -30 -30 U (wavelengths) -40 -40 -50 -40 -30 A-n20tenn-10a X p0osit 1i0on ( m20) 30 40 50 -40 -30 -20 -10 U ( 0m) 10 20 30 40 0.012 60 150 s) 40 0.01 h gt 20 0.008 40 100 n Antenna Y position (m)- 22 000 V (m)- 55 000 V (wavele--42 000 0000...000000246 -40 -100 -40 -20 0 20 40 U (wavelengths) -60 -150 -60 -40 A-2n0tenna X p0osition ( m20) 40 60 -150 -100 -50 U ( 0m) 50 100 150 Figure1.[Top](Left)Antennapositionsforthe37elementinterferometerusedinsimulations1-3inSection6.(Middle)Sampled(u,v) coordinatesfroma30minuteobservation,and(Right)therelativeweightsofthe(u,v)obtainedfromEq.15normalisedtohaveasum of1.[Bottom]As[Top]butforthe61elementarrayusedinsimulations4-5inSection6. overasingle30minutepointinggiventhoseconfigurations. Parameter Description Value We take the pointing centre to have right ascension equal ηs Systemefficiency 1 to 0.0, and declination equal to -30.0. This results in 21870 ∆ν Bandpassofobservation 200kHz sampled (u,v) coordinates per channel for the 37 element ηa Antennaefficiency 1 array, and 58141 per channel for the 61 element array (Fig. A Antennaeffectivearea 150m2 1, middle, top and bottom panels respectively). Tsys Systemtemperature 550K Ourinputskymodelsareconstructedusing2048×2048 pixel channels, with a resolution of ∼ 40 arcseconds per Table 1. Instrument and observation parameters used to calcu- pixel, giving a total field of view of ∼ 23×23 degrees. We latethevarianceofthetheoreticalinstrumentalthermalnoiseper thenmultiplytheseskymodelsbyaGaussianprimarybeam unittimeτ. with a full width at half max of 8 degrees at 122.17 MHz, and evaluate the direct Fourier transform of the observed sky-models onto the sampled (u,v) points obtained previ- ously. 1 2k T 1 σ(τ)=10−26 b sys√ (52) We now describe each of the five simulations in more ηs ηaA 2∆ντ detail below: Foreachsimulationweusethearrayconfigurationsfor the 37 or 61 element interferometers in Fig.1 as input to Simulation 1 the CASA3 (Common Astronomy Software Applications) A 2000 hrs simulation of a single flat spectrum point simobserve tool, to obtain the set of observed (u,v) coor- source 10.4 degrees away from the primary beam center, dinatesthatcorrespondtoaseriesof30secondintegrations resulting in a 1000σ detection using the 37 element array shown in Fig. 1 (top). We include uncorrelated thermal noise in each visibility. To simulate 4000 repetitions of our 3 http://casa.nrao.edu 30 minute observation we therefore add 0.045Jy noise to (cid:13)c 0000RAS,MNRAS000,000–000 10 L. Lentati et al. Figure2.(left)AsinglechannelfromsimulationoftheEoRsignalusingtheseminumerical21cmFASTalgorithm(Mesinger,Furlanetto, & Cen 2011; Mesinger & Furlanetto 2007) described in section 6, after subtracting the mean in the spatial variations for that channel. (right)Asinglechannelfromasimplecontinuumskysimulationdescribedinsection6,aftersubtractingthemeaninthespatialvariations for that channel. The continuum simulation is scaled such that the power is a factor 108 greater than in the EoR simulation. In both casesthecolourscaleisinmK. each of the 21870 visibilities in each channel. In order to shown in Fig. 1 (bottom). compareequivalentsimulationsweusethesamewhitenoise realization for simulations 1-3, and for simulations 4-5. Simulation 5 For ease of interpretation we do not include the quadratic AsSimulation4,butanadditionalflatspectrumcontinuum described in Eq. 21 in our model when analysing this componentisaddedtothemodelasdescribedinSimulation simulation. The purpose of this first simulation is to show 3. in a straightforward way how our approach automatically accountsforthefrequencydependenceoftheUV-sampling, In order to adequately sample the aperture function of which causes observed low frequency structure along the Gaussian primary beam in the UV plane, we define our individual baselines. UV cells to each have a width of 2.5λ. We then use Eq. 15 to determine the set of cells to include in our model. Simulation 2 The weightsfor each cellare shown inFig. 1 (right) for the A 160 hour integration including only the EoR signal and 37 and 61 element arrays (top and bottom panels respec- the uncorrelated thermal noise described in Simulation tively). Including all cells that contribute up to 99% of the 1. We generate the EoR signal using the seminumerical totalweight,wefindresultsin650,and1142UVcellsperη 21cm FAST algorithm (Mesinger, Furlanetto, & Cen 2011; modeincludedinthemodelforthe37and61elementarrays Mesinger & Furlanetto 2007) to simulate a cosmological respectively. volume of 10243 Mpc3. We use this same EoR realisation For simulations 1-3 we will use the analytic marginal- in all subsequent simulations. An example of one channel isation over the signal coefficients described in Section 4, from this EoR simulation is shown in Fig. 2 (left). samplingfromthe7dimensionalsphericalpowerspectrum usingtheMultiNestalgorithm.Forsimulations4-5thenum- Simulation 3 ber of signal coefficients included in the model is too great AsSimulation2,butanadditionalflatspectrumcontinuum forthisanalyticapproach,andsoweperformthismarginal- component is added to the model, shown in Fig. 2 (right). isation numerically using the GHS described in Section 5. Each pixel in the continuum model is assigned a random positivevaluedrawnuniformlybetweenzeroandone,which is then held constant across the 38 channels for that pixel. 6.1 Results for Simulation 1 We then scale the image so that the total power in the In Figure 3 (red lines) we show the injected real (left) and meansubtractedcontinuumis∼108 timesthatoftheEoR imaginary (right) signal for the longest (top) and shortest signal. An example of one channel from this continuum (bottom)baselinesforsimulation1,containingasinglepoint simulation is shown in Fig. 2 (right). source observed by the 37 element array shown in Fig. 1 (top).Noticeablythebaselinesshowstructureasafunction Simulation 4 of frequency despite the fact that the injected source has a As Simulation 2, however we use the 61 element array flat spectrum. This is simply a result of the baselines sam- (cid:13)c 0000RAS,MNRAS000,000–000

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.