Table Of ContentMon.Not.R.Astron.Soc.000,000–000(0000) Printed13January2017 (MNLATEXstylefilev2.2)
Bayesian power spectrum estimation at the Epoch of
Reionization
L. Lentati1(cid:63), P. H. Sims1,2, C. Carilli3,1, M. P. Hobson1, P. Alexander1, P. Sutter4,5,6
7
1 1Astrophysics Group, Cavendish Laboratory, JJ Thomson Avenue, Cambridge, CB3 0HE, UK
0 2Department of Physics, Brown University, Providence, RI 02912, USA
2 3National Radio Astronomy Observatory, Socorro, NM 87801, USA
n 4INFN - National Institute for Nuclear Physics, via Valerio 2, I-34127 Trieste, Italy
a 5INAF - Osservatorio Astronomico di Trieste, via Tiepolo 11, 1-34143 Trieste, Italy
J 6Center for Cosmology and Astro-Particle Physics, Ohio State University, Columbus, OH 43210
2
1
13January2017
]
M
ABSTRACT
I
. We introduce a new method for performing robust Bayesian estimation of the
h
three-dimensional spatial power spectrum at the Epoch of Reionization (EoR), from
p
interferometricobservations.Theversatilityofthistechniqueallowsustopresenttwo
-
o approaches. First, when the observations span only a small number of independent
r spatial frequencies (k-modes) we sample directly from the spherical power spectrum
t
s coefficients that describe the EoR signal realisation. Secondly, when the number
a
of k-modes to be included in the model becomes large, we sample from the joint
[
probability density of the spherical power spectrum and the signal coefficients, using
1 Hamiltonian Monte Carlo methods to explore this high dimensional (∼ 20000) space
v efficiently. This approach has been successfully applied to simulated observations
4 that include astrophysically realistic foregrounds in a companion publication. Here
8
we focus on explaining the methodology in detail, and use simple foreground models
3
to both demonstrate its efficacy, and highlight salient features. In particular, we show
3
that including an arbitrary flat spectrum continuum foreground that is 108 times
0
greater in power than the EoR signal has no detectable impact on our parameter
.
1 estimates of the EoR power spectrum recovered from the data.
0
7
1
: Key words: methods: data analysis, cosmology: dark ages, reionization, first stars,
v
techniques: interferometric
i
X
r
a
1 INTRODUCTION a function of redshift are questions that currently remain
largely unanswered.
The Epoch of Reionization (EoR) marks a period of his-
tory that began approximately 400 Myr after the big bang, While recent observations have been able to constrain
when the first ionizing sources formed in an otherwise neu- the bright end of the galaxy luminosity function at low
tral Universe. For a detailed review of the EoR refer to, redshifts (z (cid:46) 8) (Bouwens et al. 2010; Schenker et al.
for example, Pritchard & Loeb (2012); Loeb & Furlanetto 2013),andotherobservationalprogrammeshaveplacedcon-
(2013); Morales & Wyithe (2010). In brief, the emergence straintsonreionization,forexample,fromtheopticaldepth
ofthesesourcesresultedinthegradualformationofionized of Thompson scatering to the CMB (Planck Collaboration
‘bubbles’ in the neutral Hydrogen that made up the sur- et al. 2014), it is thought that the key to answering these
rounding intergalactic medium. These bubbles are thought questions more completely lies in the detection of the red-
to have expanded over a redshift range from z ∼ 16 → 6, shifted 21–cm signal from the EoR epoch, which provides
however the precise timing and duration of the period, as a direct link to the density and distribution of the neutral
well as the spatial scales on which these bubbles existed as Hydrogen during that time.
This motivation has meant that the detection of the
21–cm signal is one of the major goals for existing, and up-
(cid:63) E-mail:ltl21@cam.ac.uk coming low frequency interferometers (e.g the Giant Me-
(cid:13)c 0000RAS
2 L. Lentati et al.
trewave Radio Telescope (GMRT) (Paciga et al. 2013) , mates of the three-dimensional power spectrum of the data
the LOw Frequency ARray (LOFAR) (van Haarlem et al. sets.
2013),theMurchisonWidefieldArray(MWA)(Tingayetal. In Sections 2 - 4 we derive the likelihood functions for
2013),thePrecisionArraytoProbetheEpochofReioniza- boththesecases.InSection5wedescribetheguidedHamil-
tion(PAPER)(Parsonsetal.2014),theHydrogenEpochof toniansamplerandhowitcanbeappliedtointerferometric
ReionizationArray1andtheSquareKilometreArray(SKA) dataanalysis.InSection6wethenapplythisframeworktoa
(Mellema et al. 2013)) . setofsimulationstodemonstratetheefficacyofthemethod.
In this paper we will, in particular, be concerned with These include both high and low signal-to-noise data sets,
extracting information from the ‘late stage’ (z ∼ 6 → 10) with and without an additional flat spectrum continuum
21–cm EoR power spectrum obtained using interferometric component. In the latter case we show that this extra con-
observations, spatially averaged in either 2 or 3 dimensions tinuum component does not affect the power spectrum es-
toform‘cylindrical’or‘spherical’powerspectrarespectively. timation of the EoR signal present in the data set despite
In recent years, Bayesian methods have become more assuming no prior knowledge of the distribution or ampli-
prevelant in the analysis of interferometric data sets , both tudesofsources.FinallyinSection7weoffersomeconclud-
in terms of providing optimal imaging techniques (Sutter ing remarks.
et al. 2014), and power spectrum analysis in the context of
the cosmic microwave background (e.g. Sutter, Wandelt, &
Malu (2012); Karakci et al. (2013)).
2 OBSERVING WITH AN
InthecaseoftheEoR,inordertodeterminethepower
INTERFEROMETER
spectrumofthefluctuationsrobustly,onemustalsoaccount
for the presence of Galactic and extragalactic foreground For a generic radio interferometer, the Measurement Equa-
emission (see e.g. Shaver et al. (1999)) which can be orders tion (Hamaker, Bregman, & Sault 1996; Smirnov 2011) for
ofmagnitudegreaterthantheEoR signal ofinterest.Typi- apairofantennasp,qobservingasinglepointsourceallows
callytheseforegroundsaredealtwithbeforetheestimation us to construct a ‘visibility matrix’, V , as:
pq
ofthesignal(e.g.Harkeretal.2009;Bonaldi&Brown2014
),butideallyonewouldfitsimultaneouslyfortheEoRsignal
and the foregrounds in order to account for the covariance Vpq =JpBJHq, (1)
between the two correctly, and hence produce an unbiased
where H denotes the Hermitian transpose, B is the ‘bright-
estimate of the significance of the EoR power spectrum re-
ness matrix’, given by
covered from a given data set.
In this paper we outline a general Bayesian framework
(cid:20) (cid:21)
thatallowsjustsuchajointanalysis,whichwewillconsider I+Q U +iV
B= , (2)
in two different regimes. First, when the number of spatial U −iV I−Q
scales we wish to include in our model for the EoR power
for Stokes parameters (I,Q,U,V), and J are 2×2 Jones
spectrum is small ((cid:46) 20000) we sample directly from the p,q
matrices that describe the cumulative product of all pro-
sphericalpowerspectrumcoefficientsoftheEoRsignal.This
pogation effects along the signal path.
results ina computationalproblem that islow–dimensional
In this work we will be considering only observations
(∼10),buthighincomputationalexpense,withlargedense
thatareuncorruptedby(orcorrectedfor),forexample,cal-
matrix inversions required in every likelihood calculation.
ibration errors, or ionospheric effects. Thus the only contri-
Assuch,whenthenumberofspatialscalestobeincludedis
butionstotheJonesmatricesthatwewillconsiderarethose
larger,andthematrixinversionsrequiredbytheanalysisbe-
thatcomefromascalarphasedelayK foreachantennap,
come computationally intractable, we sample instead from p
defined:
the joint probability density of the spherical power spec-
trumcoefficientsandtheEoRsignalrealisation.Thisallows
ustoeliminateallmatrix-matrixmultiplicationsandcostly K =exp(−2πi(u l+v m+w (n−1)), (3)
p p p p
matrix inversions from the likelihood calculation entirely, √
replacing them with matrix-vector operations and diagonal with l,m, and n = 1−l2−m2 the direction cosines of
matrix inversions. In this case the dimensionality is much the vector from the point source to the antenna, which we
larger(∼20000),andsoweperformthesamplingprocessus- denote Ω, and (u,v,w) the antenna coordinates in wave-
ingaGuidedHamiltonianSampler(GHS)(Balan,Ashdown lengths,andatermthatdescribesthepowerprimarybeam
& Hobson, in prep, henceforth B17, see e.g. Lentati et al. of the antenna, which we will denote Pp.
(2013) for uses in other astrophysical fields) which exploits Integratingoverthewholesky,wecanthereforerewrite
Hamiltonian Monte Carlo sampling methods to provide an Eq. 1 explicitly including only the discussed terms as:
efficient means of sampling in large numbers of dimensions
(potentially ∼ 106). This framework has been applied to (cid:90) 4π
simulated interferometric observations that combined real- Vpq = dΩPp(Ω)B(Ω)PHq(Ω)
istic astrophysical foregrounds, and the EoR signal in Sims 0
× exp(−2πi(u ·x)), (4)
et al. (2016), where it was shown to result in unbiased esti- pq
√
with x = (l,m, 1−l2−m2), and u = (u −u ,v −
pq p q p
v ,w −w ).
q p q
In practice this integral is difficult to evaluate directly,
1 http://reionization.org andsoweperformasineprojectionontotheplane(l,m)at
(cid:13)c 0000RAS,MNRAS000,000–000
Bayesian power spectrum estimation 3
the field centre. If the field of interest is sufficiently small, allowing us to construct a general likelihood for a model
we can also make the approximation that: vector m constructed from the set of parameters Θ as:
(cid:16)(cid:112)1−l2−m2−1(cid:17)w≈−1(l2+m2)w≈0 (5) Pr(d|Θ) = 1
2 (cid:112)
(2π)2Nvis)detN
andsoconsideronlyx=(l,m),u=(u,v).Inthecontextof (cid:20) (cid:21)
1
thesimulationspresentedinSection6,weconsideraprimary × exp − (d−m(Θ))T N−1(d−m(Θ))(1.1)
2
beam with a full width at half maximum of 8 degrees. For
an8degreeseperationwewouldhavel2+m2 ∼0.02,which Henceforthforclarityinthemathematicalnotation,wewill
we consider to be in the regime where this approximation consider our data and hence model vectors to be the con-
holds, resulting in the expression: catenationoftherealpartandimaginaryparts,ratherthan
a set of complex values. As such d, and m will be vectors
(cid:90) of length 2×N , while the diagonal elements of N will
V = d2xP (x)B(x)PH(x) vis
pq p q be given by the rms of the Gaussian noise in the real and
imaginary parts of the observed visibilities separately.
× exp(−2πi(u ·x)). (6)
pq
Finally if we consider only the total intensity of the
skyI(x),weobtainforanypairofantennas(or‘baseline’), 2.2 A model grid
operating at a single frequency ν the expression:
InthecontextofEq.8,ourmodelm(Θ)willbearepresen-
(cid:90) tion of the complex visibility plane S. In principle we can
V (u ,ν )= d2xP (x,ν )I(x,ν )exp(2πiu ·x), (7) simplydividethecomplexplaneintoequalareacellsofside
i i i i i i i
∆u, however clearly our ability to determine the properties
wherewehavedroppedthesubscriptpqforthecoordinate ofthepowerspectrumcorrectlywilldependonthenumber
vector ui, and have replaced the visibility matrix with the ofcellschosentomakeupourmodel.Sinceboththenumber
complex number Vi(ui,νi), and with Pi(x,νi) the primary of samples required, and the speed of the likelihood evalu-
beamprofileforbaselinei,whichwehavewrittenexplicitly ation will be strongly dependent upon the number of cells
as a function of the observing frequency νi. used, a compromise must be made between how accurately
By the convolution theorm, which states that the our model can represent the true complex plane, and our
Fourier transform of a product of functions is the convo- ability to perform the computational analysis.
lution of the Fourier transforms of the functions seperately, In practice a natural maximum size for the model cells
we can define the apeture function A(u,ν) as the Fourier exist, as an antenna of diameter D will convolve the com-
transform of the primary beam P(x,ν), and the complex plex visibility plane with a function that has scale length
visibility plane S(u,ν) as the Fourier transform of the sky ∼ D/2λ. As such, from sampling theory we require ∆u <
brightness I(x,ν), and so rewrite Eq. 7: D/2λinorderforourmodeltoadequatelydescribetheun-
derlying visibility plane.
(cid:90) As we are working with a model grid, initially it may
V (u ,ν )= d2uA(u−u ,ν )S(u,ν ). (8)
i i i i i i seemamorenaturalchoicetodefineourmodelintheimage
domain. Here we can construct a uniformly spaced N ×
Inthefollowingsectionswenowdescribeourmodelfor pix
N ×N cube,whereN isthenumberofpixelsalong
S(u,ν ) that allows us to reconstruct the observed visibili- pix chan pix
i onesideoftheimage,andN isthenumberofchannels.
tiesV (u ,ν ),whileremainingcomputationallytractableto chan
i i i Definingthevectorofmodelamplitudesforthepixelsinthe
evaluate.
image as c, we can then generate a set of model visibilities
m as:
2.1 Constructing a likelihood
We begin by considering the complex visibilities obtained m=F−1Pc, (12)
n
during an interferometric observation to be the sum of
wherePisadiagonalN ×N ×N matrixthatencodes
a signal component s sampled from the visibility plane, pix pix chan
theprimarybeamcorrection,andF−1isa2N ×N2 N
and an instrumental noise component n, where we describe n vis pix chan
matrix,wherethefactor2,aspreviouslydiscussed,accounts
the noise as a zero–mean statistically homogeneous Gaus-
for the real and imaginary parts, and describes the inverse
sianrandomfield,uncorrelatedbetweendifferentvisibilities,
Fourier transform from our primary beam corrected image,
with covariance matrix N given by:
to the sampled (u,v) coordinates.
AswewillshowinSection3,however,whenweinclude
Nij =(cid:10)nin∗j(cid:11)=δijσj (9) a prior on the EoR signal, that prior is defined in k-space,
suchthatcontributionstothesignalfromtermswithsimilar
where (cid:104)..(cid:105) represents the expectation value, and σ is the
j |k|valueswillbeconsideredtocomefromasingleGaussian
rms value of the noise term for visibility j.
distribution of some variance to be determined during the
We can therefore write our data vector d containing
analysis. When defining the prior in this way, construct-
N complex visibilities as:
vis ing our model in the image domain results in large dense
matrix inversions, rapidly making the analysis intractable.
d=s+n, (10) If we construct our model in the UV domain instead, this
(cid:13)c 0000RAS,MNRAS000,000–000
4 L. Lentati et al.
priormatrixbecomesdiagonal,andtheinversionstrivialto describing power at large spatial scales, to the uniformly
compute. gridded image, which remains as in Eq. 13.
Inprinciplewecouldreplicatetheeffectoftheprimary
beam in the UV domain by performing a convolution of
ourmodelUVgridwiththetelescopeapeturefunction,the 2.4 Incomplete UV coverage
Fourier transform of the primary beam. This, however, is
In any interferometric observation the coverage of the UV-
muchlesscomputationallytractablethanperformingasim-
plane will not be complete. In particular, an interferometer
plemultiplicationintheimagedomain,andthenperforming
isnotsensitivetothe(0,0)UVcoordinate,astheminimum
a Fourier transform of this primary beam corrected image
separation between two antennas cannot be less than the
to the UV domain. We can however combine the speed of
size of the dish, D, so that any observation made will be
defining our model in the UV domain with the efficiency of
insensitivetothetruemeanofthesky.Moregenerallyhow-
performing the primary beam correction in the image do-
everthesamplingoftheUVplanebyourinterferometerwill
main by defining a new matrix P, which acts on a vector
result in gaps, or areas of decreased sensitivity, the precise
ofmodelparametersa,whichnowdescribestheamplitudes
nature of which will be determined by the arrangement of
for the real and imaginary parts of the uv–plane such that
antennas and length of observation.
our model visibilities m are given by:
WecancomputetheweightingintheUV–planethatre-
sults from the sampling of a set of N discrete visibilities,
vis
m = F¯a (13) which can also be considered the Fourier transform of the
interferometerpointspreadfunction.Forthecurrentanaly-
= F−1PFa
n sis we compute these weights by defining a gridding matrix
G as:
wherethematrixFissimplytheFouriertransformfromour
gridded(u,v)domainpoints,tothegriddedimage.Thisad-
ditionalmultiplicationhasnoimpactontheevaluationtime G=F−1F , (14)
n
of our likelihood as we can simply precompute the matrix
product F−n1PF and still evaluate the model vector m in a where Fn is a Np2ix×2Nvis matrix representing the direct
single matrix–vector multiplication. FouriertransformofthevisibilitiestoanNpix×Npix image
domain grid, and F−1 describes the Fourier transform from
the gridded image to our gridded uv cells.
In principle UV cells far from the points sampled by
2.3 Including large spatial scales the interferometer could have non–zero weights, but would
contribute a negligable amount to our model. We therefore
While our definition of the matrix F in Eq. 13 represents
compute the weight of any UV cell, W , as:
a standard 2–dimensional Fourier transform, this will not i
correctlymodelpoweronspatialscalesgreaterthanthesize
oftheimage,orequivlantly,onscalessuchthat|u|<D/2λ. W =GN−1GT, (15)
i i i
Linear trends that extend across the image, such as those
andsowecanconsideronlytheN cellsinourmodelthat
that can be expected from Galactic foregrounds, will ‘leak’ uv
contain some fraction of the total weight. For the purposes
intothemodelcoefficientsthatdescribepoweronscalesless
of this work we take that fraction to be 99%, and include
than the image size, with |u|>D/2λ.
only this subset of cells in the definition of our matrix F.
In principle we could incorporate these scales into our
WorkingdirectlyintheUVdomainitisthereforetrivial
modelbysimplyreducingthecellsizeinourUVmodel,∆u.
toaccountfortheincompleteUVcoverageofagivenobser-
Forexample,poweronscales10timesthesizeoftheimage
vation.Forcompletenesswewilldiscussbrieflyoneapproach
couldbeincoporatedrobustlysimplybychoosingacellsize
to account for this if the model is defined in the image do-
of0.1×∆u.Thishoweverisnotcomputationallytractable,
main, though we will not pursue this approach further as
as it will increase the dimensionality of our problem by a
discussed in Section 2.2.
factorof100.Inthecontextofone–dimensionalpowerspec-
Given a set of N UV cells to include in the model,
trumrecoveryfromatimeseriesoflengthT inotherfields, uv
wethereforehaveasetofN ≡N2 −N cellsthatour
van Haasteren et al. (2014) show that using a log spacing Nuv pix uv
observation is not sensitive to. By constructing the N2 ×
for sub-harmonic frequencies (ν <1/T), and linear spacing pix
forν (cid:62)1/T instepsof∆ν =1/T,allowsforaccuraterecov- NNuv matrix FNG that describes the Fourier transform of
those cells to the image grid we can produce a set of basis
ery of the power spectrum when there is significant power
vectorsthatdescribethenull-sky.IfwetaketheSVDofthe
in these low frequency terms.
matrix, which results in three matrices:
Wethereforetakeanequivalentapproachwithourtwo–
dimensionalanalysis.Wedefineasetof10evenlylogspaced
spatial scales between the size of the image, and 10 times F =USVT, (16)
NG
the size of the image, and include them in our model, si-
multaneously with the linear UV domain grid with cell size wecantaketheresultantNp2ix×Np2ix matrixUandseparate
∆u. it into two components:
The matrix F therefore no longer represents the trans-
form from a uniformly gridded UV domain model, to the (cid:0) (cid:1)
U= M,M . (17)
C
uniformly gridded image. Instead it defines the transform
from our complete gridded UV model, including the points Here M is a N2 ×N matrix, representing the set of
pix Nuv
(cid:13)c 0000RAS,MNRAS000,000–000
Bayesian power spectrum estimation 5
N orthonormalbasisvectorsthatfullydescribetheorig- a log spacing for the sub-harmonics allows for accurate re-
Nuv
inal matrix F , while M is the N2 ×N complement. covery of the spectrum when there is significant power in
NG C pix uv
Thiscomplementcanbeconsideredaprojectionmatrixthat these low frequency terms. In van Haasteren et al. (2014)
allows us to take a length N vector of parameters, and theynote,however,thatexceptforthemostextremecases,
uv
produce an N ×N image that contains no contribu- these sub-harmonics terms can also be well modelled by a
pix pix
tion from the UV cells that formed our null-sky basis. This simple quadratic in frequency.
is completely analogous to the operation of projecting out In our model we therefore include a quadratic in fre-
unwanted model parameters in other fields, such as pulsar quency to act as a proxy to the sub–harmoinic structure in
timing (see e.g., van Haasteren & Levin 2013). our data:
2ν2κ
2.5 The full K-cube Qz = 10−o23Bc2(q0+νq1+ν2q2)≡Qzq, (21)
AsmentionedinSection2.2wedefineourEoRsignalmodel whereq=(q ,q ,q )areamplitudeparameterstobefitfor.
0 1 2
directlyin(kx,ky,kz)space,usingthesetof(kx,ky)points We can therefore write our final model given the con-
that correspond to the set of Nuv gridded UV coordinates catenatedNuv×nmaxη lengthvectorofofsignalcoefficients
that we include in our analysis as discussed in section 2. a,andN ×3quadraticcoefficientsqdefinedinthek-cube
uv
These(u,v)coordinatescanbetranslateddirectlytokx,ky as:
coordinates through the relations:
m=F¯(F a+Q q), (22)
2πu z z
k = (18)
x Dm whereFzandQzbothnowrepresentblockdiagonalmatrices
2πv thatactindependentlyoneachsetofcoefficients(a,q)for
k = i i
y Dm eachmodelUVcelli,andF¯ isthetwo–dimensionalprimary
beam corrected transform described in Eq.13, so that our
likelihood at this stage becomes
with D the transverse comoving distance from the obser-
m
vatory to the redshift of the EoR observation.
Our sampled visibilities, however, are defined in (u,ν) 1
Pr(d|a,q) =
space and so we must also transform our model cube from (cid:112)(2π)2Nvis)detN
kmzattroixoFbszearvsing frequency ν. We therefore first define the × exp(cid:20)−12(cid:0)d−F¯(Fza+Qzq)(cid:1)T
√ × N−1(cid:0)d−F¯(F a+Q q)(cid:1)(cid:3). (23)
2ν2κ 2 (cid:18)2π (cid:19) z z
F (ν,n )= B sin n ν , (19)
z η 10−23c2N B η
chan
2.6 Including foreground models
with an equivalent cosine term, B the bandwidth of the
observation, and η = nη/B the Fourier domain parame- In order to make a detection of the EoR power spectrum,
ter after transforming along the frequency axis, where we correctly accounting for foreground signals in the visibil-
include terms up to some maximum nη = nmaxη. The fac- ities will be key. These include diffuse emission from the
tor 2ν2κB/c2 from the Rayleigh-Jeans law, with κB the Galaxy, and continuum emission from extragalactic sources
Boltzmann constant, and c the speed of light, at the front (e.g.Jeli´cetal.2008),whichincombinationcanbeuptofive
of this expression allows us to convert from units of mK ordersofmagnitudegreaterthantheEoRsignalofinterest
in the model (kx,ky,kz) cube, to Janskys (1 Jy = 10−26 (Shaver et al. 1999).
Wm−2Hz−1). We can then relate η to the cosmological pa- In principle, any additional foreground model can be
rameter kz via the relation: added to the model in Eq 23, either in the image domain,
orintheUV.Writingtheforegroundmodelasafunctionof
k = 2πH0f21E(z)η, (20) some parameters m(Θfg), we would write:
z c(1+z)2
1
with z the redshift of the EoR observation, H0 the Hubble Pr(d|a,Θfg) = (cid:112) (24)
constant,E(z)thedimensionlessHubbleparameter,f the (2π)2Nv)detN
21
(cid:20)
frequencyofthe21cmlineemission,andcthespeedoflight. × exp −1(cid:0)d−F¯(F a+Q q)−m(Θ )(cid:1)T
WhileFz representsatypical1DFouriertransform,the 2 z z fg
EoR signal present in the data will include fluctuations on × N−1(cid:0)d−F¯(F a+Q q)−m(Θ )(cid:1)(cid:3),(25)
scales much longer than the bandwidth of the observation. z z fg
Written as in Eq. 19 this transform will not correctly ac- and then proceed to sample over the joint parameter space
count for these low frequencies, causing them to ‘leak’ into (a,q,Θ ).
fg
the higher frequency terms included in our model biasing One approach advocated to model smooth foreground
thepowerspectrumparameterestimatesatthescalesofin- emission is to use a simple polynomial in frequency (e.g.
terest. In principle these low frequencies could be included Bowman, Morales, & Hewitt 2009). In Section 2.5 we note
simply by adding additional log–spaced Fourier modes to that we include a quadratic in our Fourier transform from
our matrix 19 with n < 1, as in Section 2.3 that using frequency to the parameter η in order to model any low
η
(cid:13)c 0000RAS,MNRAS000,000–000
6 L. Lentati et al.
frequencyvariationsthatexistinthedatathathaveperiods be a set of independent parameters ϕ , one for each |k| bin
i
longer than the bandwidth of the observation. We reiterate i.
here, however, that the primary purpose of the quadratic Wethenwritethejointprobabilitydensityofthemodel
is simply to provide us with an unbiased estimate of the coefficientsthatdefineourpowerspectrum,andthek-space
scales of interest (i.e. with n (cid:62) 1 in Eq. 19). In principle signal coefficients Pr(ϕ,a|d) as:
η
higher order terms could also be added, however these will
be increasingly covariant with the Fourier modes included
Pr(ϕ,a,q|d) ∝ Pr(d|q,a)Pr(a|ϕ)Pr(ϕ)Pr(q) (27)
in the model, and so we do not take this approach.
Insection3wedescribeourapproachtoestimatingthe and then marginalise over all a and q in order to find the
EoRpowerspectrum,incorporatingaprioronthesignalco- posteriorfortheparametersthatdefinethepowerspectrum
efficients a that incorporates the assumption that the EoR ϕ alone.
signalisspatiallyhomogenous.Wedonot,however,incorpo- ForourchoiceofPr(ϕ)weuseeitherauniformpriorin
ratesuchaprioronthequadratictermsinourestimationof the amplitude of the coefficient, or a uniform prior in log
10
thepowerspectrum,asthesetermswilllikelybedominated space. The latter case is the least informative prior we can
byforegroundemissionandsowillnothavethesamehomo- choose,howeverwhenthegoalistosetanupperlimitaprior
geneity,atleastinthecaseoftheGalacticforeground.While thatisuniforminlogspaceisnotappropriate,astheupper
inprincipleaseparateGaussianpriorcouldbeincludedfor limitisdependentupontheboundsoftheprior.Whenthis
these quadratic terms, in order to make our analysis of the is the case we use a prior that is uniform in the amplitude.
EoRsignalmoreconservativeweusealessinformativeuni- In either case we draw our samples from the parameter ρ ,
i
form prior on the amplitudes of these coefficients. such that:
In our simulations in Section 6 we will be considering
only simple continuum models, with flat spectrum sources,
2π2N2 N Ω4
howeveramoredetailedaccountontheeffectofforegrounds, ϕ = pix chan pix10ρi, (28)
i |k|3V
when including realistic frequency evolution and spatial i
structure,usingthetechniquedescribedinthisworkisgiven with V the surveyed volume in Mpc3, and Ω the image
pix
in Sims et al. (2016). pixel size in radians, and the spherical power spectrum co-
efficients 10ρi defined in units of mK2 (h−1 Mpc)3.
Given these two choices of prior, and assuming a uni-
form prior on the quadraditc amplitude parameters q such
3 ESTIMATING THE POWER SPECTRUM
that Pr(q) = 1, the conditional distribution Pr(d|q,a) re-
Assuming the EoR signal to be spatially homogenous the mains as in Eq. 23, while the latter part of Eqn 27 is given
covariance matrix Φ of the k-space coefficients a will be by:
diagonal, with components
1 (cid:104) (cid:105)
Pr(a|ρ)Pr(ρ) ∝ √ exp −a∗TΦ−1a . (29)
Φ =(cid:104)a a (cid:105)=ϕ δ , (26) detϕ
ij i j i ij
when assuming a log-uniform prior on the amplitude of the
where there is no sum over i, and the set of coefficients ϕ
i
power spectrum coefficients (Pr(ρ)=1), and when using a
representthetheoreticalpowerspectrumfortheEoRsignal.
prior that is uniform in the amplitude:
In the framework we will describe below we are free
to choose any functional form for the coefficients ϕ . It is
i
htheereptohweenrtshpaetc,trsuhmouladtotnheewpiosihnttooffistamapsplinecgi,fictompoedrefolrtmo Pr(a|ρ)Pr(ρ) ∝ √ 1 exp(cid:104)−a∗TΦ−1a(cid:105)(cid:89)Ns 10ρs, (30)
detϕ
modelselectionforexample,thesetofcoefficientsϕ should s=1
i
be given by some function f(Θ), where we sample from the with N the number of spherical power spectrum bins used
s
parameters Θ from which the power spectrum coefficients in the prior.
ϕ can then be derived.
i
In Section 6 we will be comparing the results of our
3.1 A non-Gaussian prior
methodwithaninputsimulationobtainedusingtheseminu-
merical21cmFASTalgorithm(Mesinger,Furlanetto,&Cen The approach outlined in Section 3 explicitly assumes that
2011; Mesinger & Furlanetto 2007). After computing the the signal coefficients that fall into a specific |k| bin are
EoRsimulation,21cmFASToutputsasphericalpowerspec- welldescribedbyaGaussianrandomprocess,howeverifde-
trum of the simulated cube, performing a 3–dimensional sired we can relax this assumption and use a non-Gaussian
FFT and averaging all the Fourier coefficients that fall priorinstead.WeusetheapproachdevelopedinRochaetal.
within some spherical shell in k-space in order to calculate (2001), which is based on the energy eigenmode wavefunc-
the power spectrum within that bin. tions of a simple harmonic oscillator, and has since been
In order to draw the most direct comparison with the applied to other areas of astrophysics (Lentati, Hobson, &
input simulation, we therefore calculate the quantity |k| = Alexander 2014), which we outline in brief below.
(cid:112)
kx2+ky2+kz2 for each k–space coefficient a in our model, For a general random variable x, we write the PDF for
anddefineasetofbinsinthequantity|k|.Asin21cmFAST, fluctuations in x as:
fwoerbdienfisnne =the1.e.d.gnemsaoxfwthitehsenbmianxstthoeblaergsepsatcbedinaisnc1l.u5dne∆d|ikn| Pr(x|σ,α)=exp(cid:20)− x2 (cid:21)(cid:12)(cid:12)(cid:12)(cid:88)∞ α C H (cid:18)√x (cid:19)(cid:12)(cid:12)(cid:12)2 (31)
the model. Our model for the power spectrum ϕ will then 2σ2 (cid:12)(cid:12) n n n 2σ (cid:12)(cid:12)
n=0
(cid:13)c 0000RAS,MNRAS000,000–000
Bayesian power spectrum estimation 7
withα freeparametersthatdescribetherelativecontribu- situation we can perform the marginalisation numerically,
n
tions of each term to the sum, and sampling directly from the high dimension, joint probabil-
itydistributiondescribedinEq27,aprocessmadepossible
1
Cn(σ)= √ , (32) through the use of a GHS (B17) which we describe in the
(2nn! 2πσ)1/2
Section 5.
is a normalization factor. Equation 31 forms a complete set
of PDFs, normalised such that:
(cid:90) ∞ (cid:20) x2(cid:21) (cid:18) x (cid:19) (cid:18) x (cid:19) 4 THE SMALL K-CUBE REGIME:
dx exp − C H √ C H √ =δ , ANALYTICAL MARGINALISATION OVER
σ2 n n 2σ m m 2σ mn
−∞ THE SIGNAL COEFFICIENTS
(33)
with δ the Kronecker delta, where the ground state, In order to perform the marginalisation over the signal co-
mn
H , reproduces a standard Gaussian PDF, and any non- efficientsaandq,wefirstsimplifyournotationbydefining
0
Gaussianityinthedistributionofxwillbereflectedinnon- the vector b as the concatenation of the vectors a and q,
zero values for the coefficients α associated with higher and the matrix T such that our signal can be rewritten:
n
order states.
The onlyconstraintwe must placeon the valuesof the
m=F¯(F a+Q q)=Tb. (37)
amplitudes α is: z z
n(cid:88)max|αn|2 =1 (34) denoWtinegthTeTnNw−r1itTe+thΦel−o1gaosftΣh,ewlikheelriehotohde ienleEmqen2t7s,owfhtihche
n=0 matrix Φ−1 that correspond to the coefficients q are set to
with n the maximum number of coefficients to be in- zero, and denoting TTN−1d as d¯ is given by:
max
cluded in the model for the PDF. This is performed most 1 1
simply by setting: logL=− dTTTN−1Td− bTΣb+d¯Tb. (38)
2 2
(cid:118)
α0 =(cid:117)(cid:117)(cid:116)1−n(cid:88)max|αn|2. (35) Taking the derivitiv∂eloogfLlogL with respect to a gives us:
n=1 =−Σb+d¯T, (39)
∂b
Usingthisformalismwecanthenparameteriseanynon-
whichcanbesolvedtogiveusthemaximumlikelihoodvec-
Gaussianity in the coefficients a by rewriting Eq. 29: tor of coefficients bˆ:
(cid:20) 1 (cid:21) bˆ =Σ−1d¯. (40)
Pr(a|ϕ,α) = exp − aTΦ−1a (36)
2 Re-expressing Eq. 38 in terms of bˆ:
× (cid:89)n (cid:12)(cid:12)(cid:12)n(cid:88)maxα C (ϕ )H (cid:18)√ai (cid:19)(cid:12)(cid:12)(cid:12)2.
i=1(cid:12)(cid:12)n=0 n n i n 2ϕi (cid:12)(cid:12) logL = −1dTTTN−1Td+ 1bˆTΣbˆ
2 2
Theadvantageofthismethodisthatonemayuseafi- 1
− (b−bˆ)TΣ(b−bˆ), (41)
nitesetofnon-zeroαn tomodelthenon-Gaussianity,with- 2
outmathematicalinconsistency.Anytruncationoftheseries
the 3rd term in this expression can then be integrated with
still yields a proper distribution, in contrast to the more
respect to the m elements in b to give:
commonly used Edgeworth expansion (e.g. Contaldi et al.
2000). (cid:90) +∞ (cid:20) 1 (cid:21)
I = dbexp − (b−bˆ)TΣ(b−bˆ)
2
−∞
3.2 Performing the sampling = (2π)m det Σ−21. (42)
How we now perform the sampling depends entirely on the Our marginalised probability distribution for a set of EoR
sizeofthek-cubewewillbeusingtodescribetheEoRsignal power spectrum coefficients is then given as:
present in the visibilities. When the size of the k-cube, and
thus the number of signal parameters used to describe the det(Σ)−21
Pr(ϕ|d) ∝ (43)
(cid:112)
signal is small (< 20000), we can marginalise over the co- det(ϕ) det(N)
efficients a analytically and sample directly from the power (cid:20) 1(cid:16) (cid:17)(cid:21)
spectrum coefficients ρ, a process we describe in Section 4. × exp − dTN−1d−d¯TΣ−1d¯ .
2
In this scenario we can perform the sampling using Multi-
Nest(Feroz&Hobson2008;Feroz,Hobson,&Bridges2009), WhentakingthisapproachinSection6weusetheMAGMA
allowingustoperformrobustevidenceevaluation,andper- (MatrixAlgebraonGPUandMulticoreArchitectures)GPU
form model selection on the EoR power spectrum. accelerated linear algebra package2 to perform the cholesky
If however we wish to sample over a larger number of decomposition for each likelihood evaluation.
signal coefficients, the matrix to be inverted when perform-
ing the marginalisation analytically will become too large
to make this approach computationally tractable. In this 2 http://icl.cs.utk.edu/magma/
(cid:13)c 0000RAS,MNRAS000,000–000
8 L. Lentati et al.
5 THE LARGE K-CUBE REGIME: for the maximum set of signal coefficients b analyti-
max
NUMERICAL MARGINALISATION OVER cally using Eq. 40 so when searching for the global maxi-
THE SIGNAL COEFFICIENTS mum we need only search over the subset of parameters ρ.
This is achieved by using either a particle swarm algorithm
For a detailed account of both Hamiltonian Monte Carlo
(Kennedy(1995,2001)andforusesincosmologicalparame-
(HMC) and GHS refer to B17 or Lentati et al. (2013), here
terestimationseee.g.Prasad&Souradeep(2012))orusing
we will provide only a brief introduction of the key aspects
agradientsearchoptimizationGilbert&Lemarchal(1989).
of each.
HMC sampling (Duane et al. 1987) has been widely
applied in Bayesian computation (Neal, R 1993), and has
been successfully applied to problems with extremely large 5.1 Low signal-to-noise parameterisation
numbers of dimensions (∼106 see e.g. Taylor, Ashdown, &
In Lentati et al. (2016) an alternative parameterisation of
Hobson(2008)).WhereconventionalMCMCmethodsmove
the likelihood described in Eq. 29 is described that is much
through the parameter space by a random walk and there-
more efficient in the low signal-to-noise regime, where the
forerequireaprohibitivenumberofsamplestoexplore-high
power spectrum coefficients are not detected. We can an-
dimensional spaces, HMC exploits techniques that describe
ticipate that, at least at first, this is likely to be the case
themotionofparticlesinpotentialwells,andsuppressesthis
with the EoR signal, and thus we summarise this new pa-
random walk behaviour. This allows the HMC approach to
rameterisationinthecontextofourthree-dimensionalpower
maintain a reasonable efficiency even for high-dimensional
spectrum analysis below.
problems.
Rather than sample from the parameters a, we instead
Possibly the main shortcoming of traditional HMC
samplefromtherelatedparametersuwherefortheithsig-
methods is that it requires a large number of tuning pa-
nal amplitude we will have:
rameters in order to navigate the parameter space. In par-
ticular, every parameter requires a step size, and the total
√
numberofstepsineachiterationofthesamplermustalsobe a =u ϕ , (49)
i i i
chosen.Thesearetypicallydeterminedviaexpensivetuning
runs.TheGHSisdesignedtobypassmuchofthistuningby whereasbeforeϕi isthethree-dimensionalpowerspectrum
using the Hessian of the sampled probability distribution, coefficient that describes the standard deviation of the ith
calculatedatitspeak,tosetthestepsizeandcovarianceof amplitude parameter. In order to still sample uniformly in
the parameter space. The number of steps at each iteration the original parameters, a, we then include an additional
isthendrawnfromauniformdistributionU(1,nmax),with term,thedeterminantoftheJacobiandescribingthetrans-
nmax of ten found to be suitable for all tested problems. A formation from ai to ui. The Jacobian in this case has ele-
single global scaling parameter for the step size is then the ments:
only tunable parameter, chosen such that the acceptance
rate for the GHS is ∼68%. √
J = ϕ δ , (50)
Therefore in order to perform sampling we need the i,j i i,j,
following:
withδ thekroneckardelta.Thedeterminantistherefore:
i,j,
• The gradient of Ψ for each parameter x
i
• The peak of the joint distribution (cid:89)m √
• The Hessian at that peak det(J)= ϕi, (51)
i=0
Thegradientsofourparametersaregivenbythefollowing:
which acts to cancel exactly with the determinant of the
matrix Ψ in Eq. 29. In the following work when using the
∂Ψ
=−(d−Tb)TN−1T+bTΦ−1 (44) GHS we will use both parameterisations dependent upon
∂b
whether we are in the high or low signal-to-noise regime.
(cid:18) (cid:19)
∂Ψ 1 ∂Φ 1 ∂Φ
= Tr Φ−1 − bTΦ−1 Φ−1b (45)
∂ρ 2 ∂ρ 2 ∂ρ
i i i
and the components of the Hessian are: 6 APPLICATION TO SIMULATIONS
We now apply the methods described in the preceding sec-
∂2Ψ =TTN−1T+Φ−1 (46) tionstoaseriesoffivesimulations.Inthefirstthreeofthese
∂b2 we perform the simulation using a 37 element interferome-
ter,withantennalocationsshowninFig.1(top,left),while
∂2Ψ ∂Φ ∂Φ 1 ∂2Φ forthefinaltwosimulationsweusea61elementarray(Fig.
∂ρ2 =bTΦ−1∂ρ Φ−1∂ρ Φ−1b− 2bTΦ−1 ∂ρ2 Φ−1b 1 bottom, left). In both cases we simulate 38 2KHz chan-
i i i i (47) nels spanning the range 122.17∼129.90MHz and take our
antennas configuration to be representative of the upcom-
ing HERA 37 and 61 element arrays.
∂2Ψ ∂Φ
=−Φ−1 Φ−1b (48) Wecalculatethetheoreticalinstrumentalthermalnoise
∂ρ ∂b ∂ρ
i i for our simulation per unit time τ given the parameters in
For a set of power spectrum coefficients ρ we can solve Table 1, as in Taylor, Carilli, & Perley (1999):
(cid:13)c 0000RAS,MNRAS000,000–000
Bayesian power spectrum estimation 9
0.018
30 0.016
40 40 s) 20 0.014
h
30 30 gt 10 0.012
n (m) 20 20 elen 0 0.01
ositio 10 m) 10 wav-10 00..000068
Y p 0 V ( 0 V (-20 0.004
nna -10 -10 -30 0.002
nte-20 -20 0
A -30 -20 -10 0 10 20 30
-30 -30 U (wavelengths)
-40 -40
-50 -40 -30 A-n20tenn-10a X p0osit 1i0on ( m20) 30 40 50 -40 -30 -20 -10 U ( 0m) 10 20 30 40
0.012
60 150 s) 40 0.01
h
gt 20 0.008
40 100 n
Antenna Y position (m)- 22 000 V (m)- 55 000 V (wavele--42 000 0000...000000246
-40 -100 -40 -20 0 20 40
U (wavelengths)
-60 -150
-60 -40 A-2n0tenna X p0osition ( m20) 40 60 -150 -100 -50 U ( 0m) 50 100 150
Figure1.[Top](Left)Antennapositionsforthe37elementinterferometerusedinsimulations1-3inSection6.(Middle)Sampled(u,v)
coordinatesfroma30minuteobservation,and(Right)therelativeweightsofthe(u,v)obtainedfromEq.15normalisedtohaveasum
of1.[Bottom]As[Top]butforthe61elementarrayusedinsimulations4-5inSection6.
overasingle30minutepointinggiventhoseconfigurations.
Parameter Description Value
We take the pointing centre to have right ascension equal
ηs Systemefficiency 1 to 0.0, and declination equal to -30.0. This results in 21870
∆ν Bandpassofobservation 200kHz sampled (u,v) coordinates per channel for the 37 element
ηa Antennaefficiency 1 array, and 58141 per channel for the 61 element array (Fig.
A Antennaeffectivearea 150m2
1, middle, top and bottom panels respectively).
Tsys Systemtemperature 550K Ourinputskymodelsareconstructedusing2048×2048
pixel channels, with a resolution of ∼ 40 arcseconds per
Table 1. Instrument and observation parameters used to calcu-
pixel, giving a total field of view of ∼ 23×23 degrees. We
latethevarianceofthetheoreticalinstrumentalthermalnoiseper
thenmultiplytheseskymodelsbyaGaussianprimarybeam
unittimeτ.
with a full width at half max of 8 degrees at 122.17 MHz,
and evaluate the direct Fourier transform of the observed
sky-models onto the sampled (u,v) points obtained previ-
ously.
1 2k T 1
σ(τ)=10−26 b sys√ (52) We now describe each of the five simulations in more
ηs ηaA 2∆ντ detail below:
Foreachsimulationweusethearrayconfigurationsfor
the 37 or 61 element interferometers in Fig.1 as input to Simulation 1
the CASA3 (Common Astronomy Software Applications) A 2000 hrs simulation of a single flat spectrum point
simobserve tool, to obtain the set of observed (u,v) coor- source 10.4 degrees away from the primary beam center,
dinatesthatcorrespondtoaseriesof30secondintegrations resulting in a 1000σ detection using the 37 element array
shown in Fig. 1 (top). We include uncorrelated thermal
noise in each visibility. To simulate 4000 repetitions of our
3 http://casa.nrao.edu 30 minute observation we therefore add 0.045Jy noise to
(cid:13)c 0000RAS,MNRAS000,000–000
10 L. Lentati et al.
Figure2.(left)AsinglechannelfromsimulationoftheEoRsignalusingtheseminumerical21cmFASTalgorithm(Mesinger,Furlanetto,
& Cen 2011; Mesinger & Furlanetto 2007) described in section 6, after subtracting the mean in the spatial variations for that channel.
(right)Asinglechannelfromasimplecontinuumskysimulationdescribedinsection6,aftersubtractingthemeaninthespatialvariations
for that channel. The continuum simulation is scaled such that the power is a factor 108 greater than in the EoR simulation. In both
casesthecolourscaleisinmK.
each of the 21870 visibilities in each channel. In order to shown in Fig. 1 (bottom).
compareequivalentsimulationsweusethesamewhitenoise
realization for simulations 1-3, and for simulations 4-5. Simulation 5
For ease of interpretation we do not include the quadratic AsSimulation4,butanadditionalflatspectrumcontinuum
described in Eq. 21 in our model when analysing this componentisaddedtothemodelasdescribedinSimulation
simulation. The purpose of this first simulation is to show 3.
in a straightforward way how our approach automatically
accountsforthefrequencydependenceoftheUV-sampling,
In order to adequately sample the aperture function of
which causes observed low frequency structure along
the Gaussian primary beam in the UV plane, we define our
individual baselines.
UV cells to each have a width of 2.5λ. We then use Eq.
15 to determine the set of cells to include in our model.
Simulation 2 The weightsfor each cellare shown inFig. 1 (right) for the
A 160 hour integration including only the EoR signal and 37 and 61 element arrays (top and bottom panels respec-
the uncorrelated thermal noise described in Simulation tively). Including all cells that contribute up to 99% of the
1. We generate the EoR signal using the seminumerical totalweight,wefindresultsin650,and1142UVcellsperη
21cm FAST algorithm (Mesinger, Furlanetto, & Cen 2011; modeincludedinthemodelforthe37and61elementarrays
Mesinger & Furlanetto 2007) to simulate a cosmological respectively.
volume of 10243 Mpc3. We use this same EoR realisation For simulations 1-3 we will use the analytic marginal-
in all subsequent simulations. An example of one channel isation over the signal coefficients described in Section 4,
from this EoR simulation is shown in Fig. 2 (left). samplingfromthe7dimensionalsphericalpowerspectrum
usingtheMultiNestalgorithm.Forsimulations4-5thenum-
Simulation 3 ber of signal coefficients included in the model is too great
AsSimulation2,butanadditionalflatspectrumcontinuum forthisanalyticapproach,andsoweperformthismarginal-
component is added to the model, shown in Fig. 2 (right). isation numerically using the GHS described in Section 5.
Each pixel in the continuum model is assigned a random
positivevaluedrawnuniformlybetweenzeroandone,which
is then held constant across the 38 channels for that pixel. 6.1 Results for Simulation 1
We then scale the image so that the total power in the
In Figure 3 (red lines) we show the injected real (left) and
meansubtractedcontinuumis∼108 timesthatoftheEoR
imaginary (right) signal for the longest (top) and shortest
signal. An example of one channel from this continuum
(bottom)baselinesforsimulation1,containingasinglepoint
simulation is shown in Fig. 2 (right).
source observed by the 37 element array shown in Fig. 1
(top).Noticeablythebaselinesshowstructureasafunction
Simulation 4 of frequency despite the fact that the injected source has a
As Simulation 2, however we use the 61 element array flat spectrum. This is simply a result of the baselines sam-
(cid:13)c 0000RAS,MNRAS000,000–000