Table Of ContentLLNL-JRNL-661076
Hierarchical Probabilistic
Inference of Cosmic Shear
M. D. Schneider, D. W. Hogg, P. J. Marshall, W. A.
Dawson, D. J. Bard, D. Boutigny, D. Lang, J. Meyers
September 23, 2014
The Astrophysical Journal
Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States
government. Neither the United States government nor Lawrence Livermore National Security, LLC,
nor any of their employees makes any warranty, expressed or implied, or assumes any legal liability or
responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or
process disclosed, or represents that its use would not infringe privately owned rights. Reference herein
to any specific commercial product, process, or service by trade name, trademark, manufacturer, or
otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the
United States government or Lawrence Livermore National Security, LLC. The views and opinions of
authors expressed herein do not necessarily state or reflect those of the United States government or
Lawrence Livermore National Security, LLC, and shall not be used for advertising or product
endorsement purposes.
LLNL-JRNL-661076
PreprinttypesetusingLATEXstyleemulateapjv.8/13/10
HIERARCHICAL PROBABILISTIC INFERENCE OF COSMIC SHEAR
Michael D. Schneider1,2, David W. Hogg3, Philip J. Marshall4,
William A. Dawson1, Joshua Meyers5, Deborah J. Bard4, Dustin Lang6
(Dated: Draft May 14, 2015)
LLNL-JRNL-661076
ABSTRACT
Point estimators for the shearing of galaxy images induced by gravitational lensing involve a com-
plex inverse problem in the presence of noise, pixelization, and model uncertainties. We present a
probabilistic forward modeling approach to gravitational lensing inference that has the potential to
mitigatethebiasedinferencesinmostcommonpointestimatorsandispracticalforupcominglensing
surveys. The first part of our statistical framework requires specification of a likelihood function for
the pixel data in an imaging survey given parameterized models for the galaxies in the images. We
derive the lensing shear posterior by marginalizing over all intrinsic galaxy properties that contribute
to the pixel data (i.e., not limited to galaxy ellipticities) and learn the distributions for the intrinsic
galaxy properties via hierarchical inference with a suitably flexible conditional probabilitiy distri-
bution specification. We use importance sampling to separate the modeling of small imaging areas
from the global shear inference, thereby rendering our algorithm computationally tractable for large
surveys. With simple numerical examples we demonstrate the improvements in accuracy from our
importance sampling approach, as well as the significance of the conditional distribution specification
for the intrinsic galaxy properties when the data are generated from an unknown number of distinct
galaxy populations with different morphological characteristics.
Subject headings: gravitational lensing: weak; methods: data analysis; methods: statistical; catalogs;
surveys; cosmology: observations
1. INTRODUCTION 2014). All analyses of which we are aware use point
estimators for the 2D ellipticities of galaxies in a cat-
All observations to date are consistent with a cosmo-
alog to infer cosmic shear effects. From the estimated
logicalmodelinwhichmassisclusteredonspatialscales
galaxy ellipticity components one can compute a two-
ranging from galaxy sizes of order a kiloparsec to hun-
point angular correlation function that is related to the
dredsofmegaparsecs,correspondingtoangularscaleson
mass clustering power spectrum in the standard cosmo-
theskyoftensofarcsecondstoseveraldegrees. Thecos-
logical framework.
mological mass distribution acts as a gravitational lens
There are several drawbacks to existing cosmic shear
for all luminous sources, which imparts inhomogeneous
inference methods that may severely limit the amount
image distortions because the mass distribution is clus-
of information that can be extracted about the 3D cos-
tered. In principle, the coherent lensing distortions of
mological mass distribution and underlying cosmologi-
luminous sources can be used to reconstruct the 3D cos-
cal model, for a given set of observations (LSST Dark
mological mass distribution over most of the observable
Energy Science Collaboration 2012). Bernstein & Arm-
volume of the universe. However, except near the most
strong (2014) (BA14) give a review of the primary chal-
dense and rare mass peaks, the gravitational lensing
lenges in existing methods including low signal-to-noise,
shearing of galaxy images is a percent-level effect per-
large instrumental and observational systematics, finite
turbing unknown at order unity intrinsic galaxy shapes.
pixelsampling,uncertaintyinthemorphologiesofgalax-
The ‘cosmic shear’ of galaxies therefore can only be de-
ies prior to lensing distortions, and biased ellipticity es-
tected through statistical correlations of large numbers
timators in the presence of pixel noise.
of galaxy images.
AsinBA14andalsoSheldon(2014),Milleretal.(2007,
Recently, cosmicshearhasbeenusedtoplacecompet-
2013) we consider a probabilistic model for cosmic shear
itive constraints on cosmological model parameters (Jee
inference that can in principle avoid many of the weak-
et al. 2013; Kilbinger et al. 2013; Heymans et al. 2013;
nesses in existing methods. Unlike in BA14 we initially
MacCrann et al. 2014; Kitching et al. 2014; Kilbinger
set aside issues of computational feasibility and aim to
specify a statistical framework that is a complete de-
schneider42@llnl.gov
scription of the data (i.e., including all physical param-
1Lawrence Livermore National Laboratory, Livermore, CA,
94551,USA. eters that are needed to completely describe the mea-
2UniversityofCalifornia,Davis,Davis,CA95616,USA. sured pixel values in a photometric survey). While we
3Center for Cosmology and Particle Physics, New York Uni- seevalueinsimplywritingdownthecompletestatistical
versity,NewYork,NY,USA.
4SLAC National Accelerator Laboratory, Menlo Park, CA framework for cosmic shear we were also able to identify
94025,USA. statistical sampling and computational algorithms that
5Kavli Institute for Particle Astrophysics and Cosmology, are likely to make our framework practical for upcom-
StanfordUniversity,Stanford,CA94035,USA. ing surveys. We demonstrated a simple implementation
6Department of Physics, Carnegie Mellon University, Pitts-
burgh,PA,15213,USA.
2 M. D. Schneider et al.
of our framework in the GREAT37 community cosmic iesarefurthercomplicatedbyobjectblendingthat
shear measurement challenge (Mandelbaum et al. 2013, may mimic intrinsic features of the galaxy popula-
2014) and will further demonstrate the computational tion(Dawsonetal.2014). Werequireaparametric
performance of our framework in subsequent papers. model for galaxies and a likelihood function that
Most of the parameters that are needed to describe can describe all such variations in galaxy images
the pixel data are uninteresting for cosmology and can over the model parameter ranges that are impor-
be marginalized out; this is an advantage of proba- tant for describing the data. With an incomplete
bilistic modeling. If we can infer probability distribu- galaxy parametric model and without propagating
tion functions (PDFs) for the nuisance parameters and all the information about the model in the likeli-
marginalize—instead of either asserting distributions or hood function, it is well known that ‘model fitting
else fixing the nuisance parameters to heuristically cho- biases’ can be catastrophic for cosmic shear (Voigt
sen or maximum-likelihood values—we expect to obtain &Bridle2010;Melchioretal.2010;Bernstein2010;
better (more accurate and also more conservative) infer- Kacprzak et al. 2014).
ences about the cosmological parameters. Adopting this
probabilistic approach moves us (relative to other meth- 2. Specification of the probability distribution for the
ods)alongthebias–variancetrade-offtowardslessbiased galaxymodelparameterstobemarginalizedinthe
inferences. Ofcourse,inthecontextoffullyprobabilistic shearinference. Whiletheaccuratespecificationof
inference, without point estimators, there isn’t as clear modelsforgalaxyimagesmaysoundchallengingto
a definition of “bias” and it may be difficult to put the the discerning reader, the specification of the joint
long literature on weak-lensing biases into context here. distribution of intrinsic galaxy properties is per-
However, we expect many of the known biases in point haps even more daunting. A primary aim of this
estimators to be ameliorated when we permit the free- paper is to describe and demonstrate how a hier-
dom to infer and marginalize out nuisances. In detail, archical model can allow meaningful inference of
as we give more freedom to the model (more flexibility the properties of the galaxy distribution simulta-
in the nuisance-parameter space), we will move to even neouslywiththeshear. Withouthierarchicalinfer-
lessbiased(thoughalsoprobablylessprecise)inferences, ence(i.e.,inferringthedistributionsofnuisancepa-
possibly at large computational cost. rameters) we are susceptible to an ‘overconfidence
Explicit specification or inference of the distributions problem’ wherein asserting a prior that is too sim-
of all model parameters not only has the potential to re- ple can yield inferences that appear more precise
duce biases in the shear inference but also is the only than warranted.
way to guarantee an optimal measurement (in the sense
3. Monte Carlo sampling of model parameters from
that no other accurate inference method could yield
the joint posterior distribution given multi-epoch
tighter marginal posterior distributions on the cosmo-
imaging of a large galaxy survey. Because both
logicalmodelparameters). Mostpastcosmicshearanal-
theshearandthePSFvaryincorrelatedwaysover
yseshavenotspecifiedthedistributionsofgalaxyintrin-
any given image, the inferences for different galax-
sic properties and observing conditions (e.g., the PSF)
iesarestatisticallydependent. Inastraightforward
that are assumed in their cosmological inferences. But,
probabilistic inference we would need to fit model
because intrinsic galaxy properties and observing con-
parameters simultaneously to all galaxy images,
ditions do describe important features of the data, all
whichisaformidablecomputationalchallenge. In-
cosmic shear analyses must have (at least implicitly)
stead we sample model parameters for each galaxy
marginalized over some distributions in these parame-
independently and perform a subsequent joint in-
ters. Implicit marginalization over un-specified priors
ference using importance sampling given interim
cannotyieldanoptimalmeasurement. Saidanotherway,
posterior samples for each galaxy image.
without explicit marginalization of model parameters it
is not possible to saturate the Cram´er-Rao bound (e.g.
In Section 2 we outline our framework for the com-
Kendall, M. G. & Stuart, A. 1969). In particular, anal-
plete cosmic shear inference problem. In Section 2.1 we
yses in which measured ellipticities of galaxies are aver-
enumerate the components the framework and in Sec-
agedtogetheroverskypatches(orsomethingequivalent)
tion 2.2 we outline the key algorithms we use for Monte
havemadeanimplicit(andstrong)assumptionaboutthe
Carlosamplingofmodelparameterstorenderourframe-
distribution of intrinsic shapes; this assumption (since
work computationally feasible. In Section 3 we focus on
unexamined) is likely to be sub-optimal at best.
implementation details for just the inference of intrinsic
Once a complete statistical model is specified, there
galaxyproperties,leavingsimilardetailsforthePSFand
are three key practical challenges for probabilistic shear
lensing mass inferences to future work. We describe our
inference,
model for the distribution of intrinsic galaxy properties
1. Specificationofaninterimmodeldescribinggalaxy in Section 3.1 and give numerical examples of shear and
morphologies,fluxprofiles,andspectralenergydis- ellipticity inferences in Section 3.2. In Section 4 we dis-
tributions. Galaxyfeaturesarecomplexandmulti- cuss how to expand the inferences demonstrated for the
faceted because of the evolution of galaxy proper- intrinsic ellipticity distribution to the PSF and lensing
tieswithcosmictime, galaxymergers, andvarying mass models and summarize our conclusions.
environments. The observable features of galax-
2. PART1: DESCRIPTIONOFTHESTATISTICAL
7 http://www.great3challenge.info/, Our GREAT3 submis- FRAMEWORK
sions are under team ‘MBI’: http://great3.projects.phys.ucl.
2.1. Three conditionally independent branches
ac.uk/leaderboard/team/14
Hierarchical probabilistic inference of cosmic shear 3
TABLE 1
Sampling parameters for the full statistical model. The
central line separates sampled from conditional
parameters.
Parameter Description
θ Cosmologicalparameters
ΨIC Initialconditonsforthe3Dgravitationalpotential
ΨLT Late-time3Dgravitationalpotential
ψs 2Dlenspotential(givensourcephoto-z bins)
As Parametersfortheline-of-sightsourcedistribution
Πns,i PSFforgalaxyns observedinepochi
Ωi Observingconditionsinepochi
{ωns} Galaxymodelparameters;ns=1,...,ngal,s
{αns} Parametersforthedistributionof{ωns}
{ξns} Scalingparametersfor{ωns}
m,τ Hyperpriorparametersfor{ξns}
A Hyperparameterfor{αns}classifications
{dns,i} Pixeldataforgalaxiesns=1,...,ngal,s inepochi
G0|aη Priorspecificationfor{αns}
s Sourcesample(e.g.,photo-z bin)
W Surveywindowfunction
danc,i AncillarydataforPSFinepochi
p Priorparams. forobservingconditions
a Priorparams. forA
σpix,i Pixelnoiser.m.s. inepochi
I Modelselectionassumptions
Fig. 1.— Probabilistic graphical model for our complete frame-
workforshearinference. Arrowsindicatestatisticaldependencies.
We begin this section by enumerating the variables
Grey shading indicates quantities that are not sampled. See Ta-
anddependenciesofacompletestatisticalframeworkfor ble1fordescriptionsoftheparametersymbols.
shear inference. In subsequent sections we demonstrate
the inference of a subset of the variables in our frame- constraints on ΨIC in Section 2.2.1.0 but otherwise con-
work with numerical models. We defer a more detailed fine our analyses to the inference of gravitational lensing
examinationofsomeaspectsofthestatisticalframework quantities after projecting the 3D gravitational poten-
to later publications. But we believe it is useful at this tial ΨLT over the line-of-sight distribution of lenses in a
stage to list all contributions to the cosmic shear infer- survey.
ence problem given the considerable literature on the We define the 2D lensing potential that describes the
subject that has not yet converged on a unified statis- shearing (and magnification) of any sample of galaxies
tical picture. in a survey by the following integral of the 3D potential
Under a limited set of assumptions described below, ΨLT (see, e.g., equation 6.14 of Bartelmann & Schneider
the statistical framework for cosmic shear separates into 2001)(also section3.2of Narayan& Bartelmann1996),
three conditionally independent branches: 1) cosmolog- (cid:90)
ical lensing mass density, 2) observational point-spread ψ¯ (x)≡W(x) daΨLT(x,a)K(a;A ), (1)
s s
functions, 3) intrinsic galaxy properties.
In Appendix F we describe the computational imple- where a is the cosmological scale factor in the Freid-
mentation of the framework in this Section. mann equation, x are the coordinates in the plane of
the sky, W(x) is the angular window function of the
2.1.1. Lensing mass distribution
survey, the subscript s denotes a particular sample of
We start by specifying the parameters θ of the cos- ‘source’galaxiesthatarelensedbytheforegroundpoten-
mological model (top right of Figure 1), sampled from a tialΨLT,andK isthelensingkernelincludingthelensing
prior distribution Pr(θ) given all past cosmological ex- efficiency based on the distances between observer, lens,
periments. Although not explicitly implemented in this andsources,andtheintegraloverthesourcedistribution
paper, we expect θ to include parameters such as the with parameters A (see equation 6.19 of Bartelmann &
s
mean mass density Ω and the r.m.s. of mass density Schneider 2001).
m
fluctuations σ that primarily determine the amplitude TheparametersA canincludetheuncertaintiesinthe
8 s
of the cosmic shear correlation function. survey redshift distribution (e.g., from photometric red-
From these parameters, we can predict the 3D cos- shifterrors,seeLoh&Spillar1986;Connollyetal.1995;
mological mass distribution, described by the 3D grav- Huterer et al. 2006) as well as other astrophysical sys-
itational lensing potential Ψ. Assuming, for example, tematics that are redshift-dependent (such as intrinsic
Gaussian initial conditions ΨIC for the 3D mass den- alignments of galaxies, see Catelan et al. 2001; Bridle &
sity in the early universe and gravitational evolution King 2007). While both the galaxy redshift distribution
according to General Relativity, the probability distri- andintrinsicalignmentcorrelationsmightbemoreprop-
bution for the late-time 3D mass density ΨLT responsi- erly included in the specification of the properties of the
ble for gravitational lensing of galaxies depends only on galaxy distribution (Section 2.1.3), we include parame-
the cosmological parameters and the initial conditions, ters A for the line-of-sight integration of the potentials
s
Pr(ΨLT|θ,ΨIC). We discuss possibilities for inferring to connect with more common analysis conventions.
4 M. D. Schneider et al.
Marginalizing the nuisance parameters in the line-of- potential ΨLT. We introduce the scaling parameters ξ
ns
sight projection, the 2D lensing potential ψ given the inAppendixBasacomputationalconvenienceinMCMC
cosmologyandsurveysampleisdistributedaccordingto, sampling. The gravitational potential ΨLT can influence
the distribution of galaxy properties because of assem-
(cid:90)
bly bias (Wechsler et al. 2002), which is the label in the
Pr(ψ |θ,s,W)∝ dΨLT
s literature for variations in galaxy (or surrounding dark
(cid:90) matter halo) properties with the local surrounding mass
× dΨICPr(cid:0)ΨLT|ΨIC,θ(cid:1)Pr(cid:0)ΨIC|θ(cid:1) density. In a simple description of the dependence of
galaxy properties on mass density, it is well established
(cid:90)
× dA Pr(ψ |ΨLT,A ,W)Pr(A |θ,s,W). (2) that the amplitude of galaxy clustering correlations de-
s s s s pendsongalaxycolor(e.g.,Zehavietal.2011). However,
because cosmic shear is insensitive to the clustering of
For a deterministic cosmological model (as empha-
source galaxies to first order (Schneider et al. 2002), we
sized in Jasche & Wandelt 2013), Pr(ψ|ΨLT,A,W) ∝ do not consider ΨLT dependencies further in this paper
δD(cid:0)ψ−ψ¯(ΨLT,A,W)(cid:1),whereδDistheDiracdeltafunc- in any model distributions for the intrinsic galaxy prop-
tion. Inserting this expression into Equation (2) yields a erties ω.
trivialsolutionfortheintegraloverΨLT,leavingintegra-
2.2. Sampling methods to divide and conquer
tionsoverthe3DmassdensityinitialconditionsΨIC and
line-of-sight uncertainties A to evaluate the marginal 2.2.1. Interim sampling of model parameters for individual
s
distribution of the lens potential ψ given the cosmology sources
θ. Efficient numerical evaluation of Equation (2) is po- The pixel data d in the vicinity of a galaxy n enters
n
tentially a major computational challenge for our frame- our framework only through the likelihood function
work, whichwewilladdressinsubsequentpublications.
Pr(d |ω ,ψ,Π). (4)
n n
2.1.2. Point-spread functions Wecanreasonablyassumeuncorrelatednoiseinthecom-
Toconnectthelenspotentialtotheobservableproper- mon case that sky noise (i.e., Poisson noise from sky
ties of galaxy images, we must also specify the distribu- background)dominatesthenoiseinthepixelvalues,giv-
tion of the point-spread function (PSF) realizations spe- ing a likelihood function,
cimifigeincvtteson, taphnaedr(adtmiemteetece-tvrosarrΩiraeibslspeuo)cnhosbeas,esPrtvrh(inΠegsceoe|Ωnindg)i,.tiooFnpostricagstiveaeplniogΩcnh-, lnPr(dn)=−12ne(cid:88)pochs (cid:0)dn,i−σd¯2(ωn,ψ,Πi)(cid:1)2 +const.,
n,i i i=1 pix,n,i
the PSF Π(Hn;Ω) can be computed at the focal plane (5)
position Hn for any galaxy n. We model the PSF Πn,i where d¯ is the model prediction for the pixel values, i
as distinct random variables for each galaxy location n
indexes multiple observation epochs, and σ is the
and epoch i. pix,n,i
noise r.m.s. per pixel for object n in epoch i. Note the
The observing conditions Ω in an observation epoch i
i model prediction in the likelihood includes the PSF Π
willnotbeknownperfectlybutwillbeconstrainedbyan- i
for each observation i of a given galaxy n. This makes
cillarydatad andadistributionPr(Ω |p)conditioned
anc i the model predictions potentially expensive to compute
on time-independent PSF model parameters p (e.g., the
unlesstheconvolutionofthePSFwiththegalaxymodel
alloweddistributionsofopticsdistortionsoratmospheric
can be done efficiently. We will return to this issue in
turbulence power spectra).
Appendix A.
In Equation (4) we have assumed that the likelihood
2.1.3. Intrinsic source properties
functionforallsourcesinanimagecanbefactoredintoa
Finally, the intrinsic properties ωn of a model for product of likelihoods for distinct sources. This will not
galaxy n (e.g., ellipticity, size, brightness) are described be true in when the pixel noise is correlated for sources
by the distribution that are adjacent on the sky (e.g., when unresolved flux
contaminates the pixels of neighboring sources).
Pr(ω |α,I), (3)
n The model prediction d¯ for the pixel values of a
where α are parameters in the distribution of the in- galaxyimagedependsonthecosmologicalmassdensityψ
trinsic galaxy properties and I denotes galaxy modeling throughtheactionofgravitationallensingonthegalaxy
assumptions (such as the form of the galaxy morphol- morphologyandflux(inanygivenbandpassaslensingis
ogy parameterization). For example, α could describe achromatic). Forsimpleellipticalmodelsofgalaxies,the
the width of the intrinsic (i.e., unlensed) ellipticity dis- ellipticityparameterinω isdegeneratewiththereduced
n
tribution of a given galaxy population. And in our con- shearg ≡γ/(1−κ). However,thestatisticaldistribution
tribution to the GREAT3 challenge described in Man- of all elements of ω should be isotropic across different
delbaum et al. (2014), the model assumptions I include galaxies in the sky (although this is not true for small
modeling all galaxies as elliptical S´ersic profiles. It will fields, e.g., around a cluster), while the ellipticities of
be important to keep I explicit as we consider possible lensed galaxy images are not. We therefore maintain a
model-fitting biases in our framework (Voigt & Bridle conceptualdistinctionbetweenω andψinthemodelfor
n
2010;Zuntzetal.2013;Violaetal.2014;Kacprzaketal. pixelvaluesofagalaxyimaged¯ eventhoughthisdistinc-
2014; Bruderer et al. 2015). tion is not computationally meaningful for the simplest
In Figure 1, the galaxy properties ω also depend on models of galaxy images. See also the discussion follow-
ns
‘scaling parameters’ ξ and, via α , the gravitational ing Equation (15) below.
ns ns
Hierarchical probabilistic inference of cosmic shear 5
The first step in our computational algorithm is to shears at different sky locations and redshifts. This is
draw‘interimsamples’of(ω ,ψ,Π )viaMarkovChain distinct from conventional algorithms that involve cross-
n n,i
Monte Carlo (MCMC) from the likelihood in Equa- correlating galaxy ellipticity estimators at different sky
tion (5) for a single source (or region of sky) identified locations. We instead seek a generative model for the
in all available exposures of a survey. The method of shear correlations. So, the interim sampling of the vari-
source identification for selecting the pixel values d in able shear or lens potential must happen at a computa-
n
Equation (5) is not important as long as our model pre- tional step following the interim sampling of individual
dictions for the observed pixels allow for observationally galaxy and PSF properties.
truncated source profiles or multiple overlapping sources Wecaninfervariableshearmodelswithoutacosmolog-
(as will be needed for blended objects). Allowing for ical prediction of the mass density (which can be com-
such selection effects may also admit interim sampling putationally expensive) by specifying an interim prior
of the combined pixel data from different instruments or for the lens potential ψ for a given source distribution.
s
surveys. The only requirement is that the mathematical specifi-
cationoftheinterimpriorforψ allowsforspatialcorre-
s
PSFmodelhierarchy— NotethatweinferadistinctPSF lations that encapsulate the possible cosmological inter-
realization Πn,i for each galaxy n in each observation pretations. For example, even though the late-time lens
epochi,whicharerelatedacrossgalaxiesinagivenepoch potential is not Gaussian distributed, we might specify
by the common observing conditions Ωi. The PSFs in a multi-variate Gaussian prior on ψ for the purposes of
different epochs are further related by another level of interimsampling. Correlationsinthelenspotentialsam-
hierarchywithacommonPSFmodelwithparametersp. plescouldbecapturedwithacovariancemotivatedfrom
So, we can also directly marginalize the likelihood over cosmology,
the PSF realizations,
nepoch(cid:90)
(cid:89)
Pr(d|{ω },ψ,p)= dΩ Pr(Ω |p)
n i i
(cid:90) (cid:96)d(cid:96)
i=1 (cid:104)ψ(x)ψ(x(cid:48))(cid:105)= C (|x−x(cid:48)|)J ((cid:96)|x−x(cid:48)|), (9)
ngal(cid:90) 2π (cid:96) 0
(cid:89)
× dΠ Pr(Π |Ω )Pr(d |ω ,ψ,Π )
n,i n,i i n,i n n,i
n=1
nepoch(cid:90) (cid:34)ngal (cid:35)
(cid:89) (cid:89)
= dΩiPr(Ωi|p) Pr(dn,i|ωn,ψ,Ωi) (6) where C(cid:96) is an angular power spectrum computed, e.g.,
i=1 n=1 via Equation (1), and J0 is the zeroth order Bessel func-
nepoch tion of the first kind. While Equation (9) is motivated
(cid:89)
= Pr(di|{ωn},ψ,p) (7) fromcosmologicaltheory,wedonotconnectC(cid:96) toacos-
mological model when interim sampling ψ. Instead, we
i=1
useEquation(9)tointroducesomedegreeofspatialcor-
ngal
(cid:89) relationsintheinterimsamplesofψthatcanincreasethe
(cid:54)= Pr(d |ω ,ψ,p) (8)
n n efficiencyoflatersamplingstepsforconstrainingcosmol-
n=1 ogy as we describe in the next section.
Marginalizing the PSF realizations Π over all galax- In practice, we first sample galaxy model parameters
n,i
ies n in a given epoch i still yields a likelihood function without a correction for the applied shear and magnifi-
that is separable for separate objects n in Equation (6), cation. Giventhegalaxymodelparameters,wecanthen
conditioned on the observing conditions Ω . This shows reinterpret the parameters under an assumed lens po-
i
that more information about the observing conditions tential model because we know how lensing affects the
Ω leads to smaller statistical correlations among the model for the galaxy light profile. We therefore ignore
i
marginal likelihoods for different galaxies independent the lensing parameters in the first step of our inference,
from the actual PSF realizations. Under our framework including a model for lensing shears and magnifications
therefore, it is the parameters of the distribution from only when combining inferences from distinct sources as
which PSF realizations are drawn that are most impor- we describe next.
tant to constrain rather than estimation and interpola-
tion of PSF realizations at different image plane loca-
tions.
Lensing mass inference— The intrinsic galaxy properties
ω and the PSF Π can be interim sampled with dis-
n n,i 2.2.2. Hierarchical inference via importance sampling
tinct parameters for each object. We might consider in-
terim sampling uncorrelated shears and magnifications
for every object, but in most cases the shear and the in- It is possible to infer model constraints independently
trinsic ellipticity are strongly degenerate for isolated ob- for each galaxy and then combine these independent in-
jects (the exception being resolved galaxy images with ferences in a hierarchical framework using the technique
multiple morphological components that have different of‘importancesampling’(e.g.Geweke1989;Wraithetal.
responses to the action of an applied shear). To in- 2009; Hogg et al. 2010).
fer a shear or lens potential that varies over the sky, Our goal with the importance sampling algorithm is
we then need to specify the model correlations between to evaluate the likelihood marginalized over individual
6 M. D. Schneider et al.
galaxy intrinsic properties and PSF realizations8, the marginalized likelihood can be approximated by
n(cid:89)gal(cid:90) Pr(dn|α,Ω,ψ)≈ ZKn (cid:88)PPr(rω(ωnk|α|I,ψ))PPrr((ΠΠnk||IΩ)),
Pr(d|α,Ω,ψ)∝ dω Pr(ω |α) nk 0 nk 0
n n k
(15)
n=1
n(cid:89)epoch(cid:90) n(cid:89)gal
× dΠ Pr(Π |Ω ) Pr(d|α,Ω,ψ)= Pr(d |α,Ω,ψ). (16)
n,i n,i i n
i=1 n=1
×Pr(d |ω ,ψ,Π ), (10)
n,i n n,i WhatwehaveachievedwithEquation(15)istheability
tofitgalaxymodelsandPSFstoindividualgalaxies(via
where
MCMC) and then to combine the fit information from
d≡(cid:8){d }nepoch(cid:9)ngal , (11) eachgalaxyintoinferencesonthedistributionsofintrin-
n,i i=1 n=1 sic galaxy properties α and PSF parameters p. Equa-
tion (15) is fast to compute once the K samples for each
and Ω≡{Ω}nepoch. Using the identity, galaxyaregenerated. Forthefinalshearinferencewewill
i=1
marginalizetheparametersα,butwithEquation(15)we
(cid:90) (cid:90) f(x) have already addressed the primary computational bot-
dxp(x)f(x)= dxp(x)g(x) , (12) tleneck in modeling images of large numbers of sources
g(x)
in a field.
Although not part of the preceding derivation, we in-
for arbitrary probability distributions p(x),f(x) and as-
serted the lensing potential ψ into the list of conditional
suming g(x) has nonzero probability mass over the do-
dependencies in the numerator of Equation (15). This is
mainwheref(x)isnonzero,wecanfactorEquation(10)
because given interim posterior samples of galaxy model
as,
parameters, we can always reinterpret those model pa-
rameters under a different assumption for the shear for
ngal(cid:90) nepoch(cid:90) any model where we know how to calculate the action of
(cid:89) (cid:89)
Pr(d|α,Ω,ψ)∝ dω dΠ shear on the model galaxy9.
n n,i
To evaluate Equation (15) we need K independent
n=1 i=1
samples from the interim posteriors for each galaxy.
Pr(ω |α) Pr(Π |Ω )
× n n,i i WhilethiscanbeslowtogenerateviaMCMCwhenK is
Pr(ω |I ) Pr(Π |I )
n 0 n,i 0 required to be large, we have found in simple tests that
×Pr(d |ω ,ψ,Π )Pr(ω |I )Pr(Π |I ). (13) even K =1 may be sufficient when assuming a constant
n,i n n,i n 0 n,i 0
shearoveragivenareaofthesky. WeuseK =10inthe
The final line of Equation (13) defines the ‘interim pos- numerical examples in Section 3.
terior’ from which we draw samples when we analyze
Importance sampling the lens potential— Given interim
each galaxy independently (following Hogg et al. 2010).
samples of the lens potential ψ we can then infer cos-
We refer to Pr(ω |I ) and Pr(Π |I ) as ‘interim priors’. s
n 0 i 0 mological parameter constraints from the sampled lens
The interim priors can be chosen to make computations
potentials for all galaxy samples in all fields using im-
convenient. We have found flat or broad Gaussian dis-
portance sampling as in Equation (15), but with a new
tributions to be sufficient interim prior specifications.
conditionalPDFthatincludesthedependenceofthelens
We use the following algorithm to combine samples
potential cosmological parameters θ and the cosmologi-
from the interim posteriors for each galaxy into a hier-
cal initial conditions ΨIC,
archical inference of the global marginal posterior. For
eachobjectnwegenerateK samples(cid:2)ω ,{Π }nepoch(cid:3),
sampled from the interim posterior givnekn pixneilkdia=t1a for Pr(d |ΨIC,θ,W)∝ 1 (cid:88)N (cid:81)sPr(ψˆs,k|ΨIC,θ,W),
galaxy n in all epochs, n N (cid:81) Pr(ψˆ |I)
k=1 s s,k
(17)
1 where, again, we have implicitly marginalized over the
Pr(ω ,Π |d )=
n n n Z line-of-sightdistributionerrorparametersAs fromEqua-
n
tion (2).
×Pr(d |ω ,Π )Pr(ω |I )Pr(Π |I ), (14)
n n n n 0 n 0 Evaluating the numerator on the right-hand side of
Equation (17) requires first realizing the 3D mass den-
where Z is a normalization that we never need to com-
n sity perturbations at some early time (e.g., by a tra-
pute,andI denotesthesetofassumptionsencodedinto
0 ditional initial conditions generator for cosmological N-
theinterimpriors. OncewehavethisK-elementinterim bodysimulations). TheinitialconditionsΨIC thenmust
sampling for every galaxy n we can build importance-
be evolved under gravity in an expanding universe (e.g.,
sampling approximations to various other marginalized
byrunninganN-bodysimulation). Eachrealizationand
likelihoods. Specifically,fortheintegralinEquation(13),
evolution of ΨIC is a potentially expensive computation
becauseΨICmustcoveratleasttheentirevolumeprobed
8 WemaynotalwayswanttoassumethatthePSFrealizations
foragivengalaxyarestatisticallyindependentacrossobservation 9 Thisisadistinctconceptfromestimatingtheactionofshear
epochs,butitisausefulclarifyingassumptionhere. ontheobservedpixelvaluesofagalaxyasinBA14.
Hierarchical probabilistic inference of cosmic shear 7
by the survey in question (tens of gigaparsecs for future the shear field, apply the realized shear to each galaxy
surveys) and contain enough small-scale mass resolution model and then compare the pixel model with the data
to identify model galaxy locations with respect to the for each galaxy. We consider the computational sepa-
cosmologicalmassdensity(typicallyofordertensofkilo- ration of each of these steps to be a key benefit of the
parsecs). Thus evaluating Equation (17) for one pair of framework we present here.
values of θ and ΨIC could require running an N-body Finally, one should keep in mind that the relationship
simulation covering ∼6 orders of magnitude in dynamic between the lens potential defined over the sky and the
range. Ultimately to infer cosmological parameter con- lensingdistortionsofmeasuredgalaxyimagesaredepen-
straints we would run an MCMC chain sampling in val- dentontheastrometricsolutionforeachexposure. How-
ues of ΨIC and θ, potentially requiring (cid:38) 104 expensive ever,weexpectthistobeanegligiblecontributiontothe
cosmological N-body simulations10. shear inference in most cases. Possible centroid shifts of
Inferring cosmological parameter constraints from the galaxy images should instead be captured by the PSF
procedure just described may sound impractical. How- model.
ever, we are encouraged by noting that,
2.3. Statistical framework summary
• Jasche et al. (2014) demonstrated feasible MCMC In Sections 2.1 and 2.2 we introduced a number of pa-
sampling of 3D density cells of the cosmological rameterswithnontrivialinterdependencies. Herewecol-
mass density by using an analytic perturbative lect all the parameters we have introduced and summa-
model for the evolution of the mass density from rizethestatisticalframeworkbypresentingthefactoriza-
Gaussian initial conditions, tionoftheunmarginalizedjointposteriorforallsampled
model parameters,
• upcoming surveys such as LSST are already con-
sidering in excess of 104 expensive cosmological Pr(θ,{ψ },{Π ,Ω },{ω ,α ,ξ },A,τ,m,a |{d },X)∝
s i i n n n η n
simulations for parameter inferences (Dodelson &
Pr(θ)·Pr({ψ }|W,{s},θ)·Pr(A|a)
Schneider 2013; Taylor et al. 2013), s
(cid:34)ngalnepoch (cid:35)
(cid:89) (cid:89)
• several post-processing methods have shown × Pr(Π |Ω ,I)·Pr(d |ω ,ξ ,ψ ,Π )
n,i i n,i n n s n,i
promise for reducing the computation times
n=1 i=1
in generating ensembles of cosmological simula- (cid:34)ngal (cid:35)
tions(Schneideretal.2011;Angulo&White2010; (cid:89)
× Pr(ω |α ,I)·Pr(α |A,a )·Pr(ξ |m,τ)
Mead & Peacock 2014; Angulo & Hilbert 2015). n n n η n
n=1
The practicality and tradeoffs of these methods remains (cid:34)nepoch (cid:35)
(cid:89)
an area for future research, both with respect to each × Pr(Ωi|p,danc,i) (18)
other and with respect to traditional analyses with two- i=1
point function estimators of the mass density.
where in the first line we collapsed the conditional pa-
Equation (17) also helps to understand the utility in
rameters into,
separating the galaxy population into different samples
s. It is straightforward to derive separate inferences of X ≡[d ,{s},W,a,p,I]. (19)
anc
ΨIC and θ for different sub-samples s of the galaxies to
WecollectthedefinitionsofallvariablesinEquation(18)
test for consistency with respect to unmodeled system-
in Table 1 and display the statistical dependencies in
atic errors or new physical phenomena not captured by
Figure 1. The variables a ,m,τ,A will be defined in
the cosmological model for the mass density evolution. η
Section 3.
One could similarly infer different distributions of the
The final line of Equation (18) is the likelihood func-
intrinsic galaxy properties ω for different sub-samples to
tion for the pixel data d of galaxy n in observation
testforvariablemodelfittingbiasesandunmodeledred- n,i
epoch i, which depends on the intrinsic galaxy proper-
shift evolution in the galaxy population. But, once we
ties ω ,ξ , the lensing potential ψ for source sample
havecomputedtheinferencesofthe2Dlenspotentialfor n n s
s containing galaxy n, and the PSF Π . The preced-
different sub-samples s, the combined inference of cos- n,i
inglinesinEquation(18)describethehierarchicalPDFs
mology in Equation (17) is not much more complicated
for the conditional dependencies in the likelihood func-
than that for an undivided galaxy sample.
tion, including the important factorizations across dis-
To summarize, by placing an interim prior on ψ that
tinct galaxies and observation epochs. In Section 4.1 we
includes spatial correlations over the observed sky area,
discusshowmarginalizationoftheconditionaldependen-
we stand to gain from a similar computational factoriza-
cies in the likelihood couples the inference from individ-
tion of the analysis as that achieved with the model for
ualgalaxiesandepochsasdeterminedbythehierarchical
the intrinsic galaxy properties. Most published cosmic
parameters that are constant across galaxies and epochs
shearanalysesfollowareversealgorithmwheretheshear
in Equation (18) (e.g., parameters that do not have n or
is estimated, turned into two-point correlation estima-
i subscripts). See Appendix F for a description of the
tors and compared with a two-point correlation model.
computational implementation of this framework.
Instead, our framework must start with the two-point
correlation model (for a Gaussian density field), realize 3. PART2: IMPLEMENTATIONFORINFERRING
INTRINSICGALAXYPROPERTIES
10 Thenumber104issimplyaguessfortherequirednumberof 3.1. Model for the conditional distribution of intrinsic
MCMC steps to infer marginal constraints on θ from a converged
galaxy properties
chain.
8 M. D. Schneider et al.
We choose a non-parametric distribution to describe the parameters α (eq. 2.2 of Neal 2000),
the galaxy model parameters both because we have lit-
tle information to guide us on the choice of a paramet- n−1
1 (cid:88)
ric distribution and because we need a flexible distri- α |α ,...,α ∼ δ (α )
n 1 n−1 n−1+A D (cid:96)
bution to minimize any biases in the model parameter
(cid:96)=1
inferences (as mentioned in Section 1). We therefore A
choose a Dirichlet Process (DP) model (e.g. Ferguson + n−1+AG0(·), (21)
1973; Walker et al. 1999; Neal 2000; Mu¨ller & Quintana
2004; Wang & Dunson 2011) for the distribution of in- where n indexes galaxies and δ is the Dirac delta func-
D
trinsicgalaxyproperties,withdistinctparametersα for tion. Given the discreteness property of the DP there
n
each galaxy n. isnonzeroprobabilitythatdrawsof α willberepeated.
n
For a given galaxy n we assume the properties ω Letα∗,...,α∗ betheuniquevaluesamongα ,...,α ,
n 1 K 1 n−1
describing the galaxy image are drawn from a (multi- and let N be the number of repeats of α∗ in the ‘latent
c c
variate) Normal distribution with mean 0 and variance class’c. Thentheconditionalupdatedistributioncanbe
parameters α . Denote this by, ω ∼ N(0,α ), where equivalently written as (eq. 10 of Teh 2010)
n n n
N indicates the Normal distribution. By including the
index n on α , we allow for the possibility that every 1
n α |α ,...,α ∼
galaxy has properties drawn from a distinct generating n 1 n−1 n−1+A
distribution. But, we are accustomed to thinking about (cid:32) K (cid:33)
galaxies as belonging to different types or populations. (cid:88)N δ (α∗)+AG (·) . (22)
So a better model might be to select a small number of c D c 0
c=1
valuesforαthatcharacterizedifferentgalaxytypes. The
parameters ω for galaxy n would then be drawn from a This equation is useful because it shows the clustering
n
zero-meanNormaldistributionwithvarianceparameters properties of the DP and the meaning of the param-
α selected from one of a few possible classes. This is a eter A. The value of αc∗ will be assigned to αn with
description of a mixture of Normal distributions for ω. probability proportional to the number of times α∗ has
c
The DP model for ω can be derived by starting been previously drawn, Nc. Note that when A is large,
with a mixture of Normal distributions where the num- successive values for αn are drawn from the base distri-
ber of mixture components is allowed to be arbitrarily bution G0 with high probability, generating a new αK∗
large (e.g., Wang & Dunson 2011). We can describe this each time. When A is small, αn is assigned to one of
mathematicallywiththe followinghierarchy(Neal2000, the previous values of α∗ with high probability. We will
c
Eqn. 2.1), refer to groups of galaxies {n} that have equal values
of α∗ as a ‘cluster’ or ‘sub-population’, representing po-
k
tentialgalaxytypeclassifications(analogoustothecom-
mon ‘early-’ or ‘late-type’ classifications). Equation (21)
showsthatlargervaluesofAwillleadtomoreclustersor
categories of galaxy types while bringing A to zero will
force all galaxies to be assigned to the same cluster or
ω ∼N(0,α ), α ∼G(α |A), G∼DP(A,G ). typeclassification. Notethataslongastheobservations
n n n n 0
(20) areexchangeable(whichistrueforsourcesinanimage),
Equation (20) indicates that the properties of galaxy n the ordering of values in α is arbitrary.
aredrawnfromaNormaldistributionwithvariancesα , To summarize, we are motivated to use the DP model
n
as previously described. The variances α for the distri- for the following reasons,
n
bution of the properties ω of galaxy n are also modeled
n • The DP is analogous to a mixture model when the
asrandomvariables,assumedtobedrawnfromsomedis-
numberofmixturecomponentsislearnedfromthe
tribution G(α |A) (yet to be specified) with parameter
n data,
A.
A key feature of the DP model is that G(α |A) is a
n • Samples for galaxy n are naturally informed by
discrete distribution in α . If we consider again a mix-
n those for galaxies 1,...,n−1, thereby improving
ture of Normal distributions for ω, then G(α|A) would
the sampling efficiency (for arbitrary ordering of
specify the finite and discrete values that α can take in
observed galaxies),
ordertodescribetheparametersofeachmixturecompo-
nent. ThefinalstatementinEquation(20)saysthatthe • The DP can find unknown clusters in the galaxy
discrete support of the distribution G(α|A) is sampled parameters, teaching us about galaxy formation
from yet another distribution denoted DP(A,G0) with and different prospects for shear inference from
parameter A and ‘base distribution’ G0. The base dis- different data subsets. For example, we show
tributiondescribesthecontinuousrangeofsupportfrom in Section 3.2.2 that knowledge of a galaxy sub-
whichthediscretesupportofG(α|A)canbedrawn. The population that is more round than average can
parameter A is called the DP ‘precision’ or ‘clustering’ improve the shear inference.
parameter, which we describe next.
Forsamplingalgorithmsandtobetterunderstandhow With such an algorithm to cluster the features in the
parameters α for distinct galaxies may take identical galaxy population we are able to simultaneously fit dif-
n
values it is useful to first marginalize the distribution G ferent distributions for each galaxy and exploit the sta-
in Equation (20) to get the conditional update rule for tistical information from a large sample of galaxies. As
Description:Kuijken, K., Rowe, B., Schrabback, T., Semboloni, E., Vafaei,. S., & Velander, M. 2013, MNRAS, 432, 2433 [1]. Hogg, D. W. 2012, Hogg's Research Blog [4.2]. Hogg, D. W., Myers, A. D., & Bovy, J. 2010, ApJ, 725, 2166. [2.2.2, 2.2.2]. Huterer, D., Takada, M., Bernstein, G., & Jain, B. 2006,. MNRAS, 36