ebook img

A Nested Sampling Algorithm for Cosmological Model Selection PDF

0.19 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A Nested Sampling Algorithm for Cosmological Model Selection

Draftversion February5,2008 PreprinttypesetusingLATEXstyleemulateapjv.11/26/04 A NESTED SAMPLING ALGORITHM FOR COSMOLOGICAL MODEL SELECTION Pia Mukherjee, David Parkinson and Andrew R. Liddle AstronomyCentre,UniversityofSussex,Brighton,BN19QH,UnitedKingdom Draft versionFebruary 5, 2008 ABSTRACT Theabundanceofnewcosmologicaldatabecomingavailablemeansthatawiderrangeofcosmolog- ical models are testable than ever before. However, an important distinction must be made between parameter fitting and model selection. While parameter fitting simply determines how well a model fits the data, model selection statistics, such as the Bayesian Evidence, are now necessary to choose 6 between these different models, and in particular to assess the need for new parameters. We imple- 0 ment a new evidence algorithm known as nested sampling, which combines accuracy, generality of 0 applicationandcomputationalfeasibility,andapplyittosomecosmologicaldatasetsandmodels. We 2 find that a five-parameter model with Harrison–Zel’dovichinitial spectrum is currently preferred. n Subject headings: cosmology: theory a J 1 1. INTRODUCTION 2. BAYESIANINFERENCE 1 Thehigher-levelinferenceproblemofallowingthedata Using Bayes’ theorem, the probability that a model to decide the set of parameters to be used in fitting is (hypothesis: H) is true in light of observed data (D) is 2 known as model selection. In the Bayesian framework, given by v thekeymodelselectionstatisticistheBayesian evidence P(D|H)P(H) 1 P(H|D)= . (1) 6 (Jeffreys 1961; MacKay 2003), being the average likeli- P(D) 4 hood of a model over its prior parameter space. The ev- 8 idence can be used to assign probabilities to models and It shows how our prior knowledge P(H) is modified in 0 torobustlyestablishwhetherthedatarequireadditional the presence of data. 5 parameters. The posterior probability of the parameters (θ) of a 0 While the use of Bayesian methods is common prac- model in light of data is given by / ticeincosmologicalparameterestimation,itsnaturalex- h P(D|θ,H)P(θ|H) p tension to model selection has lagged behind. Work in P(θ|D,H)= P(D|H) , (2) - thisareahasbeenhamperedbydifficultiesincalculating o the evidence to high enough accuracy to distinguish be- where P(D|θ,H) is the likelihood of the data given the r tweenthemodelsofinterest. Theissueofmodelselection model and its parameters, and P(θ|H) is the prior on t s hasbeenraisedinsomerecentpapers(Marshall,Hobson the parameters. This is the relevantquantityfor param- a & Slosar 2003; Niarchou, Jaffe & Pogosian 2004; Saini, eterestimationwithin a model, forwhichthe denomina- : v Weller&Bridle2004;Bassett,Corasaniti&Kunz2004), tor P(D|H) is of no consequence. P(D|H) is however i and information criterion based approximate methods the evidence for the model H, the key quantity of in- X introduced in Liddle (2004). Semi-analytic approxima- terest for the purpose of model selection (Jeffreys 1961; r tions such as the Laplace approximation, which works MacKay2003;Gregory2005). Normalizingtheposterior a wellonlyforgaussianlikelihoods,andtheSavage–Dickey P(θ|D,H) marginalized over θ to unity, it is given by density ratio which works for more general likelihood functions but requires the models being compared to E =P(D|H)= dθP(D|θ,H)P(θ|H), (3) be nested, are methods that were recently exploited by Z Trotta (2005). A more general and accurate numerical the prior also being normalized to unity. method is thermodynamic integration, but Beltr´an et Theevidenceforagivenmodelisthusthenormalizing al.(2005)found thatinrealistic applicationsaround107 constantthatsetstheareaundertheposteriorP(θ|D,H) likelihood evaluations were needed per model to obtain to unity. In the now-standard Markov Chain Monte goodaccuracy,makingitasupercomputer-classproblem Carlomethodfortracingtheposterior(Gilksetal.1996; in cosmology where likelihood evaluations are computa- Lewis & Bridle 2002), the posterior is reflected in the tionally costly. binnednumberdensityofaccumulatedsamples. Theev- In this paper we present a new algorithm which, for idencefoundinthiswaywouldnotgenerallybeaccurate the first time, combines accuracy, general applicability, as the algorithm would sample the peaks of the proba- andcomputational feasibility. It is basedonthe method bilitydistributionwell,butwouldunder-samplethetails ofnestedsampling,proposedbySkilling(2004),inwhich which might occupy a large volume of the prior. the multi-dimensional integral of the likelihood of the When comparing two different models using Bayes data over parameter space is performed using Monte Theorem, the ratio of posterior probabilities of the two Carlosampling,workingthroughthepriorvolumetothe models would be the ratio of their evidences (called the regions of high likelihood. Bayes Factor) multiplied by the ratio of any prior prob- abilities that we may wish to assign to these models (Eq. 1). It can be seen from Eq. (3) that while more 2 P. Mukherjee, D. Parkinsonand A. R. Liddle θ complex models will generally result in better fits to 2 the data, the evidence, being proportional to the vol- ume occupied by the posterior relative to that occupied by the prior, automatically implements Occam’s razor. It favours simpler models with greater predictive power provided they give a good fit the data, quantifying the L(x) tensionbetweenmodelsimplicityandtheabilitytofitto the data in the Bayesian sense. Jeffreys (1961) provides ausefulguidetowhatconstitutesasignificantdifference between two models: 1 < ∆lnE < 2.5 is substantial, θ 1 2.5 < ∆lnE < 5 is strong, and ∆lnE > 5 is deci- sive. For reference, a ∆lnE of 2.5 corresponds to odds of about 1 in 13. While for parameter fitting the priors become irrele- vant once the data are good enough, for model selection some dependence on the prior ranges always remains however good the data. The dependence on prior pa- rameter ranges is a part of Bayesian reasoning, and pri- 0 1 x ors should be chosen to reflect our state of knowledge Fig. 1.— The nested sampling algorithm integrates the like- about the parameters before the data came along. The lihood over the prior volume by peeling away thin iso-surfaces of Bayesianevidenceisunbiased,asopposedtoapproxima- equallikelihood. tions such as the information criteria. P(D|θ,H) = L(X), the equation for the evidence be- Perhaps the most important application of model se- comes lection is in assessing the need for a new parameter, de- 1 scribing some new physical effect proposed to influence E = LdX. (4) Z thedata. Frequentist-styleapproachesarecommonplace, 0 where one accepts the parameter on the basis of a bet- Thus the problem of calculating the evidence has be- ter fit, corresponding to an improvement in ∆(lnL) by come a one-dimensional integral, in which the integrand some chosen threshold (leading to phrases such as ‘two- is positive and decreasing. Suppose we can evaluate the sigma detection’). Such approaches are non-Bayesian: likelihood as L = L(X ), where the X are a sequence j j j theevidenceshowsthatthesizeofthethresholddepends of decreasing values, such that both on the properties of the dataset and on the priors, 0<X <...<X <X <1, (5) andin fact the more powerfulthe datasetthe higher the m 2 1 thresholdthatmustbeset(Trotta2005). Further,asthe as shown schematically in Figure 1. Then the evidence Bayesian evidence provides a rank-ordered list of mod- canbe estimated by any numericalmethod, for example els, the need to choose a threshold is avoided (though the trapezoid rule one must still decide how large a difference in evidence m is needed for a robust conclusion). L j The main purpose of this paper is to present an algo- E = Ej, Ej = 2 (Xj−1−Xj+1). (6) X rithm for evidence computation with widespread appli- j=1 cations. However as a specific application we examine The nested sampling algorithm achieves the above the need for extra parameters against the simplest vi- summation in the following way: ablecosmologicalmodel,aΛCDMmodelwithHarrison– Zel’dovich initial spectrum, whose five parameters are 1. Sample N points randomly from within the prior, the baryon density Ω h2, cold dark matter density andevaluatetheirlikelihoods. Initiallywewillhave b Ωcdmh2, the Hubble parameter H0 = 100hkms−1Mpc−1 the full prior range available, i.e. (0,X0 =1). (or the ratio Θ of the approximate sound horizon at de- 2. Select the point with the lowest likelihood (L ). coupling to its angular diameter distance), the optical j The prior volume corresponding to this point, X , depth τ, and the amplitude A of primordial perturba- j s can be estimated probabilistically. The average tions. We study the case for two additional parameters, the scalar spectral index and the dark energy equation volume decrease is given as Xj/Xj−1 = t where t istheexpectationvalueofthelargestofN random of state (assumed constant). numbers from uniform(0,1), which is N/(N +1). 3. NESTEDSAMPLING 3. Increment the evidence by Ej = Lj(Xj−1 − 3.1. Basic Method Xj+1)/2. Nested sampling (Skilling 2004) is a scheme to trace 4. Discard the lowest likelihood point and replace it the variation of the likelihood function with prior mass, with a new point, which is uniformly distributed with the effects of topology, dimensionality and every- within the remaining prior volume (0,X ). The j thing else implicitly built into it. It breaks up the prior new point mustsatisfy the hardconstraintonlike- volume into a large number of ‘equal mass’ points and lihood of L>L . j orders them by likelihood. Rewriting Eq. (3) in the no- tation of Skilling (2004), with X as the fraction of total 5. Repeat steps 2–4,until the evidence has been esti- prior mass such that dX =P(θ|H)dθ and the likelihood mated to some desired accuracy. Bayesian Evidence from Nested Sampling 3 (b) (a) Fig. 2.— Evidence against the number of samplepoints N for Fig. 3.— (a) The accumulated evidence (dashed curve), the a 6D Gaussian likelihood function as a test case. The horizontal evidence contributed by the remaining points at each stage (dot- linesshowtheactual analytical value.Pointsaredisplacedslightly dashed curve), and their sum (solid curve), shown against prior horizontallyforvisualclarity. volume remaining for a 6D Gaussian likelihood function. (b) A laterpartofthesolidcurveshownagainstthelog(tol)(seetext). The method thus works its way up the likelihood sur- face, through nested surfaces of equal likelihood. Af- for two different values of the enlargementfactor. When ter j steps the prior mass remaining shrinks to Xj ∼ the enlargementfactor is large enough(here 1.5)the ev- (N/(N+1))j. Theprocessisterminatedwhensomestop- idence obtainedwith 100sample points agreeswith that pingcriterionissatisfied,andafinalamountofhLi×Xj obtained with 500, while when the enlargement factor due to the N −1 remaining sample points is added to is not large enough, the evidences obtained with small the thus far accumulated evidence. N are systematically biased high. Similar tests done Thus a multi-dimensional integral is performed using on multi-dimensional Gaussian likelihood functions, for Monte Carlo sampling, imposing a hard constraint in which the expected evidence can be found analytically, likelihood on samples that are uniformly distributed in indicated the same. Based on such tests we choose to the prior, implying a regularized probabilistic progres- work with N of 300, and enlargement factors of 1.5 for sion through the prior volume. Besides implementing the 5D model (this correspondsto a 50% increase in the and testing this scheme in the cosmologicalcontext, our range of each parameter), 1.7 for 6D and 1.8 for 7D main contribution lies in developing a general strategy models. These choices are conservative (smaller values to sample new points efficiently. would reduce the computing time), and were made in order to ensure evidences that are accurate and free of 3.2. Details systematics. In our cosmological applications we have The prior space to sample from reduces by a constant alsocomputed evidenceswithlargerenlargementfactors factorofN/(N+1)everytime thelowestlikelihoodpoint and found that they remained unchanged. The typical isdiscarded. Themostchallengingtaskinimplementing acceptancerate in finding a new point during the course the algorithm is to sample uniformly from the remain- of the algorithm was found to be roughly constant at ingpriorvolume,withoutcreatingtoolargeanoverhead ∼ 20−25% for an enlargement factor of 1.5, and lower in likelihood evaluations even when the remaining vol- for larger enlargement factors, after an initial period of ume of prior space may be very small. The new point almost 100% acceptance, as expected. must be uncorrelated to the existing points, but we can Figure 3a shows the accumulated evidence, the evi- still use the set of existing points as a guide. We find dencecontributedbytheremainingpointsateachstage, the covariance of the live points, rotate our coordinates and their sum, against prior volume remaining, again to the principal axes, and create an ellipsoid which just for a flat Harrison–Zel’dovich model with a cosmologi- touches the maximum coordinate values of the existing cal constant. The X at which the calculation can be points. To allow for the iso-likelihood contours not be- terminated will depend on the details of the problem ingexactlyelliptical,theselimitsareexpandedbyacon- (e.g. dimensionality, priors etc.). We define a parame- stant enlargement factor, aiming to include the full vol- ter tol as the maximum possible fractional amount that ume with likelihood exceeding the current limit (if this the remaining points could increase the evidence by: isnotdone newpointswillbe biasedtowardsthe centre, tol ≡ (Lmax)jXj/E where Lmax is the maximum like- thus overestimating the evidence). New points are then lihood of the current set of sample points. Figure 3b selected uniformly within the expanded ellipse until one zooms into a late partof the solid curve,now plotting it has a likelihood exceeding the old minimum, which then against the value of the parameter tol. The calculation replaces the discarded lowest-likelihood point. need only be carried out until the standard error on the The two algorithm parameters to be chosen are the meanevidence,computedforacertainnumberofrepeti- number of points N and the enlargement factor of the tions, drops below the desired accuracy. An uncertainty ellipsoid. Figure 2 shows evidence verses N, for a flat in lnE of 0.1 would be the highest conceivable accuracy Harrison–Zel’dovichmodelwithacosmologicalconstant. onemightwish,andwith8repetitionsthishappenswhen The mean evidence values and standard deviations ob- ln(tol)∼ a few. This takes us to quite small X, of order tained from 4 repetitions are shown. These are shown 10−6−10−7 in our actual cosmologicalsimulations. 4 P. Mukherjee, D. Parkinsonand A. R. Liddle TABLE 1 Parameterrangesandevidences forvariouscosmological models. Otherparameterrangesaregivenin text. Model ΛCDM+HZ ΛCDM+ns ΛCDM+ns HZ+w w+ns (wideprior) ns 1 0.8–1.2 0.6–1.4 1 0.8–1.2 w -1 -1 -1 -1 –-1 -1 –-1 3 3 e.f 1.5 1.7 1.7 1.7 1.8 Nlike(×104) 8.4 17.4 16.7 10.6 18.0 logE 0.00±0.08 −0.58±0.09 −1.16±0.08 −0.45±0.08 −1.52±0.08 4. RESULTS yondthebasesetof5,inagreementwiththeconclusions of Liddle (2004) and Trotta (2005). However, the differ- We have calculated the evidences of four different cos- ence between the lnE of the higher-dimensional models mological models: 1) a flat, Harrison–Zel’dovich model and the base model is not large enough to significantly with a cosmologicalconstant(ΛCDM+HZ), 2)the same exclude any of those models at present. as 1, except allowing the tilt of the primordialperturba- tionspectrumn tovary(ΛCDM+n ),3)the sameas 1, except allowingsthe equation of statesof the dark energy 5. CONCLUSIONS totakealternativevaluestow =−1(w+HZ),andfinally We introduce the nested sampling algorithm for the 4) allowing both ns and w to vary (w+ns). The prior computation of Bayesian evidences for cosmological ranges for the other parameters were fixed at 0.018 ≤ model selection. We find that this new algorithm Ωbh2 ≤ 0.032, 0.04 ≤ Ωcdmh2 ≤ 0.16, 0.98 ≤ Θ ≤ 1.1, uniquely combines accuracy, general applicability and 0≤τ ≤0.5, and 2.6≤ln(As×1010)≤4.2. computational feasibility. It is able to attain an accu- The data sets we use areCMB TT andTE anisotropy racy (standard error in the mean ln evidence) of 0.1 in powerspectrumdatafromtheWMAPexperiment(Ben- O(105) likelihood evaluations. It is therefore much more nett et al. 2003; Spergel et al. 2003; Kogut et al. 2003), efficient than thermodynamic integration, which is the togetherwithhigherlCMBtemperaturepowerspectrum only other method that shares the general applicability data from VSA (Dickinson et al.2004),CBI (Pearsonet ofnestedsampling. Nestedsamplingalsoleadstoagood al. 2003) and ACBAR (Kuo et al. 2004), matter power estimate of the posterior probability density of the pa- spectrum data from SDSS (Tegmark et al. 2004) and rametersofthe modelforfree,whichwe willdiscuss ina 2dFGRS(Percivaletal.2001),andsupernovaeapparent forthcomingpaper. Wealsoplantomakeapublicrelease magnitude–redshift data from Riess et al. (2004). of the code in the near future. Results areshowninTable 1. Forthe spectraltilt, ev- Usingnestedsamplingwehavecomputedtheevidence idenceshavebeenfoundfortwodifferentpriorranges,as for the simplest cosmologicalmodel, with a base setof 5 anadditionaltestofthemethod. Forapriorrangetwice parameters,which provides a good fit to current cosmo- the size of the original in ns the evidence is expected to logical data. We have computed the evidence of models changeby−ln2atmostandthatdifferenceisrecovered. with additional parameters — the scalar spectral tilt, a Thefirst2rowsshowthepriorsontheadditionalparam- constant dark energy equation of state parameter, and eters of the model, or their constant values if they were both of these together. We find that current data offer fixed. The3rdrowshowstheenlargementfactor(e.f)we no indication of a need to add extra parameters to the usedforthemodel. The4throwshowsthe totalnumber base model, whichhas the highest evidence amongstthe oflikelihoodevaluations needed to compute the meanln models considered. evidence to an accuracy ∼ 0.1, and the 5th row shows the meanlnE andthe standarderrorinthatmeancom- puted from 8 repetitions of the calculation, normalized The authors were supported by PPARC. We thank to the ΛCDM+HZ evidence. Martin Kunz, Sam Leach, Peter Thomas and especially The ΛCDM+HZ model has the highest evidence, and JohnSkillingforhelpfuladviceandcomments. Theanal- as such is the preferred fit to the data. Hence we do not ysiswasperformedontheUKnationalcosmologysuper- findanyindicationofaneedtointroduceparametersbe- computer (COSMOS) in Cambridge. REFERENCES Beltran, M., Garcia-Bellido, J., Lesgourgues, J., Liddle, A. R., & Marshall, P. J., Hobson, M. P., & Slosar A. 2003, MNRAS, 346, Slosar,A.2005,Phys.Rev.D,71,063532 489 Bennett, C.L.,etal.2003,ApJS,148,1[WMAPCollaboration] Niarchou,N.,Jaffe,A.H.,&Pogosian,L.2004,Phys.Rev.D,69, Dickinson,C.etal.2004,MNRAS,353,732[VSACollaboration] 063515 Gilks W. R., Richardson S., & Spiegelhalter D. J. (eds.) 1996, Pearson,T.J.,etal.2003,ApJ,591,556[CBICollaboration] Markov Chain Monte Carlo inPractice,Chapman&Hall Percival, W. J., et al. 2001, MNRAS, 327, 1297; 2002, 337, 1068 Gregory,P.2005,BayesianLogicalDataAnalysisforthePhysical [2dFGRSCollaboration] Sciences,CambridgeUniversityPress Riess, A. G., et al. 2004, ApJ, 607, 665 [Supernova Search Team Jeffreys,H.1961,Theory of Probability,OxfordUniversityPress Collaboration] Kogut,A.,etal.2003,ApJS,148,161[WMAPCollaboration] Saini,T.D.,Weller,J.,&Bridle,S.L.,2004, MNRAS,348,603 Kuo,C.J.,etal.2004,ApJ,600,32[ACBARCollaboration] Skilling, J. 2004, ‘Nested Sampling for General Lewis,A.,&Bridle,S.,2002,Phys.Rev.D,66,103511 Bayesian Computation’, unpublished, available from: Liddle,A.R.2004,MNRAS,351,L49 http://www.inference.phy.cam.ac.uk/bayesys/. MacKay,D.J.C.2003,Informationtheory,inferenceandlearning Spergel,D.N.,etal.2003,ApJS,148,175[WMAPCollaboration] algorithms, CambridgeUniversityPress Tegmark,M.,etal.2004,ApJ,606,702[SDSSCollaboration] Trotta,R.2005,astro-ph/0504022.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.