StatisticalScience 2011,Vol.26,No.3,352–368 DOI:10.1214/11-STS364 (cid:13)c InstituteofMathematicalStatistics,2011 A Problem in Particle Physics and Its Bayesian Analysis Joshua Landon, Frank X. Lee and Nozer D. Singpurwalla 2 Abstract. There is a class of statistical problems that arises in several 1 0 contexts, the Lattice QCD problem of particle physics being one that 2 has attracted the most attention. In essence, the problem boils down to n the estimation of an infinite number of parameters from a finite number a of equations, each equation being an infinite sum of exponential func- J tions. By introducing a latent parameter into the QCD system, we are 5 able to identify a pattern which tantamounts to reducing the system to ] a telescopic series. A statistical model is then endowed on the series, and E inference about the unknown parameters done via a Bayesian approach. M A computationally intensive Markov Chain Monte Carlo (MCMC) algo- . rithm is invoked to implement the approach. The algorithm shares some t a parallels with that used in the particle Kalman filter. The approach is t s validated against simulated as well as data generated by a physics code [ pertaining to the quark masses of protons. The value of our approach 1 is that we are now able to answer questions that could not be readily v answered using some standard approaches in particle physics. 1 4 The structure of the Lattice QCD equations is not unique to physics. 1 Such architectures also appear in mathematical biology, nuclear magnetic 1 imaging, network analysis, ultracentrifuge, and a host of other relaxation . 1 and time decay phenomena. Thus, the methodology of this paper should 0 have an appeal that transcends the Lattice QCD scenario which moti- 2 1 vated us. : Thepurposeof this paperis twofold. Oneis todraw attention toa class v i ofproblemsinstatisticalestimationthathasabroadappealinscienceand X engineering. The second is to outline some essentials of particle physics r a thatgivebirthtothekindofproblemsconsideredhere.Itisbecauseofthe latter that the first few sections of this paper are devoted to an overview of particle physics, with the hope that more statisticians will be inspired to work in one of the most fundamental areas of scientific inquiry. Keywordsandphrases: Exponentialpeeling,MarkovchainMonteCarlo, mathematical biology, quarks, reliability, simulation, telescopic series. Joshua Landon is Assistant Professor, Department of University, Washington, District of Columbia 20052, Statistics, George Washington University, Washington, USA e-mail: [email protected]. District of Columbia 20052, USA e-mail: [email protected]. Frank X. Lee is Professor, This is an electronic reprint of the original article Department of Physics, George Washington University, published by the Institute of Mathematical Statistics in Washington, District of Columbia 20052, USA e-mail: Statistical Science, 2011, Vol. 26, No. 3, 352–368. This [email protected]. Nozer D. Singpurwalla is Professor, reprint differs from the original in pagination and Department of Statistics, George Washington typographic detail. 1 2 J. LANDON,F. X.LEE ANDN.D. SINGPURWALLA 1. INTRODUCTION AND OVERVIEW Section 2 gives an overview of some essentials of particlephysics,andtheensuingLatticeQCDequa- Lattice Quantum Chromodynamics, or Lattice tions.Thissection, writtenby anonphysicist(NDS) QCD,isanactively researchedtopicinparticlephy- but reviewed by a physicist (FXL), has been devel- sics. Many investigators in this field have received oped by fusing material from a variety of sources, the Physics Nobel Prize, the 2004 prize going to some notable ones being Pagels (1982), Dzierba, Gross, Politzer and Wilczek, developers of the no- Meyer and Swanson (2000), Yam (1993), Riordan tion of “asymptotic freedom” that characterizes and Zajc (2006) and Frank Wilczek’s (2005) Nobel QCD. Underlying the Lattice QCD equations are lecture.Interjectedthroughoutthissectionareafew issues of parameter estimation that have proved to comments of historical interest; their purpose is to bechallenging. Essentially, oneneedsto estimate an inform a nonphysicist reader about the individuals infinite number of parameters from a finite number who have contributed to the building of a magnif- of equations, each equation being an infinite sum of icent edifice. Section 2 concludes with a graphical exponential functions. display of the structure of matter via a template The approach proposed here is Bayesian; it is dri- that is familiar to statisticians, in particular, those ven by a computationally intensive Markov Chain working in network theory and in reliability. MonteCarlo(MCMC)implementation. However,to Section 3 pertains to an anatomy of the Lattice invoke this approach, we need to introduce a latent QCDequationsandtheresultingmathematical pat- parameter and then explore the “anatomy” of the tern that it spawns. It is not necessary to read Sec- QCD equations. This reveals a pattern, which when tion 2 (save perhaps for an inspection of Figure 5) harnessed with some reasonable statistical assump- in order to read Section 3, which is wherethis paper tions provided a pathway to a solution. Theinferen- really begins; indeed,Section 2 could have been del- ces provided by our approach were successfully vali- egated toan Appendix.Section 3is afoundationfor datedagainstsimulatedaswellasrealdata.However, the rest of the paper. It is here that the inferential therealvalueofourapproachisthatitisabletoans- problem is introduced along with its accompanying werquestionsthatcouldnotbeansweredusingsome notation and terminology. Section 3.1 gives a broad of the conventional approaches of particle physics. overviewoftheseveralotherscenariosinscienceand The approach can therefore be seen as an addition engineering where the Lattice QCD type equations to the lattice field theorists’ data analysis tool kit. also arise. Of particular note are the several exam- The structure of the Lattice QCD equations is ples in mathematical biology wherein the QCD like not as specialized as one is inclined to suppose. In- equations are often discussed. deed,suchequationsalsoappearinothercontextsof Section4pertainstothestatisticalmodelthatthe engineering, physics, nuclear magnetic imaging and material of Section 3 creates, and an outline of the mathematical biology where they go under the la- MCMC approach that is used to estimate the para- bel of “exponential peeling;” see Section 3.1. Our meters of the model. These are the parameters that focus on the physics scenario is due to the fact that areof interesttophysicistsandother scientists. Sec- this is how we got exposed to the general problem tion 5 pertains to validation against simulated and addressed here. actual data and proof of principles. Section 6 per- This paper is directed toward both statisticians tainstosomesuggestionsforextendingtheworkdo- and physicists, and could serve as an example of nehere,andstrategiesforovercomingsomeoftheen- the interplay between the two disciplines. The for- countereddifficulties.Section7concludesthepaper. mer may gain an added appreciation of problems in Since the Lattice QCD equations can be seen as modernphysics thatcan beaddressedviastatistical a prototype for similar equations that arise in other methods. In the sequel, they may also get to know scientific endeavors, this paper also serves as an in- more about particle physics and the beautiful theo- vitation to other statisticians to develop approaches ries about it that Mother Nature has revealed. It is, for solving such equations using methods more so- with the above in mind,that Section 2 is devoted to phisticated and/or alternate to the one we have en- an overview of aspects of particle physics, its asso- tertained. ciated terminology and the awe inspiring discover- ies aboutit.Reciprocally, thephysicists may benefit 2. ESSENTIALS OF PARTICLE PHYSICS by exposureto some modern statistical technologies The smallest quantity of anything we can see or thatcan bebroughtto bearforaddressingproblems feel is a molecule, and all matter is made up of mo- that may have caused them some consternation. lecules,whichinturnaremadeupofatoms.Molecu- 3 PARTICLEPHYSICS ton. The pions are said to be carriers (or media- tors) of the strong force (or the nuclear force), and thephotonsarecarriersoftheelectromagnetic force. Physicists look at thenuclear glues as forcecarrying particles, and thus collectively regard the electrons, the neutrons, the photons, the pions and the pro- tons as subatomic particles. Figure 2 displays the structure of matter as understood around the 1946 timeframe.ThedottedlinesofFigure2indicatethe glued members. In 1911, when Rutherford announced the struc- ture of the atom, the existence of electrons and pro- tonswasknown.Theneutron,asamajorconstituent of the nucleus, was discovered in 1932 by Chadwick, and the pion was discovered in 1946. But these dis- Fig. 1. Architecture of a carbon atom. coveries were just the tip of the iceberg. Many more subatomic particles have subsequently been discov- les and atoms are called particles, and the physics ered.Collectively, these subatomicparticles arenow that describes the interactions between the parti- called hadrons. Physicists speculate that there exist cles is known as particle physics; see, for example, an infinite number of such hadrons. This discovery Griffiths (1987). ofhadronswas madepossiblebyaccelerators, which An atom consists of electrons, which carry a neg- are essentially microscopes for matter. ative charge, and the electrons are centered around The invention of the accelerators opened up the anucleus thatismadeupofprotons thatcarryapo- subnuclearworldwiththeexperimentaldiscovery of sitive charge, andneutrons thatcarry nocharge. Fi- thousands of new particles. The question thus arose gure 1 illustrates the architecture of a carbon atom as to what the hadrons could be saying about the whichhassixelectrons,sixprotonsandsixneutrons; ultimate structure of matter. it is denoted 12C. 6 2.1 The Quark Structure of Matter The protons and the neutrons are held together within thenucleus by a nuclear glue called thepion. The current view is that hadrons are composite Similarly, the protons and the electrons are held to- objects madeout of morefundamentalparticles cal- gether within the atom by a glue called the pho- led quarks, and no one has ever seen a quark! This Fig. 2. The structure of matter (circa 1946). 4 J. LANDON,F. X.LEE ANDN.D. SINGPURWALLA Fig. 3. Illustration of a quark orbit. point of view came about in the early 1960s when The essence of Gell-Mann’s idea is that hadrons are Murray Gell-Mann discovered that the hadrons or- bound states of quarks, just like how the atoms ganized themselves into classes (or families) based are bound states of electrons, neutrons and pro- onamathematicalsymmetry.Aneasywaytounder- tons.Furthermore,Gell-Mann postulatedthatthere stand why this organizational principle worked is to ought to exist a force carrying particle, called the assumethatthehadronsaremadeupofquarks,only gluon, that holds the quarks together. The gluon is three of which were needed to build the hadrons. said to be the carrier of the strong force. Figure 4 These quarks were named the up quark, the down illustrates the quark structure of a hadron. quark and the strange quark. For example, a proton The quark model was purely a theoretical cons- has two up quarks and one down quark, whereas truct.ItsvaliditywasaffirmedwhenGell-Mannused a neutron has two down quarks and one up quark. it to postulate in 1962 the existence of a particle In general, every hadron is made up of quarks that neverseenbefore.Thiswasascientificbreakthrough orbit around each other in a specific configuration, of the highest order! It showed that discoveries in each configuration resulting in a hadron. Figure 3 is physics can come from mathematical patterns—not an illustration of a quark orbit. justthelaboratory.Forunravelingthemathematical Since there could be several orbit configurations, symmetries of the hadron, Gell-Mann received the there ought to be an infinite number of hadrons. 1969 Nobel Prize in Physics. Fig. 4. The quark structure of hadrons. 5 PARTICLEPHYSICS Fig. 5. Matter as a coherent system. Figure 5 gives a pictorial representation of the is because solving the QCD equation (which is just quark structure of matter using a template that is one line) by analytical methods is difficult. Thecur- familiar to statisticians. It represents an atom as rent approach is to solve the QCD equation numer- a coherent (or logical) system with quarks as the ically, by discretizing it over a space–time lattice. basic building blocks of the system. The logic sym- Lattice QCD refers to the representation of space– bols of “and” and “or” are represented by and re- time as a scaffold in four dimensions wherein the spectively. The neutrons and the protons can be re- quarks rest on the connecting sites, and the gluons garded as subsystems, and the gluons, photons and as connections between the lattice points. the pions that link the quarks, the nucleus and the The scaffold is first restricted to a finite volume; electrons can beseen as the structure(or link) func- it is then replicated with periodic boundary condi- tions ofthesystem (cf.Barlow andProschan,1975). tions. All this entails on the order of 100 million billion arithmetic operations on typical lattices; this These are the carriers of the strong force and the is one example as to why physicists need supercom- electromagneticforce,respectively.Figure5contains puters. Lattice QCD has been able to explain as to Gell-Mann’s famous quote that “everything that is why a free quark has not been seen and will not be not forbidden is compulsory;” the logical systems seen; this is because it will take an infinite amount analogue tothisquoteis thenotion of “irrelevance.” of energy to isolate a quark. 2.2 Quantum Chromodynamics and Lattice QCD LatticeQCD,beinganapproximationtotheQCD, The theory of QCD can be thought of as a recipe improves as the lattice points increase indefinitely forproducinghadronsfromquarksandgluons.Since and as the volume of the lattice grid expands. In quarks and gluons make upmost of theknown mass so doing it opens up avenues for statistical methods ofthephysicalworld,unravelingthequarkstructure to enter the picture. Physicists have explored some ofmatteristhekeytoanunderstandingofthephys- of these avenues, one of which is the focus of this ical world, and thus the importance of the subject paper; see Section 3 below. of this paper. 3. THE UNDERLYING PROBLEM: QCD TheQCDtheorywassuccessfulinenunciatingthe EQUATIONS properties of the hadrons. However, its complexity made its use for predicting unobservable quantum With Lattice QCD, an archetypal scenario is the quantities,likequarkmasses,almostimpossible.This estimation of an infinitenumber of parameters from 6 J. LANDON,F. X.LEE ANDN.D. SINGPURWALLA a finite number of equations. The left-hand side of 3.1 Relevance to Other Scenarios in Science and each equation is the result of a physics based Monte Engineering Carlo run, each run taking a long time to complete. The Lattice QCD architecture of equation (3.1) is Thus, there are only a finite number of runs. For not unique to physics. They occur in several other example, a meson correlator, G(t|·), takes the form scenarios in the physical, the chemical, the engi- (cf. Lepage et al., 2002) neering and the biological sciences, a few of which ∞ are highlighted below. Most attempts at estimation (3.1) G(t|·)= A e−Ent for t=0,1,2,..., n of the underlying parameters have involved least Xn=1 squares or numerical techniques based on local lin- wheretheparameters An denotetheamplitude, and earizationwithiterativeimprovements.Besideslack- En denotetheenergy.Also,E1<E2<···<En<···. ing a theoretical foundation vis-`a-vis the require- Interest centers around the estimation of An ment of coherence (cf. Bernardo and Smith, 1994, and En, n=1,2,..., based on G(t|·), estimated as page 23), techniques have proved notoriously unre- G(t|·),t=0,1,...,k,for somefinite k [23 in the case liable and not robust to slight changes in the exper- of Lepage et al. (2002)]. The physics codes which imental data (cf. Hildebrand, 1956). b generate the G(t|·)’s do not involve the A ’s and n Mathematical biology: exponential peeling in com- theE ’s,andareautocorrelated,thusthelabel“cor- n b partmentsystems Whenconsideringradioactivetra- relator.”Thephysicscodesalsoprovideestimates of cers used for studying transfer rate of substances in the autocorrelation matrix. living systems (cf. Robertson, 1957; Rubinow, 1975, DeterministicapproachestosolvefortheA ’sand n page 125), sums of exponentials are encountered. the E ’s cannot be invoked, and statistical approa- n Here,theG(t|·)ofequation(3.1)representsthecon- ches involving curvefitting bychi-square, maximum centration of a substance, the t’s are integer values likelihoodandempiricalBayeshaveprovedtobeun- oftime, andtheA ’s andtheE ’s areconstants that i i satisfactory(cf.Morningstar,2002).Foranapprecia- needtobeestimated.Hereinterestgenerally centers tion of these efforts, see Lepage et al. (2002), Fiebig around the case of n= 2, and the coefficients A n (2002) and Chen et al. (2004); the latter authors and E of equation (3.1) are negative. An ad hoc n proposewhattheycalla“sequentialempiricalBayes graphicalprocedurecalledthe method of exponential approach.”However,empiricalBayesapproachesuse peeling isusedtoestimatetheparameters(cf.Smith observed data to influence the choice of priors, and, and Morales, 1944, Perl, 1960; van Liew, 1967). as asserted by Morningstar (2002), are a violation Some other scenarios in biology where the Lattice of the Bayesian philosophy. Indeed, Fiebig (2002) QCDtypeequations appearareinbonemetabolism states that “Bayesian inference has too long been studies and cerebral blood flow (cf. Glass and de ignored by the lattice community as an analysis Garreta, 1967), and in biological decay (cf. Foss, tool. ...The method should be given serious con- 1969). In the latter context, Dyson and Isenberg sideration as an alternative for conventional ways.” (1971) consider for fluorescence decay an equation Bayesian approaches alternate to ours have been of the type considered by Nakahara, Asakawa and Hatsuda m (1999). These authors entertain the use of maxi- y(t)= α exp(−t/τ ), 0≤t≤T, j j mumentropypriors,but,asclaimedbyLepageetal. Xj=1 (2002), the accuracy of their estimator of E is in- 2 wherey(t)represents“momentsofthefluorescence,” ferior to those obtained using other approaches. Be- α ’s the amplitudes [the A ’s of equation (3.1)], cause priors based on the principle of maximum en- j n and the τ ’s are time constants corresponding to tropyresultindefaultpriors,suchpriorsalsoviolate j the E ’s of equation (3.1). Here the α ’s are zero the Bayesian philosophy. The approach of Lepage n j for j≥m+1. et al. (2002) is Bayesian in the sense that prior in- formation is used to augment a chi-square statistic Gene expression data When considering a time whichis then minimized.We findthis workvaluable seriesofgeneexpressiondata(cf.Giurcaneanuetal., because it articulates the underlyingissues and pro- 2005),asystemofequationsparallelingthatofequa- videsaframework forexaminingtheanatomy of the tion (3.1) arises again. In this context G(t|·) repre- QCD equations, which enables us to identify a pat- sents “mRNA concentrations” as afunction of time, tern,whichinturnenablesustoinvoketheBayesian andtheparameters A andE describeinteractions n n approach we propose. between the genes. In the gene expression context, 7 PARTICLEPHYSICS as in the Lattice QCD context, the parameters E n are increasing in n. Nuclear magnetic resonance (NMR) NMR exper- iments often generate data that are modeled as the sum of exponentials (cf. Bretthorst et al., 2005). Experiments relying on NMR to probe reaction ki- neticis, diffusion, molecular dynamics and xenobi- otic metabolism are some of the applications where parameter estimates provide insight into chemical and biological processes. See, for example, Paluszny etal.(2008/09)whostudybraintissuesegmentation from NMR data. Here one considers equations of the type Fig. 6. Number of An’s as a function of t. m di=C+ Ajexp{−αjti}+ni, plying that the An’s are constrained. When t→∞, Xj=1 G(t|·)=0,whichimpliesthatforlargevaluesoft,A n wheremisthenumberofexponentialsanddi adata and the En cannot be individually estimated. Thus, value sampled at t . The parameters of interest are simulatingG(t|·)forlargetdoesnothaveapayback; i the decay rate constants α , the amplitudes A and consequently, it is futile to do such a simulation. j j the constant offset C. The ns’s are the error terms. Sincethe En’s increase with n,wemay, as astart, Electromechanical oscillations in power systems reparameterize the En’s as En−En−1=c, for some unknown c, c>0, for n=2,3,.... It will be argued Equations entailing thesumof exponentials arealso later, in Section 6.1, that c is a latent parameter. encountered in the context of low frequency electro- Thus, mechanical oscillations of interconnected power sys- tems, the impulse response of linear systems in net- (3.2) E =E +(n−1)c, n=2,3,... n 1 works,ultracentrifugeandahostof otherrelaxation withE andcunknown.Withtheaboveassumption 1 andtime-decay phenomena(cf.DysonandIsenberg, in place, a parsimonious version of the Lattice QCD 1971). For example, in the electromagnetic oscilla- equation takes the form tions scenario, Sanchez-Gasca and Chow (1999) en- ∞ counter an equation analogous to our equation (3.1) (3.3) G(t|·)=e−E1t A e−(n−1)ct, n withG(t|·)denotingasignalandA connotingasig- n nX=1 nal residue associated with the “mode” E . n t=0,1,2,.... To summarize, the relationships of the type given byequation(3.1)ariseinsomanycontextsofscience With c fixed, the parsimonious model given above and engineering that it seems to be quintessential, reveals the following features: and almost some kind of law of nature. The Lat- (a) When t is small, the number of A ’s entering n tice QCD problem considered here can therefore be equation (3.3) is large; indeed, infinite when t=0. seenasaprototypeandaconvenient platformtoex- (b) Whentislarge,thenumberofA ’sweneedto n positastatisticalproblemofgeneralapplicability.In considerissmall,becausethecombinationofalarget most of the application scenarios described above, with any n will make the term A exp(−(n−1)ct) n statistical methods have been used, many ad hoc, get small enough to be ignored. some empirical Bayesian and a few Bayesian (un- (c) Moderate values of t and n will also make the der the rubric of maximum entropy). Many of these above term small, causing A to be irrelevant. n methodshavenotexploitedanunderlyingtelescopic Figure6illustratesthefeaturethatas tgetslarge, pattern in these equations which makes an appear- thenumberof A ’soneneedstoconsidergets small. ance when a latent parameter is introduced into the n As a consequence of the above, for any fixed c, we system, and inference about the latent parameter can find a t such that in the expression made. 1 e−tE1[A +A e−ct+A e−2ct+··· 3.2 Anatomy of the Lattice QCD equations 1 2 3 +A e−(n−1)ct+···], An examination of equation (3.1) yields the fol- n ∞ lowing boundary conditions. G(0|·)= A , im- all the terms, save for A , are essentially zero. n=1 n 1 P 8 J. LANDON,F. X.LEE ANDN.D. SINGPURWALLA Similarly, we can find a t , t <t , such that all Besidesprovidingy andy ,thephysicscodesalso 2 2 1 1 2 the terms save for A and A e−ct2 get annihilated. provide σ2, σ2 and ρ . As a consequence, the sta- 1 2 1 2 12 Continuing in this vein, there exists a sequence t < tistical model boils down to the bivariate normal k tk−1 <···<t2 <t1, such that all that matters are distribution, the terms associated with A1,A2,...,Ak. In what Y G(t ) σ2 ρ σ σ folTlohwuss,,wfoersaunpypfioxseedthcaatnkdiks,swpeitchifited>. t >···>t (4.3) (cid:20)Y12(cid:21)∼N(cid:18)(cid:20)G(t21)(cid:21),(cid:20)ρ12σ11σ2 12σ221 2(cid:21)(cid:19). 1 2 k Writing out a likelihood function for the unknowns chosen in the manner described above, our parsimo- A ,E ,A andc,basedonequation(4.3),isastraight- niousversionoftheLatticeQCDequationstelescope 1 1 2 forward matter. However, we need to bear in mind as follows: thatsincetheparameters A and E appearin both G(t |·)=e−E1t1A , 1 1 1 1 G(t |·) and G(t |·), both y and y provide informa- 1 2 1 2 G(t2|·)=e−E1t2(A1+A2e−ct2), tion about A1 and E1, with y2 providing informa- tion about A and c as well. To exploit this feature, G(t |·)=e−E1t3(A +A e−ct3 +A e−2ct3), 2 3 1 2 3 we construct our likelihoods based on the marginal (3.4) .. distribution of Y1, and the conditional distribution . of Y given Y . That is, on 2 1 G(tk|·)=e−E1tn(A1+A2e−ctk +··· (4.4) Y1∼N(A1e−E1t1,σ12) +A e−(k−1)ctk). and k σ To summarize, by introducing the constant c, fixing (Y |Y =y )∼N G(t |·)+ρ 2(y −G(t |·)), 2 1 1 2 12 1 1 (cid:18) σ a k, and identifying an underlying pattern in the 1 (4.5) Lattice QCD equations, we have reduced the prob- lemtothecaseofk equationsand(k+2)unknowns, σ2(1−ρ2 ) . 2 12 (cid:19) A ,...,A ,E and c.Thechoice ofwhat k tochoose 1 k 1 is determined by the number of physics code based Specifically, the likelihood of A1 and E1, with y1 fi- estimates G(t),t=0,1,...,k, that can be done and xed, is are available. (y −A e−E1t1)2 b (4.6) L(A ,E ;y )∝exp − 1 1 , 1 1 1 (cid:20) 2σ2 (cid:21) 4. STATISTICAL MODEL: SOLVING THE QCD 1 EQUATIONS andthelikelihoodofA1,E1,A2 andc,withy2 fixed, and the effect of y incorporated via the posterior 1 Many have expressed the view that it would be distribution of A and E , is of the form 1 1 considered goodprogress if trustworthy estimates of L(A ,E ,A ,c;y ,y ) just A ,A ,E and E can be had. The other pairs 1 1 2 1 2 1 2 1 2 (A ,E ), (A ,E ),..., can be considered later; see 3 3 4 4 ∝exp − y −(e−E1t2(A +A e−ct2)) Section 6. Thus, we start by focusing attention on 2 1 2 (cid:20) (cid:26) the first two equalities of equation (3.4); that is, the (4.7) casek=2andsomefixedc.Specifically,weconsider σ 2 +ρ 2(y −A e−E1t1) G(t1|A1,E1)=e−E1t1A1 and 12σ1 1 1 (cid:27) (4.1) G(t2|A1,E1,A2,c)=e−E1t2(A1+A2e−ct2). ·[2σ2(1−ρ2 )]−1 . 2 12 (cid:21) Ify = G(t |·),i=1,2,denotesthephysicscodebased i i In the above development, the covariance matrix is evaluationsofG(t |·),thenouraimistoestimateA , i 1 b provided by the physics code. As suggested by a ref- E ,A andc,inlightofy andy .Tosetupourlike- 1 2 1 2 eree, a deeper investigation of this matrix may be lihoods, we take a lead from what has been done called for, because with increasing t, the variances by Nakahara, Asakawa and Hatsuda (1999), and by are likely to increase, posing computational chal- Lepage et al. (2002), to write lenges to the proposed approach. Y =G(t |·)+ε and 1 1 1 4.1 Specification of the Prior Distributions (4.2) Y =G(t |·)+ε , 2 2 2 To implement our Bayesian approach, we need to where ε ∼N(0,σ2),i=1,2, and Corr(ε ,ε )=ρ . make assumptions about conditional independence, i i 1 2 12 9 PARTICLEPHYSICS and assign prior distributions for the unknown pa- tributionsobtainedinPhaseIwillserveasthepriors rameters.ThepriorsthatweendupchoosinginSec- of A and E in Phase II. Since the parameters A 1 1 1 tion 5 are not based on knowledge of the underlying and E reappear in the likelihood of equation (4.7) 1 physics, but are proper priors based on an appreci- as the mean of y , Phase II of the MCMC run cap- 2 ation of the material in Morningstar (2002), Lepage turestheeffect of y on theseparameters.Theeffect 2 et al. (2002) and Fleming (2005). of y was captured in Phase I. 1 The A ’s are supposedly between 0 and 1, and no Phase III. Repeat Phase I and Phase II m times i relationship between them has been claimed. Thus, using new starting values of E and c to produce 1 it is natural to assume that A and A are apriori a sample of size m from the posterior distribution 1 2 independent, and have a beta distribution on (0,1) of A , E , A and c, with y and y as the data. 1 1 2 1 2 withparameters(α,β);wedenotethisasB(A ;α,β), The MCMC exercise described above is routine, i i=1,2. The relationship between E and c is less but computer intensive and entails 12 steps, six in 1 straightforward.WeconjecturethatthelargertheE , each phase, and this too for a highly curtailed ver- 1 the smaller the c, and that E can take values over sion of the Lattice QCD equations. The details of 1 (0,∞). It is therefore reasonable to assume that the how this is done could be interesting, because they prior on E is a gamma distribution with scale pa- involve some discretization of the simulated poste- 1 rameter η and shape parameter λ; we denote this rior distributions,and working with individualsam- by G(E ;η,λ). Some other meaningful choices for pled values reminiscent of that done in particle Kal- 1 aprioronE couldbeaWeibull,oraPareto,thelat- manfiltering(cf.Gordon,SalmondandSmith,1993). 1 ter being noteworthy as a fat-tailed distribution. To Thus, we label our approach as Particle MCMC. encapsulate the dependence between E and c, we More details are given in Landon (2007), and the 1 supposethat, given E1, c has auniform distribution method illustrated in the Appendix. The software over (0,ω/E ), for some ω>0. Finally, we also as- can be downloaded at http://www.gwu.edu/˜stat/ 1 sume that E1 and c are independent of all the Ai’s. irra/Lattice QCD.htm. Theabove choice of priors, with user specified hy- 4.3 A Caveat of the Proposed Scheme perparameters α, β, ω, λ and η, is illustrative. In principle, any collection of meaningful priors can be The caveat mentioned here stems from the fea- used,sincetheensuinginferenceisdonenumerically tures that c has been fixed, and that the MCMC viaaMarkovchainMonteCarlo(MCMC)approach. runs are centered around fixed values of y and y . 1 2 Lepageetal.(2002),andalsoMorningstar(2002), To see why, recall that our parsimonious version of seem to use independentGaussian priors for the pa- the Lattice QCD equations [see equation (3.4)] is rameters in question—see equations (8) and (11) based on those t ’s for which the exponential terms i respectively. Indeed, Morningstar (2002) makes the vanish; however, the t ’s are determined by a fixed i claim that “practitioners often restrict the choice of value of c. Thus, any change in the value of c will apriortosomefamiliardistributionalform.”There- bring about a change in the values of t , and, as i stricted parameter space makes the choice of Gaus- a consequence, the Lattice QCD equations will also sian priors questionable. An overview of how the have to be different. This would be tantamount to MCMC is invoked here is given next. obtaining new values of the y ’s. However, all the i likelihoods in the MCMC runs are based on fixed 4.2 An Outline of the MCMC Excercise values of the y ’s; see equations (4.6) and (4.7). But i The telescopic nature of the Lattice QCD equa- a change in the value of c is inevitable, because tions suggests that the MCMC will have to be con- in Phase II of the MCMC run one iterates around ducted in the following three phases: sampled values from the posterior distribution of c, Phase I. Using E(0) as a starting value and y as so that the initial c(0) systematically gets replaced 1 1 data, obtain the posterior distribution of A and E by c(1), c(2),...,c(1,000). 1 1 via equation (4.6) as the likelihood, and 1,000 iter- Away toovercome this caveat is torecognize that ations of the MCMC run. forany c(i)>c(0), i=1,2,..., theexponentialterms Phase II. Using c(0) as a starting value, and y as mentioned above will continue to vanish, so that 2 data, obtain a sample from the posterior distribu- any specified values of y will continue to satisfy the i tion of A , E , A and c via the likelihood of equa- right-hand side of equation (4.2). 1 1 2 tion (4.7), and 1,000 iterations of the MCMC run. A strategy to ensure that the successively gen- Samplevaluesof A andE fromtheirposteriordis- erated values of c(i), i = 1,2,..., will tend to be 1 1 10 J. LANDON,F. X.LEE ANDN.D. SINGPURWALLA (a) (b) Fig. 7. Posterior distribution of E . (a) Posterior of E based on y . (b) Posterior of E based on y and y . 1 1 12 1 12 6 greater than c(0) is to pick small values of c(0) for Table 1 Simulated data for validating approach each of the m iterations of Phase III of the MCMC algorithm. During the course of the MCMC runs, Time t Indexti G(t) yi should one encounter a generated value of c(i) that is smaller than c(0), then one should discard the 1 0.54874373 0.54900146 so-generated value c(i), and generate another value 2 0.17764687 0.17756522 of c(i). Hopefully, the number of discarded c(i)’s will 3 0.06387622 0.06373037 4 t 0.02422158 0.02414992 3 not be excessive, but if they are, then the start- 5 0.00945326 0.00952723 ing value c(0) should be decreased, and new values 6 t 0.00375071 0.00377058 2 of t1 and t2 obtained. This of course would be tan- 7 0.00151265 0.00151498 tamount to obtaining new values of y and y as 8 0.00060552 0.00061147 1 2 well. 9 0.00024486 0.00024698 10 0.00009923 0.00009821 11 0.00004026 0.00004007 5. PROOF OF PRINCIPLE: VALIDATION 12 t 0.00001635 0.00001625 AGAINST DATA 1 We first validate the accuracy of our approach 5.1 Results Based on Simulated Data againstsimulateddata.Forthis,wechooseA =0.8, 1 A2 =0.6, A3 =0.4, A4 =0.2, A5 =0.1, and Ai =0 Figure 7(a) and (b) shows the posterior distribu- fori≥6.WealsochooseE1=0.9andc=0.5.Using tions of E1 based on y12, and on y12 and y6, re- these values in equation (3.3), we compute G(t), for spectively. Recall that y corresponds to t , and y 12 1 6 t=1,2,...,12; these are shown in column 3 of Ta- corresponds to t . Note that the posterior distribu- 2 ble 1. Since Yt=G(t|·)+εt, with εt∼N(0,σt2) [see tion of Figure 7(a) becomes the prior distribution equation (4.2)], we generate y1,...,y12, assuming for the construction of the posterior distribution of the εt’s are independent, with σt=0.001×G(t)×t; Figure 7(b). Both the distributions of Figure 7 indi- these are shown in column 4 of Table 1. We next cate a modal value of 0.9, suggesting a tendency to identify those t’s for which the leading exponen- converge to the true value of E . Furthermore, the 1 tial terms vanish. These happen to be t at t=12, difference between the two distributions is not very 1 t at t = 6, and t at t = 4; see column 2 of Ta- great, suggesting that y may not be contributing 2 3 6 ble 1. Our aim is to invoke the methods of Section 4 muchtoward inferencefor E ,beyondthatprovided 1 on the entries of Table 1, to see if the constants by y . 12 specified above can be returned. With the above A similar feature is revealed by the posterior dis- in place, Phases I, II and III of the MCMC run tributions of A , shown in Figures 8(a) and (b). 1 weremadearbitrarilychoosingthehyperparameters These distributions have a modal value of 0.8, sug- α=β=η=λ=ω=1, and m=1,000. gesting again a convergence to the true value of A . 1