1 The HST Key Project on the Extragalactic Distance Scale: A Search for Three Numbers Barry F. Madore and Wendy L. Freedman Observatories of the Carnegie Institution of Washington 813 Santa Barbara Street, Pasadena, CA 91101, USA Abstract. We present a review of the main results of the Hubble Space Telescope Key Project on the Extragalactic Distance Scale with emphasis on the new techni- ques that were developed in order to undertake the observations, and the methods thatwereadoptedinbothreportingtheresultsandquantifyingespeciallytheassocia- ted statistical and systematic errors. The three numbers (the cosmological expansion rate H , its statistical error and the systematic uncertainty, respectively) are H = 72 o o ±(3) ±[7] km/sec/Mpc. 1.1 Introduction The Hubble Space Telescope (HST) was designed and built to measure the ex- pansionrateoftheUniverse.Anditsucceeded.FordecadesbeforeHSTadebate wasragingoverafactor-of-twodisagreementinthevalueoftheHubbleconstant. After 8 years of observing and data reduction the final results of the HST Key Project on the Extragalactic Distance Scale were published in Freedman et al. (2001)[5].Precedingthatoverthirtypaperswerepublisheddetailingtheresults on the discovery and measurement of Cepheids in individual galaxies constitu- ting the Key Project sample. And with those publications the debate over the Hubbleconstanthasbeenreducedtoandfocuseduponaninterestingdiscussion of dominant systematics and residual random errors, but now at the 10% level. TheoverallgoaloftheKeyProjectwastomeasuretheexpansionrateofthe Universe, using Cepheids to calibrate a variety of independent, secondary di- stance indicators so as to then reach beyond the locally perturbed flow out into thecosmologicallydominatedexpansion.Giventhehistoryofhavingsystematic errorsdominatingtheaccuracyofdistancemeasurements,theapproachadopted by the Key Project was to explicitly assess systematics in any one approach by examiningandusingseveraldifferentmethodsenroutetoaglobalmeasurement. These secondary methods included surface brightness fluctuations, Type II su- pernovae,theTully-Fisherrelationforspiralgalaxies,thefundamentalplanefor elliptical galaxies, and finally Type Ia supernovae at the very farthest extreme in distance. Each of the secondary distance indicators had their own strengths and their own weaknesses; many overlapped in distance; some could be applied to the same galaxies; all had their own systematics, both a cluster environment and the field were being sampled. If systematic differences were to be found this experiment was designed to highlight and to quantify them. B.F. Madore and W.L. Freedman, The HST Key Project on the Extragalactic Distance Scale: A SearchforThreeNumbers,Lect.NotesPhys.635,1–20(2003) http://www.springerlink.com/ (cid:1)c Springer-VerlagBerlinHeidelberg2003 2 B.F. Madore and W.L. Freedman Fig.1.1. Averaging over Systematics – An example of how averaging can reduce the noise in the statistical determination of a standard metric. In this case the mea- surement being undertaken is the length of a “standard rod” as defined by twenty randomly selected “feet” And so as the Cepheid observations were made and distances began to be compiled the first order of business was to establish rigorous standards of docu- menting,propagatingandreportingthestatisticalerrorsassociatedwithsample sizes. This was paralleled by the enumeration and assessment of the various sy- stematicerrorsassociatedwitheachdecisionandeverysteptakenalongtheway toevaluatingdistancesandvelocitiescontributingtoafinalvalueoftheHubble constant. This review will not so much dwell on the value of the Hubble constant that was finally reported, but rather the emphasis will be on the errors associated with that determination. 1.2 “Statistics, Damned Statistics, and...” Benjamin Disraeli is reported to have said that there are three kinds of lies: “Lies, Damned Lies, and Statistics”. Curiously, it is now regarded that Disra- eli never uttered these words and that Mark Twain, who in his autobiography 1 The HST Key Project on the Extragalactic Distance Scale 3 attributed this now famous phrase to the then Prime Minister of England was himselfapparentlyindulginginalieofthefirstkind.Whethertherewasabitof hiddenironythereonewillneverknow,butwhatisclearlymeantbytheoriginal statement, whoever really said it, it is that at least in some contexts statistics fall at the very bottom of the credibility heap. But that unfortunately, is where any new study of the expansion rate of the Universe had to start. But in the end, it was the careful application of statistics and error analysis that brought the results of this study into sharp and final focus. Theremaybethreetypesoflies,butforourpurposesthereareonlytwotypes of errors: statistical errors and systematic errors. Statistical (or random) errors usuallyareamenabletoreductionbyincreasingthesamplesize,N.√Theyobeya random-walkconvergencearoundtheirmean,slowlydecreasingas1 N.Various statistical errors can be combined in quadrature (if they are statistically (sic) independent),andinsuchacaseafinallyreported,single,combinederrormakes both mathematical and physical sense. Statistical errors measure precision. Systematic errors are of an entirely different breed. They measure accuracy. No matter how many times an experiment may be repeated, and no matter how many samples may be taken, if the methodology is unchanged any inhe- rent systematic errors will remain the same. Systematic errors are offsets and displacements of the answer from the truth that no increase in sample size can reveal or reduce. As such systematic errors are hard to evaluate even when they are identified, and it is even harder to know when and whether they have even been identified at all. After the noise of small numbers has been beaten down by “statistically significant” sample sizes, one is always left “dominated by the systematics”.TheHubbleconstantlongsufferedfrombothlarge,butunknown, systematics, and from small, but over-worked, samples. TheHubbleSpaceTelescopetookcareofthesamplesize,asshownbyTable3 in Freedman et al. (2001) [5] which lists the revised Cepheid distances to thirty- one galaxies after the Key Project was completed. This is to be compared to the handful of distances available in 1990, say, after more than half a century of observing from the ground (see Madore & Freedman 1991 [7] for a compilation of Cepheid distances relevant to that pre-HST period in time). Dealingwithsystematicerrorsrequiredasoberenumerationoftime-honored methods and an explicit evaluation of a host of implicit assumptions. But HST did not just guarantee time to do the project as one would have done it from the ground. It demanded that the whole project be optimized. We had to know just how many observations of how many Cepheids in how many galaxies would get sufficiently reduce the statistical errors. And then we had to select those galaxies to cover and include as many tests for systematics as we could conceive of. And do this before the shutter ever opened on the first target. Below we outline how we optimized the scheduling of HST to monitor a large number of galaxies to find significant samples of Cepheids, and how the numbers of observations in the two filters were chosen so as to allow us to discover variables, select out the Cepheids, measure their periods, amplitudes, mean magnitudes and time-averaged colors, and accomplish all of this within a 4 B.F. Madore and W.L. Freedman narrowwindowoftimeoftennotevenaslongastheperiodsofthelongest-period Cepheids themselves. 1.3 Optimal Search Strategies From the ground, after time is assigned on a given telescope, there is little control one can exercise over the rising and setting of the Moon, or the motion of weather patterns across the surface of the Earth. Even in sunny California one can observe only when it is dark, and even then only when it is clear. The situation from Earth-orbit is, of course, not the same; in many different ways. Every ninety minutes there is an opportunity to open the shutter and begin observing. Within those orbital constraints one can in principle schedule the telescope to point at anything that is not occulted by the Earth, Moon or Sun. HST could be scheduled in just that way. The challenge was to capitalize on this feature and optimize the use of the telescope in monitoring a fair sample of objects. The historical precedent was not good. From the ground, Hubble, Baade, Sandage and others spent decades semi-systematically (but still more or less randomly) observing some of the most nearby galaxies (M31, IC 1613, NGC 6822, etc.) in search of variable stars. And with time and with great patience results did flow in. Nevertheless the lunar cycle still imposed aliases on the observations (Fig. 1.2), as extensive as they were, leaving gaps and clumps in the phase-folded lightcurves of some of the variables. The situation was clearly notoptimal.Infact,itwasnotevenpossibletoconsideroptimizingthesituation, so little attention was paid to producing an observing strategy matched to the problem; that is, not until HST came along. WithHSTitwasnotonlypossiblebutitquicklybecamemandatorythatthe telescope be scheduled in a highly optimized way. Time was extremely valuable and competition was fierce. Furthermore it was not just one or two additional galaxiesthatneededCepheiddistancesdetermineditwasanorderofmagnitude more than had already been done from the ground that was required in order to calibrate a variety of secondary distance indicators. So the challenge was: Observe about a dozen different galaxies. Reduce the numbers of observations from more than 100 per target down to order 10 epochs. Do this in two colors (to measure reddenings). Detect the variables. Measure their periods. Extract time-averaged luminosities, amplitudes and colors. And complete each set of discovery/detection/measurementobservationsinasinglewindow,generallynot exceeding 60 days. Exposures had to be long enough to get good photon statistics on the indi- vidual measurements of the stars, so as to unequivocally discriminate variables from constant stars. Those same individual phase points had to be sufficiently highinsignal-to-noisetoallowadelineationofthelightcurvesothatphasingof thedataandaperioddeterminationcouldbemadethatwasinitselfsufficiently precise that a robust period-luminosity relation could be constructed and false non-Cepheids discriminated against either by the shape of their light curve or 1 The HST Key Project on the Extragalactic Distance Scale 5 Fig.1.2. A Sample of Light Curves of Cepheids in M31 as published by Baade & Swope (1965) [1]. Note that even with over 70 observations, taken over a period of six years there are still strong resonances in the data, especially for variables V326 and V254whoseperiodsareveryclosetoanintegralnumberofdays,andinfactveryclose to being exactly a week by their colors, magnitudes and/or corresponding periods. Color observations had to be woven into the observing schedule so that two independent apparent moduli could be derived and reddening corrections extracted and applied to the determination of a final true distance modulus. Randomsamplingofanysourc√e(intrinsicallyvariableornot)willonlydrive down the error on the mean as 1 N; but for a variable with an intrinsic am- plitude larger than the observing errors the empirical demands are considerably worse. In our case (for Cepheids with amplitudes expected to be anywhere up to 1.5 magnitudes in the visual) the dominant error on the mean magnitude and color is driven by the phase sampling of the light curve itself. Obviously an abundance of observations randomly clumped at maximum light would bias the mean to too bright a magnitude. Too many observations wasted at minimum light would bias the mean too lo√w. The equivalent sigma of an intrinsic variable with a 1.0 mag amplitude is 1.0 12 mag or approximately ±0.30 mag. To drive 6 B.F. Madore and W.L. Freedman thiserroronthemeandownto0.03magwouldthenrequireontheorderof100 observations! This was unthinkable for a Hubble Space Telescope project intent on observing more than a dozen galaxies. Before describing our finally adopted sampling strategy, and reasoning for it, we note that in a more general context the detection and characterization of time-dependentsignalshasalongandwellstudiedhistory,especiallyinelectrical engineeringanditsalliedsciences.Itwillnotberepeatedhere.However,wenote that much of the theory and many of the practical applications involve equal- spacedsamplingathighsignaltonoise,obtainedoverlongruntimes,andusually covering many cycles of the searched-for, but unknown, signal. After detection, the characterization or parameterization usually involves the determination of a period, an amplitude and phase, and then finally the quantification of some shape parameters of the signal. With regard to the latter point, we note that periodic signals certainly are not always sinusoidal in form, but often they can reasonably be decomposed into the superposition of a few low-order Fourier components. Here we discuss an extreme corner of parameter space in signal detection not much explored by others: a region defined by very small numbers of observations (a mere handful), all of which are non-uniformly placed in time, covering at most a few cycles of highly asymmetric, but still periodic, signals. 1.4 A Figure of Merit As with any parameter extraction it is useful to have a quantitative measure for the goodness of the solution. We begin by stating that the ideal distribution of points over the phase-folded waveform is that where the observations fall equallyspacedoverthelightcurve,wherenoneareredundant(i.e.,nocoincident observations). With this in mind we have devised the following figure of merit: For a given number of observations (N) we first calculate a variance in phase space (cid:1)N ∆2 = (φ −φ )2 N i+1 i i=1 whichforthecaseoftheidealsampling(i.e.,thatofuniformandnon-redundant placement around the light curve) reduces to (cid:1)N ∆2 = (1/N)2 =1/N N,uniform i=1 (cid:2)N (where φ = φ +1.0, and (φ −φ ) = 1.0). For the actual resultant N+1 1 i+1 i i=1 sampling (phase-folded to a given period) the equivalent (realized) statistic ∆2 N is analogously derived from the sum of the squares of intervals separating the orderpairsofobservationsovertheunitinterval.Thefinalfigureofmeritisthen the difference between the realized ∆2 statistic and the ideal phasing statistic N 1 The HST Key Project on the Extragalactic Distance Scale 7 ∆2 , normalized by the ideal case, giving N,uniform U2 =[∆2 −∆2 ]/[(N −1)∆2 ] N N,uniform N,uniform or (cid:1)N U2 =[N (φ −φ )2−1]/(N −1) i+1 i i=1 Interpreted as a normalized variance, the U2 statistic has a value of zero when thedistributionofpointsisnon-redundantlyuniformoverthelightcurve,anda value of unity when all points are coincident in phase space. The added division by(N-1)isintroducedtoforcethevariancetounityindependentofsamplesize, whenallpointsclusteratasinglephase(e.g.,totalredundancy).Inthefollowing however, we chose to plot the Uniformity Index (UI) which is based on U2 but simply inverts and maps it onto the interval [0-100] by the following simple transformation: UI =100[1−U2]. In this way a score of 100 indicates perfectly uniform sampling over the light curve, and 0 indicates total failure, resulting fromcompleteredundancyinthephaseplacementoftheobservationswherethe data points, folded over the period of the variable, all end up at precisely the same phase point. 1.5 Sampling Strategies While it is intuitively obvious that optimal sampling of a signal with known frequency should be undertaken in such a way that no two observations overlap in phase as seen by the signal, what is perhaps not so obvious is that such a uniform sampling strategy has very important consequences for that rate of convergence of such things as the measured amplitude and the error on the calculatedmean(bothofw√hichareintimatelyrelated).Unlikerandomsampling which is a random-walk 1/ N process, uniform sampling has both of its errors on amplitude and on the calculated mean drop much more rapidly; in fact, those errors fall directly as 1/N. This is a considerable savings when additional observations come at a high premium. A factor of 3 in observing time is always easier to come by than is a factor of 10. Optimization of the type discussed here buys that difference. In the following series of plots and diagrams (Figs.1.3–1.9) we first explore the systematics of the random sampling of highly asymmetric light curves using extremely few observations. Indeed we begin with 2, 3 and 4 observations only and then jump to 12 observations, which is the adopted number of observations typically used in the small observing window imposed by orbital constraints upon the Key project. The important details of the simulations are given in extensive captions to the figures. The simulations bear out the expectation that random sampling is highly inefficient in its error convergence properties where the clearly Gaussian distribution of er√rors quantities of interest (such as the mean magnitude) only go down like 1/ N, and where the first-order measure of the light curve shape, the measured amplitude has a distribution that can be 8 B.F. Madore and W.L. Freedman Fig.1.3. Monte Carlo Simulations of the mean magnitude, the uniformity index andthederivedamplitudeforasyntheticlightcurveapproximatingthatofavariable star.Thispanelshowsthedistributionsandmarginalizedvaluesforrandomsamplesof twoobservations(N=2)only.Intheuppermiddlepanelisshownanexpandedviewof the correlation of the mean magnitude versus the uniformity index. The right-handed square bracket near UI = 1 shows the total range of mean magnitudes predicted for the equivalent number of observations uniformly sampling the light curve (but with random phase with respect to the periodic function). The next error bar to the right uses the simulated data and shows the data-derived standard deviation (thick line) and two-sigma (thin extension). The final error bar to the far right is the one-sigma error bar for the uniform sampling. It can be shown that the distribution function for themarginalizedmeans(centralpanel)forN=2isasymmetricaltriangularfunction centered on a mean of 0.5. The marginalized distribution of observed amplitudes is a ramp function with the modal value of the amplitude being zero provenusingorderstatisticstohaveexactlytheformoftheBetafunctionwhich only slowly converges on the true amplitude and has a long tail toward small (derived) amplitudes. Knowing that random sampling is too inefficient and that uniform sampling over the light curve is ideal the problem becomes that of finding an observing strategy (in real time) that corresponds to uniform sampling as viewed by a 1 The HST Key Project on the Extragalactic Distance Scale 9 Fig.1.4. Monte Carlo Simulations of the mean magnitude, the uniformity index andthederivedamplitudeforasyntheticlightcurveapproximatingthatofavariable star. This panel shows the distributions and marginalized values for random samples of three observations (N = 3) only. The error bars are as discussed in Fig. 1.3; but note again how much smaller the error bar for the uniform sampling is (far right) as compared to the one-sigma error seen for random sampling (thick error bar). The distribution of marginalized means (central panel) is rapidly becoming Gaussian in appearance,whilethemarginalizeddistributionofamplitudesisverysymmetricabout Amplitude = 0.5 Fig.1.5. The same as Figs. 1.3 and 1.4 except that this panel shows the results of MonteCarlosimulationsofdistributionsandmarginalizedvaluesforrandomsampling ofalightcurveusingonlyfourobservations(N=4).Themarginalizedmeanscontinue tobecomemoreGaussianintheirdistribution,whilethedistributionofamplitudesand uniformity indices are both markedly asymmetric, and skewed towards larger values 10 B.F. Madore and W.L. Freedman Fig.1.6. Uniform Sampling in Time. The upper panel on the left shows the Uni- formity Index (UI) as a function of period resulting from a sample of 12 observations placed equidistantly within an observing window W = 100 days. As a function of pe- riod(movingupinthepanel)onecanseethatthemeanleveloftheUniformityIndex (asmarkedbythesolidverticallines)isadecreasingfunctionofperiod.Similarly,the variance in the Uniformity Index increase toward shorter periods. Excursions to low values of UI occur when data points fall redundantly at the same phase in the light curve. Themiddlepanelshowsaselectionoflightcurvesforperiodsrangingfromafewdays upto80days.Theactualtimeoftheobservationwithinthe100-daywindowisshown by the 12 vertical lines crossing the light curves at points marked by encircled dots. TheUIforthe80-dayvariable(top)isveryhighandascanbeseenintherightpanel thephase-foldedlightcurveisveryuniformlysampled.The36-dayvariableisalsouni- formly sampled but its UI is significantly lower given the fact that each plotted point represents three (overlapping) observations in the phase-folded plot. The 22-day Cep- heid also shows redundant (phase-clumped) observations when folded over the known period.The10-dayvariableisalsoveryuniformlycoveredbutastrongaliascanbecan be readily seen in the real-time plot of the observations in the middle panel where an equally good light curve having a period of about 100 days would produce an equally compelling fit