Table Of Content

Draft version January 24, 2017 TypesetusingLATEXtwocolumnstyleinAASTeX61 A PRINCIPAL COMPONENT ANALYSIS OF THE DIFFUSE INTERSTELLAR BANDS T. Ensor,1 J. Cami,1,2 N.H. Bhatt,1 and A. Soddu1 1Department of Physics and Astronomy and Centre for Planetary Science and Exploration (CPSX), The University of Western Ontario, 7 London, ON N6A 3K7, Canada 1 2SETI Institute, 189 Bernardo Ave, Suite 100, Mountain View, CA 94043, USA 0 2 ABSTRACT n We present a principal component analysis of 23 line of sight parameters (including the strengths of 16 diffuse a J interstellarbands,DIBs)forawell-chosensampleofsingle-cloudsightlinesrepresentingabroadrangeofenvironmental 2 conditions. Our analysis indicates that the majority (∼93%) of the variations in the measurements can be captured 2 byonlyfourparametersThemaindriver(i.e.,thefirstprincipalcomponent)istheamountofDIB-producingmaterial in the line of sight, a quantity that is extremely well traced by the equivalent width of the λ5797 DIB. The second ] A principal component is the amount of UV radiation, which correlates well with the λ5797/λ5780 DIB strength ratio. G Theremainingtwoprincipalcomponentsaremoredifficulttointerpret, butarelikelyrelatedtothepropertiesofdust in the line of sight (e.g., the gas-to-dust ratio). With our PCA results, the DIBs can then be used to estimate these . h line of sight parameters. p - o Keywords: ISM:lines and bands — ISM:molecules — methods:data analysis — methods:statistical r t s a [ 1 v 0 8 1 6 0 . 1 0 7 1 : v i X r a Correspondingauthor: J.Cami 2 Ensor et al. 1. INTRODUCTION smallnumberofDIBs,typicallyfairlystrongandnarrow DIBs. Second,whiletheroleofmeasurementuncertain- One of the greatest outstanding astronomical chal- ties on the correlation coefficient is well established (see lenges is the identification of the diffuse interstellar e.g. discussions in Herbig 1975; Cami et al. 1997), they bands (DIBs): a series of ∼500 absorption features de- are often not taken into account in correlation studies. tected in optical and infrared spectra toward reddened Correlations between the DIB strengths and line of stars (see Herbig 1995; Sarre 2006; Snow 2014 for re- sight parameters can reveal additional properties about views and Hobbs et al. 2008, 2009 for recent surveys). the DIB carriers. DIB strengths often show some cor- It has been clear that the DIBs arise from material in relation (r typically 0.7) with various other line of sight interstellar clouds since they were first detected (Heger parameters, albeit typically with a large scatter around 1922); but despite nearly 100 years of research, most the mean relation; examples are the correlation with of the DIB carriers remain unidentified. The only no- E(B −V), or the λ5780 DIB strength with N(H I) or table exception is the recent identification of four DIBs asduetoC+ (Campbelletal.2015;Walkeretal.2015). thecolumndensitiesofotherinterstellarspecies(seee.g. 60 Herbig 1975, 1993, 1995; Krel(cid:32)owski et al. 1999; Welty This identification is in line with the general consensus etal.2006;Friedmanetal.2011;Lanetal.2015;Baron that the DIB carriers are highly stable, carbonaceous, etal.2015). Aparticularlyintriguingfindingisasubset gas-phase molecules. of DIBs (the so-called “C -DIBs”) that roughly corre- An identification of DIBs with specific carriers re- 2 latewithN(C )andarethoughttobechemicallyrelated quires a perfect match between laboratory spectra and 2 to C or else form under similar conditions (Thorburn astronomical observations; however, given the countless 2 et al. 2003). Modern surveys have confirmed such re- numbers of possible carrier candidates, this is not an lations for averaged DIB strengths on large scales, and easy task. To guide these laboratory efforts, obser- have furthermore also shown that much of the scatter vational studies aim to learn about the nature of the can be traced back to differences in the amounts of H carriers and constrain the set of possible species. Two 2 relative to H I in the line of sight (Herbig 1993; Lan types of such studies that are particularly relevant for et al. 2015). this paper are correlation studies – either mutual DIB Part of the scatter in these correlations must thus correlations, or correlations between the DIBs and line be due to changes in the physical environment that of sight properties (see e.g. Seab & Snow 1984; Herbig drive the carrier abundances. Indeed, the DIBs ex- 1993; Cami et al. 1997; McCall et al. 2010; Friedman hibit clear environmental behavior, and show intensity et al. 2011) – and research into the environmental be- variations that could be explained for instance by ion- havior of the DIBs (e.g. Jenniskens et al. 1994; Cami ization or (de-)hydrogenation (Jenniskens et al. 1994; et al. 1997; Sonnentrucker et al. 1997; Cox et al. 2006). Cami et al. 1997; Sonnentrucker et al. 1997). Interest- The basic idea behind pairwise correlations is simple: ingly, variousparametershavebeenshowntobeindica- If two DIBs arise from the same state in the same car- tive of these environmental conditions. The strength rier,theyshouldhavethesamestrengthratioinalllines ratio between two strong DIBs, λ5797 and λ5780, is of sight and thus, their equivalent widths (EWs) should highly variable and a good indicator of local conditions exhibitaperfectcorrelation. Observationalstudieshave (Krelowski et al. 1997). Using this ratio, diffuse clouds notfoundtwoDIBsthatshowsuchaperfectcorrelation. can typically be subcategorized into two groups – σ The best case is the λ6196 and λ6614 DIBs which cor- and ζ type clouds, named after their prototypes σ Sco relate well in a large sample of sightlines (correlation (HD 147165) and ζ Oph (HD 149757). σ clouds have coefficient r of 0.986; see McCall et al. 2010). How- lower W(λ5797)/W(λ5780) ratios and are characterized ever, even these two DIBs show quite different behavior by stronger UV exposure; ζ clouds, on the other hand, in the remarkable sightline towards Herschel 36 imply- probe deeper layers of diffuse clouds where material is ing that they are most likely originating from different sheltered from UV radiation, and this causes a much carriers (Dahlstrom et al. 2013; Oka et al. 2013). This larger W(λ5797)/W(λ5780) ratio while simultaneously has led to the “one DIB, one carrier” paradigm (Her- affecting the dust properties (Cami et al. 1997). big 1995; Cami et al. 1997; Snow 2014). At the same The large number of DIBs coupled with the lack of time, the notion of DIB “families” can be established: strongcorrelationssuggestthattheDIBscarryanenor- sets of DIBs that correlate fairly well with one another mous diagnostic potential to study the environments in and that might have similar or chemically related carri- which they reside. At the same time, it raises the ques- ers(Krelowski&Walker1987;Camietal.1997). There tion of what factors drive these variations in the DIB are two important caveats though in correlation stud- strengths, and how it is possible that there is such a ies. First, correlation studies generally include only a A PCA of the DIBs 3 lack of correlations in such a large collection of spectral this sample thus ensures a consistent treatment of the lines. Here, we address these questions, and in particu- requiredauxiliaryline-of-sightdataandallowsustoen- lar the key question: how many parameters do we need sureawidecoverageofenvironmentalconditions(tothe to explain the variations in the DIB spectrum and what extent that they can be traced by any of these parame- are those parameters? ters). Tothisend,wepresentamultivariateanalysisofaset WesearchedtheVLT/UVESandELODIEarchivesto of strong and clean DIBs with several line of sight pa- find good-quality, high-resolution spectra of these tar- rameters. In a proof-of-concept study, we first perform gets. UVES (the Ultraviolet and Visual Echelle Spec- a principal component analysis (PCA) on the data to trograph) is a high-resolution instrument on the VLT find out how many parameters are required to describe coveringwavelengthrangesfrom3000-4000˚Aand4200 the observed variations among the DIBs. We physi- - 11,000˚A, with maximum resolutions of 80,000 and callyinterpretthesenewparametersandfindconvenient 110,000, respectively (Dekker et al. 2000). ELODIE is quantitative alternatives to represent these parameters. an echelle spectrograph on the 1.93m telescope at the From this work, the huge diagnostic potential of the Observatoire de Haute-Provence in France. ELODIE DIBs becomes clear: since DIBs are products of their has a resolution of 42,000 and covers the wavelength environments, we can use DIBs to determine physical range from 3906 - 6801˚A (Moultaka et al. 2004). We parameters of their environment – even without identi- foundappropriatedatafor91oftheJenkinstargets: 43 fying the carriers. targets in the UVES database; the remaining 48 from This paper is organized as follows. In Section 2, we the ELODIE database. In a few rare cases, parts of the describe our sample selection and how we acquired our spectrum would be of too low quality, or simply miss- data. We then describe PCA in Section 3, followed by ing from the data, and in those cases we supplemented our results in Section 4. We interpret the results in our data with archival spectra from the ESPaDOns Section 5 and present our conclusions in Section 6. (Echelle spectropolarimetric device for the observation ofstars)instrumentontheCanada-France-HawaiiTele- 2. DATA, OBSERVATIONS & METHODS scope (CFHT) – a bimodal instrument with maximum resolutionsof68,000and81,000foritsspectropolarimet- 2.1. Target Selection ricandnon-polarimetricmodes, respectively, coveringa Our goal in this paper is to determine the parameters wavelength range of 3700 - 10,000˚A (Donati 2003). that drive the variations in the DIB spectrum. Thus, Tofurtherselectonlysingle-cloudlinesofsight,weex- we need to select a sample of sightlines where physical amined the interstellar NaI D lines at 5890 and 5895˚A conditionsarereasonablywelldetermined,andthatrep- and the K I lines at 7665 and 7699˚A (see e.g., Bhatt resent the overall observed variations of the DIB spec- & Cami (2015), illustrated in Figure 1). For our cur- trum. The first requirement implies that we should re- rent study, we consider a sightline to be a single cloud strict ourselves as much as possible to single-cloud lines if these interstellar lines show only one dominant com- ofsighttoavoidhaving todeal withill-definedaverages ponent at the spectral resolution of UVES or ELODIE. throughout multiple intervening clouds. The second re- Thus,atargetwasstillconsideredtobeasinglecloudif quirement stresses the importance of including lines of there were multiple radial velocity components, but one sightthatareasobservationallydifferentaspossible. Fi- component had significantly stronger features than the nally,inordertobeabletorelateanychangestoknown others. For example, HD 149757 is known to have two observables,weneedtopicklinesofsightforwhichaux- strongradialvelocitycomponentsatapproximately−27 iliary data (e.g. hydrogen column densities, E(B−V), and −15 km s−1, but the one at −15 km s−1 has much extinction properties, ...) are known and available. In larger column densities (Herbig 1968, see also Fig. 1). thispilotstudy,wechosetorestrictourselvestoinclude ForthetargetswithonlyELODIEspectra,theKIlines only a limited number of DIBs, and to lines of sight for are outside the available wavelength range, and thus we which we can find high-resolution spectra that allow us could only use the Na I D lines to inspect the number to exclude possible blends with stellar lines. of radial velocity components. An obvious and inherent We started our selection of targets from the thorough drawback of using these lines is that they are easily sat- and detailed study of elemental depletion in the lines urated. We searched for Ca I and CH lines too, but in of sight towards 243 stars published by Jenkins (2009). most cases, these were too weak to be seen. Thisstudycriticallyreviewsavailableliteraturedatafor With this exercise, we established that 33 of our tar- E(B-V), N(H I), N(H ), N(H), and introduces a deple- 2 getscanbeidentifiedassingle-cloudlinesofsight. How- tion strength factor, F , describing the collective level (cid:63) ever, for three of them, the UVES archival spectra have of elemental depletion in a line of sight. Starting from 4 Ensor et al. HD22951 1.0 00..68 Flux 0.4 0.2 K I at 7699Å 0.0 1.0 0.8 5789 5793 5797 5801 5805 5795 5799 5795 5799 malized flux 10000.....00246 K I at 7665Å Fthieguλr5e7927.DIlIluBstfroartitohneotfaWmragveeealtensguHthr D(iÅn)g22t9h5e1e.qu(Liveaflte:n)tIwnidathfeao-f or 0.8 N 0.6 tureless part of the spectrum near the feature, we measured 0.4 the S/N by determining the standard deviation of flux val- 0.2 Na I D at 5895Å 0.0 ues across a straight line. The blue parallel lines show the 1.0 0.8 mean flux value and the ±1σ values. (Center:) We chose a 0.6 point on either side of the feature, and determined a linear 0.4 0.2 Na I D at 5890Å continuumbetweenthetwo(showninred). Thechoseninte- 0.0 −45 −30 −15 0 15 30 45 grationlimitsareshownasdottedlines. (Right:) Thisfigure Velocity (km s(cid:60)1) showsthe1,000instancesofthecontinuumslopesgenerated Figure1. Interstellarlinesforoneofourtargets,HD149757 throughourMonteCarlosimulation,superimposedoverthe in the heliocentric rest frame. Here, the Na I lines show a spectrum. dominantcomponentatapproximately-15kms−1,aswellas a second component that is much weaker near -27 km s−1. ture, if not available. These velocities are listed in col- The Na I lines are somewhat saturated, but the K I lines umn 14 of Table 1. confirm that there is just a single, dominant radial velocity component. This star meets our criterion for a single-cloud 2.2. DIB Measurements line of sight. Note that the feature near +27 km s−1 in the K I (7665 ˚A) plot is a telluric oxygen line and not of For our purposes, we only wanted to include those interstellar origin. DIBs whose equivalent widths can be confidently measured, i.e., with small relative errors. This limits our a gap in the wavelength coverage from approximately selection to fairly strong and often narrow DIBs that 5760-5830˚A and thus two important DIBs, λ5780 and are as much as possible free of contamination from stel- λ5797 are missing. We therefore excluded these targets lar lines. The only exceptions we made was to include from our data set. After these considerations, we were four of the C2 DIBs – λλ4964, 5513, 5546, and 5769 leftwithasampleof30single-cloudlinesofsight. These – despite the latter three being quite weak. We thus targetsarelistedinTable1alongwiththeirlineofsight include a sample of 16 DIBs in our analysis: λλ4428, parameters we will use in this paper (see below). 4964, 5494, 5513, 5545, 5546, 5769, 5780, 5797, 5850, It should be noted that six of the targets in our sam- 6196, 6270, 6284, 6376, 6379, and 6614. ple – HD 23630, HD 24534, HD 110432, HD 149757, We measured the equivalent widths for these 16 DIBs HD164284,andHD202904–areBestars. Suchobjects inalllinesofsight. Mostofthesewerestraightforwardto have an intrinsic E(B-V) value relative to non-Be stars measure, as they are not heavily contaminated by stel- of the same spectral type (Schild 1978; Sigut & Patel lar and/or telluric features. The largest uncertainty on 2013); the hot, circumstellar gas produces excess emis- these measurement stems from establishing the contin- sion in the V filter and therefore, their E(B-V) values uum. To obtain a good estimate of these uncertainties, are over-estimated. Furthermore, any dust existing in we used the following Monte Carlo approach to vary the circumstellar shell (CS) can contribute to E(B-V), the continuum level and perform direct integration of while there is no evidence to suggest that DIBs exist the spectra (similar to that in Bhatt & Cami (2015); inCSenvironments(Krel(cid:32)owski&Sneden1995;Snow& see Figure 2 for illustration). First, we measured the Wallerstein1972;Snow1973). Hence,weexpectweaker- standard deviation of the flux values over a specified, than-normal DIB strengths relative to E(B-V) for these featureless range of data in the vicinity of the feature six targets. to estimate the uncertainty on the flux values – i.e. we Weappliedaheliocentriccorrectiontoalltargets,and measured the signal-to-noise ratio (S/N). Next, we se- then shifted all spectra to their interstellar rest frames. lected a point on each side of the feature and defined a Interstellarvelocitycomponentswereobtainedfromthe continuum baseline by adopting a linear continuum be- literature, or measured from a known interstellar fea- tween those two points. We also selected points as our integrationlimits;notethatforconsistency,weusedthe A PCA of the DIBs 5 sameintegrationlimitsforthesamefeatureinalllinesof the best-fit values and upper or lower envelope values sight. Tosimulatetheprocessofdeterminingthecontin- and determined the uncertainties on EW through error uum,wevariedthetwoselectedcontinuumpoints1,000 propagation. timesbyaddingarandomnumbertothefluxvalues selected from a normal distribution with a mean of zero and standard deviation equal to 1 the measured stan- 3 dard deviation in the featureless continuum; we believe 400 that this represents well how accurately one can posi- tion the continuum (which corresponds to determining themeanfluxintheadjacentcontinuum),andwefound that this produces reasonable continuum estimates (see 300 Figure 2). Using one full standard deviation produces per a p many continuum points which are clearly too high or s hi too low and thus result in unrealistic continuum levels. 0), T200 In essence, we thus simulated the entire process of de- 78 5 (cid:104) termining a continuum line 1,000 times. For each con- W( tinuum, we then measured the equivalent width. The 100 equivalent width we use in this paper is then the mean of these 1,000 measurements, and the standard deviation of these measurements provides the uncertainty. 0 We kept a few precautions in mind when using this 0 100 200 300 400 W((cid:104)5780), Friedman et. al 2011 method. For instance, the sharp and narrow λ5797 is known to be blended with the broader and shallower λ5795. To avoid measuring a contribution from λ5795, wemeasuredtheλ5797whiletreatingtheλ5795ascon- 140 tinuum (see Figure 2). If bad pixels were found within theintegrationrange,thatdatapointwasreplacedwith 120 the average of the neighboring points. For HD 23180, the λ5494 feature was strongly contaminated by a stel- er100 p larline. Wecouldnotresolvethetwofeaturestoobtain pa s a proper measurement, so instead we adopted the value Thi 80 obtainedfromhigherresolutionobservationsbyBondar 14), 66 60 (2012). (cid:104) W( The two broad DIBs in our sample – λ4428 and 40 λ6284 – cannot be measured using the methods de- scribed above, because they are heavily contaminated 20 by stellar and telluric features, respectively. For the λ6284 DIB, we first applied a telluric correction using 0 0 20 40 60 80 100 120 140 molecfit (version 1.1.0) (Smette et al. 2015; Kausch W((cid:104)6614), Friedman et. al 2011 et al. 2015) and then proceeded as for the other DIBs. The λ4428 DIB is extremely broad, and the region Figure 3. A comparison of our measured λ5780 and λ6614equivalentwidthvalues(verticalaxes)tothosefound spannedbythisDIBisplaguedbystellarfeatures. Snow in Friedman et al. (2011) (horizontal axes). The two data etal.(2002)showedthattheintrinsicprofileoftheband sets agree very well; the correlation coefficient between our is Lorentzian, and thus, rather than numerically inte- W(λ5780) values is r = 0.995, while that of W(λ6614) is grating the DIB profile, we preferred to fit a Lorentzian r=0.993. profile to the observations and determine the equivalent widths from the fitted parameters. In addition to We compared our measurements to values found in our best fit, we also determined Lorentzians that repre- the literature whenever possible and found a generally sent the upper and lower envelope of the observed pro- good agreement. For instance, 23 out of our 30 lines files; these then have a different full-width-at-half-max of sight were also studied by Friedman et al. (2011) and (FWHM) and central depth (CD) value (while we kept wefoundaverygoodcorrelationbetweentheirreported the same continuum). We found the difference between EW values and ours (see e.g. Figure 3). 6 Ensor et al. All of our equivalent width measurements are shown molecular hydrogen, f(H ), for each line of sight from 2 in Table 5 in the appendix to this paper. N(H ) and N(H). 2 Similarly, it has long been known that the ratio W(λ5797)/W(λ5780) is somehow related to the physical conditions in a region of space (Krelowski et al. 2.3. Line of Sight Parameters 1997; Cami et al. 1997); more specifically, it is also Tohaveabetterchanceofunderstandingwhatdrives suggested to trace UV exposure, where larger values of the variations in the DIB strengths, we need to include theW(λ5797)/W(λ5780)ratiocorrespondtomoreshel- otherparametersthatoffersomedescriptionofthelines tered (ζ-type) environments. We thus also include the of sight. The Jenkins (2009) study provides a critically W(λ5797)/W(λ5780) ratio as a line of sight parameter reviewed set of measurements that describe the lines of in our study. sight we study here and we reproduce some of their val- Oursampleisquitediverseintermsoff(H ): reported 2 ues in Table 1. f(H ) values in diffuse clouds run from 0 to ∼0.8 (Snow 2 There are two quantities that are related to the dust &McCall2006); oursamplecoversasimilarrangefrom in the line of sight. The amount of dust is traditionally 0 to 0.824. Our sample also covers a broad range of characterized by using the color excess, E(B-V), and is physicalconditions,asitincludesbotharchetypalσ and often used as a normalization factor for DIB strengths. ζ sightlines. For comparison, our W(λ5797)/W(λ5780) We adopt an uncertainty of ±0.02 mag for our E(B-V) values range from 0.10 to 0.65 whereas those calculated measurements. Jenkins(2009)furthermoreshowedthat from EWs in a large sample of 133 stars by Friedman the depletion of different elements in a line of sight can et al. (2011) range from 0.15 to 0.85. Furthermore, bedescribedbyasingleparameter,F ,thatisameasure (cid:63) the boundary between σ and ζ clouds is usually placed forthe“totaldepletion”; individualelementaldepletion around W(λ5797)/W(λ5780)=0.3 and therefore, we in- factors scale with F . Since depletion may play a role (cid:63) clude a large sample of both cloud types. For F we (cid:63) intheformationand/ordestructionoftheDIBcarriers, similarly cover much of the possible range. Jenkins de- we include it here in our analysis. fines F such that a line of sight with minimal deple- (cid:63) Information on the gas in the line of sight stems from tions has a value of 0, and the −15 km s−1 component measurements of hydrogen, in particular the column of HD 149757 – the archetype for large depletions – has densitiesofneutralandmolecularhydrogen,N(HI)and avalueof1.0. OurvaluesforF arefoundbetween0.37 (cid:63) N(H ), respectively. Note that total hydrogen column 2 and1.19. ForE(B-V),thesituationisslightlymorecom- densities N(H) are not listed, but can easily be calcu- plicated. Because we restrict ourselves to single clouds, lated as N(H) = N(H I) + 2N(H ). For HD 23630, a 2 we are biased towards objects of lower reddening. In- reliable N(H I) value could not be obtained by Jenk- deed, our E(B-V) values range from 0.00 to 0.47, while ins; consequently, he uses synthetically derived N(H) it is not uncommon for multiple-cloud sightlines to ex- and F values, which we adopt for this paper. We use (cid:63) ceed 1. In particular, E(B-V) values below 0.08 can thissyntheticN(H)valueandthemeasuredN(H )value 2 be problematic; below this threshold, H does not exist 2 to derive a synthetic N(H I) from N(H) − 2N(H ). For 2 in appreciable amounts (Savage et al. 1977) and many two other targets, HD 27778 and HD 202904, Jenkins DIBs similarly fall below the limit of detection. Within provides only upper limits and best values for N(H I). our sample there are six of these low-E(B-V) targets: We take the lower limit to be zero in both cases. For HD23630,HD24760,HD35715,HD36822,HD143275, N(H), he lists only upper and lower limits. We adopted and HD214680. Studies by Kumar et al.(1982), Feder- N(H) = N(H I) + 2 N(H ) for these targets, best best 2 best manetal.(1984),andKumar(1986)confirmthatstrong as well as synthetic F values. For HD 35149, only an (cid:63) DIBs such as λ5780, λ5797, λ6284, and λ6614 can be upperlimitforN(H )isprovided. Inthiscase,N(H )is 2 2 detected toward such objects. Moreover, Galazutdinov quite small, so Jenkins takes N(H) = N(HI) and calcu- et al. (1998) confirm that W(λ5797)/W(λ5780) is still latesF normally. BecauseN(H )issosmallinthiscase, (cid:63) 2 sufficienttodifferentiatebetweenσ andζ environments, wetakeJenkins’valuetobethebestvalue,setthelower despitetheweakreddening. SomeweakerDIBsmaynot limit equal to zero, and set the upper limit equal to the appear in these poorly reddened objects; in such cases, bestvalue. Ithasbeensuggestedthatf(H )canbeused 2 sightlines having larger E(B-V) values will still be ob- as an indicator for the amount of interstellar UV radi- servable and will dominate the overall trends. ation that can penetrate since H is dissociated when 2 not shielded (see e.g. Cami et al. 1997; Sonnentrucker et al. 1997). This could be an important parameter in our study, and we therefore calculated the fraction of A PCA of the DIBs 7 g a E E E E E E E E E E E E E E E E E E E E E E m ta rce DI DI DI DI DI DI DI DI DI ES DI DI DI DI ES ES ES DI ES ES ES DI ES DI DI DI DI DI DI DI 2 Da Sou ELO ELO ELO ELO ELO ELO ELO ELO ELO UV ELO ELO ELO ELO UV UV UV ELO UV UV UV ELO UV ELO ELO ELO ELO ELO ELO ELO ±0.0 949) of (1 aRef 1 1 2 2 2 5 2 1 2 2 1 1 3 2 3 2 2 2 2 2 2 1 6 2 4 2 2 1 1 1 nty ams ai d t A ] r avISM −1kms -9.58 12.47 13.45 16.76 14.54 14.5 7.06 11.2 15.22 24.09 25.2 25.53 25.2 15.29 6.8 -10.90 -8.95 -8.49 -6.26 -8.02 -14.98 -15.32 -12.9 -10.04 -12.90 -15.28 -11.39 -9.2 -9.44 -12.65 nunce 7);(6) [ a 0 2 2 4 4 2 4 2 1 3 4 4 4 4 4 1 2 1 1 1 3 4 2 1 1 5 1 1 2 2 1 me (20 W(λ5797)W(λ5780) 0.30±0.0 0.35±0.0 0.65±0.0 0.16±0.0 0.55±0.0 0.62±0.0 0.18±0.0 0.26±0.0 0.43±0.0 0.20±0.0 0.10±0.0 0.19±0.0 0.48±0.0 0.20±0.0 0.25±0.0 0.19±0.0 0.11±0.0 0.18±0.0 0.13±0.0 0.27±0.0 0.50±0.0 0.15±0.0 0.26±0.0 0.24±0.0 0.13±0.0 0.53±0.0 0.31±0.0 0.34±0.0 0.17±0.0 0.28±0.0 weassu eretal. e k r c F(cid:63) 0.37±0.09 0.73±0.05 0.84±0.06 0.89±0.10 0.88±0.05 0.90±0.06 0.68±0.04 0.83±0.02 1.19±0.07 0.54±0.11 0.66±0.11 0.74±0.08 0.57±0.04 0.49±0.04 1.17±0.11 0.90±0.03 0.81±0.02 0.80±0.11 0.76±0.06 1.09±0.08 1.05±0.02 0.89±0.18 1.02±0.11 0.81±0.05 0.39±0.11 0.90±0.03 0.57±0.26 0.50±0.06 0.68±0.10 0.60±0.06 2009)whe Sonnentru ( ) s 5 6 n ( f(H)2 +0.090.22−0.06 +0.270.35−0.18 +0.330.51−0.24 +0.230.28−0.15 +0.460.59−0.31 +0.130.76−0.11 +0.250.21−0.14 +0.210.35−0.15 +0.450.82−0.27 +0.000.02−0.02 −4±2×10 +0.040.06−0.03 +0.040.04−0.02 +0.100.12−0.07 +0.130.55−0.11 +0.030.03−0.02 +0.020.10−0.02 +0.080.12−0.07 +0.040.05−0.03 +0.090.15−0.07 +0.200.63−0.17 +0.180.25−0.20 +0.220.58−0.18 +0.270.42−0.20 +0.120.11−0.10 +0.050.28−0.05 +0.120.16−0.09 +0.030.06−0.03 +0.100.13−0.07 +0.190.24−0.13 fromJenkigen. al.(1994); Table1.Basictargetdata. DECVE(B-V)N(H)N(H)I2 21−220−22000][10cm][10cm] +0.57+0.263257.67.860.241.291.86−0.40−0.12 +0.35+1.485754.14.980.191.102.88−0.32−0.98 +0.26+1.641717.73.860.220.763.98−0.23−1.16 +0.10+0.180618.52.870.050.220.35−0.07−0.12 +0.06+2.405301.12.880.270.634.68−0.07−1.59 +0.08+0.800245.06.100.310.548.32−0.07−0.73 +0.05+0.270036.82.900.070.250.33−0.05−0.15 +0.26+1.404727.74.040.261.293.39−0.24−0.99 +0.55+1.061803.66.330.340.225.25−0.22−0.88 +0.12+0.003240.05.000.080.430.03−0.13−0.03 +0.13−60544.44.600.030.316±2×10−0.13 +0.13+0.092922.54.400.070.650.21−0.12−0.06 +0.16+0.085603.03.300.100.600.13−0.16−0.05 +0.16+0.315714.14.820.100.790.54−0.15−0.20 +0.29+0.420331.05.320.390.714.37−0.21−0.38 +0.29+0.153718.12.290.001.410.26−0.29−0.09 +0.12+0.104819.62.620.181.230.68−0.11−0.09 +0.56+0.322738.54.130.201.170.78−0.59−0.23 +0.90+0.253534.02.910.312.190.62−0.87−0.18 +0.98+1.532648.75.020.374.273.72−0.80−1.09 +0.02+0.903401.52.580.290.524.47−0.04−0.75 +0.23+0.292207.04.780.110.420.71−0.39−0.21 +0.59+1.474745.05.760.381.077.24−0.47−1.22 +0.84+3.060650.94.860.432.047.41−0.63−2.17 +0.21+0.075348.84.430.090.230.14−0.23−0.05 +0.59+0.652738.05.960.473.396.76−0.50−0.59 +0.41+0.621647.35.110.271.291.20−0.38−0.41 +0.14+0.050301.04.880.080.500.17−0.15−0.04 +0.20+0.221331.65.230.060.580.43−0.18−0.14 +0.28+0.732511.14.840.160.891.41−0.26−0.48 AD.V,E(B−V),N(H),N(H)andFvaluesareI2(cid:63)/[N(H)+2N(H)]isthefractionofmolecularhydro2 thedominantinterstellarcomponent. Hobbs(2001);(3)Bhatt&Cami(2015);(4)Weltyet [J +52 +33 +32 +24 +31 +31 +40 +35 +24 +03 +03 +09 +09 +25 −63 −22 −19 −19 −25 −23 −10 +04 −10 +46 +34 +62 +62 +39 +40 +59 MBH)2 yof & 0] 9.81 2.65 9.13 9.08 7.92 3.08 1.23 7.90 9.76 0.00 0.23 9.24 8.28 9.66 0.27 0.01 6.23 9.74 1.32 5.10 9.54 5.80 5.69 6.29 5.08 3.28 8.79 5.68 8.65 6.82 romSI=2N( velocit Welty RA [J200 02275 03422 03441 03472 03540 03552 03575 03585 04235 05225 05265 05344 05350 05575 12425 16002 16052 16115 16211 16253 16370 18001 18312 20485 21175 21445 22050 22391 22412 23063 takenff(H)2 tothe per;(2) Alt. Name 40Per oPer ηTau ζPer XPer (cid:15)Per ξPer 62Tau 23Ori ΨOri 1ϕOri λOriA 139Tau 2BZCru 5δSco 17βSco 2νSco 5σSco 3ρOphA 7ζOph 466Oph 0 855Cyg 4υCyg 8 519Cep 010Lac 312Lac 61Cas andDECareB−V)values. referencerefer s—(1)Thispa 7 1 0 0 8 4 0 2 8 9 5 2 1 1 3 7 1 0 6 3 5 8 4 7 0 9 7 8 9 7 A( d e Target HD1513 HD2295 HD2318 HD2363 HD2439 HD2453 HD2476 HD2491 HD2777 HD3514 HD3571 HD3682 HD3686 HD4011 HD1104 HD1432 HD1442 HD1455 HD1471 HD1479 HD1497 HD1642 HD1707 HD1984 HD2029 HD2071 HD2099 HD2146 HD2149 HD2183 Note—RontheEaValuean Referenc 8 Ensor et al. 3. PRINCIPAL COMPONENT ANALYSIS ence frame, the largest amount of variance in the data occurs along the axis defined by the first PC, and the Several parameters are known to contribute to DIB second PC accounts for as much of the remaining varia- strength;however,mostoftheseparametersareinterre- tion in the data as possible, with the constraint that it lated (e.g., a larger E(B-V) implies larger column den- is orthogonal to PC . Each successive PC has the same sities of all species). For this work, we want to extract 1 properties: they are linear combinations of the original a set of uncorrelated parameters that best describe ob- variables, and each accounts for as much of the remain- served DIB strengths. To achieve this goal, we perform ingvariationaspossible,whilebeingorthogonalto(and a principal component analysis (PCA). thus, uncorrelated with) all previous PCs. PCA is a statistical technique used to analyze high- The entire coordinate transformation can be written dimensional data. The objectives of PCA are two-fold: in matrix notation: (1) to reduce the number of dimensions in a data set, and (2) to identify hidden patterns in a set of data. Y =AX (3) We will primarily focus on the former in this analysis. where X is an n×m matrix containing the original set For derivations of PCA see Pearson (1901) or Hotelling of data; A is the n×n transformation matrix; and Y is (1933); formodernreviews,seeAbdi&Williams(2010) ann×mmatrixcontainingthetransformeddatapoints or Jolliffe (2014); for uses of PCA in astronomy see e.g. in the new reference frame defined by the set of PCs1. Yip et al. (2004); Suzuki (2006); Conselice (2006); Bu- The PCA obtains the transformation matrix A from davári et al. (2009); Pâris et al. (2011). the eigenvalues (λ , λ , ..., λ ) of the covariance ma- InthisSection,wefirstdefinesomeoftheterminology 1 2 n trix of the original variables. The rows of this matrix and notation associated with PCA and provide a basic A are the corresponding eigenvectors; each eigenvector mathematical overview of the process. We then discuss describing the projections of the original variables onto some of the problems and underlying assumptions as- the new coordinate system defined by the PCs. The sociated with PCA, and how we address each of them. eigenvalues are furthermore ordered such that λ is the Finally, we provide a simple example using two vari- 1 largest and λ is the smallest with (cid:80)nλ = n. Each ables to better illustrate our methodology and facilitate n i i eigenvalue indicates how many original variables a sin- the interpretation of our results. gle PC can account for, i.e., an eigenvalue of 2 indicates 3.1. Definitions and Terminology that the corresponding PC carries the same weight as The starting point of our analysis is a set of n vari- two of the original variables. ablesforwhichwehavemmeasurementsx ;inourcase, One of they key features of a PCA is the ability to i we use n = 23 variables (representing DIB equivalent reduce the number of dimensions in a data set. In- widths and line of sight parameters) that are measured deed, it is often the case that we can use the results for m=30 lines of sight. Our measurements thus span of a PCA to accurately describe the variation in a data ann-dimensionalparameterspace–eachdimensioncor- set with a reduced number of variables. As such, PCs respondingtoadifferentvariable–andafullsetofmea- that account for very little variation are often ignored, surementsforasinglelineofsightcanberepresentedby and only the leading p components are kept. Selecting an n×1 vector in this parameter space. Starting from a value for p is somewhat arbitrary, but the decision is thissetofn possiblycorrelated(andthusnotnecessarily usually based on one of three criteria: (1) once a cer- orthogonal) variables, a PCA finds a coordinate trans- tainamountofvariationisaccountedfor(e.g.,90%),all formation that casts the variables in terms of a new, further PCs are ignored; (2) only PCs with eigenvalues orthogonal, n-dimensional reference frame: greater than one are kept (representing variables that can replace more than one of the original variables); or, y nˆ(cid:48) =a x nˆ +a x nˆ +...+a x nˆ (1) i i i,1 1 1 i,2 2 2 i,n n n (3) if an “elbow” (a sharp, sudden drop-off) is noticed where the nˆ represent unit vectors in the original pa- in a so-called “screeplot” (i.e., a plot of eigenvalue vs. i rameter space and nˆ(cid:48) unit vectors in the new reference componentnumber, seee.g. Fig.6), allPCsbeyondthe i frame – these are the so-called principal components elbow are ignored (Jolliffe 2014). If p components are (PCs). The coefficients, a , are constrained such that kept, then the last n−p rows of A are eliminated, and i,j their squared sums equal one: Ybecomesap×mmatrix. Atthatpoint,weeffectively express the original n variables with only p new vari- a2 +a2 +...+a2 =1 (2) i,1 i,2 i,n The coordinate transformation is chosen such that 1 NotethatinthePCAanalysisbySuzuki(2006), theirEq.1 when representing the measurements in the new refer- correspondstotheinverseofourEq.3. A PCA of the DIBs 9 ables, while keeping as much of the variance in the data the value was replaced with zero. For each of these new as desired. datasets,westandardizedthevariables2andperformed a separate PCA. The result is 1,000 unique transforma- 3.2. Treatment of Data tionmatrices. Foreachentryinthetransformationma- Thevariablesxi thatwewanttoincludeinouranaly- trix, we found the mean (a¯i,j) and standard deviation sis are listed in column 1 of Table 2. Since PCA is con- (sai,j) among the 1,000 unique matrices. Then we com- cernedwithlinearcombinationsofvariables,itisredun- puted the upper and lower limit on each value in the dant to include N(H) in addition to N(HI) and N(H2); matrix A according to a¯i,j ±sai,j. We compared these however, whereas N(H I) and N(H ) measure the col- upper and lower limits to the original, unperturbed re- 2 umn densities of individual species, N(H) approximates sultsandultimately,wereleftwithpositiveandnegative thetotalamountofgasinthelineofsight. Recognizing errors on each entry in the transformation matrix (i.e., this distinction, we choose to keep N(H) in addition to each component of each eigenvector). We applied the N(HI) and N(H ). same methods to obtain uncertainty measurements on 2 There are a few underlying assumptions associated each entry in Y, the matrix of transformed data points. with PCA, which we will discuss. The first is that A third assumption associated with PCA is that each raw data is comparable in units and magnitude; other- variable follows a normal distribution. We tested our wise,theresultsofPCAwillbemorestronglyinfluenced variables for normality and found that in some cases, by variables that have larger variances (i.e., variations the distributions were not normal. The assumption of among N(H) values which are on the order of 1021 will normality,however,isnotcriticalandtheresultsofPCA completely dominate those of E(B-V), which are on the should still be valid if this condition is not met, so we orderof10−1). Toaddressthisproblem,westandardize decided to continue with the analysis. each variable prior to performing PCA: Finally, the results of PCA can be sensitive to out- liers. Despite including some unusual targets (e.g., x −x¯ z = i,j i (4) HD 147933 is found within a dark cloud, and HD 36822 i,j sxi and HD 36861 are found near H II regions), we do not remove any targets from our sample. In order to probe where i refers to the ith variable and j refers to the awiderangeofinterstellarconditions,itisimportantto jth observation. Essentially, we subtract the mean and include these abnormal lines of sight: Any conclusions divide the residuals by the standard deviation, and use made about the DIBs through this analysis should ex- the resulting variables z as input to the actual PCA. i,j tendtothesesightlinesaswell. Theonlypotentialcom- Columns2and3ofTable2showthemeanandstandard plication is for the six Be stars discussed in Section 2.1. deviation of each of our variables, respectively. After TheatypicalE(B-V)valuesarisefromcircumstellarma- this process, the standardized variables (z , z , ... z ) 1 2 23 terialandarethereforenotapartoftheinterstellarDIB each have a mean of zero and a standard deviation of environment. Weacknowledgethattheycouldinfluence one. ourresultsandthereforewedifferentiatethesepointsin AsecondassumptionofPCAisthatallvaluesareac- the plots and analyses that follow. curately known. This assumption is challenged by the fact that our measurements have uncertainties. Since PCA does not consider uncertainties, we need a way to quantifyhowreliableourresultsare. SincethePCsrep- resent vectors in a multi-dimensional parameter space, this is not straightforward. We therefore used a Monte Carlo (MC) simulation to 3.3. A Simple 2D Example estimate the uncertainties on our results. We first per- It is insightful to illustrate our methodology by performed PCA using our measured values. These results forming a PCA on a simple, two-dimensional dataset; provide the PCs that we discuss in the sections that we therefore performed a PCA analysis using only the follow. Next, we generated 1,000 perturbed data sets, measurementsforthevariablesE(B-V)andN(H).Using by adding random noise to each observation, selected the notation provided in Table 2, we denote these vari- from a normal distribution with a mean of zero and a ables x and x and their corresponding standardized standard deviation corresponding to the error bar on 17 20 eachmeasurement. Bydefinition, noneofourmeasured quantities can be negative, so if a perturbed measure- 2 Notethatwerecalculatedthemeansandstandarddeviations ment was negative after the addition of random noise, foreachperturbeddatasetbeforestandardizingthevariables. 10 Ensor et al. Table2. InputvariablesusedinthePCA,andtheir Table 3. Principalcomponents, eigenvaluesand relative mean values and standard deviations. importanceofeachPCfora2DexampleinvolvingE(B-V) and N(H). Variable Mean Standard PC Eigen- % Cumulative Eigenvector Name Deviation value Variation % (x ) (x¯ ) (s ) i i xi 1 1.813 90.63 90.63 (0.707, 0.707) x =W(λ4428) 645.9 350.9 1 2 0.187 9.37 100.00 (0.707, -0.707) x =W(λ4964) 6.8 5.7 2 x =W(λ5494) 5.6 4.1 3 x =W(λ5513) 3.9 3.9 4 x =W(λ5545) 5.9 4.3 5 Fromthis,thePCAresultsinthefollowing2×2trans- x6 =W(λ5546) 2.8 2.2 formation matrix A: x =W(λ5769) 2.6 2.2 7     a a 0.707 0.707 x8 =W(λ5780) 131.5 77.0 A= 1,1 1,2=  (6) x9 =W(λ5797) 37.7 27.8 a2,1 a2,2 0.707 −0.707 x10 =W(λ5850) 15.6 13.6 as well as the 2×30 matrix, Y, containing the trans- x =W(λ6196) 13.5 8.3 formed data points: 11 x =W(λ6270) 20.7 13.9 12     x13 =W(λ6284) 149.4 84.1 PC1 0.279 0.011 ... −0.437 Y = =  (7) x =W(λ6376) 9.3 7.4 14 PC 0.107 −0.161 ... −0.034 2 x =W(λ6379) 24.7 18.0 15 Note that we use the notation PC to describe the x =W(λ6614) 52.5 35.2 i 16 measurement values in the new coordinate frame de- x17 = E(B-V) 0.20 0.13 scribed by the PCs. That is, PC = y in Eq. (1). The i i x18 = N(HI) 1.0×1021 9.2×1020 results pertaining to the PCs for this example are sum- x = N(H ) 2.4×1021 2.6×1020 marizedintable3. Column1liststhePCnumber,while 19 2 Column2showstheeigenvalueassociatedwiththatPC. x = N(H) 1.5×1021 1.2×1021 20 Since there are two original variables, the eigenvalues x21 = f(H2) 0.27 0.23 sumto2. Thefractionofthetotalvariationinthedata x = F 0.75 0.21 for which each PC is responsible is shown in Column 3, 22 (cid:63) x = W(λ5797) 0.2907 0.1567 followed by the cumulative fraction in Column 4. The 23 W(λ5780) (unit length) eigenvectors are shown in Column 5. Note—Entries for i=1,2,...16 refer to equivalent Figure 4 illustrates these results in two ways. In the widths of named features in m˚A. Units for the top panel, we show the original data set along with the remaining entries are those specified in Table 1. PC vectors (i.e., the eigenvectors corresponding to PC Standardized variables z correspond to the x , 1 i i and PC ). From this figure, it is clear that the vector e.g. z isthestandardizedequivalentwidthofthe 2 8 PC liesinthedirectionofmaximumvariance. Fromta- λ5780 DIB. 1 ble3,weseethatinfactalmostallofthevariationinthe dataset(>90%)isaccountedforbyPC . Thisindicates 1 thatasinglevariablecanadequatelydescribevariations among E(B-V) and N(H) – an interstellar finding that formsasz andz . The2×30matrixof(standardized) 17 20 iswellknown(e.g.Bohlinetal.1978,seealsodiscussion measurements X from Equation 3 is then: below). In the bottom panel of Figure 4, we illustrate the results in the form of a so-called “biplot”. Here, we show thetransformeddatapoints(i.e.,thoseofmatrixY)and     z17 0.273 −0.106 ... −0.333 the projections of the original variables onto the PC1- X =  =  (5) PC plane. This format is much more convenient for 2 z 0.121 0.121 ... −0.285 20

A Principal component analysis of the diffuse interstellar bands PDF

2.4 MB·

by Tiffany Ensor

#journals #arxiv

Checking for file health...

Download

Upgrade Premium

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Download A Principal component analysis of the diffuse interstellar bands PDF Free - Full Version

by Tiffany Ensor| 2.4

Download A Principal component analysis of the diffuse interstellar bands by Tiffany Ensor in PDF format completely FREE. No registration required, no payment needed. Get instant access to this valuable resource on PDFdrive.to!

Free Download PDF

About A Principal component analysis of the diffuse interstellar bands

No description available for this book.

Detailed Information

Author:	Tiffany Ensor
File Size:	2.4
Format:	PDF
Price:	FREE

Download Free PDF

Safe & Secure Download - No registration required

Why Choose PDFdrive for Your Free A Principal component analysis of the diffuse interstellar bands Download?

100% Free: No hidden fees or subscriptions required for one book every day.
No Registration: Immediate access is available without creating accounts for one book every day.
Safe and Secure: Clean downloads without malware or viruses
Multiple Formats: PDF, MOBI, Mpub,... optimized for all devices
Educational Resource: Supporting knowledge sharing and learning

Frequently Asked Questions

Is it really free to download A Principal component analysis of the diffuse interstellar bands PDF?

Yes, on https://PDFdrive.to you can download A Principal component analysis of the diffuse interstellar bands by Tiffany Ensor completely free. We don't require any payment, subscription, or registration to access this PDF file. For 3 books every day.

How can I read A Principal component analysis of the diffuse interstellar bands on my mobile device?

After downloading A Principal component analysis of the diffuse interstellar bands PDF, you can open it with any PDF reader app on your phone or tablet. We recommend using Adobe Acrobat Reader, Apple Books, or Google Play Books for the best reading experience.

Is this the full version of A Principal component analysis of the diffuse interstellar bands?

Yes, this is the complete PDF version of A Principal component analysis of the diffuse interstellar bands by Tiffany Ensor. You will be able to read the entire content as in the printed version without missing any pages.

Is it legal to download A Principal component analysis of the diffuse interstellar bands PDF for free?

https://PDFdrive.to provides links to free educational resources available online. We do not store any files on our servers. Please be aware of copyright laws in your country before downloading.

The materials shared are intended for research, educational, and personal use in accordance with fair use principles.