ebook img

Using machine learning to classify the diffuse interstellar bands PDF

8.3 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Using machine learning to classify the diffuse interstellar bands

Mon.Not.R.Astron.Soc.000,000–000(0000) Printed5May2015 (MNLATEXstylefilev2.2) Using machine learning to classify the diffuse interstellar bands Dalya Baron1(cid:63), Dovi Poznanski1 , Darach Watson2, Yushu Yao3, 5 † 1 Nick L. J. Cox4,5, & J. Xavier Prochaska6 0 2 1School of Physics and Astronomy, Tel-Aviv University, Tel Aviv 69978, Israel. 2Dark Cosmology Centre, Niels Bohr Institute, University of Copenhagen, Juliane Maries Vej 30, DK-2100 Copenhagen, Denmark. y 3Lawrence Berkeley National Lab, 1 Cyclotron Road, Berkeley CA 94720, USA. a 4Universite´de Toulouse, UPS-OMP, IRAP, 31028, Toulouse, France. M 5CNRS, IRAP, 9 Av. colonel Roche, BP 44346, F-31028 Toulouse, France. 6Department of Astronomy and Astrophysics & UCO/Lick Observatory, University of California, Santa Cruz, CA 95064, USA. 2 ] A 5May2015 G . ABSTRACT h p - Using over a million and a half extragalactic spectra from the Sloan Digital Sky o SurveywestudythecorrelationsoftheDiffuseInterstellarBands(DIBs)intheMilky r Way. We measure the correlation between DIB strength and dust extinction for 142 t s DIBs using 24 stacked spectra in the reddening range E(B V) < 0.2, many more a − linesthaneverstudiedbefore.MostoftheDIBsdonotcorrelatewithdustextinction. [ However, we find 10 weak and barely studied DIBs with correlations that are higher 2 than 0.7 with dust extinction and confirm the high correlation of additional 5 strong v DIBs. Furthermore, we find a pair of DIBs, 5925.9˚A and 5927.5˚A, which exhibits 1 significant negative correlation with dust extinction, indicating that their carrier may 3 be depleted on dust. We use Machine Learning algorithms to divide the DIBs to 6 spectroscopicfamiliesbasedon250stackedspectra.Byremovingthedustdependency 4 we study how DIBs follow their local environment. We thus obtain 6 groups of weak 0 DIBs, 4 of which are tightly associated with C or CN absorption lines. . 2 1 0 Key words: ISM:general–ISM:linesandbands–ISM:molecules–dust,extinction 5 – surveys – techniques: spectroscopic – astrochemistry 1 : v i X 1 INTRODUCTION et al. (1985) suggested fullerenes. Motylewski et al. (2000), r Krel(cid:32)owski et al. (2010) and Maier et al. (2011) suggested a The diffuse interstellar bands (DIBs) are unidentified ab- differenthydrocarbonmoleculesandions(HC N+,HC H+, sorption features found in optical and near infrared stel- 5 4 and linear C H respectively). lar spectra. Since their discovery in 1919 by Heger (1922) 3 2 and the establishment of their interstellar medium (ISM) SincethehundredsofknownDIBsareunlikelytocome origin (Merrill 1936; Merrill & Wilson 1938) they remain from a single carrier (see for example Herbig & Leka 1991), an astronomical mystery (see reviews by Herbig 1995 and searches for correlations between DIBs are carried out with Sarre 2006). Hundreds of DIBs have been found so far and the purpose of revealing sets of DIBs that may come from many carriers have been proposed. Polycyclic aromatic hy- thesameorsimilarcarrier(seereviewsbyHerbig1995and drocarbons (PAHs) have been suggested by Salama et al. Sarre2006).Weselaketal.(2004)foundthatthestrengthof (1999),Zhouetal.(2006)andLeidlmairetal.(2011),Kroto the5797.1˚A DIBcorrelatestightlywithCHcolumndensity. Ehrenfreund et al. (2002) and Cox & Patat (2008) proved the existence of DIBs and molecular features in the spectra (cid:63) [email protected] of stars in the Magellanic Clouds and M100, respectively. † [email protected] Ka´zmierczaketal.(2009)havefoundacorrelationbetween (cid:13)c 0000RAS 2 Baron et al. the full width half maximum (FWHM) of the 6196˚A DIB Poznanski,Prochaska&Bloom(2012)andBaronetal. andtherotationaltemperatureoftheC molecule.Weselak (2015) (hereafter paper1) use millions of SDSS spectra of 2 et al. (2014) found a correlation between the 4964˚A DIB galaxies and quasars to study the properties of the MW and CH and between the 6916˚A DIB and CH+. NaID absorption doublet and DIBs. They show that even The concept of DIBs families was first described by thougheveryspectrumhasalowSNR,bybinningthespec- Krelowski&Walker(1987)andwassincedevelopedbyoth- tra in large numbers (typically more than 5000) one can ers. Some of the studies compute the correlation of equiv- detect the strongest DIBs and measure their EW with a alent width (EW) or central depth (CD) for pairs of DIBs 17%-20%accuracy.Hereweusethecentraldepth(thedepth (Chlewickietal.1986;Josafatsson&Snow1987;Camietal. at central wavelength, CD) of DIBs instead of the EW and 1997; Moutou et al. 1999; Weselak et al. 2001; Cox et al. studythecorrelationofDIBsandreddening.Weshowthat 2005;Wszolek2006;Bryndal&Wszol(cid:32)ek2007;McCall,Dros- this method is better suited for our low SNR data and re- back & Thorburn 2010; Xiang, Liu & Yang 2012), other sultsincorrelationuncertaintiesthatare2–3timessmaller. studies use the correlation with color excess or UV radia- As observing capabilities increase, hundreds of DIBs tiontoachievethetask(Chlewickietal.1986;Krelowski& needtobestudiedinhundredsorthousandsofspectra.This Walker 1987; Josafatsson & Snow 1987; Krelowski & West- makesgroupingandclassificationviatraditionalmethodsan erlund 1988; Vos et al. 2011; Kos & Zwitter 2013). More- intractable problem. Machine Learning was specifically de- over, DIBs with complex structure were studied and their signedtorevealpatternsinlargedatasetsandreducethedi- structureswerecompared(Jenniskens&Desert1993).DIBs mensionality of such problems. We construct an algorithm strength is the result of a combination of the properties of that divides the DIBs to spectroscopic families based on every physical environment. As the line of sight toward an their pairwise correlations. The core of the algorithm is the object usually samples several intervening clouds, studies Hierarchical Clustering algorithm, which divides objects to typicallyconcentrateonobservingobjectswithonlyonein- groups based on the distance between them, as defined in tervening cloud with a known extinction and UV radiation some space. field (Cami et al. 1997). We present the data and our method in section 2, we then study the correlation between DIBs and dust in sec- It has been proposed that DIBs spectroscopic families tion 3. We develop an algorithm which computes the pair- are composed of few strong and a number of weak DIBs wise correlation matrix of the DIBs and divides them into (Moutou et al. 1999; Fulara & Krelowski 2000). Recent spectroscopic families in section 4, where we also test the studies have divided DIBs into three spectroscopic fami- lies: (1) DIBs 5797.1˚A, 5849.8˚A, 6196.2˚A, 6379˚A and algorithmonknownatomicandmolecularabsorptionlines. 6613.7˚A which have narrow profiles with a substructure, We review and discuss our findings in section 5. We dis- (2) DIBs 5780.6˚A, 6284.3˚A and 6204.3˚A which show no cuss the extensive simulations we perform to quantify our apparent substructure and (3) DIBs 4501.8˚A, 5789.1˚A, detection limits and uncertainties in appendices A and B. 6353.5˚A and 6792.5˚A which are weak DIBs for which a substructure is not certain yet (Josafatsson & Snow 1987; Krelowski&Walker1987;Camietal.1997;Porceddu,Ben- 2 STACKING SPECTRA venuti & Krelowski 1991; Moutou et al. 1999; Wszol(cid:32)ek & God(cid:32)lowski2003).DIBsinthesefamiliescorrelatereasonably OurreferencelistofDIBsincludes197lineswhichwecom- well among the members in the group but not with those pile using the catalogs of Jenniskens & Desert (1994), Jen- from other families (see Cox et al. 2005 for a summary). niskens et al. (1996), Krelowski, Sneden & Hittgen (1995), The pair 6196.2˚A and 6613.7˚A shows a particularly good Hobbs et al. (2008) and Jenniskens et al. (1996). correlation of 0.98±0.18 and was studied by Moutou et al. The SDSS 9th data release (DR9) comprises of extra- (1999)andMcCall,Drosback&Thorburn(2010),although galactic spectra that cover an area of more than one forth Galazutdinov et al. (2002), using high resolution and high of the sky, most of which are at high galactic latitudes signal-to-noise ratio (SNR) spectra, argue that the ratio of (30◦ < b < 90◦). Since SDSS spectra are too noisy and EW of the two is not always constant. havelowspectralresolutiontoshowtheDIBswestackthem Most of the DIBs studies are based on observations of intobinsofthousandsofspectra.Thedetailedstackingpro- a small number of stellar spectra, usually at low Galactic cess is described in paper1. Briefly, we use the galaxies and latitude and high extinction. Several studies have also used quasars with redshift z > 0.005 from the 9th data release extragalacticsources,suchasnearbygalaxiesorsupernovae, of the SDSS which results in more than 1.5M spectra. The to study the DIBs in the Milky-way (MW) (Welty et al. spectra are interpolated to a uniform grid between 3817˚A 2006; Cox et al. 2006; Cordiner et al. 2008, 2011; van Loon and9206˚A withresolutionof0.1˚A.Wethennormalizeeach et al. 2013). In recent years there has been an increase in spectrum using the Savitzky Golay algorithm (Savitzky & the use of large samples (hundreds and more) of objects in Golay 1964) and group spectra by different parameters and ordertodrawmoregeneralconclusionsregardingDIBs(see combine them by calculating the median at every wave- for example Friedman et al. 2011,Xiang, Liu & Yang 2012, length. Kos et al. 2013, Zasowski et al. 2014, Ma´ız Apella´niz et al. ForthecorrelationoffluxwithE(B−V)insection3we 2014, Lan, M´enard & Zhu 2014 and Weselak et al. 2014). group the 1.5M spectra by their E(B−V) values using the (cid:13)c 0000RAS,MNRAS000,000–000 Using ML to classify DIBs 3 dustmapsofSchlegel,Finkbeiner&Davis(1998)asdonein the lack of difference between Pearson correlation coeffi- paper1.Eachofthe24stackedspectraweobtainisbasedon cients and Spearman rank correlation coefficients (which stacking about 70000 spectra, and has a SNR greater than tests for non linear relations) that we measure throughout 240.Forthepairwisecorrelationbetweenabsorptionlinesin ourentireanalysis.Thefluxofanabsorptionlineisgivenby: section4wegroupthespectrabytheircoordinatesandtheir E(B−V) (see paper1 for details) and result in 250 stacks I(λ)=I (λ)e−τ(λ) (2) spectra, each spectrum is a median of 5000 spectra with c a SNR near 180. We test different bin sizes for the latter, Where I (λ) is the flux of the continuum and τ is the c ranging from 1000 to 40000, and find that the results are opticaldepth.Intheopticallythinregimetheopticaldepth consistent. Furthermore, we find that stacking the spectra obeys τ (cid:28)1 and the flux can be expanded: by coordinates only, regardless of the reddening, yields the same results. I(λ)∼=Iλ(c)(1−τ(λ)) (3) The EW in the optically thin regime is proportional to the optical depth: 3 FLUX CORRELATION WITH E(B-V) 3.1 Method (cid:90) ∞ I(λ) EW = 1− dλ∼τ(λ) (4) I (λ) DIB strength is usually represented by the EW of the ab- −∞ c sorptionline(seeforexampleFriedmanetal.2011andKos Assumingthatthefluxofthecontinuumisconstantweob- & Zwitter 2013) or by its CD (Moutou et al. 1999; Wese- tain: lak et al. 2001). Both of the methods have advantages and disadvantages that vary according to the particular study. ρ(EW,E(B−V))=−ρ(I(λ0,E(B−V)) (5) Asshowninpaper1,inSDSSspectra,EWcanbemeasured only for EW > 5m˚A1, and its use imposes significant un- Whereλ isthecentralwavelength.Linesthatgetstronger 0 certaintiesduetothedifficultytodeterminethecontinuum with extinction will have a lower flux and hence a negative level.Furthermore,theEWisusuallymeasuredbyintegrat- correlation between flux and E(B−V). For clarity we call ingoverthefluxorafittedprofile(Gaussian,Lorentzianor thissimplya‘strongcorrelation’omittingthefactthatitis Voightfunction)betweenselectedendpoints:differentmeth- technically negative. Although we do not resolve the DIBs, ods of EW measurement may yield different results, thus the CD is linearly related to the EW if the point spread preventing a straightforward comparison between studies. function of the system is approximately Gaussian, which it While fitting a profile to uncorrelated blended lines intro- is. ducessystematiccorrelationbetweenthemandthereforebi- Thenormalizationofthespectradescribedinsection2 ases their pairwise correlation, simple integration over the is expected to remove any correlation between the contin- fluxdoesnotallowtheseparationofthelinesandablended uumandthereddeningandalsoresultinaconstantcontin- pairmustbetreatedasasingleline.Hereweusethefluxat uum level. On the other hand, the entire stacking process the central wavelength of the DIB to represent its strength may affect the initial correlation of DIBs and reddening. andmeasurethecorrelationbetweenCDandreddening.We We therefore simulate synthetic absorption lines with var- show through extensive simulations that this allows us to ious correlations with reddening to quantify the effects of study weak DIBs (EW(cid:62)6 m˚A) with the same precision as the entire process on the correlation. The simulations are strongDIBs(typicaluncertaintyofupto10%)andinsome presented in appendix A. Using the hundreds of thousands cases enables us to separate blended lines. oflinesthatwesimulate,wederiveafunctionthatconverts We use the Pearson correlation coefficient to study the between the measured correlation with reddening and the correlation of CD and E(B−V), ‘true’correlation,priortothestackingprocess.Wefindthat cov(x,y) the ability to detect the correlation of flux with reddening ρ= . (1) σ σ for an isolated absorption line depends only on the slope of x y the flux–E(B−V) relation and the width of the line, and We assume that all the DIBs are in the optically thin that the initial correlation can be recovered for DIBs with absorption regime since we use objects at high Galactic a slope greater than 10−2.9 1 and FWHM greater than mag latitudesandlowdustcolumndensities.Thisassumptionis 0.5˚A. We find that the initial correlation of a blended pair justified by the linear relations between EW and reddening of lines depends on both of the measured correlations and for the strongest DIBs that were found in paper1, and by can be recovered above the same slope threshold. Since we lacktheresolutiontomeasurethewidthoftheDIBsweuse theaveragewidthvaluefromDIBcatalogstoexcludeDIBs 1 WegenerallyconsideraDIBasbroadwhenFWHM(cid:38)2˚A and which have widths that are below our threshold. However, narrow when FWHM (cid:46) 1.5˚A. We refer to strong bands when we measure the correlation of DIBs with borderline widths EW∼100 m˚A . values(e.gthemeasuredFWHMofthe6196˚A DIBisinthe E(B−V) (cid:13)c 0000RAS,MNRAS000,000–000 4 Baron et al. range 0.2−0.8˚A according to different studies) and these ones out of the 142 bands that exhibit such strong nega- should be treated with caution. In this case, low measured tivecorrelationwithdustextinction,theyshowcorrelations correlation can indicate a non detection. We mark the bor- that are consistent with being similar and they are highly derline DIBs in the online material. blended,thustheyarelikelytooriginatefromthesamecar- We use the 24 spectra, stacked using the Schlegel, rier, perhaps one that is easily depleted on dust. Finkbeiner & Davis (1998) maps, as described in section We find that the DIBs 4428˚A, 5512˚A, 5850˚A, and 2, and measure the correlation coefficient between flux and 6445˚A areabsentinourdataandtheDIBs4762˚A,4964˚A, reddeningforeverywavelengthinourwavelengthgrid.The 5487˚A,6090˚A,and6379˚A havealowcorrelationwithdust result is a continuous correlation spectrum. (lowerthan0.5).Theaboveareamongthestrongestand/or most studied DIBs, most of which exhibit medium to high correlations with dust in the Galactic plane (see for exam- 3.2 Results ple:Friedmanetal.2011andKos&Zwitter2013).Sincewe measurestrongandsignificantcorrelationswithdustextinc- Forpresentation,wecolorasyntheticspectrumbasedonthe tion for other DIBs with similar strengths and correlations DIB catalog according to the correlation we measure. We weconcludethattheaboveindeedbehavedifferentlyinour clipthecolorsinfigures1–4tostrongcorrelationvalues,for data. clarity.Redregionsinfigures1and2markhighcorrelation offluxwithcolorexcess(negative;sinceinabsorption),while Established by Krelowski & Walker (1987) and later yellow coloring implies little to no correlation. Blue regions studied and adjusted by a number of studies (Chlewicki in figures 3 and 4 mark high correlation (positive), while et al. 1986; Krelowski & Westerlund 1988; Krelowski, Sne- green coloring implies little to no correlation. Clearly some den & Hittgen 1995; Cami et al. 1997; Weselak et al. 2001; lines show strong correlation (e.g. 5780.6˚A and 5797.1˚A) Kos & Zwitter 2013), the difference in DIBs behavior in while others, often as strong, do not (e.g. 4428˚A). Further- different environments has led to the suggestion that UV more, one can see the correlation with dust of interstellar radiation leads to the weakening of some DIBs while not absorption lines which are not DIBs. affecting other DIBs or even strengthening them. The ma- Our simulations show that we can robustly recover the jor classes that are said to exist are σ (not UV shielded) true correlation only above a threshold in the slope of the and ζ (UV shielded) clouds. This partition was made us- flux-reddening relation. 154 DIBs are above the threshold, ingthestrengthsofthe5780.6˚A and5797.1˚A DIBs.While thus their correlation with reddening can in principle be the 5780.6˚A DIB appear stronger where the UV radiation measured. Of the 154 DIBs, 132 are isolated and 22 are is stronger (σ type), the 5797.1˚A appears stronger in UV blendedin11pairs.4pairsarediscardedsincetheyexhibit shieldedenvironments(ζ type).Followingthis,Kos&Zwit- correlations with opposite signs and 2 pairs are discarded ter (2013) divide their sight lines into σ and ζ type clouds since they exhibit correlations that are lower than 0.5 (see basedontheratioofthe5797˚A and5780˚A DIBsstrength appendix A for details), thus 142 DIBs remain. A sample andfindthatthispartitionshowsatleasttwodifferenttypes of the correlations we measure is given in table 1, the full ofDIBs:typeIDIBswherethebehavioroftheDIBEWand table can be found in the online version of this paper. The reddening is similar for σ and ζ sight lines (5705˚A, 5780˚A, correlation uncertainties we deduce for the isolated DIBs 6196˚A, 6202˚A and 6270˚A) and type II DIBs where it dif- varyfromanaverageof5%forhighcorrelations(0.8)upto fers significantly (4964˚A, 5797˚A, 5850˚A, 6090˚A, 6379˚A, 15% for low correlations above 0.3. We find in paper1 that 6613˚A and 6660˚A). The DIBs we find to be absent or to the uncertainty of the EW for the strongest DIB, 5780.6˚A have low correlation with E(B −V) are inconsistent with is17%(andlargerforsmallerDIBs)andtheuncertaintyin these groups and with similar groups that were established the correlation is of order 20%. These results are therefore by other studies using the σ/ζ partition i.e., some of the 2–3 times better than in paper1. type I DIBs from Kos & Zwitter (2013) are absent in our Of the 142 DIBs, only 15 isolated DIBs and 2 pairs of data, some show low correlation with dust and some show blendedDIBshaveastatisticallysignificantcorrelationwith the opposite. The same applies to type II DIBs. E(B−V)thatishigherthan0.7.Whilethehighcorrelation McIntosh&Webster(1993)studiedthestrengthofthe oftheDIBs:5705˚A (pair),5780˚A(pair),5797.1˚A,6284.3˚A, DIBs4428˚A,5780.6˚Aand5797.1˚AasafunctionofGalac- 6613.7˚A and8621.1˚A isaconfirmationofwhatwasalready tic latitude. They found that the strength of 4428˚A rela- established in previous studies (see for example Friedman tive to the others is greatest at low latitude and decreases et al. 2011, Kos & Zwitter 2013 and Kos et al. 2013), we with increasing latitude whereas the strength of 5797.1˚A is findhighcorrelationswithdustextinctionformuchweaker, greatest at high latitude. van Loon et al. (2013) find that rarelystudiedDIBs:5975.6˚A,6308.9˚A,6317.6˚A,6325.1˚A, thegrowthoftheEWofthe4428.1˚A appearstoslowdown 6353.5˚A, 6362.4˚A, 6491.9˚A, 6494.2˚A, 6993.2˚A, 7223.1˚A as the EW of the 5780˚A DIB grows. We show in paper1 and 7558.2˚A. that the 4428˚A is absent in our spectra while the 5780.6˚A Moreover,wefindablendedpairofDIBs,5925.9˚A and and 5797.1˚A DIBs tend to have higher EW per reddening 5927.5˚A, which exhibit a high positive correlation between unit compared to the more UV shielded Galactic plane, in their flux and E(B−V), i.e., negative correlation between agreement with the findings of McIntosh & Webster (1993) theirstrengthanddustextinction.TheseDIBsaretheonly and van Loon et al. (2013). We therefore suggest that the (cid:13)c 0000RAS,MNRAS000,000–000 Using ML to classify DIBs 5 Following these findings we wish to compute the pair- Table 1.DIBcorrelationwithreddening wisecorrelationofDIBsafterremovingthecorrelationwith dust,basicallyusingdustasatracerforgascolumndensity, Wavelength∗[˚A] Deducedcorrelation whichisanuisanceparameter.Theresultoftheprocessisa 5704.7∗ 0.99±0.50 197X197 matrix that represents the pairwise correlation of 5705.1∗ 0.99±0.54 all the DIBs. We then wish to divide the DIBs to spectro- 5779.5∗ 1.0±0.055 scopic families based on this correlation matrix. The algo- 5780.6∗ 1.0±0.055 rithm,whichtakesspectraasaninputandreturnsgroupsof 5797.1 0.87±0.02 lines(DIBsorothers)asanoutput,isdescribedindepthin 5925.9∗ −0.9991±0.0004 section 4.1. We test the algorithm, find its detection limits 5927.5∗ −0.9997±0.0002 and characterize its precision and accuracy in appendix B. 5975.6 0.72±0.03 We further test the algorithm on known atomic and molec- 6203.2∗ 0.99±0.30 6204.3∗ 0.99±0.31 ular absorption lines in section 4.2, and discuss the results 6280.5∗ 0.99±0.11 for DIBs in section 4.3. 6281.1∗ 0.99±0.12 6284.3 0.976±0.029 6308.9 0.762±0.025 4.1 Algorithm 6317.6 0.841±0.022 For clarity, we describe the algorithm as it works with an 6325.1 0.713±0.025 input of 250 binned spectra, each spectrum has a median 6353.5 0.839±0.023 6362.4 0.702±0.025 extinctionvalueand8000wavelengthelementsfrom4000˚A 6491.9 0.808±0.025 to 8000˚A with a resolution of 0.1˚A. In order to compute 6494.2 0.904±0.024 thecorrelationbetweenpairsoffluxesonemustfirstremove 6613.7 0.801±0.025 thecorrelationbetweenfluxandreddening,whichwedoby 6993.2 0.704±0.025 computing the partial correlation: 7223.1 0.706±0.025 ρ −ρ ρ 7558.2 0.777±0.026 ρ = f1f2 f1Av Avf2 (6) 8621.1 0.841±0.022 f1f2,Av (cid:113)1−ρ2 (cid:113)1−ρ2 f1Av Avf1 Where the extinction A is the nuisance parameter, f and v 1 f isthepairoffluxeswewishtocomputepartialcorrelation CorrelationbetweenCDandreddeningfortheDIBsthat 2 exhibitastrongcorrelation.Thefulltableisavailableinthe for, and ρab is the Pearson Correlation Coefficient between onlineversionofthispaper. the a and b variables. Computing the partial correlation between every indi- vidualpairoffluxesinthewavelengthgridreturnsa8000X ∗WemarkDIBsthatarepartofablendedpair. 8000correlationmatrix.The[i,j]elementofthecorrelation matrix represents the partial correlation between the fluxes absenceandweakeningofsomeofthestrongest,moststud- intheithandjthwavelengthindices.Therefore,thecorrela- ied, DIBs is not due to UV radiation but highly connected tion matrix is symmetric, positive-definite, and its primary to the unique environment we probe in this study. Our re- diagonal is equal to one. sultsinpaper1reinforcethefindingsofMcIntosh&Webster Diagonals which are far away from the primary diag- (1993) and our current results add more DIBs that follow onal follow a Normal distribution with zero mean. This is the same patterns. The group 4428˚A, 5512˚A, 5850˚A, and expectedsincerandomlychosendistantwavelengthsshould 6445˚A tends to disappear in high Galactic latitudes while notbecorrelated.Diagonalswhichareclosetotheprimary thegroup4762˚A,4964˚A,5487˚A,6090˚Aand6379˚A exhibit diagonal measure the flux-flux correlation of nearby wave- almost no correlation with dust extinction (0−0.4). lengths and their mean is shifted towards 1 as the diagonal gets closer to the primary diagonal. This bias is the result of our limited resolution and some systematic effects that 4 CLUSTERING LINES are due to the calibration and analysis process. We wish to remove it. Figure 5 presents a sample of correlation dis- As we show above, while some DIBs correlate fairly well tributions for different diagonals. In order to remove the with color excess, many more do not, and there is substan- spuriouscorrelationthatiscausedbywavelengthproximity tialintrinsicscatter(seealsovanLoonetal.2009,Friedman we normalize each element in the correlation matrix in the et al. 2011, Kos & Zwitter 2013, van Loon et al. 2013, and following way: paper1). Furthermore, recent studies have measured DIBs ρ−µ strength in low column densities environments and found ρ = k (7) norm σˆ that the scale height of the DIBs carriers may be more ex- k tended than the distribution of dust (paper1 for 5780.6˚A Where µ is the median correlation for the kth diagonal k and 5797.1˚A DIBs and Kos et al. 2014 for 8621.1˚A). and σˆ is the median absolute deviation (MAD) of the kth k (cid:13)c 0000RAS,MNRAS000,000–000 6 Baron et al. 1.02 x1.01 u1.00 fl 3800 3820 3840 3860 3880 3900 3920 3940 3960 3980 4000 4020 4040 4060 4080 1.02 x1.01 u1.00 fl 4100 4120 4140 4160 4180 4200 4220 4240 4260 4280 4300 4320 4340 4360 4380 1.02 ux0.92 fl 0.83 4400 4420 4440 4460 4480 4500 4520 4540 4560 4580 4600 4620 4640 4660 4680 1.02 x0.99 u fl0.96 4700 4720 4740 4760 4780 4800 4820 4840 4860 4880 4900 4920 4940 4960 4980 1.02 x1.01 u1.00 fl 5000 5020 5040 5060 5080 5100 5120 5140 5160 5180 5200 5220 5240 5260 5280 1.02 x0.99 u fl 0.96 5300 5320 5340 5360 5380 5400 5420 5440 5460 5480 5500 5520 5540 5560 5580 1.02 ux0.86 fl 0.71 5600 5620 5640 5660 5680 5700 5720 5740 5760 5780 5800 5820 5840 5860 5880 1.02 x0.97 u fl 0.92 5900 5920 5940 5960 5980 6000 6020 6040 6060 6080 6100 6120 6140 6160 6180 1.02 ux0.85 fl 0.68 6200 6220 6240 6260 6280 6300 6320 6340 6360 6380 6400 6420 6440 6460 6480 wavelength(A˚) Figure 1. Correlation spectrum for the wavelength range 3800−6500˚A. The spectrum is a simulated spectrum based on the DIBs catalog and is colored by the correlation coefficient between flux and E(B-V), red for strong (negative; since in absorption) correlation (-1)andyellowforacorrelationthatisclosetozero.Thecolorsareclippedtocorrelationvaluesof(0.5,1)forclarity. (cid:13)c 0000RAS,MNRAS000,000–000 Using ML to classify DIBs 7 1.02 x0.92 u fl 0.81 6500 6520 6540 6560 6580 6600 6620 6640 6660 6680 6700 6720 6740 6760 6780 1.02 x0.96 u fl 0.89 6800 6820 6840 6860 6880 6900 6920 6940 6960 6980 7000 7020 7040 7060 7080 1.02 x0.99 u fl0.96 7100 7120 7140 7160 7180 7200 7220 7240 7260 7280 7300 7320 7340 7360 7380 1.02 x0.99 u fl0.96 7400 7420 7440 7460 7480 7500 7520 7540 7560 7580 7600 7620 7640 7660 7680 1.02 x1.00 u fl0.98 7700 7720 7740 7760 7780 7800 7820 7840 7860 7880 7900 7920 7940 7960 7980 1.02 x0.99 u fl 0.95 8000 8020 8040 8060 8080 8100 8120 8140 8160 8180 8200 8220 8240 8260 8280 1.02 x1.01 u1.00 fl 8300 8320 8340 8360 8380 8400 8420 8440 8460 8480 8500 8520 8540 8560 8580 1.02 x0.98 u fl0.94 8600 8620 8640 8660 8680 8700 8720 8740 8760 8780 8800 8820 8840 8860 8880 1.02 x1.01 u1.00 fl 8900 8920 8940 8960 8980 9000 9020 9040 9060 9080 9100 9120 9140 9160 9180 wavelength(A˚) Figure 2.SameasFigure1,forthewavelengthrange 6500−9200˚A. (cid:13)c 0000RAS,MNRAS000,000–000 8 Baron et al. 1.02 x1.01 u1.00 fl 3800 3820 3840 3860 3880 3900 3920 3940 3960 3980 4000 4020 4040 4060 4080 1.02 x1.01 u1.00 fl 4100 4120 4140 4160 4180 4200 4220 4240 4260 4280 4300 4320 4340 4360 4380 1.02 ux0.92 fl 0.83 4400 4420 4440 4460 4480 4500 4520 4540 4560 4580 4600 4620 4640 4660 4680 1.02 x0.99 u fl0.96 4700 4720 4740 4760 4780 4800 4820 4840 4860 4880 4900 4920 4940 4960 4980 1.02 x1.01 u1.00 fl 5000 5020 5040 5060 5080 5100 5120 5140 5160 5180 5200 5220 5240 5260 5280 1.02 x0.99 u fl 0.96 5300 5320 5340 5360 5380 5400 5420 5440 5460 5480 5500 5520 5540 5560 5580 1.02 ux0.86 fl 0.71 5600 5620 5640 5660 5680 5700 5720 5740 5760 5780 5800 5820 5840 5860 5880 1.02 x0.97 u fl 0.92 5900 5920 5940 5960 5980 6000 6020 6040 6060 6080 6100 6120 6140 6160 6180 1.02 ux0.85 fl 0.68 6200 6220 6240 6260 6280 6300 6320 6340 6360 6380 6400 6420 6440 6460 6480 wavelength(A˚) Figure 3. Correlation spectrum for the wavelength range 3800−6500˚A. The spectrum is a simulated spectrum based on the DIBs catalogandiscoloredbythecorrelationcoefficientbetweenfluxandE(B-V),blueforstrongcorrelation(1)andgreenforacorrelation thatisclosetozero.Thecolorsareclippedtocorrelationvaluesof(-0.5,-1)forclarity. (cid:13)c 0000RAS,MNRAS000,000–000 Using ML to classify DIBs 9 1.02 x0.92 u fl 0.81 6500 6520 6540 6560 6580 6600 6620 6640 6660 6680 6700 6720 6740 6760 6780 1.02 x0.96 u fl 0.89 6800 6820 6840 6860 6880 6900 6920 6940 6960 6980 7000 7020 7040 7060 7080 1.02 x0.99 u fl0.96 7100 7120 7140 7160 7180 7200 7220 7240 7260 7280 7300 7320 7340 7360 7380 1.02 x0.99 u fl0.96 7400 7420 7440 7460 7480 7500 7520 7540 7560 7580 7600 7620 7640 7660 7680 1.02 x1.00 u fl0.98 7700 7720 7740 7760 7780 7800 7820 7840 7860 7880 7900 7920 7940 7960 7980 1.02 x0.99 u fl 0.95 8000 8020 8040 8060 8080 8100 8120 8140 8160 8180 8200 8220 8240 8260 8280 1.02 x1.01 u1.00 fl 8300 8320 8340 8360 8380 8400 8420 8440 8460 8480 8500 8520 8540 8560 8580 1.02 x0.98 u fl0.94 8600 8620 8640 8660 8680 8700 8720 8740 8760 8780 8800 8820 8840 8860 8880 1.02 x1.01 u1.00 fl 8900 8920 8940 8960 8980 9000 9020 9040 9060 9080 9100 9120 9140 9160 9180 wavelength(A˚) Figure 4.SameasFigure3,forthewavelengthrange 6500−9200˚A. (cid:13)c 0000RAS,MNRAS000,000–000 10 Baron et al. 5940 1.0 1.0 1A˚ 0.8 5A˚ 0.6 0.8 500A˚ 5917 0.4 n ˚(A) 0.2 atio function0.6 wavelength5895 �0.00.2rtialcorrel y 0.4a sit0.4 5872 � p n 0.6 e � D 0.8 � 5850 1.0 0.2 5850 5872 5895 5917 5940 � wavelength(A˚) 0.0 5940 1.0 0.5 0.0 0.5 1.0 − − Partialcorrelationcoefficient 6 Fwaraiagvtueeldreenbgy5th.1.C˚ATohr(erbeflllaautcxiko-)flnusdhxioscwtorrirabeulmatiteoidoninaanosfcawoarfrvueenllacettniigootnnhsooffth0da.i6ts5taa,rnewcaseevpien-- ˚h(A)5918 24 rrelation loefn0g.t1hatnhdatwaarveesleepnagrtahtsedthbayt5ar˚Aes(ebpluaera)tsehdowbya5m0e0d˚Aian(rceodr)rehlaavtieona elengt5895 0 sedco mediancorrelationof0. wav �2mali r 5872 o 4n diagonal. We thus obtain a 8000 X 8000 normalized corre- � lation matrix in which the [i, j] element represent the devi- 6 � ation in σ units from the median correlation. We consider 5850 5850 5872 5895 5918 5940 |ρnorm|>3σ as high correlation (see discussion below). We wavelength(A˚) present a cutout of the partial correlation matrix and nor- malized correlation matrix around the NaID wavelengths Figure 6. Partial correlation matrix cutout around the NaID in figure 6. Despite the proximity of the doublet lines one wavelengths (top) and the normalized correlation matrix (bot- can see that the strong positive correlation between them tom). remains in the normalized matrix. It is also apparent that thecorrelatedlinesproduceasecondaryeffectofstrongneg- ative correlation that fades away as the element is farther away from the NaID wavelength. This does not affect our 7presentsthe197X197normalizedcorrelationmatrixthat final results. contains the pairwise correlation of the DIBs. The correla- We then reduce the normalized correlation matrix to a tionmatrixcontains19306uniqueelements,whichprevents matrix that contains only the elements of the pairwise cor- us from dividing the DIBs into groups manually. We use a relation between absorption lines we wish to study, using Hierarchical Clustering algorithm to achieve this goal. their central wavelength. We tested several methods to ob- HierarchicalClusteringisaMachineLearningalgorithm tainthereducedmatrix.Thesimplestistakingtheelements thatseekstobuildahierarchyofclusters(i.e.,groups)based [i,j]thatmatchtheithandjthcentralwavelengths.Wealso on the distance between objects. We use the agglomerative tried integrating over the wavelength range of a given line type of clustering, the bottom-up approach, in which each usingitsmeasuredprofileasaweightfunction.Wefindthat object (absorption line in our case) starts as a one sized the latter method smears the resulting correlation, some of cluster, and pairs of clusters are merged according to their the smearing is caused by the secondary effect of the neg- distance as one moves up the hierarchy. In order to convert ative correlation. For the NaID doublet, for example, we correlationsbetweenlinesintodistances,weperformthefol- measure a correlation significance of 4.3σ while in the bot- lowing linear transformation on the normalized correlation tomoffigure6thesignificanceatthecentralwavelengthsis matrix, such that a strong positive correlation transforms 7.8σ.Wethereforeusetheexactwavelengthstoextractthe toashortdistance,andastrongnegativecorrelationtrans- correlation element and obtain the reduced matrix. Figure forms to a long distance: (cid:13)c 0000RAS,MNRAS000,000–000

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.