Astronomy&Astrophysicsmanuscriptno.pca-in-OrionB˙arXiv (cid:13)c ESO2017 January17,2017 Dissecting the molecular structure of the OrionB cloud: Insight from Principal Component Analysis(cid:63) PierreGratier1,EmericBron2,3,MaryvonneGerin4,Je´roˆmePety5,4,VivianaV.Guzman6,JanOrkisz5,4,Se´bastien Bardeau5,JavierR.Goicoechea2,FranckLePetit3,HarveyLiszt7,KarinO¨berg6,NicolasPeretto8,EvelyneRoueff3, AlbrechSievers5,andPascalTremblin9 1 Laboratoired’astrophysiquedeBordeaux,Univ.Bordeaux,CNRS,B18N,alle´eGeoffroySaint-Hilaire,33615Pessac,France 2 ICMM,ConsejoSuperiordeInvestigacionesCientificas(CSIC).E-28049.Madrid,Spain. 3 LERMA,ObservatoiredeParis,PSLResearchUniversity,CNRS,SorbonneUniversite´s,UPMCUniv.Paris06,F-92190,Meudon, France 7 4 LERMA, Observatoire de Paris, PSL Research University, CNRS, Sorbonne Universite´s, UPMC Univ. Paris 06, E´cole normale 1 supe´rieure,F-75005,Paris,France 0 5 IRAM,300ruedelaPiscine,38406SaintMartind’He`res,France 2 6 Harvard-SmithsonianCenterforAstrophysics,60GardenStreet,Cambridge,MA,02138,USA. n 7 NationalRadioAstronomyObservatory,520EdgemontRoad,Charlottesville,VA,22903,USA a 8 SchoolofPhysicsandAstronomy,CardiffUniversity,Queen’sbuildings,CardiffCF243AA,UK. J 9 MaisondelaSimulation,CEA-CNRS-INRIA-UPS-UVSQ,USR3441,Centred’e´tudedeSaclay,F-91191Gif-Sur-Yvette,France. 6 1 ] ABSTRACT A Context.Thecombinationofwidebandreceiversandspectrometerscurrentlyavailablein(sub-)millimeterobservatoriesdeliverwide- G fieldhyperspectralimagingoftheinterstellarmedium.Tensofspectrallinescanbeobservedoverdegreewidefieldsinaboutfifty . hours.Thiswealthofdatacallsforrestatingthephysicalquestionsabouttheinterstellarmediuminstatisticalterms. h Aims.Weaimatgaininginformationonthephysicalstructureoftheinterstellarmediumfromastatisticalanalysisofmanylines p fromdifferentspeciesoveralargefieldofview,withoutrequiringdetailedradiativetransferorastrochemicalmodeling. - o Methods.Wecoupledanonlinearrescalingofthedatawithoneofthesimplestmultivariateanalysismethods,namelythePrincipal r ComponentAnalysis,todecomposetheobservedsignalintocomponentsthatweinterpretfirstqualitativelyandthenquantitatively t s basedonourdeepknowledgeoftheobservedregionandoftheastrochemistryatplay. a Results.Weidentify3principalcomponents,linearcompositionsoflinebrightnesstemperatures,thatarecorrelatedatvariouslevels [ withthecolumndensity,thevolumedensityandtheUVradiationfield. Conclusions.Whensamplingasufficientlydiversemixtureofphysicalparameters,itispossibletodecomposethemolecularemission 1 inordertogainphysicalinsightontheobservedinterstellarmedium.Thisopensanewavenueforfuturestudiesoftheinterstellar v medium. 5 0 Keywords.ISM:molecules,ISM:clouds,ISM:photon-dominatedregion(PDR),Object:OrionB,Method:statistical 2 4 0 1. Introduction self: astrochemical modeling (Le Bourlot et al. 2012; Agu´ndez . 1 &Wakelam2013).Furthercomplexityariseswhenconsidering 0 Molecularcloudshaveacomplexstructure,withfilamentshost- radiativetransfertoderivelineintensitiesfromthelocalchemi- 7 ingdensecoresandimmersedinalowdensitydiffuseenvelope. calcompositionandphysicalstructure. 1 Large scale dust continuum maps obtained with Herschel have The last few years have seen the installation of new : provided a breakthrough, by showing the tight relationship be- v wideband receivers and spectrometer at millimeter and sub- tween the filaments and the dense cores. These maps however i millimeterradiotelescopes.Withtheseinstruments,linesurveys X donotprovideinformationonthegasdynamicsoritschemical of several GHz bandwidth and several tens of thousands of r composition. Furthermore the relationship between the submil- a limeterdustemissionandthegascolumndensityisaffectedby spectral channels are the new default mode of observations. Combined with wide field imaging capabilities both for single thedusttemperatureandpossiblevariationsofthedustemissiv- dishandinterferometers,hyperspectralimagingisnowroutinely ity. Molecular line emission maps provide alternative means to carriedoutwiththeseinstruments. studymolecularcloudstructureandrelateittotheflowkinemat- Theanalysisandinterpretationoftheselargedatasets,con- ics.Molecularlineemissionislinkedtotheunderlyingphysical sistingofthousandsofspatialpositionsandtensofthousandsof propertiesoftheISM,suchasdensity,gasanddusttemperatures, spectral channels, will benefit from the use of statistical tools. UVradiationfield,andcosmicrayionizationrate.Butthesere- PrincipalComponentAnalysis(PCA)isoneofthemostwidely lationships are complex and their detailed study is a field in it- used multivariate analysis method. It has been used to study (cid:63) Based on observations carried out at the IRAM-30m single- the Interstellar Medium (ISM) using molecular emission maps dish telescope. IRAM is supported by INSU/CNRS (France), MPG (Ungerechts et al. 1997; Neufeld et al. 2007; Lo et al. 2009; (Germany)andIGN(Spain). Melnicketal.2011;Jonesetal.2012). 1 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis Table 1. Properties of the observed spectral lines. The last six columns show the statistics of the data before and after asinh reparametrization. Originaldata Afterasinhreparametrization Molecule Transitions Frequency Noise Min. Median Max. Std. Min. Median Max. Std. (MHz) (K) (K) (K) (K) (K) (K) (K) (K) (K) 12CO J=1→0 115271.202 0.09 −0.39 13.40 57.11 10.18 −0.37 2.39 3.32 0.91 13CO J=1→0 110201.354 0.04 −0.19 1.38 36.43 3.27 −0.19 0.97 3.03 0.74 CS J=2→1 97980.953 0.06 −0.36 0.06 15.53 0.48 −0.35 0.06 2.48 0.23 HCN J=1→0,F=2→1 88631.848 0.10 −0.58 0.15 10.32 0.39 −0.52 0.15 2.22 0.25 HCO+ J=1→0 89188.525 0.09 −0.45 0.26 8.07 0.47 −0.42 0.25 2.07 0.30 SO N=3→2,J=2→1 99299.870 0.06 −0.43 0.04 6.46 0.24 −0.40 0.04 1.92 0.17 CN N=1→0,J=3/2→1/2,F=5/2→3/2 113490.970 0.09 −0.58 0.10 6.33 0.27 −0.52 0.09 1.91 0.20 HNC J=1→0 90663.568 0.08 −0.49 0.07 6.01 0.27 −0.45 0.07 1.88 0.19 CCH N=1→0,J=3/2→1/2,F=2→1 87316.898 0.12 −0.62 0.08 5.72 0.22 −0.55 0.08 1.85 0.18 C18O J=1→0 109782.173 0.06 −0.30 0.06 5.55 0.42 −0.29 0.06 1.83 0.26 N2H+ J=1→0,F1=2→1,F=3→2 93173.764 0.08 −0.44 0.00 4.53 0.13 −0.41 0.00 1.70 0.10 CH3OH J=2→1,K=0→0,(A+) 96741.375 0.06 −0.34 0.01 2.24 0.08 −0.32 0.01 1.26 0.08 In this paper, we address the following question: can PCA telescope at almost the same time, as the observed bandwidth provide a method to study the underlying physics of the ISM wascoveredinonlytwofrequencytunings.Thenoiseforeach when applied to a large dataset of molecular emission, without map,alongwiththeminimum,median,maximum,andvariance performingeitherradiativetransferorastrochemicalmodeling? values are listed in Table 1. The noise is computed by fitting a Thearticleisdividedasfollows.Section2presentsthedata gaussian function to the negative part of the histogram of pixel usedinthisstudy.Section3describesthestatisticalmethodused brightnesses.Thisenablestocomputethenoisewithoutneeding in this paper and its implementation. Results are presented in tomaskouttheemission. Sect 4 first by analyzing the output of the PCA, and further by comparing theses outputs with independent maps of physi- 3. PrincipalComponentAnalysis cal conditions in OrionB in Sect. 5. The last section discusses theseresults. 3.1. Principle We use the following standard statistical terms: the dataset is 2. Data composed of samples each described by individual features. In our case, each spatial pixel is a sample and each line intensity The data used in this paper is selected from the ORION-B isafeature(thefulldatasetthuscorrespondstoadatamatrixof project (PI: J. Pety), which aims at mapping with the IRAM- 132300samplestimes12features). 30mtelescopealargefractionoftheSouth-Westernedgeofthe PrincipalComponentAnalysis(PCA)isawidelyusedmulti- OrionB molecular cloud over a field of view of 1.5 square de- dimensionalanalysistechnique(Jolliffe2002),whichcanbede- greesinthefull3mmatmosphericwindowat200kHzspectral finedinseveralmathematicallyequivalentways.Itaimsatfind- resolution. Pety et al. (2016) describe in detail the data acqui- inganeworthogonalbasisofthefeaturespace(whoseaxesare sition and reduction strategies. Table 1 lists the 12 transitions calledprincipalcomponents,orPCs),sothatforeachk,thepro- selectedinthispaperfromthealreadyobservedfrequencyrange jectionontothehyperplanedefinedbythefirstkaxesisoptimal (from 84 to 116GHz), based on the inspection of the full data inthesensethatitpreservesmostofthevarianceofthedataset cube. (orequivalentlythattheerrorcausedbythisprojectionismini- For each line, we focus on emission coming from a lim- mal).PCAthusdefinessuccessiveapproximationsofthedataset ited1.5kms−1-velocityrangecenteredonthepeakvelocity(i.e., byhyperplanesofincreasingdimension. 10.5kms−1) of the main velocity component along the line of This is equivalent to the diagonalization of the covariance sight.Averagingthethree0.5kms−1velocitychannelsallowsus matrix, so that the principal components are naturally uncorre- to get a consistent dataset from the radiative transfer and kine- lated.Itcanbethoughtofasfindingtheprincipalaxesofinertia maticsviewpoints.Inparticular,weavoidtheneedtodisentan- gle1)theeffectsofhyperfinestructuresofsomelines,and2)the ofthecloudofsamplesabouttheirmeaninfeaturespace,andis thus a way to analyze the covariance structure of the data. The complexvelocitystructureofthesource(Orkiszetal.subm.). principalcomponentsareorderedbydecreasingprojectedvari- The observed field of view covers 0.81deg×1.10deg and ance.AsaresultPC1istheaxisoflargestvarianceinthedata. containstheHorseheadnebula,andtheHiiregionsNGC2023, PC2isthentheaxisoflargestvarianceatconstantPC1(orthog- NGC2024, IC434, and IC435. The angular resolution ranges onaltoPC1)andsoforth.Neglectingtheaxesoflowestvariance from22.5to30.5(cid:48)(cid:48).The12resultingmapshaveacommonpixel thenallowstodefinealow-dimensionalhyperplaneinwhichthe sizeof9(cid:48)(cid:48)thatcorrespondstoaNyquistsamplingforthehighest dataset is approximately embedded. An important property to frequencylineobserved(12CO(1–0)at115.27GHz).Themaps keepinmindisthelinearityofPCA(itdefineslow-dimensional thuscontains315×420pixels.Atadistanceof∼400pc(Menten hyperplanes,andnotgenerallow-dimensionalhypersurfaces). etal.2007),themapsgiveusaccesstophysicalscalesbetween Acommonvariant1 intheapplicationofPCAistonormal- ∼50mpcand10pc. ize the variations of the dataset around the mean by the stan- Figure1showsthe12mapsoftheresultingbrightnesstem- dard deviation of each features, before applying the PCA. This perature multiplied by an ad-hoc factor in order that they can amounts to diagonalizing the correlation matrix instead of the share the same color look-up table, even though the intrinsic covariancematrix.Thecorrelation-basedvariantallowstoavoid brightnesstemperaturesofthedifferentlinesdifferbymorethan one order of magnitude. The relative calibration of the differ- 1 ItactuallygoesbacktooneofthetwoearliestdescriptionsofPCA: entlinesisexcellentbecausetheywereobservedwiththesame Hotelling(1933) 2 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis 12CO 13CO 2 CS 5 HCN 10 × × × HCO+ 10 SO 10 CN 10 HNC 10 × × × × CCH 10 C18O 15 N H+ 15 CH OH 15 2 3 × × × × 40 30 20 ) (0 y 10 δ 0 10 − 20 − 40 30 20 10 0 δx( ) 0 0 6 12 18 24 30 36 42 48 T (K) mb Fig.1.MapsofmolecularemissioninKelvinmainbeamtemperatures. having one feature dominating the variance, and is indicated if moment, elemental abundances,...) that are not relevant for our therelativescalesofthefeaturesarenotrelevantforthepurpose analysisofthechemicalvariationsacrossthemap,weusehere of the analysis. As the relative intensity scales of the different thecorrelation-basedversionofPCA. molecules used here are largely affected by properties (dipole 3 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis of the N H+(1−0) is Gaussian to first order, the histogram of 101 101 2 n 13CO(1−0) exhibits heavy tails similar to power laws. As a o ncti 100 100 result, extreme intensity values might dominate the covariance utionfu10−1 10−1 slotrwuecrtuirneteonfsitthyevdaaltuae,sh.idingthevariationsatthemorecommon Distrib1100−−32 10−2 brigFhrtnoemssthteemppheyrasitcuarel visieawlspooidnet,sitraakbilneg. PtCheAloisgaariltihnmearotfecthhe- 0 10 20 30 40 2 1 0 1 2 3 niquewhichdecomposethedataasasumofuncorrelatedcom- − − 101 101 ponents.Applyingittothelogarithmofthedataallowsadecom- nfunction101−001 100 pttuoorstehiteiinountneadrsmearslypoirnofgdruartcaitdooisa,ftifpvarecotdtorurascn,tssafneadrndathnpudoscwdheeesrmclraiicwbaesls,etmffheeocrdtesa.taaTdsaatkpriutnecgd- utio10−2 10−1 thelogarithmsofthedataistheequivalentinastrochemistryto Distrib1100−−43 10−2 dstouidnigesc.oPloert-ycoeltoarl.m(a2g0n1i6tu)dsehodwiagtrhaamt tahnealliynseisininteogpratitceadlborrigUhVt- 0 10 20 30 40 2 1 0 1 2 3 nesstemperaturesofourdatasetistofirstordercorrelatedtothe − − Brightnesstemperature(K) Reparametrizedbrightnesstemperature(K) columndensityofmatteralongthelineofsight.Weexpectthis aspecttoappearinourPCAanalysis,andsecondorderchemi- Fig.2.Effectoftheasinhrenormalizationontheintensitydistri- calvariationsaroundthistrend,whichwouldberevealedbyline butions.Leftcolumn:beforerenormalization,rightcolumn:after ratios, are thus better described as multiplicative –rather than renormalization,Toprow: N H+,bottomrow: 13CO. 2 additive–factors. 4.0 3.2.2. Impactofnoise 3.5 Thepresenceofnoisehowevercausesthepossibilityofnegative valuesinpixelswheresomelinesareundetected.Thelogarithm 3.0 transform cannot be applied to these negative noise values. In addition,asthelogarithmstretchesthelowestvaluescompared 2.5 tothelargestones,itwillalsotendtostretchthepositivenoise y2.0 valuesoftheundetectedpixels,thusgivingthemmoreweightin thecovarianceofthedata. 1.5 There may be two different reasons for a non-detection: ei- 1.0 ther the measurement is not sensitive enough to detect the line y=aasinh(x/a) ortheregionjustdoesnotcontainthespeciesthatemitstheline. 0.5 y=x ThelattercasehappensinparticularinHiiregions(e.g.,IC434), y=aln(2x/a) where 12CO(1–0) is photo-dissociated by the far UV photons. 0.0 0 1 2 3 4 5 6 Inthisparticularcase,wecouldremovefromourdatasetallthe x samples (pixels) where no 12CO(1–0) is detected. This just as- sumesthatno 12CO(1–0)detectionsathighsensitivityimplythe Fig.3.Plotofthe asinhfunction(solidline)showingtheasymp- toteswhen x → 0(dashdottedline)and x → +∞(dottedline). absenceofmoleculargas.However,thismethodisnotgeneric. Theparametera(herea=1)istracedwithathinverticalline For instance, N2H+(1–0) is only detected in dense cores (Pety et al. 2016), and restricting ourselves to these regions would drasticallylimitthescopeofourstudy.Ifwewishtousethisim- portanttracerofthemoleculargaswhilestillcoveringtherange In this work, weused the PCA implementation available in of different chemical regimes present in our full map, we thus the Python package scikit-learn (Pedregosa et al. 2011), need to find an alternative. Moreover, in the 3mm band, radio whichusesasingularvaluedecompositiontocomputetheprin- recombinationlines,whichemitinHiiregions,couldinprinci- cipalcomponentaxes. plebeaddedtoourdatasettostudytheformationofmolecular gas. 3.2. Reparametrizationofinputdata Addingathresholdingstepbeforetakingthelogarithmwill onlyworsenthescopeofundefinedvalues.WhiletherearePCA 3.2.1. Ontheneedofareparametrization methods(seee.g.Ilin&Raiko2010)thatcantakemissingdat- As seen in Table 1, some of the tracers have large dynamical apoints into account, they rely on the fact that these missing ranges(2ordersofmagnitudefor 13CO(1–0)and 12CO(1–0)). points have the same statistics than the measured points (i.e., Figure 2 shows the histogram of the brightnesses temper- when a value is missing, it is independent of the actual value). atures of two lines with contrasting behavior in our dataset, This is clearly not the case here as the missing values (unde- namely 13CO(1−0), and N H+(1−0). As the dynamic range tected lines) are missing because they are below the sensitivity 2 is large both in intensity and number of pixels per bin, these threshold.Thesearecalledcensoredvaluesinstatistics.Wethus histograms use the Bayesian Blocks algorithm (Scargle et al. searchforafunctionthatislineararound0(inthenoisedomi- 2013, using here the Python implementation from AstroML, nateddomain),andwhichisasymptoticallyequaltoalogarithm seeVanderplasetal.2012;Ivezic´ etal.2014),whichadaptsthe forlargevalues(comparedtothenoiselevel).Theinversehyper- bin width to the underlying distribution. While the histogram bolicsinusfunction, asinh(x),fulfillstheseconditions.Wethus 4 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis 12CO 13CO CS HCN HCO+ SO CN HNC CCH C18O N H+ CH OH 2 3 40 30 20 ) (0 y 10 δ 0 10 − 20 − 40 30 20 10 0 δx( ) 0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 − Fig.4.Mapsofmolecularemissionafterreparametrizationbytheasinhfunction. usedthefollowingfunctiontoreparametrizethedatabeforeap- where the parameter a is the typical value for which the func- plyingPCA tionbehaviorchangesfromalineartoalogarithmicregime(see Fig.3). T(x)=aasinh(x/a), (1) 5 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis Thefirstprincipalcomponentexplainsthemajority(60%)of 100 thetotalcorrelationpresentintheoriginaldataset.Thusalarge part of the variations in the dataset occur along a single axis (i.e. all lines are strongly correlated to each other). The second 80 principalcomponentaccountsforabout10%ofthecorrelation. %) Itissignificantlylessthanthefirstcomponent,butmorethanany ( n atio 60 othercomponents.PC3,4and5correspondtosimilaramounts el of correlations (around 5% each) and PC6 slightly less (3.3%). corr PC1 to 6 collectively explain more than 90% of the correlation d ne 40 inourdataset.TheremainingPCshavesimilarlowamountsof plai explainedcorrelation(from2%forPC7to0.9%forPC11). x E 20 4.2. Discussionoftheprincipalcomponents ThePCsdefinedbyouranalysisrepresentnewaxesinthefea- 0 2 4 6 8 10 12 ture space (the full 12 PCs are simply a rotation of the initial Numberofcomponents basis of the feature space), deduced from the data itself. They can thus be expressed in terms of the original axes, as a linear Fig.5. Percentage of the explained correlation as a function of combination of the (transformed) line intensities. Figure 6 dis- thenumberofcomponentsinthePrincipalComponentAnalysis, plays the quantitative contribution of each initial feature (line) dottedline:cumulativepercentage. toeachPC.Analternativeviewoftherelationshipbetweenthe PCs and the line intensities, namely the correlation wheels, is presentedinFig 7. The only free parameter in the method is the threshold a. Eachsample(pixelbrightness)canthenbeprojectedonthe Appendix A discusses our choice, i.e., a = 8×0.08 = 0.64K new axes, providing new coordinates commonly called com- equal to 8 times the median noise of the dataset. In short, we ponent scores. The PCA method considers the pixels as inde- selectthevalueofathatmaximizesthecorrelationsofthefirst pendent samples, and thus ignores the spatial structure of the 3principalcomponentswithindependentknownmeasurements molecularemission.Itisneverthelesspossibletoreconstructthe ofthecolumndensity,volumedensity,andUVillumination(see maps of the component scores. Figure 8 shows these projected Section5).Thisappendixalsodemonstratesthatourresultsare maps. The chosen color look-up table emphasizes that positive quiteinsensitivetotheexactvalueofa. (red-colored)andnegative(blue-colored)valuesoftheprojected TherightcolumnofFig.2showstheresultoftheasinhtrans- maps–correspondingtovariationsaboveandbelowtheaverage formation on the intensity distributions of two transitions rep- alongtheconsideredaxis–clearlyextractadifferentspatialpat- resentativeofbright(13CO(1–0))andweak(N H+(1–0))aver- 2 ternperprincipalcomponent. aged lines. In the case of the bright lines, the dynamic range is The first principal component is a linear combination of all drasticallyreducedwiththeheavytailbeingtransformedintoa tracers,withsimilarpositiveweightsforalllines(Fig.6).Itthus secondpeakinthedistributionbutwithnovaluesabove3.Inthe describes correlated variations of all molecules, and these ac- case of N H+ the distributions before and after reparametriza- 2 count for most of the variations in the dataset. It is a natural tionareverysimilar.Figure4showsthe12mapsofthemolec- consequence of the fact that all lines are well correlated (posi- ularemissionafterreparameterizationbytheasinhfunction,but tively) to each other. Pety et al. (2016) show that the emission before the normalization step of the PCA analysis. The bright- of all lines is correlated to first order with the column density nesstemperaturesofallthemapshavebeencompressedbetween ofmatteralongthelineofsight.Thiscomponentisthusproba- about-0.5and3.5,withlowsignal-to-noisebrightnesstempera- blyrelatedtothetotalcolumndensity,whoseincreasecausesto turesbetween-0.5and0.5beingmostlyuntransformed. firstorderanincreaseinalllines(inthelinearapproximationof PCA,non lineareffectssuch assaturationofthe 12COlineare 4. Results not captured). The corresponding component map (Fig. 8) in- deedresemblesamapofcolumndensity.Thisrelationbetween The PCA method exposes the correlations between the line PC1andthetotalcolumndensityofmatterwillbeinvestigated brightness temperatures. The derived PCs give the main axes morequantitativelyinSect.5.NotethatasPC1hasonlypositive of correlated variations in the data set. As such, it does not di- coefficientsforalllines,orthogonalityensuresthatallotherPCs rectlyyieldphysicalinformationunderlyingthedataset.Inthis willrepresentcontrastsbetweendifferentlines. section, we describe the results of the application of this sta- The second principal component represents the axis of tistical method to our dataset and we start to discuss their pos- largestvariationatconstantPC1(orthogonaltoPC1).Thisaxis sible physic interpretation based on our a priori astrochemical ofvariationsisdominatedbypositivecontributionsof N H+and 2 knowledge.ThepossiblerelationsbetweenthesePCsandphys- CH OH,andnegativecontributionof 12COand 13CO.Thefirst 3 icalvariablesareinvestigatedinalatersection. two tracers are chemically associated with dense and cold re- gions of the interstellar medium. For instance, N H+ is easily 2 destroyedbyCO,itcanthusonlybeabundantinthegasphase 4.1. Correlationfractionexplainedbythedifferentprincipal when CO has been depleted on the grain surfaces (Pety et al. components 2016). The component map shows strong positive values high- Figure 5 shows the percentage of the correlation explained by lighting known dense cores, including the clumps in the head eachPC(asafunctionoftheprincipalcomponentnumber)along andintheneckoftheHorsehead(Ward-Thompsonetal.2006). with the cumulative explained correlation as a function of the The third principal component shows positive contribution numberofprincipalcomponentskeptinthedecomposition. ofCCHandCNthatareknowntobesensitivetoUVillumina- 6 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis PC1 60.1% PC2 10.9% PC3 6.5% PC4 5.9% 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 − − − − 1.0 1.0 1.0 1.0 − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 C C C C PC5 5.0% PC6 3.3% PC7 2.0% PC8 1.9% 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 − − − − 1.0 1.0 1.0 1.0 − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 C C C C PC9 1.7% PC10 1.4% PC11 0.9% PC12 0.4% 1.0 1.0 1.0 1.0 0.5 0.5 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.5 0.5 − − − − 1.0 1.0 1.0 1.0 − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H − OOSN+ ONCHO+ H 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 12C13CCHCHCOSCHNCC18CNH2HO3 C C C C Fig.6.Barplotsshowingthecontributionofeachlineintensitytoeachprincipalcomponent(withthefractionofthetotalcorrelation accountedforbyeachPCgivenasapercentage).Theuncertainties(standarddeviations)showninredareobtainedbybootstrapping asdescribedinSect.4.3. tion,andnegativecontributionof N H+, CH OHandoftheCO PC2(butwithnegativevalueswherePC2isverylarge),andthis 2 3 isotopologues, that trace gas shielded from the UV field. This PCcouldthustracethechemistryofmoderatelydensegas. componentthusprobablytracesthechemicalspecificitiesofUV Thesixthprincipalcomponentshowsnegativecontributions illuminatedgas.Thecomponentmapclearlyshowspositiveval- ofHCN,HCO+ andCN,whichcanalloriginateinphotochem- uesattheeasternedgeofthecloud,illuminatedbyσOri,andin istry,andpositivecontributionsfromCCH, C18O 13COandSO, thestar-formingregionNGC2024. which are usually associated with more shielded gas although CCH can also be bright in UV illuminated regions. Its compo- Thefourthprincipalcomponentisparticularinthefactthat nentmapshowsawideblueregionaroundNGC2024,similarto itsmapalmostonlyshowslarge(positiveornegative)valuesin thelargewarmdustregionseeninthedusttemperaturemapof theregionsoflargepositivevaluesofPC2.Itthushighlightfur- the region (Schneider et al. 2013). It could thus also be related ther chemical variations inside dense cores. This component is totheradiationfield,buttraceadifferentaspectfromPC3,char- completelydominatedbyoppositecontributionsof CH OHand N H+ and thus traces variations in the ratio of these tw3o lines. acterizedbylowerCCHintensitiesrelativetotheotherlines. 2 The remaining components are more difficult to interpret, The component map seems to highlight smaller size cores em- but tend to describe opposite variations in pairs of lines that beddedinsomeoftheclumpsrevealedbyPC2,andthusprob- ablyhighlightsthechemistryofthedensestcores(larger N H+ varied together in previous PC. PC7, 9 and 10 display oppo- 2 sitevariationsinpairsoflineofthegroupHCN,HNC,CNand to CH3OHratios). HCO+,whosevariationswerecorrelatedinthepreviousPCsin The fifth principal component shows positive contributions whichtheyhadlargeweights(PC1and6).PC8showsanticorre- ofsulfurspecies(CSandSO),and C18O,andnegativecontribu- latedvariationsofSOand C18O,anditscomponentmapshows tionsof 12CO,CCH,and CH OH.Itslargepositivevalueshigh- a striking spatial pattern with negative values (high SO/C18O 3 lightlargerscaleregionsembeddingthedensecloudsshownby ratios) in the Horsehead, the molecular gas at the base of the 7 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis 1.0 1.0 1.0 %) 0.5 N2H+ %) 0.5 CCH %) 0.5 CH3OH C2(10.9 00..05 HHNCCO+ C3(6.5 00..05 12CO NC2HH3+OH C4(5.9 00..05 13CO CCH P− 12CO P− P− N2H+ 1.0 1.0 1.0 − − − 1.5 1.0 0.50.0 0.5 1.0 1.5 1.5 1.0 0.50.0 0.5 1.0 1.5 1.5 1.0 0.50.0 0.5 1.0 1.5 − − − − − − − − − PC1(60.1%) PC2(10.9%) PC3(6.5%) 1.0 1.0 1.0 0%) 0.5 SO 3%) 0.5 CCH CSO18O 0%) 0.5 CN C18O 5. 0.0 3. 0.0 2. 0.0 CCH 5( N2H+ 6( CH3OH 7( HCN SO PC−0.5 CCH CH3OH PC−0.5 CN PC−0.5 HNC 1.0 1.0 1.0 − − − 1.5 1.0 0.50.0 0.5 1.0 1.5 1.5 1.0 0.50.0 0.5 1.0 1.5 1.5 1.0 0.50.0 0.5 1.0 1.5 − − − − − − − − − PC4(5.9%) PC5(5.0%) PC6(3.3%) 1.0 1.0 1.0 %) 0.5 %) 0.5 %) 0.5 C8(1.9 00..05 HNHCCNSO C18CON C9(1.7 00..05 SOCHNCNHCN18CO C10(1.4 00..05 HHCCCNO18+OSO HNC P− P− P− 1.0 1.0 1.0 − − − 1.5 1.0 0.50.0 0.5 1.0 1.5 1.5 1.0 0.50.0 0.5 1.0 1.5 1.5 1.0 0.50.0 0.5 1.0 1.5 − − − − − − − − − PC7(2.0%) PC8(1.9%) PC9(1.7%) 1.0 1.0 9%) 0.5 CS 4%) 0.5 logNH2 C11(0. 00..05 HCO+ C18O HCN C12(0. 00..05 C1138COO H12CCOCO+S llooggnUH/U¯ P− P− 1.0 1.0 (cid:0) (cid:1) − − 1.5 1.0 0.50.0 0.5 1.0 1.5 1.5 1.0 0.50.0 0.5 1.0 1.5 − − − − − − PC10(1.4%) PC11(0.9%) Fig.7.Correlationwheels,showingtheinitiallineintensitiesasvectorhavingascoordinatestheircorrelationcoefficientstoeach PC,representedintheplanesofsuccessivepairsofPCs.Uncertaintiesfromourbootstrapinganalysis(seeSect.4.3)arepresentedas thinblackcontoursaroundthearrows’heads(isodensitycontourscontaining68%ofthedistribution).Alsorepresentedincolored (cid:16) (cid:17) arrowsarethecorrelationsofourindependentphysicalparameterswiththePCs(red:logN ,green:logn ,blue:log U/U¯ ). H2 H Horsehead, and the small scale clumps in NGC2024, and pos- lightedregion,thushavinglittleweightinthecorrelationmatrix. itive values (low SO/C18O ratios) in a dense filament stretch- PC12iscompletelyconstrainedbyorthogonalitytotheprevious ing away from NGC2024. PC11 is strongly dominated by CS, PCsandisthusonlyaconsequence. and thus shows specific variations of CS, mostly uncorrelated with the other lines (somewhat anticorrelated with C18O), and thatwerenotdescribedbythepreviousPCs.Itscomponentmap 4.3. Studyingtheeffectofnoise shows small scale spots of positive values, mostly surrounding NGC2024andNGC2023.Thefactthatitappearssolateinthe Thenoisynatureofourdatacanhavetwokindsofeffects.Itcan decomposition can be explained by the small size of the high- firstinducevariabilityinourresults(theresultswouldvaryfora differentrealizationoftherandomnoise).Weverifiedthestabil- 8 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis PC1 60.1% PC2 10.9% PC3 6.5% PC4 5.9% -14.3 0.0 14.3 -9.2 0.0 9.2 -3.0 0.0 3.0 -5.1 0.0 5.1 PC5 5.0% PC6 3.3% PC7 2.0% PC8 1.9% -3.4 0.0 3.4 -1.9 0.0 1.9 -1.5 0.0 1.5 -1.8 0.0 1.8 PC9 1.7% PC10 1.4% PC11 0.9% PC12 0.4% -1.4 0.0 1.4 -1.1 0.0 1.1 -1.6 0.0 1.6 -0.6 0.0 0.6 40 30 20 ) (0 y 10 δ 0 10 − 20 − 40 30 20 10 0 δx( ) 0 Fig.8.Principalcomponentmaps.Thesemapsrepresentthevalueofeachobservedpixelwhentheyareprojectedinthespaceof theprincipalcomponents. 9 PierreGratieretal.:DissectingthemolecularstructureoftheOrionBcloud:InsightfromPrincipalComponentAnalysis ityofourresultsbyusingabootstrappingmethod.Bootstrapping definedyoungmassivestars(seediscussioninPetyetal.2016). is a method of choice to compute uncertainties on an estima- This allowed us to infer a link between the first three princi- tor (here the PCA components) when the distribution of esti- palcomponentsandphysicalquantitiessuchasthecolumnden- matorvaluescannotbeassumedtofollowasimpledistribution sity,thevolumedensity,andUVillumination.Inthissection,we (Feigelson & Babu 2012). The idea of bootstrapping is to use willquantitativelyassertthesepotentialrelationsbystudyingthe a Monte Carlo method to create new resampled datasets of the correlationofeachcomponentmapwithasetofindependently samesizeastheoriginaldatasetbysamplingwithreplacement measuredmapsofphysicalparameters. fromtheoriginaldataset.Weconstruct5000suchbootstrapped datasetsandrunthePCAalgorithmoneach.BecausethePCA 5.1. Independentmeasureofthephysicalquantities is invariant through the change of sign of the PCs, we ensure thatthesignsareconsistentbeforecomputingthedistributionof The goal of this section is to find the principal component that thePCcoefficientstoavoidoverestimatingtheuncertainty.The is best associated to each of the physical parameters, not to as- resultsarepresentedbothfortheeigenspectrainFig.6,andthe sign an absolute physical meaning to some of the components. correlationwheelsinFig.7. Itisthereforenotnecessarytohaveabsolutevaluesoftheinde- ThePCcoefficientsappearoverallverystable,thevariances pendentlymeasuredphysicalparametersmaps.Onlytherelative beingcompletelynegligibleforPC1and2,andverysmallfor variationofeachphysicalparameterisrequiredtocomputethe PC 3 to 6. Only PC 7 and 8 show significant variability, with correlationcoefficient.Figure9showsthedifferentmapsofthe somecoefficientschangingsign.ThiscanbeunderstoodasPCA physical quantities that we will correlate with the first 3 prin- results are particularly sensitive to noise when two PCs corre- cipalcomponents. Thissection describeshow thesemaps were spondtoverycloseeigenvalues,andPC7and8havetheclosest obtained. eigenvalueswithrespectively2%and1.9%ofthetotalcorrela- tion. Indeed, PCs with equal eigenvalues are degenerate in the sensethatanybasisofthesubspacetheydefinesatisfiesthedef- 5.1.1. Columndensity inition of PCA. As a result, when eigenvalues are not exactly The dust column density map is from the Hershel Gould Belt equalbutveryclose,thenoisecanresultinarandomrotationof Survey(PI:P.Andre)OrionBmap(Andre´etal.2010;Schneider thisgroupofPCsinsidetheirsubspace.Ourresults,whichfocus etal.2013)2.Thismapwasobtainedbyfittingthefarinfrared onthefirstfewPCs,arethusunaffectedbynoisevariability. Spectral Energy distribution by greybodies. We apply a loga- Thesecondpossibleeffectofnoiseistobiastheresults.PCA rithmic scaling to the data to reduce the dynamical range, the isunbiasedifthenoiseisspherical(i.e.,hasequalvarianceinall resultingmapsisplottedintheleftpanelofFig.9. directions)inthefinaldatasetonwhichPCAisapplied(i.e.af- terstandardizationinourcorrelation-basedvariant).Inthiscase, thenoisecanonlyhidethelowestPCs(thatdescribevariations 5.1.2. Volumedensity smallerthanthenoiselevel)andmakethemdegenerate.Inour casehowever,thenoiselevelsonthedifferentmolecularlinesare Volume density is a difficult quantity to measure because one needs both a mass estimate and an associated volume. Density initiallyclosebutnotequal(variationsbyafactorof3atmost, isthusdependentonthescalethatitiscomputedat.Weusethe seeTable1).Thenon-linearreparametrizationkeepstheserela- catalogofcoresidentifiedandcharacterizedinKirketal.(2016). tive variances. Finally, the last normalization step (by the stan- We compute masses from each cloud’s 850µm flux using their dard deviation) gives final noise variances proportional to the equation (3). To do this, we assume a common temperature of ratios of noise standard deviation to total standard deviation of the reparametrized intensities. These differ by up to a factor of 17Kforallclouds.Fromthismassandtheirobservedsizeesti- mateswecomputeavolumedensityforeachofthedensecores 14.8betweenthelines,andpossiblebiasesmaybepresentinour inourobservedfieldofview.Inthiscasecorrelationcannotbe results, giving higher weight to the lines with the largest ratios carried out over the full map but we correlate the density mea- of noise variance to total variance. However, it wasn’t possible sured for each core with the value of the principal components withthePCAmethodtobothavoidgivinghigherweighttothe measuredinthenearestpixel.Thedataisshownasascatterplot brightestlines(whichledustousethecorrelation-basedPCA), inthemiddlepanelofFig.9. andatthesametimeensureequalnoiseonallvariables.Wenote thatthepreviousPCAstudiesofmolecularcloudswerelesscon- cerned by noise-induced bias as they only used lines that were 5.1.3. UVradiationfield clearlydetectedinallpixels.Ourchoiceallowedustoperform theprincipalcomponentanalysisonthefullregion,whichledto We compute the UV radiation field by using the fact that PAH the identification of two PCs associated with dense core chem- emissivityisroughlyconstantperunitHandunitradiationfield istry. (Draine & Li 2007). In practice, we use the WISE (Meisner & Finkbeiner 2014) 12µm maps divided by the column density clipped to a maximum value of 1022cm−2. We do not claim to 5. Correlationoftheprincipalcomponentmaps have an absolute value of the UV radiation field but a quantity withindependentlymeasured physical that should be proportional to it. The quantity log(U/U¯) where U¯ is the mean value of U is shown in the rightmost panel of parametersmaps Fig.9. In the previous section, we combined two sources of informa- TheproperwaytocomputetheUVradiationfieldfromPAH tion to interpret the main principal components: 1) astrochem- emission would be to divide by the volume density but as we istryteachesusthatsomemoleculestracecertainphysicalcon- discussedinthepreviousparagraph,itisnotpossibletogetafull ditions,2)OrionBisanextremelywellstudiedsource,implying map of volume density. We choose to use column density as a that the spatial structure of the source is well known. For in- stance,themolecularcloudisknowntobeilluminatedbywell- 2 http://www.herschel.fr/cea/gouldbelt/en/ 10