ebook img

An analysis of feature relevance in the classification of astronomical transients with machine learning methods PDF

2.6 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview An analysis of feature relevance in the classification of astronomical transients with machine learning methods

Mon.Not.R.Astron.Soc.000,1–??(2015) PrintedJanuary18,2016 (MNLATEXstylefilev2.2) An analysis of feature relevance in the classification of astronomical transients with machine learning methods A. D’Isanto1,2(cid:63), S. Cavuoti3, M. Brescia3, C. Donalek4, G. Longo1, G. Riccio3, 6 S. G. Djorgovski4,5. 1 0 1Department of Physical Sciences, University of Napoli Federico II, via Cinthia 9, 80126 Napoli, ITALY 2 2Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, GERMANY 3INAF - Astronomical Observatory of Capodimonte, via Moiariello 16, 80131 Napoli, ITALY n 4Center for Data Driven Discovery, California Institute of Technology, 1200 E. California Blvd., 91125 Pasadena, USA a 5Department of Astronomy, California Institute of Technology, 1216 East California bvd., Pasadena, CA 91125, USA J 5 1 ] Accepted.Received;inoriginalform M I . ABSTRACT h The exploitation of present and future synoptic (multi-band and multi-epoch) sur- p veys requires an extensive use of automatic methods for data processing and data - o interpretation. In this work, using data extracted from the Catalina Real Time Tran- r sientSurvey(CRTS),weinvestigatetheclassificationperformanceofsomewelltested t s methods:RandomForest,MLPQNA(MultiLayerPerceptronwithQuasiNewtonAl- a gorithm) and K-Nearest Neighbors, paying special attention to the feature selection [ phase. In order to do so, several classification experiments were performed. Namely: 1 identification of cataclysmic variables, separation between galactic and extra-galactic v objects and identification of supernovae. 1 Key words: methods: data analysis - stars: novae, cataclysmic variables - stars: 3 9 supernovae: general - stars: variable: general - stars: variables: RR Lyrae 3 0 . 1 0 1 INTRODUCTION dedicatedfollow-up’sandthereforeitwillbecomecrucialto 6 disentangle potentially interesting events from lesser ones. 1 The advent of a new generation of multi-epoch and With data volumes already in the terabyte and petabyte : multi-band (synoptic) surveys has opened a new era in as- v domain, the discrimination of time-critical information has tronomyallowingtostudywithunprecedentedaccuracythe i already exceeded the capabilities of human operators and X physicalpropertiesofvariablesources.Thepotentialofthese also crowds of citizen scientists cannot match the task. A newdigitalsurveys,bothintermsofnewdiscoveriesaswell r viable approach is therefore to automatize each step of the a asofabetterunderstandingofalreadyknownphenomena,is dataacquisition,processingandunderstandingtasks.Inthis huge.Forinstance,theCatalinaReal-TimeTransientSurvey work,with”dataunderstanding” wemeantheidentification (CRTS,Drakeetal.2009)inlessthan8yearsofoperation, oftransientsandtheirclassificationintobroadclasses,such enabled the discovery of ∼ 2400 SN, ∼ 1200 CV, ∼ 2800 as periodic vs non periodic, supernovae, Cataclysmic Vari- AGN, as well as to identify brand new phenomena such as ables (CV) stars, etc. binary black holes (Graham et al. 2015) and peculiar types ofsupernovae(Drakeetal.2010).Adiscoverytrendwhichis Manyeffortshavebeenmadetoapplyavarietyofma- expectedtocontinueandevenincreasewhennewobserving chine learning (ML) methods to classification problems (du facilities such as the Large Synoptic Telecope (LSST, Clos- Buisson et al. 2014; Goldstein et al. 2015; Rebbapragada sonFerguson2015),andtheSquareKilometerarray(SKA, 2014; Wright et al 2015). Yahya et al. 2015) become operational. Real time analysis can be performed using different With these new instruments, however, both size of the methods, among which we shall just recall those based on dataandeventdiscoveryratesareexpectedtoincrease,from Random Forest (RF; Breiman 2001) and on Hierarchical the current ∼10−102 events per night, up to ∼105−107. Classification (Kitty et al. 2014). Only a small fraction of these events will be targeted by Off-lineclassification,beinglesscriticalintermsofcom- putingtime,canbeperformedwithmanydifferenttypesof classifiers. It is common practice to distinguish between su- (cid:63) E-mail:[email protected] pervised and unsupervised methods, depending on whether (cid:13)c 2015RAS 2 D’Isanto et al. 2015 square degrees of the sky with the main goal of discover- ing rare and interesting transient phenomena. The survey utilizes data taken in only one band (V) by the three ded- icated telescopes of the highly successful Catalina Sky Sur- vey (CSS) NEO project and detects and openly publishes all transients within minutes of observation so that all as- tronomers may follow ongoing events. The sample used in the present work consists of the light curvesofobjectswhosenaturewasconfirmedwithspectro- scopic or photometric follow-up’s, and it is composed by: • Cataclysmic Variables - CV (461 objects); • Supernovae - SN (536 objects); • Blazar - Bl (124 objects); • Active Galactic Nuclei - AGN (140 objects); • Flare Stars - Fl (66 objects); Figure1.AnadaptedversionoftheschemepresentedinDubath • RR Lyrae - RRL (292 objects). (2012)forageneralclassificationofvariableobjects. 2.1 Photometric features apreviouslyclassifiedsampleisorisnotusedforthetrain- The ability to recognize and quantify the differences ing phase. In the supervised category we have, for instance, between light curves with ML methods, requires many in- Bayesian Network (Castillo et al. 1997), Support Vector stances of light curves for each class of interest. As exten- Machines (SVM, Chang & Lin 2011), K-nearest neighbors sivelydiscussed(cf.Donaleketal.2013;BloomandRichards (KNN,Hastieetal.2001),RandomForest(Breiman2001), 2011; Graham et al. 2012; Wright et al 2015), in analysing and Neural Networks (McCulloch and Pitts 1943). While astronomical time series, it is crucial to extract from the in the unsupervised family we mention Gaussian Mixture light curves a proper set of features. Since light curves are Modeling (GMM, McLachlan and Peel 2001), and Self- usually unevenly sampled, and not all instances of a cer- Organizing Maps (SOM, Kohonen 2007). tainclassareobservedwiththesamenumberofepochsand In this work we shall focus on off-line classification, S/Nratio,theuseofthelightcurvesthemselvesforclassifi- making use of three different machine learning methods, cation purposes is therefore challenging, both conceptually namely: the Multi Layer Perceptron with Quasi-Newton and computationally. Therefore, the data need to be ho- Algorithm (MLPQNA, Brescia et al. 2012), the Random mogenized by transforming each light curve into a vector Forest (RF, Breiman 2001) and the K-Nearest Neighbors of real-number features generated using statistical and/or (KNN, Hastie et al. 2001). Most of the presented work was model-specific fitting procedures. performed in the framework of the Data Mining & Explo- In this work we used the Caltech Time Series Charac- ration Web Application REsource (DAMEWARE, Brescia terization Service (CTSCS), a publicly offered web service et al. 2014) infrastructure and the PhotoRaptor public tool (Graham et al. 2012), to derive from a given light curve a (Cavuoti et al. 2015). rathercompletesetoffeaturescapabletocharacterizeboth The paper is structured as it follows: in Section 2 we periodic (Richards et al. 2011; Debosscher et al. 2007) and present the data and introduce the features extracted for non periodic behaviors. the analysis. In Section 3 we briefly describe the machine Amongthemanypossiblefeaturesprovidedbytheser- learning methods used for the experiments detailed in Sec- vice, we used those listed below. tion 4. Results are discussed in 5. • Amplitude (ampl): the arithmetic average between the maximum and minimum magnitude; mag −mag ampl= max min (1) 2 THE DATA 2 • Beyond1std (b1std): the fraction of photometric points In what follows we shall divide objects according to a ((cid:54)1) above or under a certain standard deviation from the simplifiedversion(seeFig.1)ofthesemantictreedescribed weighted average (by photometric errors); in Eyer and Mowlavi (2007). From this scheme it emerges quitenaturally,theneedtosplittheclassificationtaskinat b1std=P(|mag−mag|>σ) (2) least three steps (e.g. Dubath 2012). In the first step, vari- • Flux Percentage Ratio(fpr):thepercentileisthevalue able objects (the transients) are disentangled from normal, of a variable under which there is a certain percentage of non variable stars. In the second step, periodic objects are lightcurvedatapoints.ThefluxpercentileF wasdefined separatedfromnon-periodicobjectsand,finally,inthethird n,m asthedifferencebetweenthefluxvaluesatpercentilesnand and last step, one can proceed to the final classification of m. The following flux percentile ratios have been used: the objects. Inthisworkwemakeuseof1,619lightcurvesextracted fpr20=F /F 40,60 5,95 from the Catalina Real-Time Transient Survey (CRTS, fpr35=F /F 32.5,67.5 5,95 Drakeetal.2009)publicarchive.CRTSisasynopticastro- fpr50=F /F 25,75 5,95 nomicalsurveythatrepeatedlycoversthirtythreethousand fpr65=F /F 17.5,82.5 5,95 (cid:13)c 2015RAS,MNRAS000,1–?? Classification of astronomical transients 3 fpr80=F /F • Standard deviation(std):thestandarddeviationofthe 10,90 5,95 fluxes. • Lomb-Scargle Periodogram(ls):theperiodobtainedby thepeakfrequencyoftheLomb-Scargleperiodogram(Scar- gle 1982); • Linear Trend (lt): the slope of the light curve in the 3 THE METHODS linear fit, that is to say the a parameter in the following linear relation: As it was said before, this work aims to classify tran- sients using a machine learning approach based on the use mag=a∗t+b (3) of various methods: MLPQNA, RF and KNN. MLPQNA stands for the classical Multi-Layer Percep- tron model implemented with a Quasi Newton Approxima- lt=a (4) tion (QNA) as learning rule (Byrd et al. 1994). This model has already been used to deal with astrophysical problems • Median Absolute Deviation (mad): the median of the anditisextensivelydescribedelsewhere(Bresciaetal.2012; deviation of fluxes from the median flux; Cavuoti et al. 2014). mad=mediani(|xi−medianj(xj)|) (5) RF stands instead for Random Forest, a widely known ensemblemethod(Breiman2001),whichusesarandomsub- • Median Buffer Range Percentage (mbrp): the fraction set of data features to build an ensemble of decision trees. of data points which are within 10% of the median flux; Our implementation makes use of the public library scikit- mbrp=P(|x −median (x )|<0.1∗median (x )) (6) i j j j j learn(Pedregosaetal.2011).Thismethodhasbeenchosen • Magnitude Ratio(mr):anindexusedtoestimateifthe mainly because it provides for each input feature a score object spends most of the time above or below the median of importance (rank) measured in terms of its contribution of magnitudes; percentage to the classification results. KNN is the well known k-Nearest Neighbors method mr=P(mag>median(mag)) (7) (Hastie et al. 2001), widely used both for classification and • Maximum Slope (ms): the maximum difference ob- regression.Inthecaseofclassification,ittriestoclassifyan tained measuring magnitudes at successive epochs; objectbyamajorityvoteofitsneighbors,andtheobjectis thenassignedtothemostcommonclassamongitsknearest (mag −mag ) ∆mag ms=max(| i+1 i |)= (8) neighbors. (t −t ) ∆t i+1 i The analysis of the results of the experiments is based • Percent Amplitude (pa): the maximum percentage dif- on the so-called confusion matrix (Provost et al. 1998), a ferencebetweenmaximumorminimumfluxandthemedian; widelyusedclassificationperformancevisualizationmatrix, pa=max(|x −median(x)|,|x −median(x)|) (9) where columns represent the instances in a predicted class, max min androwsgivetheexpectedinstancesintheknownclasses.In • Percent Difference Flux Percentile (pdfp): the differ- a confusion matrix defined as in Tab. 1 the quantities are: ence between the second and the 98th percentile flux, con- TP: true positive, TN: true negative, FP: false positive, verted in magnitudes. It is calculated by the ratio F on 5,95 FN: false negative. median flux; By combining such terms, it is then possible to derive (mag −mag ) the following statistical parameters (in brackets the label pdfp= 95 5 (10) median(mag) that will be used in the tables): • Pair Slope Trend (pst): the percentage of the last 30 • overall Efficiency (Eff): the ratio between the number couples of consecutive measures of fluxes that show a posi- ofcorrectlyclassifiedobjectsandthetotalnumberofobjects tive slope; in the data set; pst=P(x −x >0,i=n−30,...,n) (11) TP +TN i+1 i Eff = (15) TP +FP +FN +TN • R Cor Bor(rcb):thefractionofmagnitudesthatisbe- low 1.5 magnitudes with respect to the median; • class Purity (Pur1 and Pur2): the ratio between the numberofcorrectlyclassifiedobjectsofaclassandthenum- rcb=P(mag>(median(mag)+1.5)) (12) berofobjectsclassifiedinthatclass,alsoknownasefficiency • Small Kurtosis (sk): the kurtosis represents the depar- of a class; ture of a distribution from normality and it is given by the TP ratio between the 4th order momentum and the square of Pur1= (16) TP +FP the variance. For small kurtosis it is intended the reliable kurtosis on a small number of epochs; TN sk= µ4 (13) Pur2= FN +TN (17) σ2 • Skew(skew):theskewnessisanindexoftheasymmetry • class Completeness(Comp1 andComp2):theratiobe- of a distribution. It is given by the ratio between the 3rd tweenthenumberofcorrectlyclassifiedobjectsinthatclass order momentum and the variance to the third power; andthetotalnumberofobjectsofthatclassinthedataset; µ skew= 3 (14) TP σ3 Comp1= (18) TP +FN (cid:13)c 2015RAS,MNRAS000,1–?? 4 D’Isanto et al. 2015 OUTPUT andthereforerepresentacompletelydifferentcategorywith - class1 class2 respect to active galactic nuclei; TARGET class1 TP FN • SN vs ALL,whereALLincludesAGN, Bl, CV, Fl and class2 FP TN RRL types. Table 1.Structureoftheconfusionmatrixforatwoclassesex- Foreachclassificationexperimentweadoptedthesame periment. The interpretation of the symbols is self explanatory. strategy. First of all, we run a RF experiment using all 20 Forinstance,TP denotesthenumberofobjectsbelongingtothe features described in Sec. 2.1, in order to obtain a feature class1whoarecorrectlyclassified. importance ranking (i.e. the relevance of each feature to theclassificationexpressedintermsofinformationentropy). The results of the RF experiment allowed us to select dif- TN Comp2= (19) ferent groups of features (ordered by ranking), to be used FP +TN for a second set of binary classification experiments per- • class Contamination:itisthedualofthepurity.Namely formed with MLPQNA, RF and KNN. Finally, using the it isthe ratiobetweenthe number ofmisclassified objectin bestsetoffeatures,weperformedanheuristicoptimization a class and the number of objects classified in that class. oftheMLPQNAparameters(i.e.complexityofthenetwork Since easily derivable from the purity percentages, it is not topologyaswellastheQuasi-Newtonlearningdecayfactor), explicitly listed in the results; aimed at improving the classification results. • Matthews Correlation Coefficient (MCC): it is an in- We then froze the topology of the MLPQNA using 1 dexusedasaqualitymeasureforatwo-classclassification.It hidden layer, while for the RF we chose a 10,000 trees con- takesintoaccountvaluesderivedfromtheconfusionmatrix, figuration, and finally for the KNN we chose k = 5. More- and can be used also if the classes are very unbalanced. It over, we always applied a 10-fold cross validation (Geisser can beregarded as a correlation coefficient between the ob- 1975), in order to obtain statistically more robust results servedandpredictedbinaryclassification,returningavalue (i.e. to avoid any potential occurrence of overfitting in the between-1and1.Where-1indicatestotaldisagreementbe- trainingphase).Intermsofperformanceevaluation,itisim- tween prediction and observation, 0 indicates random pre- portant to underline that we were mostly interested to the diction, and 1 stands for a perfect prediction (Matthews classification purity percentages. Therefore these indicators 1975). have been primarily evaluated to assign the best results. TP ×TN−FP ×FN MCC= (cid:112)(TP +FP)(TP +FN)(TN+FP)(TN+FN) 4.1 Multi-class (20) Weperformedthemulti-classclassificationexperiment, These parameters can be used to describe completely to understand the behavior of the classifiers in the most the distribution of the blind test patterns after training. complexsituation,i.e.consideringsimultaneouslyallthesix Moreover,inordertocomparethethreeclassifiersused, available variable object categories. Therefore, as explained we also derived the Receiver Operating Characteristic or above, we performed a preliminary experiment using the ROC curve plots for the most significant experiments. A RF model with all available input features, thus obtaining ROCcurveisagraphicaldiagramshowingtheclassification thefeatureimportancerankingforthistypeofclassification performancetrendbyplottingthetruepositiverateagainst (Fig. 2). The feature ranking, in fact, is automatically pro- thefalsepositiverateastheclassificationthresholdisvaried videdbytheRFclassifier,whichassignsascoretoallinput (Hanley and McNeil 1982). The overall effectiveness of the features, corresponding to their relevance assumed to build algorithm is measured by the area under the ROC curve, the decision rules of the trees during the training phase. where an area of 1 represents a perfect classification, while Such information indeed is suitable to judge the weight of an area of .5 indicates a useless result. each individual feature in the decision process and to eval- uate its eventual redundancy in terms of contribution to the learning. One useful way to exploit the feature rank- ing is to engage a training/test campaign, by sequentially 4 CLASSIFICATION EXPERIMENTS addingfeaturestothetrainingparameterspace(inorderof their importance) and evaluating the training results, until We performed the following classification experiments: the classification performance reaches a plateau. The final • multi-class (six-class), in which the whole catalog, in- outcome of such campaign is the best compromise between cludingallthesixclasses,wasseparatelyconsidered,inorder theparameterspacedimensionandtheclassificationperfor- toinvestigatethecapabilitytocorrectlydisentangleatonce mance. After such preliminary analysis, we then submitted all the given categories of variable objects; the dataset to the RF, MLPQNA and KNN classifiers, by • CataclismicVariables(CV)vsALL,wherethecategory using respectively all, the first 5 and the first 3 features of ALL includes AGN, SN, Fl, Bl types. Here the RRL type the ranking list in order of importance. A statistical evalu- was not considered; ation of the classification results is reported in Tables A4, • Extra-Galactic (AGN and Bl types) vs Galactic (CV, A5 and A6, while the ROC curves for each class are shown SN and Fl types), to search for an improvement with re- in Fig. 8. From these results it appears evident the worst spect to the previous separation. The inclusion of SN type behavioroftheKNNmodelwithrespecttotheotherclassi- in the Galactic class is motivated by the fact that, even fiers. In terms of class purity, the best behavior is obtained though mainly observed in external galaxies, they are stars by the RF model using all available features. (cid:13)c 2015RAS,MNRAS000,1–?? Classification of astronomical transients 5 MULTI-CLASSCLASSIFICATION 13 11 e c n a 9 t r o p m 7 i e ur 5 t a e F 3 1 lt std ampl ls skew pa mbrp ms pdfp mad sk fpr20 fpr50 fpr35 fpr65 b1std fpr80 mr pst rcb Features Figure 2.FeatureimportancelistobtainedbytheRFinthecaseofthesix-class experiment,withtheimportancepercentageforeach feature. 4.2 Cataclysmic Variables vs ALL mentsusingthefirst5,10,andall features,byapplyingall three ML models. We started by performing an experiment using the RF In addition, we performed one additional experiment, modelandallselectedfeatures.Thedatasetwascomposed using the 5 features which were selected as most relevant by 461 CV and 866 ALL objects. Results are shown in fortheCV vs ALLclassificationcase.Resultsarepresented Tab. A5, while the feature ranking is given in Fig. 3. in Tables A7, A8 and A9, while the related ROC curves Following the feature ranking evaluation strategy, we areshowninFig.9.Bestclassificationperformanceresulted performed a series of experiments using the MLPQNA, RF with,respectively,5featuresforMLPQNA(ls,lt,ms,b1std and KNN models using different groups of features taken andpa)and10featuresforRFandKNNmodels(ls,lt,ms, inorderofimportance:respectively,thefirst3,5,6,9,10,11 b1std, pa, skew, sk, fpr20, std, mbrp). groups and all the 20 features listed in Fig. 3. Inmostcases,groupsdifferingbyasmallnumberoffea- tures (e.g. 5 and 6) led to results with similar performance 4.4 Supernovae vs ALL and,inthesecases,weretainedasrepresentativethesmaller Finally, we performed experiments for Supernovae group,assumingthatthemostoftheinformationisalready (class 1), versus ALL (all other classes, labeled as class 2), contained into these groups. Therefore, in the following de- butinthiscaseweaddedtothesecondgroupalsothesixth scriptionofexperimentsweexplicitlyreporttheresultsonly class containing RR Lyrae, thus obtaining a sample of 536 for these relevant cases (see Tables A4, A5 and A6, as well SN and 1,083 ALL class objects. Again, we started from as the related ROC curves in Fig. 9). the feature importance evaluation shown in Fig. 5. From this series of experiments, it appears clear that, As it was already done in the previous cases, we regarding MLPQNA, the best configuration is achieved us- performed the classification experiments with the RF, ing only 5 features after the optimization of model param- MLPQNAandKNNmodels.Wereportheretheresultsob- eters (ampl, mbrp, std, lt and pa), while, for the RF, the tainedinthecasesof,respectively,thefirst3,5and10fea- best results were obtained by retaining all 20 features. Fi- turesintherankinglist.Moreover,weperformedadditional nally, the KNN, which is also the classifier with the worst experimentsusingthebestgroupof5featuresobtainedfrom performance, gives the best result using 6 features only. theCVvsALLexperiment(seeFig.3).Resultsforthethree experimentsarereportedinTablesA10,A11,A12andROC curves in Fig. 9. The best classification performance have 4.3 Extra-Galactic vs Galactic been obtained with, respectively, 10 features for RF model (lt, ls, pa, skew, ampl, ms, std, mr, fpr20, fpr35) and only 3 Alsointhecaseoftheclassificationexperimentrelated features for MLPQNA and KNN classifiers (lt, ls, pa). to264EXTRA-GALACTIC,hereaftercalledX-GAL,(AGN + Bl as class 1) patterns vs 1,063 GALACTIC, hereafter named GAL, (CV + SN + Fl as class 2) patterns, we 5 DISCUSSION first performed a feature ranking evaluation with the RF model,byusingallavailablefeatures(seeFig.4).Again,us- Fromtheexperimentspreviouslydescribed,wecanno- ing the ranking list and the same feature selection strategy tice that, in this context (as imposed by the structure of describedabove,weperformedareducednumberofexperi- the parameter space and the size of the data), the Random (cid:13)c 2015RAS,MNRAS000,1–?? 6 D’Isanto et al. 2015 CVvsALLCLASSIFICATION 11 9 e c n a t 7 r o p m i 5 e r u t a e 3 F 1 ampl mbrp std lt pa ls pdfp ms fpr20 fpr50 fpr35 skew fpr65 sk fpr80 mad b1std pst mr rcb Features Figure 3. Feature importance list obtained by the RF, with the importance percentage for each feature and for the CV vs ALL classification. X-GALvsGALCLASSIFICATION 15 13 e c 11 n a t or 9 p m i 7 e r u t 5 a e F 3 1 ls lt ms b1std pa skew sk fpr20 std mbrp pdfp fpr50 fpr35 ampl fpr80 fpr65 mad mr pst rcb Features Figure 4. Feature importance list obtained by the RF, with the importance percentage for each feature and for the X-GAL vs GAL classification. Forest performs on average slightly better than MLPQNA classification experiments, due to their periodic behavior, and objectively better than KNN. which introduces a very well defined signature in the data. The results presented in the previous paragraph show This has been also derived from the multi-class experiment that at least in presence of such a limited training set the results,showinghowtheRR Lyrae objectsareeasytoclas- six-class experiment is outperformed by the binary classi- sify,thusbeingnotrequiredtheirinclusion.Onlyinthecase fication experiments. The performance achieved by the RF of the SN vs ALL experiment, in order to be as general as and MLPQNA models for the classes which are more rele- possible, we re-introduced the RR Lyrae category. vant for our work, for instance SNs and CVs categories, led A first interesting result is that, in spite of the ranking us to investigate two cases of binary classification, respec- orders obtained for the different experiments and of the re- tively, SN vs ALL and CV vs ALL. Furthermore, we ap- sultsassignedasbest,inallcasesanaccuracyabove80%of proachedalsothepossibilitytoencloseBlazarsandAGNin efficiencyisobtainedusingthesame5mostrelevantfeatures asingleclasscomparedwithothercategories,thusobtaining of the experiment CV vs ALL (ampl, mbrp, std, lt and pa). a third binary classification experiment, named X-GAL vs Thiscanbeunderstoodbycomparingthefirstfivepositions GAL. We removed the RR Lyrae category from the binary of the ranking list obtained from the RF for all classifica- (cid:13)c 2015RAS,MNRAS000,1–?? Classification of astronomical transients 7 SNvsALLCLASSIFICATION 23 21 19 e c 17 n a t 15 r o p 13 m i 11 e ur 9 t a 7 e F 5 3 1 lt ls pa skew ampl ms std mr fpr20 fpr35 fpr50 fpr65 pdfp sk fpr80 b1std mad pst mbrp rcb Features Figure 5. Feature importance list obtained by the RF, with the importance percentage for each feature and for the SN vs ALL classification. tioncases,asreportedinFigures3,4and5.Infact,wecan CVvsALL Size Fraction notice that among the first five features of Fig. 3, there are two (lt and pa) in common with other cases, while the two Totaltestobjects 266 - features ampl and ls are in common between two groups of MLPQNAEff 224 84% features(Figures3and5).Moreover,thefeaturestd isoften RFEff 231 87% presentwithinthebestgroupsamongdifferentexperiments. KNNEff 199 75% ConcerningtheMCC,thisvalueisalmostalwaysabove 0.50fortheMLPQNAandRF.Infact,justoneexperiment (MLPQNA&RF&KNN)equallyclassified 189 71% shows an MCC below this value, while the best one is 0.74. (MLPQNA&RF)Eff 216 89% Therefore, we can conclude that the observed classification (MLPQNA&KNN)Eff 177 90% withthesethreeclassifiers,isclosetotheexpectedone,and that the model shows a proper behavior. (RF&KNN)Eff 184 90% The three classifiers perform differently on different (MLPQNA&RF&KNN)Eff 174 92% types of objects and, as usual in classification experiments, this implies that the overall performance can be increased Table 2. Statistical analysis on the test output for the best ex- bycombiningtheoutputofthethreemodels.Toverifythis periments of CV vs ALL classification for the three models (5∗ in Tab. A4 for the MLPQNA, 20 in Tab. A5 for the RF, and 6 hypothesis we analyzed the overall efficiency variation by inTab.A6fortheKNN).Thefirstrowreportsthetotalamount taking into account the objects classified by single models of test objects. Second, third and fourth rows indicate the over- andthoseequallyclassifiedbythecombinationofMLPQNA all efficiency obtained by the three models. While the fifth row andRF,MLPQNAandKNN,RFandKNN,andbyallthree reportsthenumberofobjectsequallyclassifiedbythethreemod- classifiers together. els (i.e. only the objects for which the three models provide the For this analysis, shown in Tables 2, 3 and 4, we per- same classification). Finally the last four rows report the overall formedexperimentsbyrandomlysplittingthecatalogueinto efficienciesreferredonlytotheequallyclassifiedobjects. a training and a blind test set, containing respectively the 80% and the 20% of the data. The increase in performance is quite evident. These results are also visualized as Venn- fail to separate unequivocally the classes, thus confirming diagrams in Fig. 6. that their combination is needed to achieve a proper clas- Therelevanceofthevariousfeaturesintheexperiments sification. Nevertheless the different roles played by the std can be better investigated by looking at their distributions. (panels d and e) in the experiments SN vs ALL and CV vs For the sake of clarity in Fig. 7 we show a few relevant ex- ALL (cf. figures 5 and 3, respectively) is confirmed by the amples. In panels a, b and c we show the distribution of histograms. the features lt, pa and ls for the SN vs ALL experiment GiventhepeculiarshapeoftheSN lightcurves,itisnot while in panel d and e, we show instead the distribution a surprise that in the experiment SN vs ALL, the lt has a the parameter std in the SN versus ALL and in the CV vs relevanceof24%followedinthirdpositionbypa witharel- ALL experiments. Finally, in panel f, we show the distribu- evanceof7.7%.ThefactthatinthisexperimenttheLomb- tion of the ampl feature in the CV vs ALL experiment. In Scargleindex(ls)isrankedsecond,mightseemstrangesince all cases, what appears evident is that individual features it is used as an indication of periodic behavior. The his- (cid:13)c 2015RAS,MNRAS000,1–?? 8 D’Isanto et al. 2015 (a) CV vs ALL (b) X-GAL vs GAL (c) SN vs ALL Figure6.Venndiagramsshowingalltheobjects(leftcolumn)andthecorrectlyclassifiedobjects(rightcolumn),basedonefficiency,for thethreedifferenttypesofclassificationinthethreeexperimenttypes.Theintersectionareasthenshowtheobjectsthatareclassified inthesamewaybydifferentmethods.ValuesaretakenfromTables2,3and4respectively. X-GALvsGAL Size Fraction SNvsALL Size Fraction Totaltestobjects 266 - Totaltestobjects 325 - MLPQNAEff 236 89% MLPQNAEff 278 85% RFEff 243 91% RFEff 288 89% KNNEff 224 84% KNNEff 241 74% (MLPQNA&RF&KNN)equallyclassified 223 84% (MLPQNA&RF&KNN)equallyclassified 238 73% (MLPQNA&RF)Eff 233 92% (MLPQNA&RF)Eff 271 90% (MLPQNA&KNN)Eff 211 92% (MLPQNA&KNN)Eff 220 89% (RF&KNN)Eff 216 93% (RF&KNN)Eff 229 90% (MLPQNA&RF&KNN)Eff 210 94% (MLPQNA&RF&KNN)Eff 218 91% Table3.Statisticalanalysisonthetestoutputforthebestexper- Table 4. Statistical analysis on the test output for the best ex- imentsofX-GAL vs GALclassificationforthethreemodels(5∗ perimentsofSNvsALLclassificationforthethreemodels(3∗in inTab.A7fortheMLPQNA,10inTab.A8fortheRF,and10 Tab.A10fortheMLPQNA,10inTab.A11fortheRF,and3in inTab.A9fortheKNN).Thefirstrowreportsthetotalamount Tab. A12 for the KNN). The first row reports the total amount of test objects. Second, third and fourth rows indicate the over- of test objects. Second, third and fourth rows indicate the over- all efficiency obtained by the three models. While the fifth row all efficiency obtained by the three models. While the fifth row reportsthenumberofobjectsequallyclassifiedbythethreemod- reportsthenumberofobjectsequallyclassifiedbythethreemod- els (i.e. only the objects for which the three models provide the els (i.e. only the objects for which the three models provide the same classification). Finally the last four rows report the overall same classification). Finally the last four rows report the overall efficienciesreferredonlytotheequallyclassifiedobjects. efficienciesreferredonlytotheequallyclassifiedobjects. (cid:13)c 2015RAS,MNRAS000,1–?? Classification of astronomical transients 9 (a) (b) (c) (d) (e) (f) Figure 7.Distributionofthelt (panela),pa (panelb),ls (panelc)andstd (paneld)inthecaseSNvsALLexperiment.Thediagram showsazoomedportionofthedistributiontobettervisualizetheregionofinterest.RedcolorisrelatedtoSN objects,darkgraycolor toALLclassobjects,whiledarkbrownshowstheoverlayareaofthehistogram.Panels(e)and(f):distributionofthe,respectively,std andampl featuresinthecaseCVvsALLexperiment.PurplecolorisrelatedtoCV objects,darkgrayrepresenttheALLclassobjects, whileindarkpurpleisshowntheoverlayareaofthehistogram. togram in panel c shows, however, that this is due to the space covered by the training sample, which as it has been fact that on average objects in the SN class (being non pe- discussedbefore,isstrictlydependingonthespecificsurvey. riodic) have a ls much smaller than the ALL class. ThecapabilitytodisentangleSN classobjectsthrough In the specific context of the CRTS, a completeness of themostrelevantselectedfeaturesappearsevidentbycom- ∼96% and a purity of 84% in the SN vs ALL classification paringthemamongeachother.Inparticularfromfigures10 experiment imply that the sample of candidate SNs pro- and11itispossibletolocatesub-regionsentirelypopulated ducedwithourmethod,wouldcorrectlyidentify∼2520out bySN typeobjects(thoselabeledasAintheplots),aswell of the 2631 confirmed SNs and would produce a sample of as regions characterized by a weak (labeled as B) or strong ∼420possiblyspuriousobjects.Theseresults,howevercan- (labeledasD)densityofSNtypeobjects.Thisimpliesthat, notbeeasilyextrapolatedtoothersurveys,sincetheperfor- besides the particular choice of the classifier, in the param- mance of the method depends drastically on the parameter eter space defined by the most relevant features there are (cid:13)c 2015RAS,MNRAS000,1–?? 10 D’Isanto et al. 2015 1.0 1.0 0.8 0.8 e e e Rat0.6 e Rat0.6 e Positiv e Positiv Tru0.4 Tru0.4 0.2 ROC curve for MLPQNA (area = 0.90) 0.2 ROC curve for MLPQNA (area = 0.93) ROC curve for RF (area = 0.94) ROC curve for RF (area = 0.95) ROC curve for knn (area = 0.83) ROC curve for knn (area = 0.84) 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate (a) CV (b) SN 1.0 1.0 0.8 0.8 e e e Rat0.6 e Rat0.6 e Positiv e Positiv Tru0.4 Tru0.4 0.2 ROC curve for MLPQNA (area = 0.89) 0.2 ROC curve for MLPQNA (area = 0.86) ROC curve for RF (area = 0.94) ROC curve for RF (area = 0.98) ROC curve for knn (area = 0.72) ROC curve for knn (area = 0.88) 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate (c) Bl (d) AGN 1.0 1.0 0.8 0.8 e e e Rat0.6 e Rat0.6 e Positiv e Positiv Tru0.4 Tru0.4 0.2 ROC curve for MLPQNA (area = 0.92) 0.2 ROC curve for MLPQNA (area = 0.99) ROC curve for RF (area = 0.92) ROC curve for RF (area = 1.00) ROC curve for knn (area = 0.84) ROC curve for knn (area = 0.95) 0.0 0.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 False Positive Rate False Positive Rate (e) Fl (f) RRL Figure 8. ROCcurves for the six-class classification for the three models used. In the case of the KNN model the curve was obtained bytakingintoaccountthelimitationsimposedbythealgorithm,whicharedeterminedbythechoiceofthenumberofnearestneighbors (inthiscase5neighborsinduce20%ofquantization). (cid:13)c 2015RAS,MNRAS000,1–??

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.