ebook img

Photometric Redshift Estimation for Quasars by Integration of KNN and SVM PDF

6.7 MB·
by  Bo Han
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Photometric Redshift Estimation for Quasars by Integration of KNN and SVM

ResearchinAstronomyandAstrophysicsmanuscriptno. (LATEX: HB.tex; printedonJanuary11,2016; 1:22) Photometric Redshift Estimation for Quasars by Integration of KNN and SVM ∗ 6 1 1 1 2 2 BoHan ,Hongpeng Ding ,YanxiaZhang ,YonghengZhao 0 2 n a 1 InternationalSchoolofSoftware,WuhanUniversity,Wuhan,430072,P.R.China; J 8 2 KeyLaboratoryofOpticalAstronomy,NationalAstronomicalObservatories,ChineseAcademyof ] Sciences,20ADatunRoad,ChaoyangDistrict,100012,Beijing,[email protected] M I Received2015June12;accepted . h p - o r AbstractThemassivephotometricdatacollectedfrommultiplelarge-scaleskysurveysoffer t s a significantopportunitiesformeasuringdistancesofcelestialobjectsbyphotometricredshifts. [ However,catastrophicfailureisstillanunsolvedproblemforalongtimeandexistsinthecur- 1 v rentphotometricredshiftestimationapproaches(suchask-nearest-neighbor).Inthispaper, 9 3 weproposeanoveltwo-stageapproachbyintegrationofk-nearest-neighbor(KNN)andsup- 7 portvectormachine(SVM)methodstogether.Inthefirststage,weapplyKNNalgorithmon 1 0 photometricdataandestimatetheircorrespondingz .Byanalysis,wefindtwodensere- phot . 1 gionswithcatastrophicfailure,oneintherangeofz ∈[0.3,1.2],theotherintherangeof 0 phot 6 z ∈[1.2,2.1].Inthesecondstage,wemapthephotometricinputpatternofpointsfalling 1 phot : into the two ranges from original attribute space into a high dimensional feature space by v i X GaussiankernelfunctioninSVM.Inthehighdimensionalfeaturespace,manyoutlierpoints r resultingfromcatastrophicfailurebysimpleEuclideandistancecomputationinKNNcanbe a identifiedbyaclassificationhyperplaneofSVM andfurtherbecorrected.Experimentalre- sultsbasedontheSDSS(theSloanDigitalSkySurvey)quasardatashowthatthetwo-stage fusion approach can significantly mitigate catastrophic failure and improve the estimation accuracyofphotometricredshiftsofquasars.The percentsin different|∆z| rangesand rms (rootmeansquare)errorbytheintegratedmethodare83.47%,89.83%,90.90%and0.192, respectively,comparedtotheresultsbyKNN(71.96%,83.78%,89.73%and0.204). Keywords: catalogs-galaxies:distancesandredshifts-methods:statistical-quasars:gen- eral-surveys-techniques:photometric ∗SupportedbytheNationalNaturalScienceFoundationofChina. 2 B.Hanetal. 1 INTRODUCTION Photometric redshifts are obtained by images or photometry. Compared to spectroscopic redshifts, they showtheadvantagesofhighefficiencyandlowcost.Especially,withtherunningofmultipleongoingmulti- band photometric surveys, such as SDSS (the Sloan Digital Sky Survey), UKIDSS (the UKIRT Infrared Deep Sky Survey) and WISE (the Wide-Field Infrared Survey Explorer), a huge volume of photomet- ric data are collected, which are largerthan spectroscopicdata by two or three ordersof magnitude.The massivephotometricdataoffersignificantopportunitiesformeasuringdistancesofcelestialobjectsbypho- tometric redshifts. However, photometric redshifts show the disadvantages of low accuracy compared to spectroscopic redshifts, and require more sophisticated estimation algorithms to overcome the problem. Researchers worldwide have investigated the photometric redshift estimation techniques in recent years. Basically, these techniques are categorized into two types: template-fitting models and data mining ap- proaches. Template-fitting model is the traditional approach for estimating photometric redshifts in as- tronomy.It extractsfeaturesfromcelestialobservationalinformation,suchas multibandvalues, andthen matches them with the designed templates constructed by theoretical models or real observations. With featurematching,researcherscanestimatephotometricredshifts.Forexample,Bolzonellaetal.(2000)es- timatedphotometricredshiftsthroughastandardSEDfittingprocedure,whereSEDs(spectralenergydis- tributions)wereobtainedfrombroad-bandphotometry.Wuetal.(2004)estimatedthephotometricredshifts ofalargesampleofquasarswiththeχ2minimizationtechniquebyusingderivedtheoreticalcolor-redshift relationtemplates.Rowan-Robinsonetal.(2008)proposedanapproachusingfixedgalaxyandquasartem- plates applied to data at 0.36-4.5 µm, and on a set of four infrared emission templates fitted to infrared excess data at 3.6-170µm. Ilbert et al. (2009) applied a template-fitting method (Le Phare) to calculate photometricredshifts in the 2-deg2 COSMOS field. Experimentalresults from the abovetemplate-fitting methodsshowedthattheirestimationaccuracyreliedonthetemplatesconstructedbyeithersimulationor realobservationaldata. Dataminingapproachesapplystatistics andmachinelearningalgorithmsona setoftrainingsamples and automatically learn complicated functionalcorrelationsbetween multibandphotometricobservations andtheircorrespondinghighconfidenceredshiftparameters.Thesealgorithmsaredata-drivenapproaches, ratherthantemplate-drivenapproaches.Theexperimentalresultsshowedthattheyachievedmuchaccurate photometricestimations in manyapplications.For example,Ball et al. (2008)applied a nearestneighbor algorithmtoestimatephotometricredshiftsforgalaxiesandquasarsusingSDSSandGALEX(theGalaxy EvolutionExplorer)datasets.Abdallaetal.(2008)estimatedphotometricredshiftsbyusinganeuralnet- workmethod.Freemanetal.(2009)proposedanon-linearspectralconnectivityanalysisfortransforming photometriccolorsto a simpler,morenaturalcoordinatesystem whereintheyappliedregressiontomake redshiftestimations.Gerdesetal.(2010)developedaboosteddecisiontreemethod,calledArborZ,toesti- matephotometricredshiftsforgalaxies.Wayetal.(2012)proposedanapproachbasedonSelf-Organizing- Mapping(SOM)toestimatephotometricredshifts.Bovyetal.(2012)presentedtheextremedeconvolution techniqueforsimultaneousclassificationandredshiftestimationofquasarsanddemonstratedthattheaddi- tionofinformationfromUVandNIRbandswasofgreatimportancetophotometricquasar-starseparation and essentially the redshift degeneracies for quasars were resolved. Carrasco et al. (2013) presented an PhotometricRedshiftEstimation 3 algorithmusingpredictiontreesandrandomforesttechniquesforestimatingphotometricredshifts,incor- poratingmeasurementerrorsinto the calculationwhilealso efficientlydealingwith missingvaluesin the photometricdata.Bresciaetal.(2013)appliedtheMultiLayerPerceptronwithQuasiNewtonAlgorithm (MLPQNA)toevaluatephotometricredshiftsofquasarswiththedatasetfromfourdifferentsurveys(SDSS, GALEX,UKIDSS,andWISE). Thoughtemplate-fittingapproachesanddataminingapproachescanroughlyestimatephotometricred- shifts,theybothsufferthecatastrophicfailureprobleminestimatingphotometricredshiftsofquasarswhen thespectroscopicredshiftislessthan3(Richardsetal.2001;Weinsteinetal.2004;Wuetal.2004).Zhang etal.(2013)practicallydemonstratedthatwithcross-matchedmultibanddatafrommultiplesurveys,such asSDSS,UKIDSSandWISE,k-nearestneighbor(KNN)algorithmcanlargelysolvethecatastrophicfail- ureproblemandimprovephotometricredshiftestimationaccuracy.Themethodbecomesmoreimportant as the developmentof multiple large photometricsky surveysand the comingof the age of astronomical bigdata.However,duringthedatapreparationprocess,weneedtocross-matchmultibandinformationof quasarsfrommultiplephotometricsurveys.Thenumberofmatchedquasarrecordsisfarlessthantheorig- inal quasar numberin a single survey.For example,there are 105,783quasar samples available in SDSS DR7.However,thenumberofcross-matchedsamplesfromSDSS,WISEandUKIDSSisonly24,089.The cross-matchedsampleisaroundonefourthofSDSSquasardata.Thisshortcominggreatlylimitstheappli- cationscopeofthisestimationapproachtoonlyasmallportionofcross-matchedquasarsobservedbyall surveys. Inthispaper,weproposeanoveltwo-stagephotometricredshiftestimationapproach,i.e.,theintegra- tionofKNN(k-nearestneighbor)andSVM(supportvectormachine)approaches,tomitigatecatastrophic failureforquasarsbyusingrelativefewbandattributesonlyfromasinglesurvey.Thepaperisorganized as follows. Section 2 describes the data used. Section 3 presents a brief overview of KNN, SVM and KNN+SVM.Section4givestheexperimentalresultsbyKNN+SVM.Theconclusionsanddiscussionsare summarizedinSection5. 2 DATA Our experimentsare based on a dataset generated from the Sloan Digital Sky Survey(SDSS; York et al. 2000),whichlabelshighlyreliablespectroscopicredshiftsandhasbeenwidelyusedinphotometricredshift estimation. The dataset was constructed by Zhang et al. (2013) for estimating photometric redshifts of quasars. Theyused the samples of the Quasar CatalogueV (Schneideret al. 2010)in SDSS DR7, which included 105,783 spectrally confirmed quasars. In each quasar record, five band features u,g,r,i,z are provided.SimilartoZhangetal.(2013),inourexperiments,weusethesefiveattributesu−g,g−r,r− i,i−z,r(shortfor4C+r)astheinputandthecorrespondingspectroscopicredshiftasaregressionoutput. 3 METHODOLOGY Firstly, we study the characteristics of catastrophic failure for quasars and observe that the outlier points by KNN are clustered into two groups: one group’s spectroscopic redshift z is between 0.2 and 1.1, spec while its photometric redshift z is between 1.2 and 2.1, and the other group’s z is between 1.4 phot spec 4 B.Hanetal. 6 5 4 ot h 3 p Z 2 Group1 1 Group2 0 µ 0 1 2 3 4 5 6 Zspec Fig.1 PhotometricredshiftestimationbyKNNestimation.ThepointsinGroup1andGroup2 areoutlierpoints.µistheparameterrepresentingtheerrortolerantscope.Theslantdottedlines givethecorrectedestimatedrangeofphotometricredshifts.Thehorizontaldottedlinesdescribes thezonesofGroup1andGroup2. and 2.3, while its z is between 0.3 and 1.2 (shown in Figure 1). Some points with z falling into phot phot Group1actuallyhavez closetotherangeofGroup2,buttheyarewronglyestimatedbyKNNandare spec mixedintoGroup1,andviceversa.Thetwogroupslookalmost180-degreerotationallysymmetricalong the45-degreediagonallineinthez vs.z diagram.ThetwooutlierclustersshowthatKNNmethod phot spec cannoteffectivelydistinguishoutlierpointsfromgoodestimationpointsusingEuclideandistanceinthetwo regions.Next, we proposea two-stage integrationapproachby fusionof k-nearestneighbors(KNN) and supportvectormachine(SVM)methods.Inthefirststage,we applyKNN algorithmonphotometricdata andestimate their correspondingz . In thesecondstage, we mapphotometricmultibandinputpattern phot ofpointsfallingintothetworangeswithz ∈[0.3,1.2]andz ∈[1.2,2.1]fromanoriginalattribute phot phot spaceintoahighdimensionalfeaturespacebyGaussiankernelfunctioninSVM.Inthehighdimensional feature space, many outlier points can be identified by a classification hyperplane in SVM and further be corrected.Since mostpointsofcatastrophicfailurehave beenidentifiedand corrected,ourintegration approachcanimprovethephotometricredshiftestimationaccuracy. PhotometricRedshiftEstimation 5 TheKNNalgorithmgenerallyappliesEuclideandistanceofattributes(showninEquation1)tocompute distancebetweenpointmandpointn, d = (f −f )2+(f −f )2+...+(f −f )2 (1) m,n q m,1 n,1 m,2 n,2 m,k n,k wheref (f ) denotesthe jth attribute among4C +r inputpatternforthe mth (nth)points,k repre- m,j n,j sents the total number of attributes. The points in Group 1 and Group 2 show that those outlier quasars cannotbe correctlyidentified in an Euclidean space. In other words, we cannothave a simple plane as a usefulseparatingcriterionbetweenpointsinGroup1andGroup2.Basedonthepresentdata,theprovided informationis notenoughto give goodestimation of the outlier points. Now there is a question whether thoseoutlierpointscanbelinearlyseparableinahigh-dimensionalnon-Euclideanfeaturespace?Thereby, we explorethe kernelfunctionin SVM and map features into a high dimensionspace and test if we can correctlyclassifyoutlierpointsinGroup1andGroup2.Fromtheanalysisabove,weproposeatwo-stage integrationapproachbyfusionofestimationwithKNNandclassificationwithSVM. 3.1 EstimationwithK-Nearest-Neighbor K-Nearest-Neighbor(KNN)algorithmisalazypredictorwhichrequiresatrainingsetforlearning.Itfirst finds the nearest neighbors by comparing distances between a test sample and training samples that are similartoitinafeaturespace.Next,itassignstheaveragevalueofthenearestneighborstothetestsample asitspredictionvalue.Ingeneral,thedistanceiscomputedasEuclideandistancedescribedinEquation1. Intheeraofbigdata,wehavebeencollectingmoredatathaneverbeforeandKNNachievesmuchaccurate predictions(Zhangetal.2013).Thereby,wealso useKNN inourresearch.OnedisadvantageofKNN is thehighcomputationcost.WeapplyKD-treetoefficientlyimplementKNNalgorithm. 3.2 ClassificationwithSupportVectorMachine SupportVectorMachine(SVM)isaneffectiveclassificationalgorithmbasedonstructuralriskminimization principleproposedbyVapnik(1995).Givenatrainingdatasetwithnrecords,eachrecordhasthepattern of(x ,y )fori=1,2,...,n,weaimtobuildalinearclassifierwiththefollowingEquation2, i i f(x)=w·x+b (2) Here, w and b are weight vector and bias respectively. Figure 2 illustrates that several lines can separate twocategoriesofpoints.InSVM,forminimizingtheclassificationerrorriskforothertestdatasets,weaim to finda line (shownasthe dashline inFigure2) with themaximizedmarginto separatethe two classes of points. This principlemakesSVM have a better classification accuracythan other competingmachine learningmodelsinmanyclassificationtasks. Sometimes,aclassificationtaskishardandnotlinearlysolvable.TheleftgraphinFigure3showsone suchcase.Insuchcase,byVapnik-Chervonenkindimensiontheory,SVMappliesakernelfunctiontopro- motetheoriginalflatspaceintheordinaryinnerproductconcepts.Bythetheoryofreproducingkernels,we canmaptheoriginalEuclideanfeaturespacetothehigh-dimensionalnon-EuclideanfeaturespaceinSVM classification algorithm.Thereby,some ofnon-linearityproblemsin the originallow-dimensionalfeature spaceℜd becomelinearlysolvableinthehigh-dimensionalspaceℜD.TherightgraphinFigure3showsa 6 B.Hanetal. 2 Margin= ||w|| w•x+b=1 w•x+b=0 w•x+b=-1 Fig.2 Maximizingclassification marginis aimed in SVM. The pointson the dotted lines are calledassupportvectors.ThedistancebetweenthetwodottedlinesisnamedasMargin.When theMarginismaximized,theclassificationaccuracyachievesbest. ´(cid:71) ´(cid:39) f(x)=0 (cid:1131) Fig.3 Linearindistinguishablepointsinalowdimensionalspace(ℜd)canbeseparatedinahigh dimensionalspace(ℜD)bytheapplicationofkernelfunction(∅)inSVM.f(x) = 0represents thehyperplanethatseparatesthetwoclasses. dimensionmappingbykernelfunctiontosolvetheproblem.Therefore,Equation2canbetransformedto thefollowingformbyfeaturemappingfunction∅, f(x)=w·∅(x)+b (3) Inthisway,wecanhavethefollowingobjectivefunctionandconstraintsforaSVMclassifierasbelow, andminimize n ||w ||2 +C ǫ X i i=1 PhotometricRedshiftEstimation 7 subjectto y ·(<w,∅(x )>+b)≥1−ǫ (4) i i i here,C isaregularizationparameterandǫ isaslackvariable. i Byrepresentedtheorem,wehave, n f(x)= α y ∅(x )T∅(x)+b (5) X i i i i=1 α is a parameter with the constraint that α ≥ 0. For solving Equation 5, SVM introducesa kernel i i functiondefinedas, K(x ,x)=∅(x )T∅(x) (6) i i In this paper, we practically apply Gaussian kernel (shown in Equation 7) to achieve the non-linear classification. −||x1−x2||2 K(xi,x)=e 2σ2 (7) wherex ,x representsvectorsofmultibandattributesorinputpatternsobservedfromasinglesurvey,σ 1 2 isafreeparameter. In this way, we aim to apply a SVM classifier to distinguish the mixture pointsin Group1 and other pointsaroundtheminordiagonalwithz ∈ [1.2,2.1]inthez vs.z diagram.Similarly,wecan phot phot spec distinguishpointsinGroup2andotherpointswithz ∈[0.3,1.2]. phot 3.3 IntegrationofKNNandSVMforPhotometricRedshiftEstimation ThephotometricredshiftestimationalgorithmbyintegrationofKNNandSVMispresentedinthefollow- ing.Toobtaintherobustaccuracymeasureforourintegrationapproach,werepeattheexperimentsfornum rounds.In eachround,the data sets willexperiencethe initializationstep, KNN step, SVM trainingstep, SVMteststep,correctionstepandevaluationstep.Ininitializationstep,werandomlydividetheSDSSdata setintoseparatetrainingset,validationsetandtestset.InKNNstep,weapplyKNNalgorithm(k = 17) toestimatezphot−validationandzphot−testbasedontrainingsetandtheunionoftrainingsetandvalidation setrespectively.InSVM trainingstep, we aim tobuild two SVM classifiers:SVM1 andSVM2to distin- guishgoodestimationandoutlierpointswithzphot−validation ∈ [1.2,2.1]andzphot−validation ∈[0.3,1.2], respectively.ThegoodestimationoroutlierpointsisdefinedinthefollowingEquation8, |z −z | ≤ µ good  spec phot (8)  |z −z | > µ bad spec phot  here,µistheparameterwhichmeanstheerrortolerantscopederivedfromthevalidationset. Visually,thegoodestimationpointswillfallintoareacloseto45-degreediagonallineinthediagram, whiletheoutlierpointswillfallintoGroup1andGroup2inFigure1. Specifically, we use those outlier points with zphot−validation ∈ [1.2,2.1] and zphot−validation ∈ [0.3,1.2] to construct datasets: Group1_trainingdata and Group2_trainingdata, respectively. In the two datasets,inputsarepatterns4C+randz directlyfromKNN,andtheoutputisz . phot spec InSVMteststep,weapplyclassifiersSVM1andSVM2toidentifyoutlierpoints. 8 B.Hanetal. In correction step, we use KNN algorithm based on Group1_data to compute z for those outlier phot pointsdistinguishedbySVM1intestdata.SinceGroup1_dataandthoseoutlierpointshavethesimilarpat- ternwhiletheoutputofGroup1_trainingdataisz ,theKNNalgorithmcanimprovethez estimation. spec phot Similarly,we canuse Group2 totraindataandthencorrectoutlierpointsdistinguishedbySVM2 intest data. Inevaluationstep, we applythe percentsin different|∆z|rangesandrootmean square(rms)errorof ∆ztotestourphotometricredshiftestimationapproach.Thedefinitionof∆zislistedinEquation9. z −z spec phot ∆z = (9) 1+z spec Thedetailedstepsofthetwo-stagemethodareasfollowing.Tobemuchclearer,theflowchartofthe wholeprocessisshowninFigure4. LoopId=1; DowhileLoopId≤num; InitializationStep: Randomlyselect1/3samplefromtheSDSSquasarsampleasthetrainingset,another1/3sampleas thevalidationset,andtheremaining1/3sampleasthetestset. KNNStep: 1. Based on training set, we apply KNN (k = 17) algorithm to estimate zphot−validation for each sampleinvalidationset; 2.Basedontheunionoftrainingsetandvalidationset,weapplyKNN(k =17)algorithmtoestimate zphot−testforeachsampleintestset. SVMTrainingStep: 1.For those sampleswith zphot−validation ∈ [1.2,2.1]in validationset, we train a classifier SVM1 with Gaussian kernel, which distinguishes good estimation and outlier points by Equation 8. With those outlierpoints,webuildadatasetGroup1_trainingdata,whichiscomposedof4C+randz astheinput phot andz astheoutput; spec 2.Similarly,forthosesampleswithzphot−validation ∈[0.3,1.2]invalidationset,wetrainaclassifier SVM2 with Gaussian kernel, which distinguishes good estimation and outlier points. With those outlier points,webuildadatasetGroup2_trainingdata,whichiscomposedof4C +r andz astheinputand phot z astheoutput. spec SVMTestStep: 1.Forthosesampleswithzphot−test ∈ [1.2,2.1]intestset,weapplytheclassifierSVM1todistin- guishgoodestimationandoutlierpoints; 2.Similarly,forthosesampleswithzphot−test ∈ [0.3,1.2]intestset,weapplytheclassifierSVM2 todistinguishgoodestimationandoutlierpoints. CorrectionStep: 1.Forthoseoutlierpointswithzphot−test ∈[1.2,2.1]intestset,weapplytheKNNalgorithmbased onthedatasetGroup1_trainingdata; 2.Forthoseoutlierpointswithzphot−test ∈[0.3,1.2]intestset,weapplytheKNNalgorithmbased onthedatasetGroup2_trainingdata. PhotometricRedshiftEstimation 9 EvaluationStep: By comparing zphot−test and zspec for all samples in test set, we compute the popular accuracy measuresforredshiftestimationandrmserrorof∆z. LoopId=LoopId+1; Enddo. Output the mean and standard error for the percents in different |∆z| ranges and rms error of ∆z to evaluatetheaccuracyofourproposedintegratedapproachKNN+SVM. 4 EXPERIMENTALRESULTS Inthe experiments,we adopttheinputpattern4C +r as attributes,whichare widelyacceptedbyrecent researchesonphotometricredshiftestimation.Inourdesignedalgorithm,wepracticallysetnum=10and repeattheexperimentsfor10times. For classification, we apply the widely used tool LIBSVM (Chang, 2011).By using Gaussian kernel function, we train classifiers SVM1 and SVM2 for the sample with zphot−validation ∈ [1.2,2.1] and for the sample with zphot−validation ∈ [0.3,1.2] in the validation set, separately. To optimize the estimation accuracy,weadjusttwoparameterscontrollingtheGaussiankernelinSVM,acostcoefficientC thatmea- sures the data unbalanceand a factor γ depicting the shape of the high dimensionalfeature space. Other parametersaresettothedefaultvaluesinLIBSVM.Inordertoobtainthebestmodelparameters,thegrid search is adopted.The grid search in SVM1 and SVM2 is indicated in Figure 5. For SVM1, the optimal modelparameterCis2,γis8,meanwhile,theclassificationaccuracyis94.12%.ForSVM2,thebestmodel parameterC is128,γ is0.5,theclassificationaccuracyamountsto90.04%. Withtheoptimizedparametersandtheunionofthetrainingsetandthevalidationsetasanewtraining set, we compare the estimation accuracy between original KNN (k = 17) algorithm and our integration approach KNN+SVM. The parameter µ is a factor to determine whether a point has good estimation or not. We change the value of µ to check its influence on the estimation accuracy.The results are listed in Table1.ForKNN,theproportionsof|∆z|< 0.1,0.2,0.3andrmserrorofpredictedphotometricredshifts are71.96%,83.78%,89.73%and0.204,separately;forKNN+SVM,theseoptimalmeasuresare83.47%, 89.83%,90.90%and0.192,respectively,whenµ=0.3,whichareboldinTable1.Obviously,thesecriteria for photometric redshift estimation are all significantly improved with the new method. It suggests that the integration approach can effectively correct those outlier points with z ∈ [1.2,2.1] and z ∈ phot phot [0.3,1.2].Thereby,itcansignificantlymitigatecatastrophicfailureandimprovetheestimationaccuracyof photometricredshifts. Theexperimentalresultsalsoshowthatwithoutcross-matchingmultibandobservationsfrommultiple surveys, we can effectively apply Gaussian kernel function in SVM to identify outlier points in Group 1 andGroup2fromcatastrophicfailurebymappingattributesfromasingledatasourceintoahighdimen- sionalfeaturespace.Theidentificationhelpsuscorrectthoseoutlierpointsandtherebyimproveestimation accuracy. In order to compare the performanceof photometricredshift estimation by KNN algorithmwith that by KNN and SVM approach, the photometric redshift estimation with these two methods is shown in 10 B.Hanetal. Id=1 No Output Id<=num ? results Yes Divide the quasar dataset into 3 separate subsets randomly: training, validation and test Apply KNN(training) to compute Zphot of the validation subset based on 4C+r Search for Group1 and Group2 subsets from the validation result Apply 4C+r+Zphot as input, whether |Zphot-Zspec|<=margin as an output class, build SVM1/SVM2 classifier for Group1/Group2; those outliers in Group1/Group2 are combined with Zspec to form the correction dataset Outlier1/Outlier2 Apply KNN(training+validation) to compute Id=Id+1 Zphot of the test subset based on 4C+r Search the test results for Group1 and Group2 subsets For those estimations in Group1/Group2, apply SVM1/SVM2 classifier to compute if they are outliers For those outliers from SVM1/SVM2, apply KNN on Outler1/Outler2 to compute Zphot with 4C+r Evaluate estimation accuracy Fig.4 TheflowchartofphotometricredshiftestimationbytheintegrationofKNNandSVM. Figure6andFigure7,respectively.AsindicatedbyFigures6-7,wecanseeclearlythattheoutlierpoints inbothGroup1andGroup2havebeensignificantlydecreasedbyadoptingthenewmethodKNN+SVM. Itintuitivelyprovesthatourproposedapproachiseffective.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.