Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander UniversityofVaasa November 1, 2010 VladimirBochko Chemometrics:Classification 1/36 Contents • Terminology • Introduction • Big picture • Support Vector Machine • Introduction • LinearSVMclassifier • NonlinearSVMclassifier • KNN classifier • Cross-validation • Performance evaluation VladimirBochko Chemometrics:Classification 2/36 Terminology Thetaskofpatternrecognitionistoclassifytheobjectsintoanumber ofclasses. • Objectsarecalledpatternsorsamples. • Measurementsofparticularobjectparametersarecalledfeaturesor componentsorvariables. • Theclassifiercomputesthedecisioncurvewhichdividesthefeature spaceintoregionscorrespondingtoclasses. • Theclassisagroupofobjectscharacterizedbysimilarfeatures. • Thedecisionmaynotbecorrect. Inthiscaseamisclassification occurs. • Thepatternsusedtodesigntheclassifierarecalledtraining(or calibration)patterns. • Thepatternsusedtotesttheclassifierarecalledtestpatterns. VladimirBochko Chemometrics:Classification 3/36 Introduction • Ifthetrainingdataisavailablethenwetellaboutsupervisedpatternrecognition. • Ifthetrainingdataisnotavailablethenwetellaboutunsupervisedpattern recognitionorclustering. • Weconsideronlysupervisedpatternrecognition.Inthiscasethetrainingset consistsofdataXandclasslabelsY. • WhenwetesttheclassifierusingthetestdataX,theclassifierpredictsclass lablesY. • Thus,classificationrequirestrainingorcalibrationandtest.SOLOGUI[3]has buttons:calibrationandtest/validation: VladimirBochko Chemometrics:Classification 4/36 Abbreviations • KNN-K-nearestneighborclassifier.TheKNNclassifierrequireslabels. • SVM-SupportVectorMachines.TheSVMclassifierrequireslabels. • PLS-PartialLeastSquares.Themapping(compression),regressionand classificationtechnique.PLSrequireslabels. • PCA-PrincipalComponentAnalyzis.Themapping(compression)technique. Labelsarenotneeded. • DA-DiscriminantAnalysis,e.g.PLSDA,SVMDA.DAmeansthatclassificationis used. • MCS-MultiplicativeScatterCorrection.Thepreprocessingtechnique. • SNV-StandardNormalVariatetransformation.Thepreprocessingtechnique. VladimirBochko Chemometrics:Classification 5/36 Big picture Classification/prediction Measured spectra Preprocessing MCS, SNV, smoothing derivatives PCA, PLS SVMDA: classifire design Classifier Training X Training Y Tested X Knn, SVM, PLS Model Training/validation Classification/prediction System evaluation Model Predicted Y VladimirBochko Chemometrics:Classification 6/36 Example Wehavegreen,yellow,orangeandredtomato. Fromthesalesmanviewpoint theorangeandredtomatoaresuitableforsale. Thereforetomatoisdivided intotwoclasses: green/yellowandorange/red. VladimirBochko Chemometrics:Classification 7/36 Measurement • ThetypicalmeasurementsystemisshowninFigure. • Important! WriteMEMOduringmeasurement. MEMOincludesaname ofthefile,physicalorchemicalparameterofobject,i.gcheesefatness, andclasslabels. Computer MEMO Light source Spectrometer N File name Parameters Class labels (cid:0)(cid:0)(cid:1)(cid:1)pL(cid:0)(cid:0)(cid:1)(cid:1)riogbh(cid:0)(cid:0)(cid:1)(cid:1)et(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:1)(cid:1)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:1)(cid:1)(cid:0)(cid:0)(cid:1)(cid:1)(cid:0)(cid:0)(cid:0)(cid:0)(cid:0)(cid:1)(cid:1)(cid:1)(cid:1)(cid:1)(cid:0)(cid:0)(cid:1)(cid:1)sensor probe 1 Measured object VladimirBochko Chemometrics:Classification 8/36 Spectral data Spectrameasuredbyaspectrometerareusuallyarrangedasfollows: • Thefirstrowiswavelength. • Thefirstcolumnisasamplenumber. • Themeasuredspectrumvaluesgivenintablecellscorrespondto wavelengthsgiveninnanometersandspectrumnumbers. • Somespectrumvaluescorruptedbynoisearenegative. Thebeginning andtheendofthespectracontainmostlynoise. VladimirBochko Chemometrics:Classification 9/36 Spectral data • Thespectralvaluesareobtainedatintervalsabout0.27nmintherange 195-1118nm. Thenumberofmeasurementpointsis3648thatis unnecessarilyhigh. • Theexampleshowshowdatamaybearrangedinthedatafileafter measurementswithspectrometer. a)Dataincludeswavelengthsand samplenumbers. b)Datawithoutwavelengthsandsamplenumbers. In thiscaseavectorofwavelengthsshouldbekeptinaseparatefile. a b Wavelength 1, 2, ... , 3648 Wavelegths Sample number Sample 1 Spectrum 1 numbers 2 Spectrum 2 Matrix where entries are spectrum values Data X Labels Y VladimirBochko Chemometrics:Classification 10/36
Description: