ebook img

Online Learning with Regularized Kernel for One-class Classification PDF

0.83 MB·
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Online Learning with Regularized Kernel for One-class Classification

PAPERSUBMITTEDTOIEEETRANS.ONSMC 1 Online Learning with Regularized Kernel for One-Class Classification Chandan Gautam*, Aruna Tiwari, Sundaram Suresh, Senior Member, IEEE, and Kapil Ahuja Abstract—This paper presents an online learning with regu- neural network. Two types of SVM based one-class classifier larized kernel based one-class extreme learning machine (ELM) have been proposed, namely, One-Class SVM (OCSVM) [7] 7 classifierandisreferredas“onlineRK-OC-ELM”.Thebaseline and Support Vector Domain Description (SVDD) [8]. After 1 kernelhyperplanemodelconsiderswholedatainasinglechunk surveyingvariouspapers,ithasbeenobservedthatSVMbased 0 with regularized ELM approach for offline learning in case of one-classclassifierhasbeenexploredmorecomparedtoother 2 one-class classification (OCC). Further, the basic hyper plane model is adapted in an online fashion from stream of training oneclassmethods[2].SVMbasedone-classclassifierhasalso n samplesinthispaper.Twoframeworksviz.,boundaryandrecon- been developed for incremental and online learning [9]. For a struction are presented to detect the target class in online RK- various one-class classifiers like SVDD, autoencoder etc., a J OC-ELM. Boundary framework based one-class classifier con- toolbox is available in [10]. Based on different nature of data 7 sists of single node output architecture and classifier endeavors usedduringlearninghyperplaneofOCC,itcanbeclassifiedas 1 to approximate all data to any real number. However, one-class classifier based on reconstruction framework is an autoencoder [11] (i) with positive examples only, (ii) with the positive and ] architecture, where output nodes are identical to input nodes smallamountofnegativeexamplesonly,and(iii)withpositive G and classifier endeavor to reconstruct input layer at the output and unlabeled data. Online learning has attracted researchers layer. Both these frameworks employ regularized kernel ELM L in recent years due to its capability to handle high volume of based online learning and consistency based model selection . streamingdata.Thisisbecauseitiscomputationallyexpensive s has been employed to select learning algorithm parameters. c The performance of online RK-OC-ELM has been evaluated on to handle as well as costly to store large volumes of data. [ standard benchmark datasets as well as on artificial datasets Neural network with backpropagation has been explored and the results are compared with existing state-of-the art one- in past for OCC task [12] [13]. The learning algorithm in 1 classclassifiers.Theresultsindicatethattheonlinelearningone- v these one-class classifiers uses iterative and computationally class classifier is slightly better or same as batch learning based 8 intensive gradient descent based approach. This issue has approaches. As, base classifier used for the proposed classifiers 0 are based on the ELM, hence, proposed classifiers would also been addressed by single layer feed-forward network in past 5 inheritthebenefitofthebaseclassifieri.e.itwillperformfaster decades.Singlelayerfeed-forwardnetworkgotquiteattention 4 computationcomparedtotraditionalautoencoderbasedone-class byresearchersformulti-classclassificationandregressiondue 0 classifier. to its fast training capability as it does not adjust the weights . 1 Index Terms—One-Class Classification, Kernel, Regulariza- by back propagation. Huang et al. [14] [15] also developed a 0 tion, Online Sequential, Extreme Learning Machine (ELM), singlelayerfeed-forwardnetwork,calledasExtremeLearning 7 Outliers Detection. Machine (ELM). Onlineversion of ELM hasalso been devel- 1 : oped for multi-class classification and regression tasks with v I. INTRODUCTION random[16]andkernelfeaturemapping[17][18][19].Inpast i X decades, ELM has been well expanded in both dimensions: THEterm‘One-ClassClassification’(OCC)wascoinedby r Moya et al. [1]. The OCC has been generally employed theory [20] [21] [22] [23] and application [24] [18] [25] [26]. a AdetailedsurveyonELManditsapplicationcanbefoundin to solve the problem of novelty, outlier, intrusion or fault [27]. Recently, Leng et al. [28] developed ELM for OCC due detection [2]. These problems are also solved by multi-class toitsfastlearningspeed,whichsupportsonlyofflinelearning. classificationwhensamplesofbothclasses,normalandoutlier Leng et al. [28] tested their model with only one type of class, are available [3] [4] [5] [6]. However, the OCC has threshold deciding criteria, i.e. rejection of few percentages been applied when either data of only one class is available of most deviant training samples after completion of training. or the data belonging to other classes is very rare. One- Later,Iosifidis[29]improvisedELMbasedone-classclassifier class classifier simply describes the data, therefore it is also based on geometric information of class for offline learning. called as a data descriptor. Most of the research in OCC Most recently, Gautam et al. [30]1 [31] explored further ELM has been based upon Support Vector Machine (SVM) and basedone-classclassifierformainlyofflinelearningandtested *CorrespondingAuthor it with three different threshold criteria. Gautam et al. [30] ChandanGautamiswiththeIndianInstituteofTechnologyIndore,(e-mail: have shown particularly two aspects: (i) random feature map- [email protected]). ping based offline methods have been outperformed by kernel Aruna Tiwari is with the Indian Institute of Technology Indore, (e-mail: [email protected]). feature mapping based offline methods (ii) the possibility of Sundaram Suresh is with the Nanyang Technological University, 639798 Singapore,(e-mail:[email protected]). 1youcandownloadthesubmittedversionofacceptedpaperformthelink: Kapil Ahuja is with the Indian Institute of Technology Indore, (e-mail: http://goo.gl/XhkVpE . We are providing this url because paper is accepted [email protected]). butnotpublishedyet. PAPERSUBMITTEDTOIEEETRANS.ONSMC 2 development of online learning with ELM based one-class the tth sample and c is the class label of the target class, t classifier.However,developedonlineone-classclassifierspro- which is same for all the training data. Input layer takes duce very inferior performance compared to offline one-class data for tth input sample is coded as (xt,Rt) because model classifiers. Hence, there is a need to develop an online one- has to approximate all data to any real number R. Target class classifier with fast learning approach, which is required output vector R is represented as [R1,R2....Rt,...], however, to provide performance similar to batch learning algorithm value of R will be same for all samples. Here, value of R t t andhandlestreamofdata.Therefore,regularizedkernelbased is considered as 1 for all the experiments. Further, kernel online-sequential one-class classifiers have been proposed in featuremappinghasbeenemployedbetweeninputandhidden this paper, which can perform similar to offline one-class layer φ. Different symbol has been used for kernel matrix classifiers. Proposed one-class classifiers are developed for to avoid confusion between kernel φ and random feature two types of framework: (i) boundary and (ii) reconstruction. mapping H. During training, hidden layer output or kernel Boundaryframeworkbasedproposedmethodisdevelopedasa matrix φ will be a square symmetric matrix of size [t t]. × fast online single output node architecture with kernel feature Output weight β for any t samples in sequence of data is mapping and reconstruction based approach is developed as a represented as [β ,β ....β ,β ]. R=[R ,R ...R ,...] 11 21 (t 1)1 tt 1 2 t − fastonlineautoencoderwithkernelfeaturemapping.Boundary is the predicted output vector and R is the predicted output t framework based proposed method is explored with one type for tth sample. c = [c ,c ...c ,...] isbthe pbredbictedbclass 1 2 t of threshold and reconstruction framework based proposed vector, where, c is the predicted clabss for tth sample. Online t methodisexploredwithtwotypeofthresholdcriteria.Outof learning algorithmb of Obnlibne bRK-OC-ELM for boundary twothresholdcriteria,onerejectsfewsamplesfromtargetdata framework basebd approach is as discussed below. Result of to decide threshold point, however, another threshold criteria breast cancer and ecoli dataset is provided jointly in the doesn’t reject any sample from target data to decide threshold table as both have well separated classes. Similarly, result point. Data description capability of the proposed methods of diabetes and liver datasets have provided jointly because is tested on synthetic or artificial datasets and compared the both have overlapped classes, and, sonar and spectf datasets outcomes with random feature mapping based online one- arealsoprovidedjointlyastheyarehighdimensionaldatasets. class classifier. Results on six benchmark data exhibit that performance of the proposed online one-class classifiers is For initial chunk of N dataset X ,R , objective is 0 0 0 either similar or better compared to traditional offline autoen- to minimize the output weight β {as well a}s error E be- 0 0 coderandincremetalSVDD.Amongtwotypesofframework, tween expected (R ) and predicted value (φ(X )β ). For 0 0 0 reconstruction framework based proposed methods perform regularization, λ is used as a regularization parameter with better than boundary framework based proposed methods. minimization problem. Hence, minimization problem can be The rest of the paper is organized as follows: Section II written as follows: discussed about the proposed work. Performance evaluation has been discussed in Section III. Section IV contains the 1 1 conclusion of the work. Minimize:L= β 2+λ E 2 0 0 2k k 2k k (1) Subject to: φ(X )β =R E II. ONLINELEARNINGWITHREGULARIZEDKERNELFOR 0 0 0− 0 OCC After solving above minimization problem using Karush- Inthissection,ELMbasedonlinesequentialone-classclas- Kuhn-Tucker (KKT) theorem [33], it yields the following sifierismodeledbytakingbothfactorsviz.,regularizationand solution: kernelization, into account. This is called as online learning with regularized kernel for one-class ELM (Online RK-OC- ELM) in this paper. Regularization factor helps classifier β =P R 0 0 0 to achieve a better generalization capability for noisy data. Tikhonov regularization [32] has been employed due to its P0 =φ−01 (2) capability to handle ill-posed and singular problem, which φ0 =φ(X0) generally emerges in solving the inverse problem. Following subsectionsdiscussesfortwotypesofframeworkviz.,bound- Here, φ is a kernel matrix for initial N of size N 0 0 0 × ary and reconstruction: N .Kernelmatrixφ isdefinedbasedonMercer’scondition. 0 0 Hence, any kernel method which satisfies Mercer’s condition A. Online RK-OC-ELM: Boundary Framework Based Ap- can be adopted as the kernel for the proposed classifier. For proach initial N0 sample, kernel matrix will be defined as: In Boundary framework based Online RK-OC-ELM, model is trained by only target data X and endeavored to Ω Ω ... Ω approximate all data to any real number. Fig. 1 shows online 11 12 1N0 OstrCeCamwoitfhtrsainingilnegoduatptautXno,de(xa1rc,hc1it)e,c(txu2re,.c2I)n,.F..i,g(.x1t,,cgti)v,e.n..a, φ=φ0 = Ω...21 Ω...22 ...... Ω2...N0 + λI (3) where xt = [x1t,x2t,...,xnt] ∈{ <n is n-dimensional input o}f ΩN01 ΩN02 .... ΩN0N0      PAPERSUBMITTEDTOIEEETRANS.ONSMC 3 (x , c) t t x1 K(x, x) t t 1 x2 (xt , Rt) t K(xt, x2) (cid:2869)(cid:2869) Debcaisseiodn o Fnu snocmtieo n (cid:2870)(cid:2869)  Threshold Deciding Kernel Feature  ̂ ̂ Criteria ̂ Mapping (cid:4666)−(cid:2869)(cid:4667)(cid:2869) K(x, x ) t t-1 xn t K(x, x) (cid:2869) t t Fig. 1: Schematic Diagram of Online Sequent ial One Class Extreme Learning Machine: Boundary Based In Now, value of P will be inverse of φ in Equation (5) as 1follows: Ω11 Ω12 ... Ω1N0 − P =P0 =φ−01 = Ω...21 Ω...22 ...... Ω2...N0 + λI P =φ−1 = (φφu,uv)T φφuv,v −1 (8)    (cid:20) (cid:21) Ω Ω .... Ω    N01 N02 N0N0   Further, compute the inverse in Equation (8) using block   (4) matrix inverse formula [34]. Initially, φ and P will be equal to φ and P respectively. 0 0 Here, φ represents kernel matrix and P represents inverse 1 of this kernel matrix for all the arrived samples till now for S =D−1 = DD11 DD12 − = SS11 SS12 (9) training. Here, φ and P will be updated continuously for any (cid:20) 21 22(cid:21) (cid:20) 21 22(cid:21) upcoming new samples Xv as per Equation (5), where, Xv= where, c{h(xuv1n,kc1o)f,(thxev2,dca2t)a,...C.,u(rxrvse,nctsv)}aluanedoXf φv ⊂is rXep.reXsevnitsedjuasst nφeuxt. S11 =(D11−D12D2−21D21)−1 NHoerwe,,cualacnudlavtesφimapfltyerdeanrroivtealooldfannedwnseawmpvlaeluaessfroelslpoewcst:ively. S12 =−D1−11D12(D22−D21D1−11D12)−1 S21 =−D2−21D21(D11−D12D2−21D21)−1 φ= φu φu,v (5) S22 =(D22−D21D1−11D12)−1 (φ )T φ u,v v (cid:20) (cid:21) Hence, Equation (8) will be rewritten as follows: Here, φ is combination of four block matrices. φ is the u ocaldlcvualaluteedoafsφp.eBrlEocqkuamtioatnrix(6φ),uw,vhiacnhdisφvdisincuEsqseudatbioenlo(w5.)are P =φ−1 = PP1211 PP1222 (10) (cid:20) (cid:21) Let the number of samples processed till now be b and P , P , P and P in Equation (10) can be written as number of samples in the current chunk be s. b is initially 11 12 21 22 follows: equaltoN .Updatebandseachtimewhencalculationstarts 0 fdoerfinneedwassafmolplloews.s:The block matrices φv and φu,v can be P11 =(φu−φu,vφ−v1φTu,v)−1 P12 =−φ−u1φu,vP22 (11) K(xv,xv) ... K(xv,xv) P21 =−φ−v1φTu,vP11 1 1 1 s φv = ... ... ... + λI (6) P22 =(φv−φTu,vφ−u1φu,v)−1 K(xvs,xv1) .... K(xvs,xvs)   P11 canbeexpandedbyemployingWoodburyformula[34]    as: K(xu1,xv1) ... K(xu1,xvs) P11 =(φu−φu,vφ−v1φTu,v)−1 φu,v = ... ... ...  (7) =φ−u1−φ−u1φu,v(φTu,vφ−u1φu,v (12) K(xub,xv1) .... K(xub,xvs)  +φ−v1)−1φTu,vφ−u1   PAPERSUBMITTEDTOIEEETRANS.ONSMC 4 In a similar fashion, Equation (11) can be explored for Algorithm1OnlineRK-OC-ELM:BoundaryBasedApproach P12, P21 and P22. After processing the training dataset X, Input: Training set X= (x1,c1),(x2,c2),...,(xN0,cN0),..., output function can be written for any set of k samples (xt,ct),... Xk = x1,x2,...,xk as follows: Output: Whether Target or Outlier corresponding to each { } sample T K(X,x1) 1: Pass initial set of samples X0,R0 to the classifier as: f(Xk)= ..  PR (13) {//(xF1o,rRfi1)r,st(xc2h,uRn2k),o..f.,(Nx0N0s,aRmNp0l)e}s, following steps are  .  required   K(X,xk) 2: Employ kernel feature mapping: φ0 =φ(X0). where, K(X,x ) denotes kernel vector for ith sample x . 3: Output Weight β0 for (X0,R0): i i β P R Further, perform the following steps to decide whether any 0 0 0 sample is outlier or not: P0←←φ−01 b N (i) Calculate distance (d) between the predicted value of the ← 0 tth training sample and R as follows: //Forsecondchunkonwards,followingstepsarerequired φ φ 0 d(xt)=|f(xt)−Rt|= Rˆt−Rt (14) P ←←P0 (cid:12) (cid:12) 4: for i=1 to last chunk or block of data in X do (ii) Aftercalculatingdistancesdaspera(cid:12)boveEqu(cid:12)ation(14), (cid:12) (cid:12) 5: Size of chunk at the current stage = s sort the differences in decreasing order. Further, reject few percent of training samples based on the deviation. 6: Calculate φv for ith block by Equation (6) More deviated sample will be rejected first because they 7: Calculate φu,v for ith block by Equation (7) 8: Update the final kernel matrix φ by using Equation (5) aremostprobablyfarfromdistributionofthetargetdata. 9: Inverse of final kernel matrix φ, i.e. calculate P by The threshold will be decided based on those deviations using Equation (8) as follows: 10: Solve Equation (8) by Equation (9)-(12) 11: b=b+s θ =d( η N ) (15) 1 b ∗ c 12: end for Where 0<η 1, N is the number of training samples 13: Output Weight β =PR ≤ and η is the fraction of rejection of training samples for 14: Compute the predicted value by using output function deciding threshold value. We have considered 10% of f(Xk), which is defined in Equation (13) rejection, i.e. η = 0.1. Now, generate a decision function 15: Calculatedistances(d)betweenpredictedvalueoftraining todecidewhetheranynewsamplezbelongstothetarget sample and R as per Equation (14) or outlier, where z = [z1,z2,...,zn] as per following 16: Sort the distances in decreasing order equation: 17: Compute θ1 by Equation (15) 18: Use Equation (16) to decide whether a new sample z belongs to target or not 1, z is classified as target Sign(θ1−d(z))= 1, z is classified as outlier (cid:26)− (16) Theaboveproposedapproachissummarizedaspseudocode predicted output vector for the tth sample. c=[c ,c ...c ,...] 1 2 t in Algorithm 1. is the predicted class vector, where, c is the predicted class t for the tth sample. Formulation of OnlinebRK-ObC-bELMb for B. Online RK-OC-ELM: Reconstruction Based Approach reconstructionframeworkbasedapprobachisasdiscussednext: In reconstruction framework based Online RK-OC-ELM, model is trained by only target data X and endeavored to For initial chunk of N0 dataset X0,X0 , objective is { } to minimize the output weight β as well as error E approximate all data to itself. Fig. 2 shows the architecture 0 0 between expected (X ) and predicted values (φ(X )β ). for reconstruction framework based Online RK-OC-ELM. In 0 0 0 Fig. 2, given a stream of training data X is defined same as Minimization problem can be written as follows: discussed above in the boundary framework based one-class classifier. Input layer takes data for tth input sample as 1 1 (xt,xt) because the target is identical to the input layer. Minimize:L= 2kβ0k2+λ2kE0k2 (17) Further, kernel feature mapping is employed between input Subject to: φ(X )β =X E , and hidden layer φ. During training, hidden layer output 0 0 0− 0 or kernel matrix φ will be a square symmetric matrix of size [t t]. Output weight β till tth samples is represented After solving the above minimization problem using × as [β ,β ....β ,β ]. X = (x ,x ...x ,...) is the Karush-Kuhn-Tucker (KKT) theorem [33], it yields following 1 2 (t 1) t 1 2 t predicted output −vectors, where x = [x1,x2,...,xn] is the solution: t t t t c b b b b b b b PAPERSUBMITTEDTOIEEETRANS.ONSMC 5 𝛽 (x , c) 11 t t xt1 K(xt, x1)  𝑥̂𝑡1 (x t , xt) xt2 K ernel Feature K (xt, x2)   𝑥̂𝑡2 𝒙̂ 𝒕 Debcaisseido no nF usnocmtieo n 𝑐𝑡̂ Threshold Deciding Mapping Criteria K(x, x ) t t-1 xtn K(x, x) 𝛽(𝑡−1)𝑛  𝑥̂𝑡𝑛 t t 𝛽 𝑡𝑛  Fig. 2: Schematic Diagram of Online Sequen tial One Class Extreme Learning Machine: Reconstruction Based typeofKernels.Hence,consistencybasedmodelselectionhas β0 =P0X0 been applied for both the frameworks based OCC. There are P0 =φ−01 (18) two parameters for kernel feature mapping with a Gaussian kernel i.e. regularization parameter (λ) and kernel parameter φ =φ(X ) 0 0 (σ). Consistency based model selection [37] also employs Here, φ0 is a kernel matrix for initial N0 of size N0 N0. K-fold cross validation within it (see Equation (22) below). × Initially,φandP willbeequaltoφ0 andP0,respectively. 5-fold cross validation is used with all one-class classifiers. Here, φ represents kernel matrix and P represents inverse This model selection works mainly based on 2-sigma bound of this kernel matrix for all the arrived samples till now for of classification model. It determines a threshold error (E ) thr training. Both of these will be updated continuously for any by using Equation (22), and executes the proposed classifier upcoming new samples Xv as per Equation (5)-(12). After withdifferentparametersoftheclassifiertillityieldslesserror processing the training dataset X, output function for any set than E . thr of k samples X = x ,x ....x can be written as follows: k 1 2 k { } E =(M η+σ (η (1 η) M))/M thr thr ∗ ∗ ∗ − ∗ (22) T K(X,xk) =η+σthr∗ (η∗p(1−η)/M) . f(X )= .  PX (19) where, M is equal to p(N/f), f is number of folds, σthr is k threshold require for determining decision boundary during  .    model selection, M denotes number of samples in the K(X,x1) validation set, η denotes the fraction of the samples rejected   where, K(X,x ) denotes kernel vector for ith sample x . from the dataset which lies between 0 and 1, (M η) is the i i ∗ Afterwards, calculate error and decide whether any sample expected number of rejected samples, η (1 η) M is the ∗ − ∗ belongs to target or outlier as per pseudo code discussed in variance and after square root of this is standard deviation Algorithm 2. Algorithm 2 introduces one extra threshold and (M η +σthr (η (1 η) M)) is the maximum ∗ ∗ ∗ − ∗ decisioncriterion(θ )comparedtoboundaryframeworkbased allowed number of rejected target objects. 2 p approach. Since, θ is based on only reconstruction error 2 of input data, therefore, it is not applicable for boundary The above discussed model selection selects the optimal framework based approach. value of the parameter. If the classifier has more than one parameters then this will search for the optimal value of every parameter on all possible combinations of range of the C. ParameterSelectionforELMbasedOne-ClassClassifier value of the parameters. Since, proposed classifiers have two Parameter selection is always a crucial task in the case parameters σ and λ and the value of these parameters are of one-class classification. Various model selection criteria decided by this model selection criteria, which performs a have been proposed in literature for handling parameters of linear search on the range of all possible combinations of the kernel, however, most of them are developed especially σ and λ. When the optimal consistent boundary is obtained for Gaussian kernel only [35] [36]. Among various methods, thenfurthersearchwillbestopped.Here,amongregularization consistencybasedmodelselection[37]isonlysuitableforany parameter (λ) and kernel parameter (σ), σ is given a higher PAPERSUBMITTEDTOIEEETRANS.ONSMC 6 Algorithm 2 Online RK-OC-ELM:Reconstruction Based Approach Input: Training set X= {(x1,c1),(x2,c2),...,(xN0,cN0),...,(xt,ct),...} Output: Whether Target or Outlier corresponding to each sample 1: Pass initial set of samples X0 to the classifier as: {(x1,x1),(x2,x2),...,(xN0,xN0)} // For first chunk of N samples, following steps are required 0 2: Employ kernel feature mapping: φ0 =φ(X0). 3: Output Weight β0 for (X0,X0): β P X 0 0 0 P0←←φ−01 b N 0 ← // For second chunk onwards, following steps are required φ φ 0 ← P P 0 ← 4: for i=1 to last chunk or block of data in X do 5: Size of chunk at the current stage = s 6: Calculate φv for ith block by Equation (6) 7: Calculate φu,v for ith block by Equation (7) 8: Update the final kernel matrix φ by using Equation (5) 9: Inverse of final kernel matrix φ, i.e. calculate P by using Equation (8) 10: Solve Equation (8) by Equation (9)-(12) 11: b=b+s 12: end for 13: Output Weight β =PX 14: Compute the predicted value by using output function f(Xk), which is defined in Equation (19) 15: if Threshold deciding Criteria is θ1 then 16: Calculate sum of square error (d) between predicted value (xˆt) of training sample and actual value (xt) of tth sample as follows: n d(xt)= (xjt −ˆxjt)2 (20) j=1 X 17: Sort the distances in decreasing order 18: Compute θ1 by Equation (15) 19: Use Equation (16) to decide whether a new sample z belongs to target or not 20: else if Threshold deciding Criteria is θ2 then 21: Calculate relative error (d) between predicted (xˆjt) and actual value (xjt) for jth attribute of tth sample as follows: xj xˆj d(xj)=abs t − t 100 (21) t xjt +xˆjt!∗ 22: For any tth sample with n attributes, Number of properly reconstructed attributes (prop reconst)=0 23: for j=1 to n do 24: if d(xjt)≤50% then 25: prop reconst prop reconst+1 ← 26: end if 27: if prop reconst 0.9 n then ≥b ∗ c 28: tth sample is target 29: else 30: tth sample is Outlier 31: end if 32: end for 33: end if PAPERSUBMITTEDTOIEEETRANS.ONSMC 7 artificial dataset with and without kernel feature mapping. Incremental svdd 4 otaurtglieetr Without kernel feature mapping indicates that random feature mapping is employed instead of kernel feature mapping. Just 2 for sake of notation we have given name to the one-class Feature 2-20 colnaes-scilfiaesrswEiLthMout(OkenrlnienlefeRa-tuOrCe-mEaLpMpin).gOasnolinnleineRr-eOgCul-aEriLzMed -4 is discussed for the both frameworks in the supplementary material provided with this paper. Further, impact of thresh- -6 old on boundary creation for artificial dataset is discussed. -8 -10 -8 -6 -4 -2 0 2 4 6 Classifier name with employed threshold criteria is denoted Feature 1 as classifier threshold in this paper. For e.g., online RK- Fig. 3: Class boundaries are estimated at different values of OC-ELM θ1andonlineRK-OC-ELM θ2denotethatonline σ in Incremental SVDD. Dashed line boundary at σ =9 and RK-OC-ELM employed θ1 and θ2 respectively as threshold complete line boundary at σ =4 for Banana dataset. deciding criteria. Same naming convention for classifiers’ name have been used during whole discussion now onwards TABLE I: Dataset Description in Section III and Section IV. Hence, notation of existing ELM based classifier is also changed from their paper based Dataset Nameof #Attributes #Targets #Outliers on which threshold criteria is employed as follows: Name TargetClass Breast 9 241 458 Benign (a) ELM based offline one-class classifiers with kernel fea- Cancer Diabetes 8 500 268 Present ture mapping: OCKELM θ1 is boundary framework Ecoli 7 52 284 Periplasm based approach. AAKELM θ and AAKELM θ are 1 2 Liver 6 145 200 Healthy reconstruction framework based approach Sonar 60 111 97 Mines (b) ELM based online one-class classifiers with random fea- Spectf 44 95 254 0 ture mapping: OS-OCELM θ is is boundary framework 1 basedapproach.OS-AAELM θ andOS-AAELM θ are 1 2 priority compared to λ. Suppose the consistent boundary is reconstruction framework based approach. obtained at two different parameter combinations, (σ , λ ) 1 1 1) Impact of Threshold on Boundary Creation for Arti- and (σ , λ ) then smaller σ is preferred over larger λ as 2 2 ficial Datasets: Threshold deciding criterion is a very crucial changing in small amount of sigma impacts more compared factorinone-classclassification.Weemployonethresholdde- to small amount of change in regularization parameter. Every ciding criterion θ with boundary framework based one-class possiblecombinationofσandλhavebeenemployedtoobtain 1 classifier and two threshold deciding criteria viz., θ and θ , most complex classifier as long as classifier is consistent. 1 2 with reconstruction framework based one-class classifier. Any Range of λ taken is [10 8,10 7,...,107,108] and range of σ − − one threshold deciding criterion is employed after the output is taken as twenty values between minimum and maximum obtainedatthelastlayeroftheone-classclassifiers.Theability pairwise distance among training samples of the dataset. oftheproposedmethodsonboundarycreationhasbeentested Further, just for showing the impact of appropriate parameter ontwoartificialdatasetsviz.,Banana-shapedandRing-shaped selection, two boundaries are exhibited in Fig. 3 on banana- datasets. These are used to test the ability of the proposed shaped dataset for incremetal SVDD. Dashed line shows the one-classclassifiersforconvexandinternalboundarycreation. boundary without optimal parameter selection and complete Convexity of the proposed methods is evaluated based on the lineboundaryshowstheboundaryusingtheoptimalparameter boundary creation ability around Banana shaped dataset. Fig. obtained by consistensy based model selection. 4exhibitsthebehavioroftheproposedone-classclassifierson creation of convex decision boundary around banana-shaped III. PERFORMANCEEVALUATION dataset. One more thing can be noticed in Fig. 4e and 4f that Inthissection,firstdiscusstheperformanceoftheproposed they don’t reject any sample during boundary creation due classifiers on two artificial datasets and then discuss perfor- to threshold deciding criteria (θ ). However, classifiers with 2 mance on six benchmark datasets. All experiments for this threshold θ reject 10% samples(can be seen from Fig. 4a- 1 paper have been executed on MATLAB 2011a in Windows 7 4d). (64bit)environmentwith4GBRAM,3.10GHzprocessorand Second artificial dataset is Ring-shaped dataset. Proposed Intel i5 processor. For implementing the existing classifiers, classifiers are tested on this dataset to test the ability of two toolboxes is used, which is proposed by Tax [10] and classifiers to create an internal boundary of the ring because Gautam [30]. internal boundary differentiates it from the only circular boundary. If parameters are not chosen appropriately, then A. Performance Comparison on Artificial Datasets classifier may create only outer circular boundary of the ring In this section, the impact of online regularized kernel on as it covers all samples within it but it also covers extra boundarycreationfortwoartificialdatasetsisdiscussed.Arti- void space within it. It is experienced that the performance ficialdatasetsarecreatedwiththehelpofPRToolBox[38]and of all methods over Ring-shaped dataset is similar to Banana each dataset is of size hundred. Boundary is created around dataset except online R-OC-ELM θ2 (same can be verified PAPERSUBMITTEDTOIEEETRANS.ONSMC 8 Online R-OC-ELM Online RK-OC-ELM Online R-OC-ELM with θ1 Online RK-OC-ELM with θ1 4 4 4 4 2 2 Feature 2-022 Feature 2-02 Feature 2-022 Feature 2-02 -4 -4 -4 -4 -6 -6 -6 -6 -4 -2 0 2 4 6 -4 -2 0 2 4 6 -4 -2 0 2 4 6 Feature 1 -4 -2 0 2 4 6 Feature 1 Feature 1 Feature 1 (b) Online Learning with (d) Online Learning with (a) Online Regularized One-Class (c) Online Regularized One-Class Regularized Kernel for One-Class Regularized Kernel for One-Class ELM wFritahmθe1w:Borokundary ELM with θ1:Boundary ELM witFhraθm1:eRweocorknstruction ELM with θ1:Reconstruction Framework Framework Online RK-OC-ELM with θ Online R-OC-ELM with θ 2 2 4 4 2 Feature 2-022 Feature 2-02 -4 -4 -6 -6 -4 -2 0 2 4 6 -4 -2 0 2 4 6 Feature 1 Feature 1 (f) Online Learning with (e) Online Regularized One-Class Regularized Kernel for One-Class ELM witFhraθm2:eRweocorknstruction ELM with θ2:Reconstruction Framework Fig. 4: Performance of Proposed Online Classifiers on Synthetic Banana dataset:Boundary and Reconstruction Framework based approaches in Fig. 5). Online R-OC-ELM θ2 classifier is not able to ationabilityasofflinekernelfeaturemappingbasedclassifiers create boundary around ring dataset as you can see in Fig. presented by Gautam et al. [30] (ii) Performance of kernel (5e). However, in contrast, when RK-OC-ELM is employed feature mapping is superior over random feature mapping, withθ2 (online RK-OC-ELM θ2)thenperformancewasjust therefore, it was necessary to implement it with online ELM opposite, it creates boundary perfectly without rejecting any based one-class classifiers. amount of sample outside of its boundary. This classifier is tested online RK-OC-ELM with various other artificial B. Performance Comparison on Benchmark Datasets datasets to check its boundary creation ability and found the Theperformanceoftheproposedmethodshavebeentested same performance. By above discussion, it is clear that θ is 2 onsixbenchmarkdatasetsasmentionedinTableI.Benchmark more suitable for reconstruction framework based approach. datasets are available at UCI Machine Learning Repository 2) Impact of Online Regularized Kernel on Boundary [39] in original format and available in one-class format on Creation for Artificial Datasets: In Fig. 4 and 5, with and the website of the TU Delft [40]. The datasets are prepared without online kernel feature mapping based methods (i.e. for OCC in the same way as prepared by Leng et al. [28] online RK-OC-ELM and online R-OC-ELM, respectively) and Gautam et al. [30]. Dataset is divided according to are kept side be side for better visualization. As you can their classes and assume normal samples as target class and see in Fig. 4, just adding regularization factor without kernel remaining of the classes as outlier class. And same procedure featuremapping(i.e.onlineR-OC-ELM)doesnothelpmuch is followed for each dataset. Further, randomly select 50% of in boundary creation as they are not able to create smooth target data for training and the remaining 50% is combined boundary for Banana dataset as well as they cover some extra with the outlier class for testing. It is to be noted that optimal voidspace.Itmeansthatdataisnotdescribedproperlybyone- parameter is calculated in the first run only. Hence, it speeds classclassifiers.However,whenkernelizedfeaturemappingis up the execution and saves a lot of time. Data is normalized embedded instead of random feature mapping and tested the between0and1usingmax-minnormalization.Classifiersare proposed classifier online RK-OC-ELM on Banana dataset experimented over twenty runs and then calculate average of then performance is significantly improved. online RK-OC- accuracy (ACC), area under curve (AUC = 1/2(Sensitivity + ELM based one-class classifiers create proper and smooth Specificity))andF-measure(F1)overallrunsasfinaloutput. boundariesaroundBananadataset.Overalltwopointsarenote All three performance measures are presented in the result here:(i)Theproposedclassifiersachievesimilarboundarycre- tables(TableII-VII)but,AUC isconsideredasaperformance PAPERSUBMITTEDTOIEEETRANS.ONSMC 9 Online R-OC-ELM Online RK-OC-ELM Online R-OC-ELM with θ1 Online RK-OC-ELM with θ1 1 1 1 1 0.5 0.5 Feature 20.50 Feature 2 0 Feature 20.50 Feature 2 0 -0.5 -0.5 -0.5 -0.5 -1 -1 -1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Feature 1 -1 -0.5 0 0.5 1 Feature 1 Feature 1 Feature 1 (b) Online Learning with (d) Online Learning with (a) Online Regularized One-Class (c) Online Regularized One-Class Regularized Kernel for One-Class Regularized Kernel for One-Class ELM wFritahmθe1w:Borokundary ELM with θ1:Boundary ELM witFhraθm1:eRweocorknstruction ELM with θ1:Reconstruction Framework Framework R-OC-ELM with θ Online RK-OC-ELM with θ2 2 1 1 0.5 0.5 Feature 2 0 Feature 2-0.50 -0.5 -1 -1 -1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1 Feature 1 Feature 1 (f) Online Learning with (e) Online Regularized One-Class Regularized Kernel for One-Class ELM witFhraθm2:eRweocorknstruction ELM with θ2:Reconstruction Framework Fig. 5: Performance of Proposed Online Classifiers on Synthetic Ring dataset:Boundary and Reconstruction Framework based approaches TABLE II: Boundary Framework: Performance of Various One-class Classifiers over Average of 20 Runs with their Standard Deviation in Bracket on Breast Cancer and Ecoli Dataset (Well separated classes) BreastCancer Ecoli Methods F1 ACC AUC F1 ACC AUC incsvdd[6] 94.18(1.2) 94.39(1.13) 94.36(1.14) 39.18(19.27) 72.9(20.17) 76.39(13.43) OCKELM θ1 [24] 93.22(1.89) 93.84(1.6) 93.68(1.64) 77.04(5.96) 96.35(0.82) 86.04(4.13) OS-OCELM θ1 [26] 86.96(2.19) 87.73(2.25) 87.64(2.21) 34.68(8.43) 74.35(11.34) 73.77(5.32) OnlineRK-OC-ELM θ1 94.97(1.55) 95.34(1.35) 95.22(1.39) 71.36(5.05) 94.56(1.24) 88.12(4.57) TABLE III: Reconstruction Framework: Performance of Various One-class Classifiers over Average of 20 Runs with their Standard Deviation in Bracket on Breast Cancer and Ecoli Dataset (Well separated classes) BreastCancer Ecoli Methods F1 ACC AUC F1 ACC AUC autoenc dd[6] 92.54(1.75) 93.2(1.49) 93.05(1.52) 46.61(13.28) 82.79(9.64) 80.39(6.81) AAKELM θ1 [26] 93.53(1.6) 94.1(1.38) 93.94(1.41) 20.31(1.38) 40.82(2.29) 63.25(3.5) AAKELM θ2 [26] 94.7(0.93) 95.11(0.81) 94.98(0.83) 56.98(7.78) 90.15(3.35) 83(3.5) OS-AAELM θ1 [26] 90.41(1.78) 91.49(1.43) 91.27(1.47) 50.83(15.56) 85.47(9.09) 80.54(5.37) OS-AAELM θ2 [26] 89.96(2.79) 89.55(3.23) 89.69(3.14) 54.01(9.92) 90.58(4.1) 78.09(5.67) OnlineRK-OC-ELM θ1 93.2(1.2) 93.79(1.03) 93.63(1.05) 20.03(2.25) 41.11(2.22) 62.62(5.27) OnlineRK-OC-ELM θ2 94.6(0.86) 95.01(0.75) 94.88(0.77) 55.54(8.89) 88.52(3.72) 85.26(2.9) PAPERSUBMITTEDTOIEEETRANS.ONSMC 10 TABLE IV: Boundary Framework: Performance of Various One-class Classifiers over Average of 20 Runs with their Standard Deviation in Bracket on Diabetes and Liver Dataset (Overlapping classes) Diabetes Liver Methods F1 ACC AUC F1 ACC AUC incsvdd[6] 66.51(0.73) 55.98(1.5) 57.15(1.43) 41.33(1.34) 32.17(0.91) 50.79(1.63) OCKELM θ1 [24] 68.86(1.31) 63.38(1.28) 64.07(1.28) 42.18(1.59) 42.06(1.77) 54.16(2.03) OS-OCELM θ1 [26] 61.66(2.15) 59.16(2.52) 59.46(2.45) 41.68(2.38) 48.66(4.35) 55.31(2.57) OnlineRK-OC-ELM θ1 69.32(0.89) 63.36(0.92) 64.11(0.91) 43.81(1.81) 45.83(1.45) 56.72(2.05) TABLE V: Reconstruction Framework: Performance of Various One-class Classifiers over Average of 20 Runs with their Standard Deviation in Bracket on Diabetes and Liver Dataset (Overlapping classes) Diabetes Liver Methods F1 ACC AUC F1 ACC AUC autoenc dd[6] 66.27(1.49) 57.48(2.65) 58.46(2.54) 42.26(2.03) 38.9(2.96) 53.52(2.51) AAKELM θ1 [26] 67.39(0.7) 60.52(0.98) 61.33(0.89) 41.21(2.03) 38.07(2.41) 52.24(1.59) AAKELM θ2 [26] 59.01(1.67) 59.48(1.58) 59.51(1.58) 42.36(1.38) 44.21(1.65) 54.86(1.62) OS-AAELM θ1 [26] 63.46(2.19) 63.23(1.71) 63.34(1.62) 38.92(4.68) 55.64(5.85) 55.3(3.32) OS-AAELM θ2 [26] 60.19(3.64) 55.72(3.15) 56.22(2.76) 38.16(5.54) 57.19(6.33) 55.47(2.5) OnlineRK-OC-ELM θ1 67.16(0.45) 60.45(0.99) 61.24(0.87) 41.03(1.95) 38.31(2.92) 52.12(1.41) OnlineRK-OC-ELM θ2 58.44(1.89) 59.37(1.39) 59.37(1.42) 42.63(1.37) 43.22(1.79) 54.9(1.72) TABLE VI: Boundary Framework: Performance of Various One-class Classifiers over Average of 20 Runs with their Standard Deviation in Bracket on Sonar and Spectf Dataset (High dimensional datasets) Sonar Spectf Methods F1 ACC AUC F1 ACC AUC incsvdd[6] 57.89(2.58) 55.59(2.48) 61.84(2.79) 46.2(2.26) 68.12(4.43) 75.87(1.88) OCKELM θ1 [24] 56.44(3.39) 64.18(3.02) 64.16(3.06) 53.75(3.44) 80.35(1.43) 77.48(3.17) OS-OCELM θ1 [26] 54.5(5.46) 61.25(5.22) 62.02(4.33) 30.91(7.54) 52.87(14.19) 57.25(7.54) OnlineRK-OC-ELM θ1 58.75(4.3) 70.23(2.99) 67.74(3.28) 57(3.81) 82.76(2.13) 78.95(3.8) measure criterion for further discussion as AUC provides terion θ has been employed in all existing and proposed 1 better insight of classifiers’ performance compared to ACC approaches presented in the the Table II, IV and VI. We and F1. Standard deviation of F1, ACC and AUC over comparetheproposedclassifierswithoneofthemostpopular twenty runs is also presented to get a proper understanding traditionalmethods,incrementalSVDDandrecentlyproposed of possible deviation in the performance over twenty runs. ELM based online and offline one-class classifiers. Proposed The proposed and existing ELM based online classifiers classifieronlineRK-OC-ELM θ1 istheonlineversionofthe have four parameters if use RBF kernel viz., the number of existingOCKELM θ1[28]classifier.Itisexpectedthatonline initial training data used in the initial phase of the online RK-OC-ELM θ1 will yield similar results as OCKELM θ1. sequential one-class classifier, size of blocks of data learned As you can see from the table, the proposed classifier online by online sequential one-class classifier in each iteration, RK-OC-ELM θ1 outperforms all existing online as well as regularization parameter (λ) and kernel parameter (σ). Out of offline classifier for all six datasets in term of AUC. It is ob- fourparameters,valuesoftwoparametersarefixedandvalues servedthatproposedkernelfeaturemappingbasedmethodon- fortherestofthetwoparameters(λandσ)havebeendecided line RK-OC-ELM θ1 performed better than existing random by consistency based model selection. The value of these two feature mapping based online method viz., OS-OCELM θ1 fixed parameters are taken as 10 and 5 for number of initial [30]. Superiority of kernel over random feature mapping was training data used in the initial phase and size of blocks of expected because Leng et al. [28] and [30] had also exhibited data learned by the classifier in each iteration, respectively. the same trend in their work for offline classifiers. In this section, first discuss performance comparison of 2) Performance Comparison of Reconstruction Frame- boundary framework based one-class classifiers and then pro- work Based Methods on Benchmark Datasets: Proposed vide discussion on reconstruction framework based one-class classifier online RK-OC-ELM has been employed with classifiers. two threshold criteria viz., θ1 and θ2 (i.e. online RK-OC- 1) Performance Comparison of Boundary Framework ELM θ1 andonline RK-OC-ELM θ2).Thesearetheonline Based Methods on Benchmark Datasets: Threshold cri- version of the existing offline classifiers AAKELM θ1 and

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.