ebook img

Multi-view learning for multivariate performance measures optimization PDF

0.07 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview Multi-view learning for multivariate performance measures optimization

1 Multi-view learning for multivariate performance measures optimization Jim Jing-Yan Wang Abstract—Inthispaper,weproposetheproblemofoptimizing byformulatingtheproblemorhigh-dimensionaldata,and multivariate performance measures from multi-view data, and employingatwo-layercuttingplanealgorithmtosolveit. 5 an effective method to solve it. This problem has two features: Moreover, this method is also used in multiple-instance 1 the data points are presented by multiple views, and the target 0 of learning is to optimize complex multivariate performance learning problems. 2 measures.Weproposetolearnalineardiscriminantfunctionsfor Up to now, all these multivariate methods are limited to each view, and combinethemto constructa overall multivariate n learn from data with single view. However, in many real- mappingfunctionformult-viewdata.Tolearntheparametersof a worldmachinelearningproblems,thedatacanbepresentedby J the linear discriminant functions of different views to optimize multivariateperformancemeasures,weformulateaoptimization multipleviews.Forexample,incomputervisionproblems,we 6 problem.Inthisproblem,weproposetominimizethecomplexity can extract different types of features, such as color features, 1 of the linear discriminant functions of each view, encourage the texture features, and shape features, and each feature can be consistencesoftheresponsesofdifferentviewsoverthesamedata ] treated as a view. In scientific article classification problems, G points,andminimizetheupperboundaryofagivenmultivariate we can also learn from the views of article abstract, content, performance measure. To optimize this problem, we employ the L cutting-planemethodinaniterativealgorithm.Ineachiteration, and references. Differentviews of data may be complemental . weupdateasetofconstrains,andoptimizethemappingfunction to each other in a learning problem and using multiple views s c parameter of each view one by one. hasbeenapopularstrategyinmachinelearningcommunity.To [ IndexTerms—Multivariateperformancemeasures,Multi-view learn from multiple views of data, different multi-view learn- 2 learning, Cutting-plane algorithm ing methods have been proposed [4], [5], [6], [7]. However, v noneofthemaredesignedtooptimizedaspecificmultivariate 6 I. INTRODUCTION performance measure. 8 7 DIFFERENT multivariate performancemeasures are used To overcome this problem, in this paper, we propose the 3 to evaluate different machine learning applications [1]. problem of learning from multiple view data to optimize 0 the multivariate performance measures. Given a tuple of data For example, in problems of text classification, F1-score and 1. precision/recallbreakevenpoint(PRBEP)areusedtocompare points, each of them are presented with multiple views. The 0 true class labels against predicted class labels of a given test problem is to learn a multivariate mapping function to map 5 them to a tuple of class labels, so thatthe multivariateperfor- textset.Inimageretrievalproblems,theprecision/recallatthe 1 mance measures can be optimized. To solve this problem, we : top k returned images are used to evaluate the performance v proposed a novel method for multi-view learning to optimize of a retrieval system. Recently, the problem of multivariate i multivariate performance measures. The contribution of this X performance measures optimization are proposed to learn di- paper are of two folds: r rectlytolossesbasedonpre-definedmultivariateperformance a measures. Many algorithms have been proposed to solve this 1) We propose the problem of multi-view learning for problem, and some examples are listed as follows. multivariate performance measures. Although there are • Joachims[1] proposeda supportvectormachine(SVM)- plenty of multivariate performance measures optimiza- based for multivariate nonlinear performance measures tionmethods,theyarelimitedtosingleviewdata.There optimizationproblems.Theproposedalgorithmcan train are also lots of multi-view learning methods, however, a multivariate SVM in polynomial time for large classes none of them are proposed to optimize multivariate of potentially non-linear performance measures. More- performance measures. over,the traditionalSAM can be arised as a special case 2) We also propose a novel method to solve this problem. of this method. We proposedtolearna lineardiscriminantfunctionsfor • Li et al. [2] proposed a two-step approach to opti- eachview,andcombinethemtoconstructaoverallmul- mizemultivariateperformancemeasures,byfirsttraining tivariatemappingfunctiontopredicttheclasslabeltuple a nonlinear classifiers with existing learning methods, ofatupleofdatapoints.Tolearnthelineardiscriminant and then adapting it to optimize specific performance functions parameters of differentviews, we formulate a measures. In the seconde step, the classifier adaptation constrained minimization problem. In this problem, we problem can be solved as a quadratic program problem, proposedtominimizethecomplexityofeachlineardis- in a similar way to linear SVM. criminantfunctionsparameterbyminimizingitssquared • Mao and Tsang [3] proposed a novel feature selection ℓ2 norm, encourage the consistence among different method to optimize multivariate performance measures, views by minimizing the squared ℓ2 norm distance 2 among different linear discriminant functions response of differen views, and also minimize the losses based m 1 2 on a specific multivariate performance measures. The min kwjk2. (4) minimization problem is optimized by a cutting-plane wj|mj=1 2 j=1 X method in an alterative algorithm [8]. We maintain a 2) Encouraging consistences among different views: To activesetofconstrains,andupdateitineachiterationby encourage the consistences of different views, we pro- add a most violated class label tuple. We also optimize the multivariate mapping function parameters one by posed to minimize the squared ℓ2 norm distances of responsesofanytwolineardiscriminativefunctionsover one, and the optimization of a multivariate mapping one single data point, function parameter can be solve as a quadratic program problem. n secTtihoenrIeIs,twpearitnstroofdutcheisthpeapperropaoresedormgaentihzoedd, aasndfoinlloswecst:ioinn wmj|mji=n1 12j,j′:j′<j i=1kw⊤j xji −w⊤j′xij′k22!. (5) III, we conclude this paper. X X 3) Optimizing multivariate performance measures: To II. PROPOSED METHOD optimize a specific multivariate performance measure, A. Problem formulation we propose to minimize the upper boundary of a loss We assume we have a training data set of n data points, function∆basedonthismultivariateperformancemea- and each data point has m views. The training set is denoted sure, as {(xji|mj=1,yi)}|ni=1, where xji ∈ Rdj is the dj-dimensional feature vector of the j-th view of the i-th data point, and min ξ yi ∈{+1,−1} is the binary class label of the i-th data point. wj|mj=1,ξ Weproposetolearnamultivariatemappingfunctionhtomap s.t.∀y′ ∈Y/y : atupleofndatapointsofmviews,x=(xj|m ,··· ,xj|m ), 1 j=1 n j=1 m totupleofnclasslabels,y =(y1,··· ,yn).Toimplementthis w⊤ Ψ(xj,y)−Ψ(xj,y′) ≥∆(y′,y)−ξ, multivariate mapping function, we use a linear discriminant j j=1 function f (xj,y′) to predict the response of a view the the X (cid:2) (cid:3) (6) j data tuple, xj = (xj1,··· ,xjn), against a candidate class label whereξ isa slack variableofthe upperboundaryofthe tuple y′ =(y1′,··· ,yn′), loss function. n The overall optimization function are obtained by combin- f (xj,y′)= w⊤Ψ(xj,y′) (1) ing all the problems above, j j i=1 X where wj ∈Rdj is a parameter vector for the linear discrim- inant function of the j-th view, , and Ψ(xj,y) is a function 1 m 2 wanhdicyh′,cdanefirneetudrnass faovlleocwtos,r to describe the match between xj wjm|mj=in1,ξ2Xj=1kwjk2+C1ξ n n +C2 kw⊤xj −w⊤xj′k2 Ψ(xj,y′)= y′xj. (2) 2 j i j′ i 2  (7) i=1 i i j,jX′:j′<j Xi=1 ! X s.t.∀y′ ∈Y/y : Based on the linear discriminant functions of different views,  m we construct the multivariate mapping function as follows, w⊤ Ψ(xj,y)−Ψ(xj,y′) ≥∆(y′,y)−ξ. j j=1 m m X (cid:2) (cid:3) h(x)=argmaxy′∈Y fj(xj,y′)= w⊤j Ψ(xj,y′) In the objectiveofthis problem,the first term is to reducethe Xj=1 Xj=1  complexity of the parameter vector of each view, and second (3) term is a slack variable to represent the upper boundary of whereY ={+1,−1}nis a setof alladmissiblelabelvectors. themultivariateperformancemeasure,andthethirdterm isto Weproposetooptimizeacomplexmultivariateperformance encouragetheconsistencesoftheresponsesofdifferentviews. measure, ∆, by learning the parameter vectors wj|mj=1 for C1 and C2 are tradeoff parameters. the m views. To this end, we consider the following three problems, 1) Reducing the complexity of each linear discrimi- B. Problem optimization native function: To prevent over-fitting, we propose to reduce the complexity of the linear discriminative To solve this problem, we employ the cutting-plane al- function of each view by minimizing the squared ℓ2 gorithm. In an iterative algorithm, we update the parameter norm of its parameter, vectors w |m and an active set of constrains W alternately. j j=1 3 1) Updating w |m : When we have a given active set of To obtain the optimal points of w and ξ, we set the j j=1 j constrain W ⊆ Y/y, and optimize the parameter vectors, we gradients of Lagrangian function with respect to w and ξ j optimize wj one by one, i.e., when wj is optimized, wj′|j′6=j to zeros, and we have, is fixed. The following optimization is obtained in this case, ∂L ∂w =Ωwj −β− αy′γy′ =0 mwji,nξ21kwjk22+C1ξ j y′:Xy′∈W  ⇒wj =Ω−1β+ αy′γy′, +C2 n kw⊤xj −w⊤xj′k2 y′:Xy′∈W (12) 2 j i j′ i 2  ∂L   j′X:j′<j Xi=1 ! ∂ξ =C1− αy′ =0 s.t.∀y′ ∈W : y′:Xy′∈W  w⊤ Ψ(xj,y)−Ψ(xj,y′) ⇒ αy′ =C1. j ≥∆(cid:2)(y′,y)− w⊤j′ (cid:3)Ψ(xj′,y)−Ψ(xj′,y′) −ξ. y′:Xy′∈W j′X:j′6=j h i By substituting these results back to (11), we obtain the dual (8) problem as The Lagrange function of this problem is ⊤ L(wj,ξ,αy′|y′:y′∈W) 1 =12kwjk22+C1ξ αy′m|y′a:yx′∈W−2β+y′:Xy′∈Wαy′γy′ Ω−1β+y′:Xy′∈Wαy′γy′ n     + C22 j′:j′<j i=1kw⊤j xji −w⊤j′xji′k22! − αy′δy′ X X y′:Xy′∈W  − αy′w⊤j Ψ(xj,y)−Ψ(xj,y′) −∆(y′,y) s.t. ∀y′,y′ ∈W : αy′ ≥0, and αy′ =C1. y′:Xy′∈W (cid:2) (cid:3) y′:Xy′∈W (13)  We can solve this problem as a quadratic program problem. + w⊤ Ψ(xj′,y)−Ψ(xj′,y′) +ξ j′  2) Updating W: To update W, we first find the most j′X:j′6=j h i violated y′ and then add it the W. y′ is obtained as n  =12w⊤j Ωwj +C1ξ−w⊤j β+ C22 w⊤j′xji′xji′⊤wj′ j′X:j′<jXi=1 m − αy′ w⊤j γy′ −δy′ +ξ y′ =argmaxy′′∈Y/y∆(y′′,y)+ w⊤j Ψ(xj,y′′) y′:Xy′∈W (cid:0) (cid:1) (9)  Xj=1 (14) m where αy′ ≥ 0 is the Lagrange multiplier for the − w⊤Ψ(xj,y) . constrain w⊤ Ψ(xj,y)−Ψ(xj,y′) ≥ ∆(y′,y) − j  w⊤ jΨ(xj′,y)−Ψ(xj′,y′) −ξ, and Xj=1  j′:j′6=j j′ (cid:2) (cid:3) 3) Alterative algorithm: The proposediterative algorithm h i P is given in Algorithm 1. n Ω= I+C2 xjxj⊤ ,  i i  j′:j′<ji=1 X X III. CONCLUSION  n  β =C2 xjixji′⊤wj′, Inthispaper,weproposetheproblemofmulti-viewlearning j′:j′<ji=1 X X for multivariate performancemeasures, and a novelalgorithm γ = Ψ(xj,y)−Ψ(xj,y′) y′ to solve it. Multivariate performance measures optimization (cid:2) (cid:3) can obtain a predictor which directly optimize a desired δy′ =∆(y′,y)− w⊤j′ Ψ(xj′,y)−Ψ(xj′,y′) . performance measure. However, the existing methods are j′X:j′6=j h i limitedtosingleviewdata,andcannotutilizemulti-viewdata. (10)   We propose to learn from multi-view data to optimize the Thisoptimizationproblemcanbe transferredto its dualform, multivariateperformancemeasures,andithaspotentialinreal- max min L(wj,ξ,αy′|y′:y′∈W) worldapplications.Theproposedmethodcanbeimplemented αy′|y′:y′∈W wj,ξ (11) easily due to its employment of cutting plane method and s.t. ∀y′,y′ ∈W : αy′ ≥0. quadratic program. 4 Algorithm 1 Iterative multi-view learning algorithm for mul- tivariate performance measures. Input: Multi-view training data set {(xj|m ,y )}|n ; i j=1 i i=1 Input: Tradeoff parameters C1 and C2; Input: The maximum iteration number T. Initialize linear discriminative function parameter w0 for j each view; W ←∅; Calculate Ω as in (10); for t=1,··· ,T do Find the most violated class label tuple y′t as in (14) by fixing wt−1|m ; j j=1 Add y′t to the constrain active set W, W ←W∪{y′t}; for j =1,··· ,m do Updatewtj bysolving(13)byfixingwtj−′ 1|j′6=j andW; end for end for Output: wT|m . j j=1 REFERENCES [1] T. Joachims, “A support vector method for multivariate performance measures,” in Proceedings of the 22nd international conference on Machine learning. ACM,2005,pp.377–384. [2] N. Li, I. W. Tsang, and Z.-H. Zhou, “Efficient optimization of perfor- mancemeasuresbyclassifieradaptation,” PatternAnalysisandMachine Intelligence, IEEETransactionson,vol.35,no.6,pp.1370–1382,2013. [3] Q. Mao and I.-H. Tsang, “A feature selection method for multivariate performancemeasures,”PatternAnalysisandMachineIntelligence,IEEE Transactions on,vol.35,no.9,pp.2051–2063, 2013. [4] V. Sindhwani and D. S. Rosenberg, “An rkhs for multi-view learning andmanifoldco-regularization,” inProceedingsofthe25thinternational conference onMachinelearning. ACM,2008,pp.976–983. [5] S.Z.Li,L.Zhu,Z.Zhang,A.Blake,H.Zhang,andH.Shum,“Statistical learning ofmulti-view face detection,” inComputer VisionłECCV 2002. Springer, 2002,pp.67–81. [6] C.Christoudias, R.Urtasun, andT.Darrell, “Multi-view learning inthe presence ofview disagreement,” arXivpreprintarXiv:1206.3242,2012. [7] W.Li,L.Duan,I.W.-H.Tsang,andD.Xu,“Co-labeling: Anewmulti- view learning approach for ambiguous problems.” in ICDM, 2012, pp. 419–428. [8] J.E.Kelley,Jr,“Thecutting-planemethodforsolvingconvexprograms,” JournaloftheSocietyforIndustrial&AppliedMathematics,vol.8,no.4, pp.703–712, 1960.

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.