ebook img

A short Introduction to Support Vector Machines and Kernelbased Learning PDF

21 Pages·2003·1.158 MB·English
Save to my drive
Quick download
Download
Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.

Preview A short Introduction to Support Vector Machines and Kernelbased Learning

A (short) Introduction to Support Vector Machines and Kernelbased Learning Johan Suykens K.U. Leuven, ESAT-SCD-SISTA Kasteelpark Arenberg 10 B-3001 Leuven (Heverlee), Belgium Tel: 32/16/32 18 02 - Fax: 32/16/32 19 70 Email: [email protected] http://www.esat.kuleuven.ac.be/sista/members/suykens.html ESANN 2003, Bruges April 2003 ¦ Overview • Disadvantages of classical neural nets • SVM properties and standard SVM classifier • Related kernelbased learning methods • Use of the “kernel trick” (Mercer Theorem) • LS-SVMs: extending the SVM framework • Towards a next generation of universally applicable models? • The problem of learning and generalization Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 1 x w 1 1 Classical MLPs w y x 2 2 h(·) w 3 x 3 w n x n b h(·) 1 Multilayer Perceptron (MLP) properties: • Universal approximation of continuous nonlinear functions • Learning from input-output patterns; either off-line or on-line learning • Parallel network architecture, multiple inputs and outputs Use in feedforward and recurrent networks Use in supervised and unsupervised learning applications Problems: Existence of many local minima! How many neurons needed for a given task? Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 2 Support Vector Machines (SVM) cost function cost function MLP SVM weights weights • Nonlinear classification and function estimation by convex optimization with a unique solution and primal-dual interpretations. • Number of neurons automatically follows from a convex program. • Learning and generalization in huge dimensional input spaces (able to avoid the curse of dimensionality!). • Use of kernels (e.g. linear, polynomial, RBF, MLP, splines, ... ). Application-specific kernels possible (e.g. textmining, bioinformatics) Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 3 SVM: support vectors 1 3 0.9 2.5 2 0.8 1.5 0.7 1 0.6 0.5 2 2 x0.5 x 0 0.4 −0.5 0.3 −1 0.2 −1.5 0.1 −2 0 −2.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 −2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 x x 1 1 • Decision boundary can be expressed in terms of a limited number of support vectors (subset of given training data); sparseness property • Classifier follows from the solution to a convex QP problem. Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 4 SVMs: living in two worlds ... → Primal space: ( large data sets) Parametric: estimate w ∈ Rnh T y(x) = sign[w ϕ(x) + b] ϕ(x) ϕ1(x) w 1 y(x) x + + w n + h + + + + + + ϕnh(x) + x + + K(x , x ) = ϕ(x )Tϕ(x ) (“Kernel trick”) x x i j i j x x x x + x x + x x → x x x Dual space: ( high dimensional inputs) x x Input space Non-parametric: estimate α ∈ RN #sv y(x) = sign[P α y K(x, x ) + b] i=1 i i i K(x,x ) 1 Feature space α 1 y(x) x α #sv K(x,x ) #sv Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 5 Standard SVM classifier (1) N Rn • Training set {x , y } : inputs x ∈ ; class labels y ∈ {−1, +1} i i i=1 i i T • Classifier: y(x) = sign[w ϕ(x) + b] Rn Rn with ϕ(·) : → h a mapping to a high dimensional feature space (which can be infinite dimensional!) • For separable data, assume T w ϕ(x ) + b ≥ +1, if y = +1 i i T ⇒ y [w ϕ(x ) + b] ≥ 1, ∀i T i i ½ w ϕ(x ) + b ≤ −1, if y = −1 i i • Optimization problem (non-separable case): N 1 y [wTϕ(x ) + b] ≥ 1 − ξ T i i i min J (w, ξ) = w w + c ξ s.t. i 2 ½ ξ ≥ 0, i = 1, ..., N w,b,ξ i X i=1 Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 6 Standard SVM classifier (2) • Lagrangian: N N T L(w, b, ξ; α, ν) = J (w, ξ) − α {y [w ϕ(x ) + b] − 1 + ξ } − ν ξ i i i i i i X X i=1 i=1 • Find saddle point: max min L(w, b, ξ; α, ν) α,ν w,b,ξ • One obtains N ∂L = 0 → w = α y ϕ(x )  i i i ∂w  X  i=1   N    ∂L  = 0 → α y = 0 i i ∂b X i=1  ∂L  = 0 → 0 ≤ α ≤ c, i = 1, ..., N  i  ∂ξ  i    Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 7 Standard SVM classifier (3) • Dual problem: QP problem N N N 1 α y = 0  i i max Q(α) = − y y K(x , x ) α α + α s.t. i j i j i j j α 2  Xi=1 X X i,j=1 j=1 0 ≤ α ≤ c, ∀i i   T with kernel trick (Mercer Theorem): K(x , x ) = ϕ(x ) ϕ(x ) i j i j N • Obtained classifier: y(x) = sign[ α y K(x, x ) + b] i i i i=1 P Some possible kernels K(·, ·): T K(x, x ) = x x (linear SVM) i i T d K(x, x ) = (x x + τ) (polynomial SVM of degree d) i i K(x, x ) = exp{−kx − x k2/σ2} (RBF kernel) i i 2 T K(x, x ) = tanh(κ x x + θ) (MLP kernel) i i Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 8 Kernelbased learning: many related methods and fields SVMs Regularization networks Some early history on RKHS: RKHS LS-SVMs 1910-1920: Moore ? = 1940: Aronszajn 1951: Krige 1970: Parzen Gaussian processes 1971: Kimeldorf & Wahba Kernel ridge regression Kriging SVMs are closely related to learning in Reproducing Kernel Hilbert Spaces Introduction to SVM and kernelbased learning ¦ Johan Suykens ¦ ESANN 2003 9

See more

The list of books you might like

Most books are stored in the elastic cloud where traffic is expensive. For this reason, we have a limit on daily download.