SVM ( support vecotr machine) in Plain English

(1) Maximum the margin

SVM is very easy to understand on the graph,, we just need to find the a separate plane which maximum the margin. see the graph below:

(2) How to calculate/find the max Margin

Assuming hard-margin issue for the simplicity of math, the separate plane can be expressed as:

w*x -b = 0 where we can always adjust w, b so that:

w * x -b = +1, w*x – b = -1 on those supported vector ( minimized data points)

The distance will be 2/ || w||, thus we want to minimized :

min (over w,b) : 1/2 * || w || ^2 ( those 1/2 and ^2 is for easy math )

s.t: ( w * x(j) + b ) * y(j) >= 1 for any j.

(3) How to solve minimum issue with some constraints

The math for solving this kind of issue with constraints is using Lagrange multiplier method, the Lagrange function is:

L ( w, a) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 ) ( absorb b into w for writing simplicity )

where a(j) >=0 for any j.

The so called primal is: min(w)[ max (a) L(w, a) ]

( this can be proofed it is equivalent to the original problem , the intuition is: constraints is hard to solve, but with the Lagrange function, we can get rid of those constraints, also minus the biggest one respect to a, then find the minimized over w sounds reasonable )

normally we want to deal with so call dual problem which is: max(a) [ min(w,b) L( w, a) ] ( notice we exchange the order of max/min)

which will be the lower bound of the primal, under some condition, it is the dual gap will be 0.

and the dual is easier to solve it.

(4) Now let’s look at the dual problem:

To: max(a) [ min(w) L( w, a) ],

Let ‘s forget the max(a) first, just assuming a fixed a first, the internal function become min(w) , so to min:

L(w,b) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 )

we need to:

dL/dw = w – sum a(j) y(j) x(j) = 0

dL/db = sum a(j) y(j) = 0

Then we use the above conditions to feed into max(a) L(a), we got:

max L(a) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 ) ) ]

= max [ Sum a(i) – 1/2 Sum a(i)a(j) y(i)y(j) x(i) x(j) ] over a(i)

s.t.: Sum a(i) y(i) = 0, a(i) >=0

where w = sum a(j) y(j) x(j), b = y(k) – w* x(k)

Now the above problem become a standard quadratic programming, thus it is solvable now.

References:

https://en.wikipedia.org/wiki/Support_vector_machine

http://www.cs.cmu.edu/~guestrin/Class/10701-S07/Slides/kernels.pdf

http://cs229.stanford.edu/notes/cs229-notes3.pdf

http://www.svms.org/kkt/

SVM ( support vecotr machine) in Plain English

ByMin Wang

By Min Wang

Related Post

Q&A: Fine-Tuning and Guidance on diffusion models

Build your own GPU

How to run stable diffusion on CPU boxes

You missed

troubleshooing missing ip in k8s ( metallb-system)

Q&A: Fine-Tuning and Guidance on diffusion models

coding judge system

what is std::forward and universal reference