(1) Maximum the margin
SVM is very easy to understand on the graph,, we just need to find the a separate plane which maximum the margin. see the graph below:
(2) How to calculate/find the max Margin
Assuming hard-margin issue for the simplicity of math, the separate plane can be expressed as:
w*x -b = 0 where we can always adjust w, b so that:
w * x -b = +1, w*x – b = -1 on those supported vector ( minimized data points)
The distance will be 2/ || w||, thus we want to minimized :
min (over w,b) : 1/2 * || w || ^2 ( those 1/2 and ^2 is for easy math )
s.t: ( w * x(j) + b ) * y(j) >= 1 for any j.
(3) How to solve minimum issue with some constraints
The math for solving this kind of issue with constraints is using Lagrange multiplier method, the Lagrange function is:
L ( w, a) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 ) ( absorb b into w for writing simplicity )
where a(j) >=0 for any j.
The so called primal is: min(w)[ max (a) L(w, a) ]
( this can be proofed it is equivalent to the original problem , the intuition is: constraints is hard to solve, but with the Lagrange function, we can get rid of those constraints, also minus the biggest one respect to a, then find the minimized over w sounds reasonable )
normally we want to deal with so call dual problem which is: max(a) [ min(w,b) L( w, a) ] ( notice we exchange the order of max/min)
which will be the lower bound of the primal, under some condition, it is the dual gap will be 0.
and the dual is easier to solve it.
(4) Now let’s look at the dual problem:
To: max(a) [ min(w) L( w, a) ],
Let ‘s forget the max(a) first, just assuming a fixed a first, the internal function become min(w) , so to min:
L(w,b) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 )
we need to:
dL/dw = w – sum a(j) y(j) x(j) = 0
dL/db = sum a(j) y(j) = 0
Then we use the above conditions to feed into max(a) L(a), we got:
max L(a) = 1/2 * || w || ^2 – SUM a(j) *( ( w * x(j) + b ) * y(j) -1 ) ) ]
= max [ Sum a(i) – 1/2 Sum a(i)a(j) y(i)y(j) x(i) x(j) ] over a(i)
s.t.: Sum a(i) y(i) = 0, a(i) >=0
where w = sum a(j) y(j) x(j), b = y(k) – w* x(k)
Now the above problem become a standard quadratic programming, thus it is solvable now.
References:
https://en.wikipedia.org/wiki/Support_vector_machine
http://www.cs.cmu.edu/~guestrin/Class/10701-S07/Slides/kernels.pdf
http://cs229.stanford.edu/notes/cs229-notes3.pdf