Introduction to Support Vector Machines in Machine Learning

栏目: IT技术 · 发布时间: 5年前

内容简介：An SVM model is the representation of the dataset as points in space so that the example of the separate categories is divided by a clear gap which is as wide as possible.1. Maximal Margin Classifier2. Support vector classifier3. Support Vector Machines4.

SVM or support vector machines are supervised learning models that analyze data and recognize patterns on its own. They are used for both classification and regression analysis.

An SVM model is the representation of the dataset as points in space so that the example of the separate categories is divided by a clear gap which is as wide as possible.

1. Maximal Margin Classifier2. Support vector classifier3. Support Vector Machines4. Support Vector Machine for more than two classes

Any new incoming data is then mapped to one of these few categories based on which side of the gap they fall on.

Introduction to Support Vector Machines in Machine Learning

For example, in the above image, we can clearly see that there are two categories in the dataset. Category blue and category pink.Our aim is to differentiate between the two categories. One simple way of doing that is to draw a line between the two categories. But as we can see, there is an infinite number of lines that can clearly divide the dataset into two parts.What we actually do is that we can choose a hyperplane that maximizes the margin between the classes. The data points (Vectors) touching the two outer lines are called support vectors.This simple example of two-dimension linear plots can be further used in a dataset having more dimensions. Each time our idea will be to draw a hyperplane that can divide the data into different categories.Now we are going to talk about some related mathematics and discuss different terms related to SVM.

In general, the discussion of SVM is divided into three parts according to how SVM evolved.

Maximal Margin Classifier
Support Vector Classifier
Support Vector Machine

We will slowly move the article toward the Support Vector Machine but for a proper understanding of SVM’s we have to go through, Maximal Margin Classifier and Support Vector Classifier.

Maximal Margin Classifier

Maximal Margin Classifier is a model that is used to classify the observations into two parts using a hyperplane.

What is a Hyperplane?

Simply put, a hyperplane is a subspace in p-dimensional space having p -1 dimensions. For example, in two-dimensional space, the hyperplane will be of 1 dimension, or it will be a line. Similarly, in the case of 3 dimensions, it will be a two-dimensional plane.

In two dimensions the equation of the hyperplane are given by,

$\beta_0 + \beta_1X_1 + \beta_2X_2 = 0$

$where\ vector\ (X1,\ X2)\ is\ on\ the\ hyperplane$

We can also find some similarities of this equation with the equation of a line.It’s fairly easy to extend this equation and find the equation of a hyperplane in p

dimensions.

$\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p = 0$

Now if,

$\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p = 0 > 0$

Then the vector is on the one side of the hyperplane and if,

$\beta_0 + \beta_1X_1 + \beta_2X_2 + ... + \beta_pX_p = 0 < 0$

The vector is on the other side of the plane.To sum it up, our main aim in case of Maximal Margin Classifier is to create a hyperplane fitted on a training data of n X p matrix X , containing n training observations in p -dimensional space such that all these vectors falls in one of the two classes divided by the hyperplane.

If we represent the classes(labels) for all the n values,

$y_1, ..., y_n\ \epsilon \{-1, 1\}$

Where -1 represents one class and 1 represents the other class.

Our main aim for any incoming test vector,

$x^* = (x_1^*\ ...\ x_p^*)^T$

is that our model has to allot this incoming test vector to one of the two classes. This equation given the class of the incoming test vector.

$f_x^* = \beta_0 + \beta_1x_{1}^* + \beta_2x_{2}^* + ... + \beta_px_{p}^*$

If the value of this function is positive, we assign it to class 1, otherwise, we assign it to class -1.

A simple issue in this approach is that there are infinite number of hyperplanes possible that can divide a perfect distribution.

Introduction to Support Vector Machines in Machine Learning

The problem reduces to choosing the best hyperplane possible which divides the observations into two parts.A natural choice is to find the perpendicular distance of each observation from the potential hyperplanes, the one which produces the maximum margin

from both the sides is chosen as the hyperplane.

Introduction to Support Vector Machines in Machine Learning

Once we have the hyperplane, it is fairly easy to predict the classes of test observations.The only assumption that we are making here is that a hyperplane dividing the observations in the training set will also divide the observations in the test set, which is not always true. Therefore, this model can lead to overfitting when p is large.We have already discussed that the points on dashed line are called support vectors and it has been found that the position of hyperplane only depends on support vector and is not dependent on the other observations in the dataset.

This is how we define a Maximal margin classifier. There are a few issues with Maximal Margin Classifier.

It doesn’t work on observations where no clear hyperplane is present between different classes.
A small addition of observation near the hyperplane can lead to a lot of change in the hyperplane making it a lot volatile.

To cope with the disadvantages of Maximal Margin classifiers, we are introduced to the concept of soft margin.

Simly put, we create a hyperplane which almost separates all of the classes, rather than surely separating all of the classes. This brings us to the concept of Support Vector Classifier.

Support vector classifier

In case of Support vector classifiers, we allow a few observations to be on the wrong side of hyperplane making the model a little more robust to individual observation and helps us to better classify other and most of the observations.

Support vector classifier is also known as a soft margin classifier.

The observations on the wrong side of the hyperplane are obviously misclassified by the model. But this helps to improve the overall accuracy of the model.

Introduction to Support Vector Machines in Machine Learning

There is not much difference in the idea behind the generation of the model. In case of support vector classifier as well, we want to maximize the value of Margin.

$y_i(\beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + ... + \beta_px_{ip}) \geq M(1 - \epsilon_{i})$

$where\ \epsilon_i \geq 0\ and\ \displaystyle \sum _{i=1}^{n} \epsilon_i \leq C$

For any given observation vector on either side of plane, epsilon ( also called a slack variable), gives the point at which it is located, relative to the hyperplane and margin.If i

th slack variable is on the right side of the hyperplane then the value of that variable is 0. Also, if

$\epsilon_i > 0$

then the point i

is on the wrong side of the margin. But if,

$\epsilon_i > 1$

then the slack variable is on the wrong side of the hyperplane.If we extend this observation to the tuning variable, C , we can deduce that C is the number which determines the count and severity of the violations to the margins and the hyperplane.The value of C is considered as the tuning parameter which is generally chosen by cross-validation. C also controls the bias-variance trade-off for the model.If the value of C is small, we allow a lesser number of observations to be on the wrong side which will fit perfectly to a data set having data with high bias and low variance and vice-versa.Again similar to Maximal Margin Classifier, it was found that all the observations don’t get to decide the position of a hyperplane of the Margin. It is only dependent on the observations on or inside the margins.

If we expand these points a little we can get to the Support Vector Machines. Let’s discuss them in some detail.

Support Vector Machines

In the Support vector Machine, we introduce another factor called the kernel, which is the result of enlarging of support vector classifiers in a specific way.

According to our discussions in support vector classifier, its equation can be re-written as,

$f(x) = \beta_0 +\displaystyle \sum_{i=1}^n \alpha_i<x, x_i>$

$where\ <x, x_i>\ is\ the\ inner\ product\ between\ the\ new\ point\ x\ and\ other\ x_i\ points$

The implementation of the inner product is hidden on purpose and we should be good without knowing the details of it.

We can directly replace all the instances of the inner product with a general term called the kernel.

$f(x) = \beta_0 +\displaystyle \sum_{i \epsilon S} \alpha_iK(x, x_i)$

as only support vectors are responsible for the creation of the hyperplane.For p

planes equation of kernel becomes,

$K(x_i, x_{i^`}) = (1 +\displaystyle \sum_{j=1}^p x_{ij}x{i`j} )^d$

which is known as a polynomial kernel of degree d

. This type of model leads to much flexible decision boundary.

Support Vector Machine for more than two classes

During the discussion for support vector machines, we haven’t really talked about the case when the number of possible classifications can be more than 2.

We can solve these problems by extending the simple SVM in two ways.

One versus One Classification

In this type of classification, we compare each class with another class one by one and trying to classify each incoming vector to one of the two possible classes chosen at the given instance.

Finally, for a vector, we will choose the class to which it belonged to most of the time during the training of the model.

One versus All Classification

At each instance, we compare of the K class to the remaining K - 1

classes. Finally, we will assign any upcoming test vector to the class which produces the highest values of the constant, or we want to maximize.

$\beta_0k +\beta_{1k}x_1^* +\beta_{2k}x_2^* +...+\beta_{pk}x_p^*$

Thats it for this version of Support Vector Machine discussion. Feel free to express your thoughts in the comments and share this post with your freinds.

以上所述就是小编给大家介绍的《Introduction to Support Vector Machines in Machine Learning》，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对码农网的支持！

查看所有标签

猜你喜欢:

Introduction to Support Vector Machines in Machine Learning

本站部分资源来源于网络，本站转载出于传递更多信息之目的，版权归原作者或者来源机构所有，如转载稿涉及版权问题，请联系我们。

码农书籍

Python源码剖析

陈儒 / 电子工业出版社 / 2008-6 / 69.80元

作为主流的动态语言，Python不仅简单易学、移植性好，而且拥有强大丰富的库的支持。此外，Python强大的可扩展性，让开发人员既可以非常容易地利用C/C++编写Python的扩展模块，还能将Python嵌入到C/C++程序中，为自己的系统添加动态扩展和动态编程的能力。. 为了更好地利用Python语言，无论是使用Python语言本身，还是将Python与C/C++交互使用，深刻理解Pyth......一起来看看《Python源码剖析》这本书的介绍吧!

码农工具