What's the relationship between an SVM and hinge loss?

https://datascience.stackexchange.com/questions/9420

16-10-2019
|

Question

My colleague and I are trying to wrap our heads around the difference between logistic regression and an SVM. Clearly they are optimizing different objective functions. Is an SVM as simple as saying it's a discriminative classifier that simply optimizes the hinge loss? Or is it more complex than that? How do the support vectors come into play? What about the slack variables? Why can't you have deep SVM's the way you can't you have a deep neural network with sigmoid activation functions?

Solution

They are both discriminative models, yes. The logistic regression loss function is conceptually a function of all points. Correctly classified points add very little to the loss function, adding more if they are close to the boundary. The points near the boundary are therefore more important to the loss and therefore deciding how good the boundary is.

SVM uses a hinge loss, which conceptually puts the emphasis on the boundary points. Anything farther than the closest points contributes nothing to the loss because of the "hinge" (the max) in the function. Those closest points are the support vectors, simply. Therefore it actually reduces to picking a boundary that creates the largest margin -- distance to closest point. The theory is that the boundary case is all that really matters to generalization.

The downside is that hinge loss is not differentiable, but that just means it takes more math to discover how to optimize it via Lagrange multipliers. It doesn't really handle the case where data isn't linearly separable. Slack variables are a trick that lets this possibility be incorporated cleanly into the optimization problem.

You can use hinge loss with "deep learning", e.g. http://arxiv.org/pdf/1306.0239.pdf

Licensed under: CC-BY-SA with attribution

Not affiliated with datascience.stackexchange