Geometric Interpretation of Whether SVMs are performing well or not

https://datascience.stackexchange.com/questions/10457

16-10-2019
|

문제

I came across this research paper which contained this figure

which talks about the center of mass (presumably, of the training dataset's datapoints?) and represents the solution of an SVM as polygon (or is it a point?). I'm having trouble understanding this figure, and since it seems to provide a geometric interpretation of when an SVM performs poorly or not, I'm interested in it. Any thoughts?

해결책

A most interesting paper! Following Opper and Haussler^$\dagger$, the authors define a version space; the set of unit vectors which separate the training samples:

$\mathcal V \equiv \{ \mathtt w | y_i f(\mathtt x_i) >0 , i= 1, \ldots, n, \| \mathtt w \|_2 = 1 \}$

Remember that we are dealing with a classification problem where $y_i \in \{-1, +1\}$, so we want our classifier $f(x_i)$ to have the same sign as $y_i$. It might help to recall that $\mathtt w = \sum_i \alpha_i \phi(\mathtt x_i)$ and $f(\mathtt x) = \left< \mathtt w, \phi(\mathtt x) \right> = \sum_i \alpha_i k(\mathtt x_i, \mathtt x)$

Normally $y_i \left( \left< \mathtt w , \phi(\mathtt x_i) \right> + b \right) \ge 1$. What they've done is to set $b=0$ and eliminate the margin (RHS). The length constraint is to ensure uniqueness.

The version space is illustrated as a region on the sphere as shown in Figs. 5 and 6. If the version space is shaped as in Fig. 5, the SVM solution is near to the optimal point. However, if it has an elongated shape as in Fig. 6, the SVM solution is far from the optimal one.

The reason there is an inscribed sphere is that:

the SVM solution coincides with the Tchebycheff-center of the version space, which is the center of the largest sphere contained in $\mathcal V$. However, the theoretical optimal point in version space yielding a Bayes-optimal decision boundary is the Bayes point, which is known to be closely approximated by the center of mass.

In other words, they are saying that the SVM classifier is not always Bayes optimal. Please see the references for proof. The hypersphere comes about when you consider the margin $\gamma \equiv \min y_i f(\mathtt x_i)$. The support vectors are then the boundaries of the version space which are tangent to the hypersphere.

A cited paper* follows this idea to develop an algorithm to actually estimate the center of mass "by averaging over the trajectory of a billiard ball bouncing in version space." If you're interested in diagnosing SVM failures, perhaps it would help to read that paper too, since it claims to offer a better algorithm instead.

$\dagger$ M. Opper and D. Haussler, “Generalization performance of Bayes optimal classification algorithm for learning a perceptron,” Phys. Rev. Lett., vol. 66, p. 2677, 1991.

* T. Graepel, R. Herbrich, and C. Campbell, “Bayes point machines: Estimating the bayes point in kernel space,” in Proc. IJCAI Workshop Support Vector Machines, 1999, pp. 23–27

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange