Question

I am exploring the SVM topic using SVM-light by Thorsten Joachims.

Now according to some introduction papers:

"The VC-dimension of the set of oriented hyperplanes in Rn is n+1 [...]"

"When C = inf, the optimal hyperplane will be the one that completely separates the data (assuming one exists) [...]"

I have prepared a two-dimensional linearly separable dataset and wanted to see the 2d hard-margin classifier, that we know from so many illustrations.

So I chose the following parameters:

  • polynomial Kernel (a*b+c)d with d = 2
  • C = 999 (so as to approach inf)

I get 3 support vectors, which is fine, but the estimated VC-dimension is more than 10,000.

Now I wonder if it is possible to have such high VCdim if the kernel is only two dimensional?

Was it helpful?

Solution

You seem to be confusing few things here:

  1. Polynomial kernel is not a "2 dimensional kernel", polynomial kernel maps to the roughly O(md) dimensional space
  2. Empirical VC dimension is not a true VC dimension, the true VC dimension is the analytical object, that cannot be directly computed "from the data", it requires rigorous proofs, and one (of a few existing) says that for n-dimensional space, VC dimension of the linear classifiers is n+1, it holds no matter how you "get" to this space.
  3. Number of support vectors is somehow related to the generalization abilities of the model, and so is VC dimension. Unfortunately there is no easy "one-to-one" dependency between the number of support vectors and the VC dimension of the data. In fact, as far as I know, there is no known proof of the VC dimension of the SVM model (we know, that it is a gap tolerant classifier, which should have lower VC dimension, but it is far from being a dimension proof).

OTHER TIPS

The VC dimension doesn't map to the amount of SVs of a given solution.

The VC dimension is the maximum number of dataset samples that can be shattered perfectly by the model for any combination of the labels associate with those points. On the other hand, support vectors are the points that define the hyperplane.

EDIT:

I am extending a bit my answer based on your comment.

First, when you say this:

"When C = inf, the optimal hyperplane will be the one that completely separates the data (assuming one exists) [...]"

That doesn't mean that there is a direct relation between C and the VC dimension (as you suggested when you say that C=999 would produce a VC-dimension of 10000). What that does mean is with C = inf you would enforce all constraints and hence producing hard margin model (an hyperplane that separates completely the data)

"whether the fact that the VC-dimension of the set of oriented hyperplanes in R^n is n+1 still holds when applying a polynomial kernel that maps into 2 dimensions"

That would be true in the feature space, but keep in mind that the decision boundary in input space won't be an hyperplane anymore, in fact would be non-linear.

"polynomial Kernel (a*b+c)^d with d = 2" ... " if the kernel is only two dimensional?"

That kernel is not bidimesional, it would map differently depending on the other parameters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top