It seems like it's doing the libsvm's predict equivalent, but why does it changes the sign of decision values, if it's the equivalent of ?
These are just implementation hacks regarding internal representation of class signs. Nothing to truly be worried about.
sklearn decision_function
is the value of inner product between SVM's hyerplane w
and your data x
(possibly in the kernel induced space), so you can use it, shift or analyze. Its interpretation, however is very abstract, as in case of rbf kernel it is simply the integral of the product of normal distribution centered in x
with variance equal to 1/(2*gamma)
and the weighted sum of normal distributions centered in support vectors (and the same variance), where weights are alpha
coefficients.
Also, is there any way to calculate confidence value for an SVM decision using this value or any prediction
Platt's scaling is used not because there is some "lobby" forcing us to - simply this is the "correct" way of estimating SVM's confidence. However, if you are not interested in "probability sense" confidence, but rather any value that you can qualitatively compare (which point is more confident) than decision function can be used to do it. It is roughly the distance between the point image in kernel space and the separating hyperplane (up to the normalizing constant being the norm of w
). So it is true, that
abs(decision_function(x1)) < abs(decision_function(x2)) =>
x1
is less confident than x2
.
In short - bigger the decision_function
value, the "deeper" the point is in its hyperplane.