First, sigmoid function is rarely the kernel. In fact, for almost none values of parameters it is known to induce the valid kernel (in the Mercer's sense).
Second, coef0 is not an intercept term, it is a parameter of the kernel projection, which can be used to overcome one of the important issues with the polynomial kernel. In general, just using coef0=0 should be just fine, but polynomial kernel has one issue, with p->inf, it more and more separates pairs of points, for which <x,y>
is smaller than 1 and <a,b>
with bigger value. it is because powers of values smaller than one gets closer and closer to 0, while the same power of value bigger than one grows to infinity. You can use coef0 to "scale" your data so there is no such distinction - you can add 1-min <x,y>
, so no values are smaller than 1 . If you really feel the need for tuning this parameter, I would suggest search in the range of [min(1-min , 0),max(<x,y>
)], where max is computed through all the training set.