Question

I am dealing with a text classification problem (sentiment analysis). I would like to know if there is any option in scikit-learn to add a "weight" (as a measure of importance) to a feature. I checked the documentation and found the attribute "coefs" of SVC, defined below:

    coef_   array, shape = [n_class-1, n_features]  
    Weights asigned to the features (coefficients in the primal problem). 
   This is only available in the case of linear kernel.coef_ is readonly property derived from dual_coef_ and support_vectors_ 

However, this attribute seems to be read-only.

Was it helpful?

Solution

The coef_ vectors is a view on the parameters learned by the machine learning algorithm. It does not make sense to set them by hand as they are automatically tuned optimally from the data. What you can do instead is:

  • set class_weight if you have prior knowledge about some classes being more important than others

  • set sample_weight if if you have prior knowledge about some samples (rows in the datasets) being more important than others

  • rescale the features to make some have more variance than others for instance if you use a RBF kernel and would like to make some feature more important than other (usually it's best to scale all feature to unit variance though)

  • use a custom precomputed kernel if you use kernels and want to encode special prior knowledge this way.

For text classification, the data is high dim and a kernel is usually just wasting resources for little or no added predictive accuracy, so the last two points are probably not relevant to your specific problem.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top