Why SVM gridsearch takes longer time?
Question
I have a dataset of 5K records and 60 features focussed on binary classification. Please find my code below for SVM paramter tuning. It's running for a longer time than Xgb
.LR
and Rf
. The other algorithms mentioned returned results within minutes (10-15 mins) whereas SVM is running for more than 45 mins.
Questions
1) Is SVM usually slower and takes longer time?
2) Is there any issue with my code below?
3) How can I make the gridsearch faster?
from sklearn.svm import SVC
param_grid = {'C': [0.001,0.01,0.1,1,10,100,1000],
'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
'kernel': ['linear', 'rbf','poly'],
'class_weight':['balanced']}
svm=SVC()
svm_cv=GridSearchCV(svm,param_grid,cv=5)
svm_cv.fit(X_train_std,y_train)
Solution
Simple, optimization problem of SVM is of quadratic order. Just check first line of documentation
"The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples."
OTHER TIPS
1) I will cite Noah Weber's answer
The fit time scales at least quadratically with the number of samples and may be impractical beyond tens of thousands of samples.
2) There is nothing wrong but you are entirely searching a space that is
7 * 5 * 3 * 1 * 5(fold) = 525
which is pretty big.
3) To boost your training you can do the training in a subsample or just reducing the search space, I normally use the following function:
def fit_cv_subsample (pipe_cv, X, y, n_max = 10_000):
'''
This function fits a CV in a subsample of the first n_max rows
returns the trained pipe and the best estimator
'''
X_sub = X[0:n_max]
y_sub = y[0:n_max]
pipe_cv.fit(X_sub,y_sub)
#pipe_cv.best_estimator_.fit(X,y)
return pipe_cv, pipe_cv.best_estimator_
results, best_model = fit_cv_subsample(svm_c, X, y)