문제

I would like to know how does sklearn.LassoCV perform cross validation. In particular I would like to know how are the samples subdivided in the folds. Is it a random or deterministic process?

For example suppose I have 100 samples and I use 10 folds cross validation and consider F the function which send every sample to its fold.

F(1:10)=1, F(11:20)=2,... or is it a random process ( for example F(1)=8, F(2)=7...)

Let me know if the question is not clear.

Thanks :)

Ok this is the solution:

from sklearn.linear_model import LassoCV
from sklearn.cross_validation import KFold

kf=KFold(len(y),n_folds=10,shuffle=True)
cv=LassoCV(cv=kf).fit(x,y)
도움이 되었습니까?

해결책

I assume you're passing in the keyword arg cv=10 to the LassoCV constructor?

If this is the case, then this will create a KFold object with 10 folds: take a look at where check_cv is called in LinearModelCV (LassoCV's parent).

KFold takes a random_state keyword argument (which defaults to None – so numpy.random will try to seed on /dev/urandom or something similar) – but if shuffle is False (which it is by default), then random_state doesn't actually do anything. The folds are selected from adjacent members in the data set.

If you want to randomise the folds, you should create a KFold object with shuffle=True, and use that object as the cv keyword argument, instead of 10.

Sources:

  1. https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/coordinate_descent.py
  2. https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top