Found input variables with inconsistent numbers of samples

https://datascience.stackexchange.com/questions/16658

16-10-2019
|

문제

I would appreciate if you could let me know how to resolve this error: Code:

X = np.array(pd.read_csv('my_X_table1-1c.csv',header=None).values)
y = np.array(pd.read_csv('my_y_table1-1c.csv',header=None).values.ravel())
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=7)

def Ridgecv(alpha):
    return cross_val_score(Ridge(alpha=float(alpha), random_state=2),
                           X_train, y_train, 'mae', cv=5).mean()

The error is related to X_train, y_train:

ValueError: Found input variables with inconsistent numbers of samples: [1052, 1052, 3]

regards,

해결책

It seems that I missed the word "scoring". In fact, the extra 3 was related to the number of characters of 'mae'.

def Ridgecv(alpha):
    return cross_val_score(Ridge(alpha=float(alpha), random_state=2),
                           X_train, y_train, scoring='mae', cv=5).mean()

다른 팁

It should be in sequence:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test=train_test_split(X,Y,random_state=101,test_size=0.3)

and then it should be in fit method(x_train,y train)

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange