How is the fit function in SimpleImputer working to find the mean in the Salary column as well when just the Age column is given as its argument?
-
12-12-2020 - |
Question
The only argument inside the fit function of SimpleImputer is: 'Age'. Yet the returned output worked on the 'Salary' column as well. That is what I am unable to understand.
Here is my code (considering all the necessary libraries imported):
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan)
imputer = imputer.fit(df[['Age']])
X[:, 1:3] = imputer.fit_transform(X[:, 1:3])
print(X)
Dataset:
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes
Output:
[['France' 44.0 72000.0]
['Spain' 27.0 48000.0]
['Germany' 30.0 54000.0]
['Spain' 38.0 61000.0]
['Germany' 40.0 63777.77777777778]
['France' 35.0 58000.0]
['Spain' 38.77777777777778 52000.0]
['France' 48.0 79000.0]
['Germany' 50.0 83000.0]
['France' 37.0 67000.0]]
La solution
imputer = imputer.fit(df[['Age']])
X[:, 1:3] = imputer.fit_transform(X[:, 1:3])
You are again applying "fit" with fit_trasnform. Try with "transform" only
You will get a value error
ValueError: X has 3 features per sample, expected 2
Licencié sous: CC-BY-SA avec attribution
Non affilié à datascience.stackexchange