How is the fit function in SimpleImputer working to find the mean in the Salary column as well when just the Age column is given as its argument?

datascience.stackexchange https://datascience.stackexchange.com/questions/77653

  •  12-12-2020
  •  | 
  •  

Pergunta

The only argument inside the fit function of SimpleImputer is: 'Age'. Yet the returned output worked on the 'Salary' column as well. That is what I am unable to understand.

Here is my code (considering all the necessary libraries imported):

from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan)
imputer = imputer.fit(df[['Age']])    
X[:, 1:3] = imputer.fit_transform(X[:, 1:3])
print(X)

Dataset:

   Country   Age   Salary Purchased
0   France  44.0  72000.0        No
1    Spain  27.0  48000.0       Yes
2  Germany  30.0  54000.0        No
3    Spain  38.0  61000.0        No
4  Germany  40.0      NaN       Yes
5   France  35.0  58000.0       Yes
6    Spain   NaN  52000.0        No
7   France  48.0  79000.0       Yes
8  Germany  50.0  83000.0        No
9   France  37.0  67000.0       Yes

Output:

[['France' 44.0 72000.0]
 ['Spain' 27.0 48000.0]
 ['Germany' 30.0 54000.0]
 ['Spain' 38.0 61000.0]
 ['Germany' 40.0 63777.77777777778]
 ['France' 35.0 58000.0]
 ['Spain' 38.77777777777778 52000.0]
 ['France' 48.0 79000.0]
 ['Germany' 50.0 83000.0]
 ['France' 37.0 67000.0]]
Foi útil?

Solução

imputer = imputer.fit(df[['Age']])    
X[:, 1:3] = imputer.fit_transform(X[:, 1:3])

You are again applying "fit" with fit_trasnform. Try with "transform" only

You will get a value error

ValueError: X has 3 features per sample, expected 2

Licenciado em: CC-BY-SA com atribuição
scroll top