문제

I am working on the Boston house price prediction. I have a column named GarageYrBlt that holds the year the garage was built for a specific house. My assumption is that the garage would most likely be built at the same time as the house so I want to fill the missing value with the median of the column GarageYrBlt relative to the column YearBuilt.

To explain my idea further: While I was working on the Titanic problem, I filled the missing Age column with the median relative to the Sex column. So all female passengers with the missing age would get the value that is the median age of all female passengers.

This is what I did:

train['GarageYrBlt'] = train['GarageYrBlt']
     .fillna(train.groupby('YearBuilt')['GarageYrBlt']
     .transform("median"), inplace=True)

And when I do print(train['GarageYrBlt']), this is my output:

0       None
1       None
2       None
3       None
4       None
5       None
6       None
7       None
8       None
9       None
10      None
11      None
12      None
13      None
14      None
15      None
16      None
17      None
18      None
19      None
20      None
21      None
22      None
23      None
24      None
25      None
26      None
27      None
28      None
29      None
        ... 
1430    None
1431    None
1432    None
1433    None
1434    None
1435    None
1436    None
1437    None
1438    None
1439    None
1440    None
1441    None
1442    None
1443    None
1444    None
1445    None
1446    None
1447    None
1448    None
1449    None
1450    None
1451    None
1452    None
1453    None
1454    None
1455    None
1456    None
1457    None
1458    None
1459    None
Name: GarageYrBlt, Length: 1460, dtype: object

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top