Pandas Groupby makes kernel die in Jupyter notebook/Python

https://datascience.stackexchange.com/questions/51554

01-11-2019
|

質問

I have a groupby in jupyter-notebook that takes ages to run and after 10 minutes of running it says 'kernel died...',

The groupby looks like this:

df1.groupby(['date', 'unit', 'company', 'city'])['col1',
'col2',
'col3',
'col4',
  ...
'col20'].mean()

All of the 'col' columns are float values. I am running everything locally. Any ideas?

UPDATE:

The shape of df1 is:

(1360, 24)

Memory and dtypes:

dtypes: category(3), datetime64[ns](2), float64(17), int64(2)
memory usage: 266.9 KB

The unique size of city, date, company, unit:

len(df1.date.unique()) = 789
len(df1.unit.unique()) = 76
len(df1.company.unique()) = 205
len(df1.city.unique()) = 237

I have 16GB of memory on MacBook Pro.

UPDATE 2:

It works only if I have date and unit inside the groupby columns as the only 2 columns. If I add either a company or city, it doesn't work anymore, it keeps running indefinitely.

正しい解決策はありません

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange