Вопрос

I am working with a very large data series of floats in Pandas 12.0. What I am trying to do is set extreme outliers to NaNs in this series, which represents a standardized feature vector (mean is 0, std is 1).

I have no trouble making a boolean mask of the feature vector to find extreme outliers:

mask = feature_series > 10 | feature_series < 10

This takes minimal resources. However, when I attempt to actually use this mask I get a memory explosion and have to force exit before a crash occurs. This happens with:

feature_series[mask] = np.nan

It's not limited to this operation either. I also get a memory explosion with:

mask.any()

What's making this happen? I feel like it may be a bug, but I'm still relatively new to Pandas and can't be sure.

Это было полезно?

Решение

probably you need some parentheses

mask = (feature_series > 10) | (feature_series < 10)
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top