質問

I have the following program in python

# input
import pandas as pd
import numpy as np
data = pd.DataFrame({'a':pd.Series([1.,2.,3.]), 'b':pd.Series([4.,np.nan,6.])})

Here the data is:

In: print data

   a   b
0  1   4
1  2 NaN
2  3   6

Now I want a isnull column indicating if the row has any nan:

# create data
data['isnull'] = np.zeros(len(data))
data['isnull'][pd.isnull(data).any(axis=1)] = 1

The output is not correct (the second one should be 1):

In: print data

   a   b  isnull
0  1   4       0
1  2 NaN       0
2  3   6       0

However, if I execute the exact command again, the output will be correct:

data['isnull'][pd.isnull(data).any(axis=1)] = 1
print data

   a   b  isnull
0  1   4       0
1  2 NaN       1
2  3   6       0

Is this a bug with pandas or am I missing something obvious?

my python version is 2.7.6. pandas is 0.12.0. numpy is 1.8.0

役に立ちましたか?

解決

You're chain indexing which doesn't give reliable results in pandas. I would do the following:

data['isnull'] =  pd.isnull(data).any(axis=1).astype(int)
print data

   a   b  isnull
0  1   4       0
1  2 NaN       1
2  3   6       0

For more on the problems with chained indexing, see here:

http://pandas-docs.github.io/pandas-docs-travis/indexing.html#indexing-view-versus-copy

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top