"The truth value of an array with more than one element is ambiguous" - searching pandas dataframe for NaNs

https://stackoverflow.com/questions/20571923

01-09-2022
|

Question

I'm trying to iterate through all the rows of a pandas DataFrame and find the first instance of NaN in a particular column. E.g.:

import pandas as pd

d = {'one' : pd.Series([1., 2., 3.], index=['a', 'b', 'c']),
     'two' : pd.Series([1., 2., 3., 4.], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d)

for row_index, row in df[:].iterrows():
if pd.isnull(df.ix[:,'one']) == True:
    break

But I get: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I thought I was only inspecting one element of the DataFrame at a time, so I don't know what the problem is. Any help is much appreciated.

Cheers.

La solution

df.index[ df.one.isnull( ) ][ 0 ]

gives the index of the first row, which has null in column "one"

the more efficient way to find the first row would be to define a generator, and take the first element of that generator, like:

gen = ( idx for idx in df.index if np.isnan( df.one[ idx ] ) )

now gen.__next__() will be the first index.

Autres conseils

Just to be clear, w/r/t the question, if you debug you will see:

ipdb> pd.isnull(df.ix[:,'one'])
a    False
b    False
c    False
d     True

So you do have multiple elements. The truth value is ambiguous, so if you used .any() you would get True and with .all() you would get False.

The comments addressed the error in the code, I just want to clearly state the any() all() behavior for anyone who comes across this topic by name.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow