Question

Good morning all,

I have a pandas dataframe containing multiple series. For a given series within the dataframe, the datatypes are unicode, NaN, and int/float. I want to determine the number of NaNs in the series but cannot use the built in numpy.isnan method because it cannot safely cast unicode data into a format it can interpret. I have proposed a work around, but I'm wondering if there is a better/more Pythonic way of accomplishing this task.

Thanks in advance, Myles

import pandas as pd
import numpy as np

test = pd.Series(data = [NaN, 2, u'string'])
np.isnan(test).sum()
#Error

#Work around
test2 = [x for x in test if not(isinstance(x, unicode))]
numNaNs = np.isnan(test2).sum()
Was it helpful?

Solution

Use pandas.isnull:

In [24]: test = pd.Series(data = [NaN, 2, u'string'])

In [25]: pd.isnull(test)
Out[25]: 
0     True
1    False
2    False
dtype: bool

Note however, that pd.isnull also regards None as True:

In [28]: pd.isnull([NaN, 2, u'string', None])
Out[28]: array([ True, False, False,  True], dtype=bool)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top