The best way to do this is with the count
method of DataFrame
objects:
In [18]: data = randn(1000, 3)
In [19]: data
Out[19]:
array([[ 0.1035, 0.9239, 0.3902],
[ 0.2022, -0.1755, -0.4633],
[ 0.0595, -1.3779, -1.1187],
...,
[ 1.3931, 0.4087, 2.348 ],
[ 1.2746, -0.6431, 0.0707],
[-1.1062, 1.3949, 0.3065]])
In [20]: data[rand(len(data)) > 0.5] = nan
In [21]: data
Out[21]:
array([[ 0.1035, 0.9239, 0.3902],
[ 0.2022, -0.1755, -0.4633],
[ nan, nan, nan],
...,
[ 1.3931, 0.4087, 2.348 ],
[ 1.2746, -0.6431, 0.0707],
[-1.1062, 1.3949, 0.3065]])
In [22]: df = DataFrame(data, columns=list('abc'))
In [23]: df.head()
Out[23]:
a b c
0 0.1035 0.9239 0.3902
1 0.2022 -0.1755 -0.4633
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
[5 rows x 3 columns]
In [24]: df.count()
Out[24]:
a 498
b 498
c 498
dtype: int64
In [26]: df.notnull().sum()
Out[26]:
a 498
b 498
c 498
dtype: int64
Like many pandas methods, this also works on Series
objects:
In [27]: df.a.count()
Out[27]: 498