How to count the number of missing values in each row in Pandas dataframe?
문제
How can I get the number of missing value in each row in Pandas dataframe. I would like to split dataframe to different dataframes which have same number of missing values in each row.
Any suggestion?
해결책
You can apply a count over the rows like this:
test_df.apply(lambda x: x.count(), axis=1)
test_df:
A B C
0: 1 1 3
1: 2 nan nan
2: nan nan nan
output:
0: 3
1: 1
2: 0
You can add the result as a column like this:
test_df['full_count'] = test_df.apply(lambda x: x.count(), axis=1)
Result:
A B C full_count
0: 1 1 3 3
1: 2 nan nan 1
2: nan nan nan 0
다른 팁
When using pandas, try to avoid performing operations in a loop, including apply
, map
, applymap
etc. That's slow!
If you want to count the missing values in each column, try:
df.isnull().sum()
or df.isnull().sum(axis=0)
On the other hand, you can count in each row (which is your question) by:
df.isnull().sum(axis=1)
It's roughly 10 times faster than Jan van der Vegt's solution(BTW he counts valid values, rather than missing values):
In [18]: %timeit -n 1000 df.apply(lambda x: x.count(), axis=1)
1000 loops, best of 3: 3.31 ms per loop
In [19]: %timeit -n 1000 df.isnull().sum(axis=1)
1000 loops, best of 3: 329 µs per loop
The simplist way:
df.isnull().sum(axis=1)
Or, you could simply make use of the info method for dataframe objects:
df.info()
which provides counts of non-null values for each column.
>>> df = pd.DataFrame([[1, 2, np.nan],
... [np.nan, 3, 4],
... [1, 2, 3]])
>>> df
0 1 2
0 1 2 NaN
1 NaN 3 4
2 1 2 3
>>> df.count(axis=1)
0 2
1 2
2 3
dtype: int64
If you want count of missing values:
np.logical_not(df.isnull()).sum()
null values along the column,
df.isnull().sum(axis=0)
blank values along the column,
c = (df == '').sum(axis=0)
null values along the row,
df.isnull().sum(axis=1)
blank values along the row,
c = (df == '').sum(axis=1)