How to count the number of missing values in each row in Pandas dataframe?

https://datascience.stackexchange.com/questions/12645

16-10-2019
|

문제

How can I get the number of missing value in each row in Pandas dataframe. I would like to split dataframe to different dataframes which have same number of missing values in each row.

Any suggestion?

해결책

You can apply a count over the rows like this:

test_df.apply(lambda x: x.count(), axis=1)

test_df:

    A   B   C
0:  1   1   3
1:  2   nan nan
2:  nan nan nan

output:

0:  3
1:  1
2:  0

You can add the result as a column like this:

test_df['full_count'] = test_df.apply(lambda x: x.count(), axis=1)

Result:

    A   B   C   full_count
0:  1   1   3   3
1:  2   nan nan 1
2:  nan nan nan 0

다른 팁

When using pandas, try to avoid performing operations in a loop, including apply, map, applymap etc. That's slow!

If you want to count the missing values in each column, try:

df.isnull().sum() or df.isnull().sum(axis=0)

On the other hand, you can count in each row (which is your question) by:

df.isnull().sum(axis=1)

It's roughly 10 times faster than Jan van der Vegt's solution(BTW he counts valid values, rather than missing values):

In [18]: %timeit -n 1000 df.apply(lambda x: x.count(), axis=1)
1000 loops, best of 3: 3.31 ms per loop

In [19]: %timeit -n 1000 df.isnull().sum(axis=1)
1000 loops, best of 3: 329 µs per loop

The simplist way:

df.isnull().sum(axis=1)

Or, you could simply make use of the info method for dataframe objects:

df.info()

which provides counts of non-null values for each column.

>>> df = pd.DataFrame([[1, 2, np.nan],
...                    [np.nan, 3, 4],
...                    [1, 2,      3]])

>>> df
    0  1   2
0   1  2 NaN
1 NaN  3   4
2   1  2   3

>>> df.count(axis=1)
0    2
1    2
2    3
dtype: int64

If you want count of missing values:

np.logical_not(df.isnull()).sum()

null values along the column,

df.isnull().sum(axis=0)

blank values along the column,

c = (df == '').sum(axis=0)

null values along the row,

df.isnull().sum(axis=1)

blank values along the row,

c = (df == '').sum(axis=1)

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange