Question

I have a dataFrame with rows and columns that sum to 0.

    A   B   C    D
0   1   1   0    1
1   0   0   0    0 
2   1   0   0    1
3   0   1   0    0  
4   1   1   0    1 

The end result should be

    A   B    D
0   1   1    1
2   1   0    1
3   0   1    0  
4   1   1    1 

Notice the rows and columns that only had zeros have been removed.

Was it helpful?

Solution

df.loc[row_indexer, column_indexer] allows you to select rows and columns using boolean masks:

In [88]: df.loc[(df.sum(axis=1) != 0), (df.sum(axis=0) != 0)]
Out[88]: 
   A  B  D
0  1  1  1
2  1  0  1
3  0  1  0
4  1  1  1

[4 rows x 3 columns]

df.sum(axis=1) != 0 is True if and only if the row does not sum to 0.

df.sum(axis=0) != 0 is True if and only if the column does not sum to 0.

OTHER TIPS

building on Drop rows with all zeros in pandas data frame to avoid using the sum()

df = pd.DataFrame({'A': [1,0,1,0,1],
                   'B': [1,0,0,1,1],
                   'C': [0,0,0,0,0],
                   'D': [1,0,1,0,1]})

df.loc[(df!=0).any(1), (df!=0).any(0)]

   A  B  D
0  1  1  1
2  1  0  1
3  0  1  0
4  1  1  1

This is my way to do it:

import pandas as pd 
hl = []
df =  pd.read_csv("my.csv")
l = list(df.columns.values)
for l in l:
    if sum(df[l]) != 0:
        hl.append(l)
df2 = df[hl]

to write reduced_Data:

df2.to_csv("my_reduced_data.csv")

It will only check columns but ignore Rows

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top