Pergunta

I have a dataFrame with rows and columns that sum to 0.

    A   B   C    D
0   1   1   0    1
1   0   0   0    0 
2   1   0   0    1
3   0   1   0    0  
4   1   1   0    1 

The end result should be

    A   B    D
0   1   1    1
2   1   0    1
3   0   1    0  
4   1   1    1 

Notice the rows and columns that only had zeros have been removed.

Foi útil?

Solução

df.loc[row_indexer, column_indexer] allows you to select rows and columns using boolean masks:

In [88]: df.loc[(df.sum(axis=1) != 0), (df.sum(axis=0) != 0)]
Out[88]: 
   A  B  D
0  1  1  1
2  1  0  1
3  0  1  0
4  1  1  1

[4 rows x 3 columns]

df.sum(axis=1) != 0 is True if and only if the row does not sum to 0.

df.sum(axis=0) != 0 is True if and only if the column does not sum to 0.

Outras dicas

building on Drop rows with all zeros in pandas data frame to avoid using the sum()

df = pd.DataFrame({'A': [1,0,1,0,1],
                   'B': [1,0,0,1,1],
                   'C': [0,0,0,0,0],
                   'D': [1,0,1,0,1]})

df.loc[(df!=0).any(1), (df!=0).any(0)]

   A  B  D
0  1  1  1
2  1  0  1
3  0  1  0
4  1  1  1

This is my way to do it:

import pandas as pd 
hl = []
df =  pd.read_csv("my.csv")
l = list(df.columns.values)
for l in l:
    if sum(df[l]) != 0:
        hl.append(l)
df2 = df[hl]

to write reduced_Data:

df2.to_csv("my_reduced_data.csv")

It will only check columns but ignore Rows

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top