Frage

I have two questions based on the following the Pandas DataFrame.

(1.) Each binary value represents an occurrence or absence of a data item (p1,p2,p3,p4). I want to count the binary values (only 1s) in the DataFrame.

df = pd.DataFrame([[1,1,1,0,1],[2,1,1,0,1],[3,1,1,1,1],[4,0,1,0,1]])
df.columns = ['session','p1','p2','p3','p4']

output

   session  p1  p2  p3  p4
0        1   1   1   0   1
1        2   1   1   0   1
2        3   1   1   1   1
3        4   0   1   0   1

Following is what I have tried.

print ([df[col].value_counts() for col in df.columns])

But my expected output is as follows. Any help to get this out put.

        count
0        3
1        3
2        4
3        2

(2.) I want to get the intersections of sessions. It is to get common data items(p1,p2,p3,p4) between session 1 and 2, 1 and 3, 1 and 4, 2 and 3, 2 and 4, 3 and 4. I have no idea of the expected output but I feel it should be like following.

  1,2,3,4
1,3,3,3,2
2,3,3,3,2
3,3,3,4,2
4,2,2,2,2
War es hilfreich?

Lösung

(1)

I assume that you want to set session as index

In [86]: df = df.set_index('session')

In [87]: df.sum(axis=1)
Out[87]: 
session
1          3
2          3
3          4
4          2
dtype: int64

(2)

Use dot

In [90]: df.dot(df.T)
Out[90]: 
session  1  2  3  4
session            
1        3  3  3  2
2        3  3  3  2
3        3  3  4  2
4        2  2  2  2
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top