Question

Hey I have the following Dataset

import pandas as pd
df = pd.DataFrame({    
'column1': [0,0,1,0,1,0,0,1,1,0,1,1,1]})

I want to be able to count the number of consecutive 1 and 0 and generate 2 columns as such:

consec0: 1,2,_,1,_,1,2,_,_,1,_,_,_
consec1: _,_,1,_,1,_,_,1,2_,1,2,3

I then want to take the max number of consecutive and create two lists:

max_consec0: 2,1,2,1
max_consec1: 1,1,2,3

My dataset in the end will be just max_consec0 and max_consec1

Was it helpful?

Solution

To check if a value has changed, you can use .diff and check if it's non-zero with .ne(0) (the NaN in the top will be considered different than zero), and then count the changes with .cumsum, like this:

df['counter'] = df.diff().ne(0).cumsum()

Afterward, you can create a second dataframe, where the indices are the groups of consecutive values, and the column values are the value (0 or 1, in your case) and length (which is what you ultimately want):

df2 = df.groupby('counter')['column1'].min().to_frame(name='value').join(
df.groupby('counter')['column1'].count().rename('number'))

The resulting max_consec0, max_consec1 are just the values in the [number] column, filtered by the [value] column:

max_consec0 = df2[df2['value']==0]['number'].tolist()
max_consec1 = df2[df2['value']==1]['number'].tolist()

You can verify that the result is [2, 1, 2, 1] and [1, 1, 2, 3], as desired.

OTHER TIPS

For this sort of problem you can use np.where and multiple boolean expressions to get your answer.

#1 We need to test if the column is equal to your target values, 0 or 1

second, we have to ensure the column value is not equal to the column value above.

3rd, for any value that is not equal to our input value we return a nan as its easier to use with numeric values.

import numpy as np

df['col2'] = np.where(
    df["column1"].eq(0),
    df.groupby(df.column1.ne(df.column1.shift()).cumsum()).cumcount() + 1,
    np.nan,
)

df['col3'] = np.where(
    df["column1"].eq(1),
    df.groupby(df.column1.ne(df.column1.shift()).cumsum()).cumcount() + 1,
    np.nan,
)

print(df)

   column1  col2  col3
0         0   1.0   NaN
1         0   2.0   NaN
2         1   NaN   1.0
3         0   1.0   NaN
4         1   NaN   1.0
5         0   1.0   NaN
6         0   2.0   NaN
7         1   NaN   1.0
8         1   NaN   2.0
9         0   1.0   NaN
10        1   NaN   1.0
11        1   NaN   2.0
12        1   NaN   3.0

we need to create a surrogate key for each group of numbers before a blank row and take the max of each group.

df.assign(
    key1=df.groupby(df["col2"].isnull())["col2"].transform("cumcount").cumsum()
).groupby("key1")["col2"].max().dropna()

[2.0, 1.0, 2.0, 1.0]

df.assign(
    key2=df.groupby(df["col3"].isnull())["col3"].transform("cumcount").cumsum()
).groupby("key2")["col3"].max().dropna().tolist()

[1.0, 1.0, 2.0, 3.0]

You can try this implementation:

num0=0
num1=0
consec0=[]
consec1=[]
for i in range(len(df)):
  if(df.iloc[i,0])==0:
    num0=num0+1;
    num1=0;
  if(df.iloc[i,0])==1:
    num0=0;
    num1=num1+1;
  consec0.append(num0)
  consec1.append(num1)
df['consec0']=consec0
df['consec1']=consec1
max_consec0=df['consec0'].max()
max_consec1=df['consec1'].max()
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top