How to implement accurate counts or sums when you have different numbers of days of the week?

datascience.stackexchange https://datascience.stackexchange.com/questions/11407

  •  16-10-2019
  •  | 
  •  

Pergunta

I have a dataset of transaction data for a retail outlet. I am using pandas and want to analyse revenue by day of the week, but there are unequal days in the dataset (i.e. an extra weekend). I have used df.dt.dayofweek to create an integer value for the day of the week, and grouped the data by that integer value using df.groupby(["Day_int",]).sum()

So I have a 'total' column at the end that I am interested in, but I want to create a a new column, something like 'adj_total' that applies a division operation of /3 to the days Monday to Friday and /4 to the days Saturday and Sunday. Is the best way to loop through the dataframe or is it better

Here is the df that I am working with (most of the column values are nonsensical, only total is of interest). Day_int is the group_by variable and should appear lower than the other columns.

Day_int Section Prod_name Cashier Date Time Receipt Total
0 ..................................................91341
1 ..................................................82262
2 ..................................................84145
3 ..................................................90115
4 ..................................................115497
5 ..................................................151971
6 ..................................................109210
Foi útil?

Solução

I would add a column that is a 3 if it's a weekday and a 4 if it's not using an apply, something like this:

df['divide_by'] = df.apply(lambda x: 3 if x['Day_int']<5 else 4, axis=1)

Assuming days are Monday 0 to Sunday 6. Then you can add the column as follows:

df['adj_total'] = df.apply(lambda x: x['Total']/x['divide_by'], axis=1)

Then you can remove the divide_by column and you have the result

Licenciado em: CC-BY-SA com atribuição
scroll top