Question

I have a data.table base that has many variables to use them to forecasting sales for the next 6 weeks of daily sales. In fact, all the database is arranged by date as you can see here.Note that here I just show you some of variables.

> Data_train[order(Date)]
         Store DayOfWeek       Date Sales Customers Open Promo StateHoliday SchoolHoliday
      1:     1         2 2013-01-01     0         0    0     0            a             1
      2:     2         2 2013-01-01     0         0    0     0            a             1
      3:     3         2 2013-01-01     0         0    0     0            a             1
      4:     4         2 2013-01-01     0         0    0     0            a             1
      5:     5         2 2013-01-01     0         0    0     0            a             1
     ---                                                                                 
1017205:  1111         5 2015-07-31  5723       422    1     1            0             1
1017206:  1112         5 2015-07-31  9626       767    1     1            0             1
1017207:  1113         5 2015-07-31  7289       720    1     1            0             1
1017208:  1114         5 2015-07-31 27508      3745    1     1            0             1
1017209:  1115         5 2015-07-31  8680       538    1     1            0             1 .

My question is about the arrangement of the data according to the target goal. My problem is really about the Date variable. In fact, I suggest this path:

  1. I sum all sales by each date (because I have many types of stores).
  2. I order my database according to the Date 's ascending order.
  3. I didn't need duplicated rows in the date variables so I delete them.

Just to show you the new base for considered variables.

> Data_train[,SumSaleseachDay:=sum(Sales),by=c('Date')][order(Date)][!duplicated(Date)][,-c('Sales','Customers'),with=FALSE]
     Store DayOfWeek       Date Open Promo StateHoliday SchoolHoliday SumSaleseachDay
  1:     1         2 2013-01-01    0     0            a             1           97235
  2:     1         3 2013-01-02    1     0            0             1         6949829
  3:     1         4 2013-01-03    1     0            0             1         6347820
  4:     1         5 2013-01-04    1     0            0             1         6638954
  5:     1         6 2013-01-05    1     0            0             1         5951593
 ---                                                                                 
938:     1         1 2015-07-27    1     1            0             1        10707292
939:     1         2 2015-07-28    1     1            0             1         9115073
940:     1         3 2015-07-29    1     1            0             1         8499962
941:     1         4 2015-07-30    1     1            0             1         8798854
942:     1         5 2015-07-31    1     1            0             1        10109742

ADDED INFORMATION: I have a database of 1017209 rows. And for each Store, I have its historic of Sales between 2013-01-01 and 2015-07-31. And I have also 17 variables included to build the model.

The steps above just lead to forecast by day.

If I want to forecast for each Store and by day, what should I do?
thank you in advance!

No correct solution

Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange
scroll top