Question

I have to calculate the correlation and covariance for my daily sales values for an event window. The event window is of 45 day period and my data looks like -

store_id    date       sales
5927    12-Jan-07    3,714.00
5927    12-Jan-07    3,259.00
5927    14-Jan-07    3,787.00
5927    14-Jan-07    3,480.00
5927    17-Jan-07    3,646.00
5927    17-Jan-07    3,316.00
4978    18-Jan-07    3,530.00
4978    18-Jan-07    3,103.00
4978    18-Jan-07    3,026.00
4978    21-Jan-07    3,448.00

Now, for every store_id, date combination, I need to go back 45 days (there is more data for each combination in my original data set) calculate the correlation between sales and lag(sales) i.e. autocorrelation of degree one. As you can see, the date column is not continuous. So something like (date - 45) is not going to work.

I have gotten till this part -

data ds1;
  set ds;
  by store_id;
  LAG_SALE = lag(sales);
      IF FIRST.store_idTHEN DO;
      LAG_SALE = .;
      END;
run;

For calculating correlation and covariances -

proc corr data=ds1 outp=Corr
by store_id date;
      cov;   /**  include covariances **/
var sales lag_sale;
run;

But how do I insert the event window for each date, store_id combination? My final output should look something like this -

id    date     corr cov
5927 12-Jan-07 ... ...
5927 14-Jan-07 ... ...
Was it helpful?

Solution

Here is what I've come up with:

First I convert the date to a SAS date, which is the number of days since Jan. 1 1960:

data ds;
    set ds (rename=(date=old_date));
    date = input(old_date, date11.);
    drop old_date;
run;

Then compute lag_sale (I am using the same calculation you used in the question, but make sure this is what you want to do. For some observations the lag sale is the previous recorded date, but for some it is the same store_id and date, just a different observation.):

proc sort data=ds; by store_id; run;

data ds;
    set ds;
    by store_id;
    lag_sale = lag(sales);
    if first.store_id then lag_sale = .;
run;

Then set up the final data set:

data final;
    length store_id 8 date 8 cov 8 corr 8;
    if _n_ = 0;
run;

Then create a macro which takes a store_id and date and runs proc corr. The first part of the macro selects only the data with that store_id and within the past 45 days of the date. Then it runs proc corr. Then it formats proc corr how you want it and appends the results to the "final" data set.

%macro corr(store_id, date);
data ds2;
    set ds;
    where store_id = &store_id and %eval(&date-45) <= date <=&date 
        and lag_sale ne .;
run;

proc corr noprint data=ds2 cov outp=corr;
    by store_id;
    var sales lag_sale;
run;

data corr2;
    set corr;
    where _type_ in ('CORR', 'COV') and _name_ = 'sales';
    retain cov;
    date = &date;
    if _type_ = 'COV' then cov = lag_sale;
    else do;
        corr = lag_sale;
        output;
    end;
    keep store_id date corr cov;
run;

proc append base=final data=corr2 force; run;

%mend corr;

Finally run the macro for each store_id/date combination.

proc sort data=ds out=ds3 nodupkey;
    by store_id date;
run;

data _null_;
    set ds3;
    call execute('%corr('||store_id||','||date||');');
run;

proc sort data=final;
    by store_id date;
run;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top