Question

I trying to figure out how to create a running total, if all we have is the total sum in the .LAST variable. I created a sample dataset that should make it a little clearer what I am after:

DATA SALES_DATA; 
    INPUT REGION_ID STORE_ID YEAR SALES; 
    DATALINES; 
        1   1   2000    .
        1   1   2001    .
        1   1   2002    .
        1   1   2003    40
        1   2   1977    .
        1   2   1978    .
        1   2   1979    .
        1   2   1980    .
        1   2   1981    12
        2   3   1999    .
        2   3   2000    .
        2   3   2001    .
        2   4   2002    17
        3   4   1956    .
        3   4   1957    22

;

so, as you can see we only have data for the last time the store was in business, which includes all the sales from previous years. Assuming sales have been completely linear and added up year over year, how would I tell SAS to grab the STORE_ID.LAST value then divide it by count of years we have data for to put it into the STORE_ID.FIRST's SALES field? Once I figure out how to get the value from the last field into the first I am planning on just running the usual running total (after dividing by the count, that can be created by something like:

DATA SALES; 
    SET SALES; 
    BY REGION_ID STORE_ID; 
    IF FIRST.STORE = 1 THEN 
        COUNT =0;
        COUNT+1; 
run;

So, ideally the final table would start of like:

DATA SALES_DATA; 
INPUT REGION_ID STORE_ID YEAR SALES; 
DATALINES; 
    1   1   2000    10
    1   1   2001    20
    1   1   2002    30
    1   1   2003    40

...

I looked into PROC EXPAND, but I could not get it to work for my case. Any suggestions are greatly welcomed!

Was it helpful?

Solution 2

Just a little improvement of Chris J's code, simply extrapolating in proc sql. Do not forget to give the full credit to him :)

proc sql;
    create table filled(rename=(sales_filled=sales)) as
    select REGION_ID, STORE_ID, YEAR, max(SALES)/(max(YEAR)-min(YEAR)+1)*(YEAR-min(YEAR)+1) as sales_filled
    from sales_data
    group by REGION_ID, STORE_ID
    order by REGION_ID, STORE_ID, Year;
quit;

Note: this works even if the years were not consecutive (e.g. year 2001 is somehow missing) because the sales value are filled based on the "ratio of timespan" instead of "counts".

OTHER TIPS

/* First calculate the sales per year */
proc sql ;
  create table meansales as
  select region_id, store_id, sum(sales) / count(year) as YEARLY_SALES
  from sales_data
  group by region_id, store_id 
  order by region_id, store_id ;
quit ;

/* Merge back into original data */
data yearcalc ;
  merge sales_data
        meansales ;
  by region_id store_id ;
  if first.store_id then n = 0 ;
  n + 1 ;
  NEW_SALES = n * YEARLY_SALES ;

  drop n ;
run ;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top