Question

I have a dataset which looks like this:

cust date 1 2 3... 600
1    1    5 . . ... .
1    2    5 . . ... .
1    2    . 4 . ... .
1    2    . . 6 ... .
2    1    1 . . ... .
2    1    . 5 . ... .
2    2    . . . ... 10

I want to collapse variables 1 to 600 for each date by customer (cust), so that the dataset looks like this:

cust date 1 2 3... 600
1    1    5 . . ... .
1    2    5 4 6 ... .
2    1    1 5 . ... .
2    2    . . . ... 10

I started with the following code (maybe it's a bit complicated), and it doesn't work:

data want ;
set have;
array vars &list.; *stored array of variables 1-600;
retain count vars;
by cust date;
if first.date then do;
do _i=1 to dim(vars);
vars[_i]=.; 
end;
count=0;
end;
count=count+1;
vars[_1]=vars;
if last.date then do;
output;
end;
drop count;
run;

Do you have any idea? Another idea was to use proc expand, but it doesn't work either because the dates are duplicates.

Thanks so much for your help!!

Was it helpful?

Solution

There's a neat trick to achieve this using the UPDATE statement. The first reference to the existing table (with the obs=0) creates an empty table with the required structure, the second reference updates with the values. The BY statement ensures it only outputs one record per BY value. Hope this makes sense.

data have;
input cust date v1 v2 v3 v600;
datalines;
1    1    5 . . .
1    2    5 . . .
1    2    . 4 . .
1    2    . . 6 .
2    1    1 . . .
2    1    . 5 . .
2    2    . . . 10
;
run;

data want;
update have (obs=0) have;
by cust date;
run;

OTHER TIPS

You can't use RETAIN with the variables coming in from the dataset on the set statement; or more accurately, you can, but it won't work - variables are RETAINed automatically on variables from set statements. They are also, however, overwritten by the next iteration of the data step when the set occurs.

You can either use a temporary array to store the retained values and copy it back over when last.date (temporary arrays are also retained automatically, FYI), or you can use a different technique entirely - hash tables, SQL, whatever you're most familiar with.

For example,

proc sql;
create table want as 
  select cust, date, sum(var1) as var1, sum(var2) as var2, ... 
  from have
  group by cust,date;
quit;

You would want to construct the sum(var1) as var1 in a macro variable, something like

%macro sumvar(var=)
sum(&var.) as &var.
%mend sumvar;
proc sql;
select cats('%sumvar(var=',name,')') 
  into :sumlist separated by ','
  from dictionary.columns
  where libname='WORK' and memname='HAVE' and not (name in ('CUST','DATE'))
;
quit;

and then use that &sumlist. in the sql above.

select cust, date, &sumlist.

This is probably the easiest to code; it's probably not as efficient as other options if you have really large data.

You could so something like the following:

proc means data=have noprint;
  by cust date;
  var &list;
  output out=want(drop=_:) sum=;
run;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top