SAS collapse dates

Question 1

There's a neat trick to achieve this using the UPDATE statement. The first reference to the existing table (with the obs=0) creates an empty table with the required structure, the second reference updates with the values. The BY statement ensures it only outputs one record per BY value. Hope this makes sense.

data have;
input cust date v1 v2 v3 v600;
datalines;
1    1    5 . . .
1    2    5 . . .
1    2    . 4 . .
1    2    . . 6 .
2    1    1 . . .
2    1    . 5 . .
2    2    . . . 10
;
run;

data want;
update have (obs=0) have;
by cust date;
run;

Question 2

You can't use RETAIN with the variables coming in from the dataset on the set statement; or more accurately, you can, but it won't work - variables are RETAINed automatically on variables from set statements. They are also, however, overwritten by the next iteration of the data step when the set occurs.

You can either use a temporary array to store the retained values and copy it back over when last.date (temporary arrays are also retained automatically, FYI), or you can use a different technique entirely - hash tables, SQL, whatever you're most familiar with.

For example,

proc sql;
create table want as 
  select cust, date, sum(var1) as var1, sum(var2) as var2, ... 
  from have
  group by cust,date;
quit;

You would want to construct the sum(var1) as var1 in a macro variable, something like

%macro sumvar(var=)
sum(&var.) as &var.
%mend sumvar;
proc sql;
select cats('%sumvar(var=',name,')') 
  into :sumlist separated by ','
  from dictionary.columns
  where libname='WORK' and memname='HAVE' and not (name in ('CUST','DATE'))
;
quit;

and then use that &sumlist. in the sql above.

select cust, date, &sumlist.

This is probably the easiest to code; it's probably not as efficient as other options if you have really large data.

Question 3

You could so something like the following:

proc means data=have noprint;
  by cust date;
  var &list;
  output out=want(drop=_:) sum=;
run;