Question

I have a dataset with from and to dates of registration for a group of users. I would like to programmatically find which months lie in between those dates for each user, without having to hard code in any months, etc. I only want a summary of numbers registered in each month, so if that makes it quicker, so much the better.

E.g. I have something like

User-+-From-------+-To-----------------
A    + 11JAN2011  + 15MAR2011
A    + 16JUN2011  + 17AUG2011
B    + 10FEB2011  + 12FEB2011
C    + 01AUG2011  + 05AUG2011

And I want something like

Month---+-Registrations
JAN2011 + 1 (A)
FEB2011 + 2 (AB)
MAR2011 + 1 (A)
APR2011 + 0
MAY2011 + 0
JUN2011 + 1 (A)
JUL2011 + 1 (A)
AUG2011 + 2 (AC)

Note I don't need the bit in brackets; that was just to try and clarify my point.

Thanks for any help.

Was it helpful?

Solution

One easy way is to construct an intermediate dataset and then PROC FREQ.

data have;
informat from to DATE9.;
format from to DATE9.;
input user $ from to;
datalines;
A     11JAN2011   15MAR2011
A     16JUN2011   17AUG2011
B     10FEB2011   12FEB2011
C     01AUG2011   05AUG2011
;;;;
run;

data int;
set have;
_mths=intck('month',from,to,'d');  *number of months after the current one (0=current one). 'd'=discrete=count 1st of month as new month;
do _i = 0 to _mths; *start with current month, iterate over months;
  month = intnx('month',from,_i,'b');
  output;
end;
format month MONYY7.;
run;

proc freq data=int;
tables month/out=want(keep=month count rename=count=registrations);
run;

You can eliminate the _mths step by doing that in the do loop.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top