Pregunta

I am trying to identify the IDs that had three or more services performed within a 90-day period. I have columns: service date, ID, service, and other types of demographic info. Could you please help me on this?

Thank you!

¿Fue útil?

Solución

Sketch of one possible solution:

  • Sort your dataset by ID and date
  • For each ID, work through all services in sequence, using lag / retain / dow-loop to compare the dates of the 2 previous services to the current one.
  • If both differences are less than 90 days then output that ID.

This might give you some IDs more than once, but you can easily get rid of any duplicates with a second pass, or by skipping to the next ID if the current one is output.

Here's a stab at this using a DOW-loop - I think this does what you want for the example data I've used below. Let me know if you find any cases where it doesn't work as expected.

data have;
  format service_date date9.;
  informat id 8. service_date date9. service $1.;
  input id service_date service;
  datalines;
1 01jan2013 a
1 01feb2013 b
1 14feb2013 c
1 15feb2013 d
2 01mar2013 a
2 01mar2013 a
2 01oct2013 a
2 01oct2013 a
;
run;

data want;
  array dates[3];
  do _n_ = 1 by 1 until (last.ID);
    set have;
    by ID;
    dates[mod(_n_,3)+1] = service_date;
    if _n_ >= 3 then do;
      if intnx('month', dates[mod(_n_,3)+1],-3) <= min(dates[mod(_n_-1,3)+1], dates[mod(_n_-2,3)+1]) then do;
        output;
        delete;
      end;
    end;
  end;
run;

I seem to have inadvertently set this up to spot instances where you have 3 or more services within a 3-month period rather than a 90-day period, but that can easily be changed.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top