Question

My data looks like:

ID      YEAR    A   B
1078    1989    1   0
1078    1999    1   1
1161    1969    0   0
1161    2002    1   1
1230    1995    0   0
1230    2002    0   1
1279    1996    0   0
1279    2003    0   1
1447    1993    1   0
1447    2001    1   1
1487    1967    0   0
1487    2008    1   1
1487    2008    1   0
1487    2009    0   1
1678    1979    1   0
1678    2002    1   1
1690    1989    1   0
1690    1993    0   1
1690    1993    0   0
1690    1996    0   1
1690    1996    0   0
1690    1997    1   1

I'd like to create two dummy variables, new and X, the scenarios are as follows: within each ID-B pair (a pair is 2 observations one with B=0 and the other B=1 with YEAR closet together in sequence)

  1. if the observation with B=1 has a value of 1 for A then new=1 for both observations in that pair, otherwise it is 0 for both observations in that pair, and
  2. if the pair has the same value in A then X=0 and if they have different values then X=1.

Therefore, the output would be:

ID      YEAR    A   B   new X
1078    1989    1   0   1   0
1078    1999    1   1   1   0
1161    1969    0   0   1   1
1161    2002    1   1   1   1
1230    1995    0   0   0   0
1230    2002    0   1   0   0
1279    1996    0   0   0   0
1279    2003    0   1   0   0
1447    1993    1   0   1   1
1447    2001    1   1   1   1
1487    1967    0   0   1   1
1487    2008    1   1   1   1
1487    2008    1   0   0   1
1487    2009    0   1   0   1
1678    1979    1   0   1   0
1678    2002    1   1   1   0
1690    1989    1   0   0   1
1690    1993    0   1   0   1
1690    1993    0   0   0   0
1690    1996    0   1   0   0
1690    1996    0   0   1   1
1690    1997    1   1   1   1

My codes are

data want;
 set have;
 by ID;
 if B=1 and A=1 then new=1;
 else new=0;
run;

proc sql;
    create table out as
    select a.*,max(a.B=a.A & a.B=1) as new,^(min(A)=max(A)) as X
    from have a
    group by ID;quit;

The first one doesn't work and the second one reordered variable B. I am stuck here. Any help will be greatly appreciated.

Was it helpful?

Solution

Very manual solution, I just used the retain statement to identify the pairs (dataset already in the required order).

    data start;
        set start;
        retain pair 0;
        if B=0 then pair=pair+1;
    run;

    data ForNew;
        set start(where=(B=1));
        New=(A=B);  /*Boolean variable=1 if the condition in brackets is true*/
        keep pair New;
    run;
/*if A has equal values mean will be 0 or 1*/
    proc means data=start NWAY NOPRINT;
        class pair;
        var A;
        output out=ForX(drop=_: where=(media in (0,1)) keep=pair media) mean(A)=media;
    run;

    data end;
        merge start ForNew ForX(in=INX drop=media);
        by pair;
        X=(^INX);
    run;

OTHER TIPS

You need to do some research into first./last. processing and the lag function.

The helpful guys here have already gotten you to this point, maybe take this as an opportunity to read the documentation at SAS' Support Site.

At a high level:

  • You need a conditional statement to step through each observation in an ID group
  • Find out how many observations are in that group (let's say N obs)
  • Flag up if any obs match the logic you mentioned
  • Lag back N obs and set your new to 1 or 0 respectively
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top