Question

I am trying to create categorical variables in sas. I have written the following macro, but I get an error: "Invalid symbolic variable name xxx" when I try to run. I am not sure this is even the correct way to accomplish my goal.

Here is my code:

%macro addvars;
proc sql noprint;
select distinct coverageid 
into :coverageid1 - :coverageid9999999
from save.test;

%do i=1 %to &sqlobs;
%let n=coverageid&i;
%let v=%superq(&n);
%let f=coverageid_&v;
%put &f;
data save.test;
 set save.test;
%if coverageid eq %superq(&v)
  %then &f=1;
  %else &f=0;
run;
%end; 
%mend addvars;
%addvars;
Was it helpful?

Solution

You're combining macro code with data step code in a way that isn't correct. %if = macro language, meaning you are actually evaluating whether the text "coverageid" is equal to the text that %superq(&v) evaluates to, not whether the contents of the coverageid variable equal the value in &v. You could just convert %if to if, but even if you got that to work properly it would be hideously inefficient (you're rewriting the dataset N times, so if you have 1500 values for coverageID you rewrite the entire 500MB dataset or whatnot 1500 times, instead of just once).

If what you want to do is take the variable 'coverageid' and convert it to a set of variables that consist of all possible values of coverageid, 1/0 binary, for each, there are a nubmer of ways to do it. I'm fairly sure the ETS module has a procedure that just does this, but I don't recall it off the top of my head - if you were to post this to the SAS mailing list, one of the guys there would undoubtedly have it quickly.

The simple way for me, is to do this with entirely datastep code. First determine how many potential values there are for COVERAGEID, then assign each to a direct value, then assign the value to the correct variable.

If the COVERAGEID values are consecutive (ie, 1 to some number, no skips, or you don't mind skipping) then this is easy - set up an array and iterate over it. I will assume they are NOT consecutive.

*First, get the distinct values of coverageID.  There are a dozen ways to do this, this works as well as any;
proc freq data=save.test;
tables coverageid/out=coverage_values(keep=coverageid);
run;

*Then save them into a format.  This converts each value to a consecutive number (so the lowest value becomes 1, the next lowest 2, etc.)  This is not only useful for this step, but it can be useful in the future in converting back.;

data coverage_values_fmt;
set coverage_values;
start=coverageid;
label=_n_;
fmtname='COVERAGEF';
type='i';
call symputx('CoverageCount',_n_);
run;
*Import the created format;
proc format cntlin=coverage_values_fmt;
quit;

*Now use the created format.  If you had already-consecutive values, you could skip to this step and skip the input statement - just use the value itself;
data save.test_fin;
set save.test;
array coverageids coverageid1-coverageid&coveragecount.;
do _t = 1 to &coveragecount.;
  if input(coverageid,COVERAGEF.) = _t then coverageids[_t]=1;
  else coverageids[_t]=0;
end;
drop _t;
run;

OTHER TIPS

Here's another way that doesn't use formats, and may be easier to follow.

First, just make some test data:

data test;
    input coverageid @@;
    cards;
3 27 99 105
;
run;

Next, create a data set with no observations but one variable for each level of coverageid. Note that this approach allows arbitrary values here.

proc transpose data=test out=wide(drop=_name_);
    id coverageid;
run;

Finally, create a new data set that combines the initial data set and the wide one. Then, for each level of x, look at each categorical variable and decide whether to turn it "on".

data want;
    set test wide;
    array vars{*} _:;
    do i=1 to dim(vars);
        vars{i} = (coverageid = substr(vname(vars{i}),2,1));
    end;
    drop i;
run;

The line

vars{i} = (coverageid = substr(vname(vars{i}),2));

may require more explanation. vname returns the name of the variable, and since we didn't specify a prefix in proc transpose, all variables are named something like _1, _2, etc. So we take the substring of the variable name that starts in the second position, and compare it to coverageid; if they're the same, we set the variable to 1; otherwise it evaluates to 0.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top