Domanda

I have a dataset with over 250 variables. I've created several new variables that need to be placed in specific locations (i.e. as the 37th variable, or the 113th variable, etc), however they are being added to the very end of the list.

I've researched the retain statement, but it requires me to list ALL the variables in the order I want. Can anyone suggest a shortcut to this? Here's some code:

data &CRF._1 (drop= studyParticipantCode                
                formid 
                participantID
                formStatusID 
                contactItemID
                lastTab
                phaseID  
                notCompleted 
                notCompletedReasonID 
                notCompletedReasonOther);
retain patid cycleID OwnerTypeID &Qn._MM -- &Qn._YYYY &Qn._MDY &Qn._INTERVIEWER -- &Qn._TIMEENDED &Qn._TIMETOTAL
        &Qn._1 -- &Qn._12AYYYY &Qn._12MDY &Qn._13 -- &Qn._13AYYYY &Qn._13MDY &Qn._14 -- &Qn._14AYYYY &Qn._14MDY
        &Qn._14b1 -- &Qn._15AYYYY &Qn._15aMDY &Qn._15B -- &Qn._15BYYYY &Qn._15bMDY &Qn._15C -- &Qn._15CYYYY &Qn._15cMDY
        &Qn._15D -- &Qn._15DYYYY &Qn._15dMDY  &Qn._16 -- &Qn._31A3YYYY &Qn._31aMDY &Qn._31A4A -- &Qn._31B3YYYY
        &Qn._31bMDY &Qn._31B4A -- &Qn._31C3YYYY &Qn._31cMDY &Qn._31C4A -- &Qn._31D3YYYY &Qn._31dMDY &Qn._31D4A -- &Qn._31E3YYYY
        &Qn._31eMDY &Qn._31E4A -- &Qn._31F3YYYY &Qn._31fMDY &Qn._31F4A -- &Qn._31G3YYYY &Qn._31gMDY &Qn._31G4A -- &Qn._31H3YYYY
        &Qn._31hMDY &Qn._31H4A -- &Qn._31I3YYYY &Qn._31iMDY;
set &CRF.;
Site            = substr(patid,6,4);
Sitecycle       = strip(Site)||strip(put(&byvar.,5.));
%inc labels;
%inc formats;

I tried the varN -- varM because there may be anywhere from 3 to 20 variables between the two that I don't want to type out (as I will be repeating this for multiple datasets). Here is the error I'm producing:

ERROR: Variable Q11_MM cannot be found on the list of previously defined variables. ERROR: Variable Q11_INTERVIEWER cannot be found on the list of previously defined variables. ERROR: Variable Q11_1 cannot be found on the list of previously defined variables. etc...

Any help would be greatly appreciated.

-Brandon

È stato utile?

Soluzione

You can't use the -- (double-dash) notation because the only reason retain works is that it operates before the dataset's variables enter the PDV; once SAS sees those variables, it assigns them in the order it sees them, and you can't change their position. However, double-dash notation requires already having those variables in the PDV, so the two concepts (reordering variables, and double-dash notation) conflict.

There isn't a great solution to what you're trying to do entirely within SAS. The simplest solution is to use a proc contents output, or similarly dictionary.columns in SQL, to get the list; but you still will have to somehow add to that.

The best solution I recommend is to create an excel spreadsheet (or a CSV or similar) that contains your variables, in the order you want them in. You can produce this initially from PROC CONTENTS with the VARNUM option, which orders the variable in the current variable order (rather than alphabetical).

Then import that spreadsheet, and use it in the RETAIN statement.

proc import file="mydatadictionary.xlsx" out=datadict dbms=excel replace;
run;

proc sql;
select name into :orderlist separated by ' '
from datadict
where active=1
order by var_order;
quit;

data want;
retain &orderlist.;
set have;
run;

The above assumes that your data dictionary spreadsheet (what I call this - also contains information about the variables, formats, etc.) has columns name (variable name), var_order (order in the dataset), and active which is 1 or blank (active variables 1, no longer active variables 0 or blank).

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top