Question

I have a similar situation to the question asked here. However, I don't want to list my 300 variable names in the var statement since they are all unique. Is there a way to use proc means or proc summary to output summary statistics for all the numeric variables in one data set?

I've tried:

proc means data=my_data min median max;
    output out=summary_data min=min median=median max=max;
run;

But this only outputs the summary statistics for the first variable. I have also tried with the help of ods trace:

proc means data=my_data min median max;
    ods output Summary=summary_data;
run;

Which gives me the summary statistics for all the variables, but still in one row:

VName_VAR1 VAR1_Minimum VAR1_Median VAR1_Maximum VName_VAR2 VAR2_Minimum etc...
VAR1       3            3           3            VAR2       3         

Where my VAR names are all unique. Is there some other way to use proc means or proc summary to output summary statistics for all the numeric variables in one data set?

UPDATE:

When I removed min=min median=median max=max:

proc means data=my_data min median max;
    output out=summary_data;
run;

The code then produces the output:

 Obs  _TYPE_ _FREQ_ _STAT_   VAR_1    VAR_2 ... etc

 1    0      91     N          91.00  91    ... etc
 2    0      91     MIN      2005.00  13         .
 3    0      91     MAX      2014.00  13         .
 4    0      91     MEAN     2009.34  13         .
 5    0      91     STD         3.02   0

However, it still doesn't give me the MEDIAN.

Was it helpful?

Solution

When I transpose the data before using proc means I get the desired output.

proc sort data=sashelp.cars out=cars; by _character_;run;

proc transpose data=cars out=cars_t;
  var _numeric_;
  by _character_;
run;

proc sort data=cars_t;by _name_;run;

proc means data=cars_t noprint;
  output out=cars_summary(drop = _type_ _freq_) min=min median=median max=max;
  by _name_;
run;

The code then produces the output:

Obs    _NAME_             min     median         max

 1    Cylinders          3.0        6.0        12.0
 2    EngineSize         1.3        3.0         8.3
 3    Horsepower        73.0      210.0       500.0
 4    Invoice         9875.0    25294.5    173560.0
 5    Length           143.0      187.0       238.0
 6    MPG_City          10.0       19.0        60.0
 7    MPG_Highway       12.0       26.0        66.0
 8    MSRP           10280.0    27635.0    192465.0
 9    Weight          1850.0     3474.5      7190.0
10    Wheelbase         89.0      107.0       144.0

This works if you have a unique id for each row in your original data.

OTHER TIPS

If you're just after min / med / max then the following will work (such that you don't have to name the variables):-

ods output quantiles = quantiles;
proc univariate data = sashelp.cars;
  var _numeric_;
proc sort;
  by varname;
run;

proc transpose data = quantiles out = quan_tran (drop=_name_ rename=(_100__max = max _50__median = median _0__min = min));
  by varname;
  var estimate;
  id quantile;
  where quantile in: ('100', '50', '0');
run;

If you want other types of measures - mean, std, etc. - proc univariate outputs them in separate datasets meaning you'd have merge tables and etc. etc. - it turns into a pain again.

The output datasets from SAS can really be puzzlingly bad with proc means, for me, being the most egregious example.

Why not use the stackods option in the means statement?

ods listing close;
ods output summary=s;
proc means data=mydata stackods min median max;
run;
ods output close;
ods listing;
proc print;
run;

UPDATED

Here is a macro-based solution, with new step-by-step comments added. It uses metadata from the SAS dictionary.columns to discover all numeric variables in a dataset. Basically, I take the MIN, MEDIAN, and MAX of all the numeric variables, outputting the results in three separate datasets. I then concatenate the datasets, using the IN variable to figure out where each row is coming from and thus labeling it with the appropriate statistic name. The output is then three rows and n columns.

As the OP demonstrated in his answer, the whole macro / meta-data thing to get the numeric variables can all be replaced by simply using the special _NUMERIC_ variable. I will leave my current approach in place in case someone is interested in using it for other things.

Furthermore, the OP's answer is a macro-free solution that uses PROC TRANSPOSE to get to the same place as this one, without any concatenation of separate result sets. I urge all readers to review it as it is more "SAS-like".

%GLOBAL 
    var_names 
    dsn_temp_min
    dsn_temp_median
    dsn_temp_max
; 
%LET dsn_temp_min = min_summary ;
%LET dsn_temp_median= med_summary;
%LET dsn_temp_max= max_summary;

/* Identify dataset */
%LET lib_name = WORK ;  /* change to your library */
%LET dsn = my_data ;

/* Retrieve numeric variable names from SAS metadata and store in `var_name` */
/* macro variable. Library and dataset name must be upper-case since that is */
/* how they are stored in `dictionary.columns`. */
/* UPDATE: this all can be avoided by just using the _NUMERIC_ special variable */
/* but I am leaving this in here in case anyone is interested in querying */
/* meta-data for other purposes. */

%LET lib_name = %UPCASE (&lib_name);
%LET dsn = %UPCASE (&dsn);

PROC SQL NOPRINT;
    SELECT name
    INTO :var_names SEPARATED BY ' '
    FROM dictionary.columns
    WHERE libname = "&lib_name"
    AND memname = "&dsn"
    AND type ^= "char"
;
QUIT;
RUN;

/* Take the MIN of all numeric variables and store in a separate dataset */
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
    OUTPUT OUT=&dsn_temp_min (DROP = _TYPE_ _FREQ_)
        MIN (&var_names) = 
    ;
RUN;

/* Take the MEDIAN of all numeric variables and store in a separate dataset */    
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
    OUTPUT OUT=&dsn_temp_median (DROP = _TYPE_ _FREQ_)
        MEDIAN (&var_names) = 
    ;
RUN;

/* Take the MAX of all numeric variables and store in a separate dataset */        
PROC MEANS DATA = &lib_name..&dsn NOPRINT ;
    OUTPUT OUT=&dsn_temp_max (DROP = _TYPE_ _FREQ_)
        MAX (&var_names) = 
    ;
RUN;


/* Concatenate the three separate datasets into one.  Use IN to figure out */
/* where each row is coming from, and label appropriately */
DATA summary_data;
    LENGTH stat $6 ;

    RETAIN
        stat &var_names
    ;

    SET 
        &dsn_temp_min (IN=s1)
        &dsn_temp_median (IN=s2)
        &dsn_temp_max (IN=s3)
    ;

    IF (s1) THEN DO;
        stat = "MIN" ;
    END;
    ELSE IF (s2) THEN DO;
        stat = "MEDIAN" ;
    END;
    ELSE IF (s3) THEN DO;
        stat = "MAX" ;
    END;

    LABEL stat = "Statistic";
RUN;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top