Question

I have a dataset as follows:

1 16.60 4923 1198 29663 1927 Davis California
2 10.09 3055 883 30282 1989 Palo Alto California 
3 9.61 5128 1096 53388 2489 Boulder Colorado 
4 8.85 4674 1210 52815 2600 Berkeley California 
5 7.28 4793 1051 65794 3408 Eugene Oregon 
6 6.64 5112 1215 76972 3579 Fort Collins Colorado 
7 6.59 3125 1608 47451 2701 Santa Barbara California 
8 6.41 2433 1078 37968 2671 Chico California 

Where the variables are bike_share, total_biker, error_biker, total_worker, error_worker, city, state, respectively. For this dataset, I want to calculate the average% of Californians that ride bikes to work (i.e, I want to sum up the total_biker that are from state California and divide that number by sum of total_worker).

First I want to identify the people who are Californians, so I wrote the following do loop to find the number of total_bike that are from California. enter image description here total_bike_cali=0; do i=1 to 8; if state="California" then total_bike_cali=total_bike_cali+total_bike; else total_bike_cali=0; end; proc print; run;

However this gave me an error

 Statement is not valid or it is used out of proper order.

For pretty much every line of the loop. Can someone please tell me what I did wrong?

Was it helpful?

Solution

You can do this a number of ways. Here are two.

proc sql;
select sum(total_biker)/sum(total_worker) 
  from your_data
  where state='California';
quit;

or

data have;
length city state $20;
input bike_share total_biker error_biker total_worker error_worker city $ state $;
datalines;
16.60 4923 1198 29663 1927 Davis California
10.09 3055 883 30282 1989 Palo Alto California 
9.61 5128 1096 53388 2489 Boulder Colorado 
8.85 4674 1210 52815 2600 Berkeley California 
7.28 4793 1051 65794 3408 Eugene Oregon 
6.64 5112 1215 76972 3579 Fort Collins Colorado 
6.59 3125 1608 47451 2701 Santa Barbara California 
6.41 2433 1078 37968 2671 Chico California
;;;;
run;

data want;
set have end=eof;
if state='California' then do;
  total_worker_cali+total_worker;
  total_biker_cali+total_biker;
  put _all_;
end;
if eof then do;
  total_cali_pct = total_biker_cali/total_worker_cali;
  output;
end;
run;

You could also set the data up so that PROC MEANS or PROC TABULATE could do the table for you; that would be the way to do it if you wanted each state, not just one specific one.

Here's an example of doing it with PROC MEANS. This isn't quite as accurate, because bike_share is rounded; if you recalculate it from the original variables it will be more accurate.

proc means data=have;
class state;
weight total_worker;
var bike_share;
run;

Here you use total_worker as the weight, to bring the summarized dataset back to looking like an unsummarized dataset with one row per worker.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top