AWK program to find the average rainfall of three states

https://stackoverflow.com/questions/3950778

awk
gawk

08-10-2019
|

Question

I want to find the average rainfall of any three states say CA, TX and AX for a particular month from Jan to Dec . Given input file delimited by TAB SPACES and has the format city name, the state , and then average rainfall amounts from January through December, and then an annual average for all months. EG may look like

AVOCA   PA  30  2.10    2.15    2.55    2.97    3.65    3.98    3.79    3.32     3.31   2.79    3.06    2.51    36.18
BAKERSFIELD CA  30  0.86    1.06    1.04    0.57    0.20    0.10    0.01    0.09    0.17    0.29    0.70    0.63    5.72

What I want to do is "To get the sum of average rainfall for say a particular month feb , over say n years and then find its average for the states CA, TX and AX.

I have written the below script in awk to do the same , but it doesn't give me the expected output

/^CA$/ {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/^TX$/ {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/^AX$/ {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 
END {
     CA_avg = CA_SUM/CA;
     TX_avg = TX_SUM/TX;
     AX_avg = AX_SUM/AX; 
     printf("CA Rainfall: %5.2f",CA_avg);
     printf("CA Rainfall: %5.2f",TX_avg);
     printf("CA Rainfall: %5.2f",AX_avg);
    }

I invoke the program with the command awk 'FS="\t"'-f awk1.awk rainfall.txt and see no output.

Question: Where am I slipping? Any suggestions and a changed code will be appreciated

Solution

your regexp should be

/ CA / {CA++; cA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only

/^AX$/ match only if it is the only word in the line

HTH!

EDIT

/ CA / {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only 
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only  
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only 
END {

 if(CA!=0){CA_avg = CA_SUM/CA;     printf("CA Rainfall: %5.2f",CA_avg);}
 if(TX!=0){TX_avg = TX_SUM/TX;     printf("TX Rainfall: %5.2f",TX_avg);}
 if(AX!=0){TX_avg = AX_SUM/CA;     printf("AX Rainfall: %5.2f",AX_avg);}
}

OTHER TIPS

The pattern /^CA$/ means the characters "C" and "A" are the only characters on the line. You want:

$2 == "CA" {CA++; CA_SUM+= $5}
# etc.

However, this is DRYer:

{ count[$2]++; sum[$2] += $5 }
END {
    for (state in count) {
        printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
    }
}

Also, this looks wrong: awk 'FS="\t"'-f awk1.awk rainfall.txt
try: awk -F '\t' -f awk1.awk rainfall.txt

Response to comments:

awk -F '\t' -v month=2 -v states="CA,AZ,TX" '
    BEGIN {
        month_col = month + 3  # assume January is month 1
        split(states, wanted_states, /,/)
    }
    { count[$2]++; sum[$2] += $month_col }
    END {
        for (state in wanted_states) {
            if (state in count) {
                printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
            else
                print state " Rainfall: no data"
        }
    }
' rainfall.txt

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow