Division a column in period and print min max for each in awk

https://stackoverflow.com/questions/22963993

30-06-2023
|

Question

I have a data file which content two columns. One of them have periodic variation of whom the max and min are different in each period :

We can find that in the 1st period (from a to i): max = 5, min = 1. In the 2nd period (from i to u) : max = 6, min = 0.

Using awk, I can only print the max and min of all second column, but I cannot print these values min and max after each period. That means I wish to obtain results like this :

period   min   max
1        1     5
2        0     6

Here is what I did :

{
nb_lignes = 21
period = 9
nb_periodes = int(nb_lignes/period)
}

{
for (j = 0; j <= nb_periodes; j++)
   {   if (NR == (1 + period*j)) {{max=$2 ; min=$2}}
       for (i = (period*j); i <= (period*(j+1)); i++)
           {
               if (NR == i) 
                  { 
                     if ($2 >= max) {max = $2} 
                     if ($2 <= min) {min = $2} 
                     {print "Min: "min,"Max: "max,"Ligne: " NR}
                  }
           }
   }
}
#END { print "Min: "min,"Max: "max }

However the result is far away from what I search for :

Min: 3 Max: 3 Ligne: 1
Min: 3 Max: 4 Ligne: 2
Min: 3 Max: 5 Ligne: 3
Min: 3 Max: 5 Ligne: 4
Min: 3 Max: 5 Ligne: 5
Min: 2 Max: 5 Ligne: 6
Min: 1 Max: 5 Ligne: 7
Min: 1 Max: 5 Ligne: 8
Min: 1 Max: 5 Ligne: 9
Min: 1 Max: 5 Ligne: 9
Min: 4 Max: 4 Ligne: 10
Min: 4 Max: 5 Ligne: 11
Min: 4 Max: 6 Ligne: 12
Min: 4 Max: 6 Ligne: 13
Min: 4 Max: 6 Ligne: 14
Min: 3 Max: 6 Ligne: 15
Min: 2 Max: 6 Ligne: 16
Min: 1 Max: 6 Ligne: 17
Min: 0 Max: 6 Ligne: 18
Min: 0 Max: 6 Ligne: 18
Min: 1 Max: 1 Ligne: 19
Min: 1 Max: 2 Ligne: 20
Min: 1 Max: 3 Ligne: 21

Thank you in advance for you help.

Solution

Try something like:

$ awk '
BEGIN{print "period", "min", "max"}
!f{min=$2; max=$2; ++f; next}
{max = ($2>max)?$2:max; min = ($2<min)?$2:min; f++}
f==9{print ++a, min, max; f=0}' file
period min max
1 1 5
2 0 6

When the flag f is not set, you assign the second column to max and min variables and start incrementing your flag.
For each line, check the second column. If it is bigger than our max variable assign that column to max. Like wise, if it is smaller than our min variable, assign it to our min variable. Keep incrementing the flag.
Once the flag reaches 9, print the period number, min and max variables. Reset the flag to 0 and start again afresh from next line.

OTHER TIPS

I've started, so I'll finish. I chose to create an array which contains the minimum and maximum for each period:

awk -v period=9 '
BEGIN { print "period", "min", "max" } 
NR % period == 1 { ++i } 
!min[i] || $2 < min[i] { min[i] = $2 } 
$2 > max[i] { max[i] = $2 } 
END { for (i in min) print i, min[i], max[i] }' input

The index i increases every period number of lines (in this case 9). If no value has been set yet or a new minimum/maximum has been found, update the array.

edit: if max[i] has not yet been set then $2 > max[i], so no need to check !max[i].

awk 'BEGIN{print "Period","min","max"}
     NR==1||(NR%10==0){mi=ma=$2}
     {$2<mi?mi=$2:0;$2>ma?ma=$2:0}
     NR%9==0{print ++i,mi,ma}' your_file

Tester here

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow