Selecting elements of two column whose difference is less than some given value using awk

StackOverflow https://stackoverflow.com/questions/23467874

  •  15-07-2023
  •  | 
  •  

Domanda

While doing post processing for a numerical analysis, I have the following problem of selection of data :

time_1     result_1              time_2       result_2
1          10                    1.1          10.1
2          20                    1.6          15.1
3          30                    2.1          20.1
4          40                    2.6          25.1
5          50                    3.1          30.1
6          60                    3.6          35.1
7          70                    4.1          40.1
8          80                    4.6          45.1
9          90                    5.1          50.1
10         100                   5.6          55.1
                                 6.1          60.1
                                 6.6          65.1
                                 7.1          70.1
                                 7.6          75.1
                                 8.1          80.1
                                 8.6          85.1
                                 9.1          90.1
                                 9.6          95.1
                                 10.1         100.1

This file has 4 columns, the first column (time_1) represents the calculated instants of a program 1, the second column (result_1) is the results calculated for each instant.

The third column (time_2) represents represents the calculated instants of another program, the fourth column (result_2) is the results calculated for each instant of this program 2.

Now I wish to select only the instants of the third column (time_2) that is very near the instants of the first column (time_1), the difference admitted is less than or equal to 0.1. For example :

for the instant 1 of the time_1 column, I wish to select the instant 1.1 of the time_2 column, because (1.1 - 1) = 0.1, I do not want to select the others instants of the time_2 column because (1.6 - 1) > 0.1, or (2.1 - 1) > 0.1

for the instant 2 of the time_1 column, I wish to select the instant 2.1 of the time_2 column, because (2.1 - 2) = 0.1, I do not want to select the others instants of the time_2 column because (2.6 - 1) > 0.1, or (3.1 - 1) > 0.1

At the end, I would like to obtain the following data :

time_1     result_1              time_2       result_2
1          10                    1.1          10.1
2          20                    2.1          20.1
3          30                    3.1          30.1
4          40                    4.1          40.1
5          50                    5.1          50.1
6          60                    6.1          60.1
7          70                    7.1          70.1
8          80                    8.1          80.1
9          90                    9.1          90.1
10         100                   10.1         100.1

I wish to use awk but I have not been familiarized with this code. I do not know how to fix an element of the first column then compare this to all elements of the third column in order to select the right value of this third column. If I do very simply like this, I can print only the first line :

{if (($3>=$1) && (($3-$1) <= 0.1)) {print  $2, $4}}

Thank you in advance for your help !

È stato utile?

Soluzione 2

One thing to be aware of: due to the vagaries of floating point numbers, comparing a value to 0.1 is unlikely to give you the results you're looking for:

awk 'BEGIN {x=1; y=x+0.1; printf "%.20f", y-x}'
0.10000000000000008882⏎            

here, y=x+0.1, but y-x > 0.1

So, we will look at the diff as diff = 10*y - 10x:

Also, I'm going to process the file twice: once to grab all the time_1/result_1 values, the second time to extract the "matching" time_2/result_2 values.

awk '
    NR==1   {print; next} 
    NR==FNR {if (NF==4) r1[$1]=$2; next} 
    FNR==1  {next}
    {
        if (NF == 4) {t2=$3; r2=$4} else {t2=$1; r2=$2}
        for (t1 in r1) {
            diff = 10*t1 - 10*t2; 
            if (-1 <= diff && diff <= 1) {
                print t1, r1[t1], t2, r2
                break
            }
        }
    }

' ~/tmp/timings.txt ~/tmp/timings.txt | column -t
time_1  result_1  time_2  result_2
1       10        1.1     10.1
2       20        2.1     20.1
3       30        3.1     30.1
4       40        4.1     40.1
5       50        5.1     50.1
6       60        6.1     60.1
7       70        7.1     70.1
8       80        8.1     80.1
9       90        9.1     90.1
10      100       10.1    100.1

Altri suggerimenti

You can try the following perl script:

#! /usr/bin/perl

use strict;
use warnings;
use autodie;
use File::Slurp qw(read_file);

my @lines=read_file("file");

shift @lines; # skip first line

my @a;

for (@lines) {
    my @fld=split;
    if (@fld == 4) {
        push (@a,{id=>$fld[0], val=>$fld[1]});
    }
}

for (@lines) {
    my @fld=split;
    my $id; my $val;
    if (@fld == 4) {
        $id=$fld[2]; $val=$fld[3];
    } elsif (@fld == 2) {
        $id=$fld[0]; $val=$fld[1];
    }
    my $ind=checkId(\@a,$id);
    if ($ind>=0) {
        $a[$ind]->{sel}=[] if (! exists($a[$ind]->{sel}));
        push(@{$a[$ind]->{sel}},{id=>$id,val=>$val});
    }
}

for my $item (@a) {
    if (exists $item->{sel}) {
        my $s= $item->{sel};
        for (@$s) {
            print $item->{id}."\t".$item->{val}."\t";
            print $_->{id}."\t".$_->{val}."\n";
        }
    }
}


sub checkId { 
    my ($a,$id) = @_;

    my $dif=0.1+1e-10;

    for (my $i=0; $i<=$#$a; $i++) {
        return $i if (abs($a->[$i]->{id}-$id)<=$dif)
    }
    return -1;
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top