Question

Running into a brainfart and am seeking guidance/tip. I have two files, I need to check the value of two fields, compare them, if they match, throw them into a third file:

File one:

tail -n 1 file_A

09/03/2013:11:55:49 S 7187187187@2.3.4.5 9999999@6.7.8.9 ThisPlace:Washington 0 09/03/2013:12:05:27 578

File two:

head -n 2 file_B
7187187187,"OfficeA"
9999999,"OfficeB"

Desired result:

more desired_result
09/03/2013:11:55:49 S 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578

I thought about a shell script in a loop matching on each instance, but I am sure there is a method to match two fields a line using awk.

awk -F"@" 'NR==FNR{a[$1]=$2;next}{if ($2 in a)print a[$2]";"$0}'  fileA fileB 

Nope, I have tried a variety of diff ORS, FS, NR combinations to where I am stumped and am sure I am overlooking something

EDITED

@jotne

$ more fileA
08/22/2013:09:21:33 E 9876543210@10.25.50.33 3333444488@10.100.10.3 EProv:EastProvidence_RI 1 08/22/2013:09:21:33 0
09/03/2013:05:09:58 S 5556666777@10.30.239.18 8877887788@10.50.25.1 Tacoma:Washington 0 09/03/2013:13:29:31 29973
09/03/2013:10:46:19 S 3333444488@10.11.12.13 7777777777@10.17.19.2 Boston:MA 0 09/03/2013:12:01:28 4509
09/03/2013:10:49:54 S 1111122222@10.20.30.1 99999888888@10.20.30.1 Gaith:MD 0 09/03/2013:12:09:26 4772
09/03/2013:10:49:54 S 1111122222@10.20.30.1 57778889999@10.20.30.1 Balt:MD 1 09/03/2013:12:09:26 4772

$ more fileB
3333444488,"Providence_Route"
5556666777,"Kenosha_Route"
9999988888,"Chitown_Route"
7778889999,"Chitown_Route"

Here is the gist of it. These are telephone numbers (CDR) I am trying to match up. The numbers I have listed in fileB and are structured as:

telnumber,"Which_Session_Border_Controller_Its_Routing_Through"

I am trying to say: Look for all these numbers in fileA: have a look in fileB, if you see a match on fileA's $3 or $4 substitute whatever comes after the @ sign for the name of the route.

While it may seem easier for me to just perl -pi -e 's:10.20.30.1:ChitownRoute:g' fileA the addresses I used are sanitized and fluctuate so even attempting to fix those is a headache in itself. I would post more examples, but fileA is 1GB and fileB has about 44k lines

Was it helpful?

Solution

Try this:

awk -F'[, ]' '
NR==FNR{
    gsub(/\"/,"",$2);
    a[$1]=$2; 
    next
}   
{
    split($3,t,/@/);
    $3=t[1]"@"a[t[1]];
    split($4,t,/@/);
    $4=t[1]"@"a[t[1]]
}1' fileb filea
09/03/2013:11:55:49 S 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578

OTHER TIPS

Here is an awk version:

awk 'FNR==NR {split($1,f,"[,\"]");a[f[1]]=f[3];next} {for (i in a) for (j=1;j<=NF;j++) if ($j~i) $j=i"@"a[i]}1' fileB fileA
09/03/2013:11:55:49 S 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578

This solution will loop trough all element in fileA, and test them against data in fileB.

More readable:

awk '
FNR==NR {
    split($1,f,"[,\"]")
    a[f[1]]=f[3]
    next} 

    {
    for (i in a)
        for (j=1;j<=NF;j++)
            if ($j~i)
                $j=i"@"a[i]
        }
1
' fileB fileA

With fileA:

09/03/2013:11:55:49 S 7187187187@2.3.4.5 9999999@6.7.8.9 ThisPlace:Washington 0 09/03/2013:12:05:27 578

and fileB

7187187187,"OfficeA"
9999999,"OfficeB"

This gives:

09/03/2013:11:55:49 7187187187@OfficeA 9999999@OfficeB ThisPlace:Washington 0 09/03/2013:12:05:27 578

Result from new files:

08/22/2013:09:21:33 E 9876543210@10.25.50.33 3333444488@Providence_Route EProv:EastProvidence_RI 1 08/22/2013:09:21:33 0
09/03/2013:05:09:58 S 5556666777@Kenosha_Route 8877887788@10.50.25.1 Tacoma:Washington 0 09/03/2013:13:29:31 29973
09/03/2013:10:46:19 S 3333444488@Providence_Route 7777777777@10.17.19.2 Boston:MA 0 09/03/2013:12:01:28 4509
09/03/2013:10:49:54 S 1111122222@10.20.30.1 9999988888@Chitown_Route Gaith:MD 0 09/03/2013:12:09:26 4772
09/03/2013:10:49:54 S 1111122222@10.20.30.1 7778889999@Chitown_Route Balt:MD 1 09/03/2013:12:09:26 4772
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top