How to extract data from strings in file in this case, in linux?

https://stackoverflow.com/questions/20397275

29-08-2022
|

Frage

I have a file with lines that look like this:

   IBACS6XX P24 ( .PADM(TEST_3), .QC(P1_87P_Z_3) );
   OBAXXCSXX08 P77 ( .A(P1_158P_N1_PROBE_SEL), .PADM(N1_SELECT) );
   inv0_p U99 ( .A(P1_P1_2P_P1_P1_19P_Z_0), .Q(n00) );
   IBACS6XX P25 ( .PADM(TBUSREQN), .QC(tbusreqn) );
   IBACS6XX P26 ( .PADM(NX_N2N), .QC(P1_177P_Z_0) );
   OBAXXCSXX08 P27 ( .A(P1_158P_N2G6PC), .PADM(N2G6PCC) );
   OBAXXCSXX08 P28 ( .A(P1_158P_N1G6PC), .PADM(N1G6PCC) );
   IOACS3P6CSXE04 P46 ( .A(P1_158P_DOUT_7), .EN(FE_OFN21_P1_158P_DATA_OUTN), 
      .PADM(DATA_7), .MA(LTIEHI_5_NET), .MB(P1_87P_Z_0_INV), .QC(P1_49P_ZI_7) );
   IOACS3P6CSXE04 P47 ( .A(P1_158P_DOUT_6), .EN(FE_OFN21_P1_158P_DATA_OUTN), 
      .PADM(DATA_6), .MA(LTIEHI_5_NET), .MB(P1_87P_Z_0_INV), .QC(P1_49P_ZI_6) );

Now to the question, I wish to extract 3 items of data and put them into a new file separated by space character

(1) The first item e.g IBACS3XX in the first line

(2) the second item that starts with P followed by 2 digits and is usually 3 characters long. After the second item we always get an opening bracket. e.g P24 in the first line

(3) and the item between .PADM( and the closing bracket ) e.g TEST_3 in the first line

How do I do this in Linux? Do you have a better way?

The problems are:

(1) Some line are broken into two lines and thus the .PADM( may end up in the second line instead as can be seen in the last 2 examples.

(2) The .PADM( does not always apppear at the same place in the line as can be seen in the second example.

(3) All lines are not of interest, only those lines that start with IBA OBA or IOA as can be seen above. If a line does not start with these characters than it can be ignored. This is a portion of a netlist file.

All lines are "closed" with ';' symbol, otherwise they continue to the next line in text file.

I assume that awk and sed is to be used in some combination but not sure how.

EDIT:

It works perfectly, now a small step is to filter out these from the netlist as well:

 ggppxbp P74 (  );
 ggppxbp P74VDD (  );
 ggppxbg P75 (  );
 ggppxbg P75VSS (  );

I just want to discard the last braket and semicolon. These cells always start with ggppxb and the last letter tells whether it is 5v or GND connection thus the last letter only will change.

I think that I can put ggppxbp into the if statement after the || symbol. But, how do I discard the bracket and the semicolon and include the remaining two items into the output file?

Lösung

Try this awk program. It assumes that there is at most one continuation line but can be changed to handle more if needed by replacing the first if with a while I guess.

{
    if (! /;/ ) {
        L=$0
        getline
        $0=L $0
    }
    if ($1 ~ /^IBA/ || $1 ~ /^OBA/ || $1 ~ /^IOA/) {
        A=$1
        B=$2
        gsub(".*PADM\\(","")
        gsub("\\).*","")
        print A,B,$0
    }
}

To handle the additional items try:

{
    if (! /;/ ) {
        L=$0
        getline
        $0=L $0
    }
    print NR,$0
    if ($1 ~ /^IBA/ || $1 ~ /^OBA/ || $1 ~ /^IOA/ || $1 ~ /^ggppxb/ ) {
        A=$1
        B=$2
        gsub(".*PADM\\(","")
        gsub("\\).*","")
        gsub("\\(.*","")
        print A,B,$0
    }
}

If you want to learn more about awk, read the wonderful book Gawk: Effective AWK Programming.

Andere Tipps

sed -n '
/^[[:blank:]]*OBA[A-Z0-9]\{5\}/ b treat
/^[[:blank:]]*IBA[A-Z0-9]\{5\}/ b treat
/^[[:blank:]]*IOA[A-Z0-9]\{5\}/ b treat
b

: treat
   {
   s/[^;][[:blank:]]*$/&/
   t full
   N
: full
   s/^[[:blank:]]*\([A-Z0-9]\{8\}\)[[:blank:]]*\(P[0-9]\{2\}\).*[.]PADM(\([^)]*\)).*/\1 \2 \3/p
   }' YourFile

Generic for OBA, IBA, IOA

Lizenziert unter: CC-BY-SA mit Zuschreibung

Nicht verbunden mit StackOverflow