Question

How can I represent the position of 1 (considering only the 1s after the colon) in the word from field5 and above; counting from right to left.

Input:

TT-124 06-03-14 08-02-10 FAS   CAT1:10
TT-125-1 05-03-14 10-06-08 CAS   CAT2:1010 FAT1:10000
TT-125-3 07-03-14 11-02-06 FAS   FAT1:1101
SS-120-1 05-03-14 09-04-07 FAS   CAT3:100000
AA-121-0 06-03-14 08-03-06 CAS   FAT2:11 CAT1:101100

Output:

TT-124 06-03-14 08-02-10 FAS   CAT1:1
TT-125-1 05-03-14 10-06-08 CAS   CAT2:3 CAT2:1 FAT1:4
TT-125-3 07-03-14 11-02-06 FAS   FAT1:0 FAT1:2 FAT1:3
SS-120-1 05-03-14 09-04-07 FAS   CAT3:5
AA-121-0 06-03-14 08-03-06 CAS   FAT2:0 FAT2:1 CAT1:2 CAT1:3 CAT1:5

I have tried below script (as suggested by someone) which works in cygwin, but does not work on Solaris 10 using nawk or /usr/xpg4/bin/awk How can I implement in solaris?

awk '{for(i = 5; i <= NF; i++) {split($i, a, ":"); $i = ""; split(a[2], b, "");
  for(j = 1; j <= length(b); j++) {if(b[j] == 1)
    {$i = ($i == "") ? (a[1] ":" length(b) - j) : ($i FS a[1] ":" length(b) - j)}}};
  print $0}' file
Was it helpful?

Solution

The drawback of using a one-liner that looks like that (quite illegible) written by someone else is that it is incredibly hard for you to fix when it breaks. I've written some Perl code that should be easier to understand and easier for you to maintain.

It basically splits your string using a regex which is hardcoded to first ignore (match and print) 4 space separated fields of data, then extract the relevant data after it. This could be made a lot more specific, but you have failed to specify what format your data has, if any, so this is the most specific I would make it for now.

Then it takes the first element of the @- array $-[0] to find the position of each 1 in the numeric string of each target string, and build new strings based on the name, and each numeric match.

use strict;
use warnings;

while (<DATA>) {
    my ($pre, $data) = /^((?:\S+\s+){4})(.+)/;   # capture using regex
    print $pre;                                  # print original prefix string
    my @list;
    for (split ' ', $data) {                     # the list of CAT2:11 pairs
        my ($name, $num) = split /:/;            # separate name and number
        $num = reverse $num;                     # reverse order of numbers
        while ($num =~ /1/g) {                   # extract position of each "1" 
            push @list, "$name:$-[0]";           # make the new string with name
        }
    }
    print "@list\n";                             # print list separated by space
}

__DATA__
TT-124 06-03-14 08-02-10 FAS   CAT1:10
TT-125-1 05-03-14 10-06-08 CAS   CAT2:1010 FAT1:10000
TT-125-3 07-03-14 11-02-06 FAS   FAT1:1101
SS-120-1 05-03-14 09-04-07 FAS   CAT3:100000
AA-121-0 06-03-14 08-03-06 CAS   FAT2:11 CAT1:101100

Output:

TT-124 06-03-14 08-02-10 FAS   CAT1:1
TT-125-1 05-03-14 10-06-08 CAS   CAT2:1 CAT2:3 FAT1:4
TT-125-3 07-03-14 11-02-06 FAS   FAT1:0 FAT1:2 FAT1:3
SS-120-1 05-03-14 09-04-07 FAS   CAT3:5
AA-121-0 06-03-14 08-03-06 CAS   FAT2:0 FAT2:1 CAT1:2 CAT1:3 CAT1:5

Usage:

To try the code out, change the file handle <DATA> to <> and use:

perl script.pl input.txt > output.txt

Redirecting to output file is optional, of course.

OTHER TIPS

Try this snippet.

#!/usr/bin/perl

use strict;
use warnings;

foreach my $line (<DATA>) {
  my @args = split /\s+/, $line;

  foreach my $arg (@args) {
    if (not $arg =~ /([A-Z]+\d+)\:([0-1]+)/) {
      print "$arg ";
      next;
    }

    my @bits = split //, $2;
    my $j = $#bits;
    for my $i (0 .. $#bits) {
      print "$1:$j " if $bits[$i];
      $j--;
    }

  }
  print "\n";
}

__DATA__
TT-124 06-03-14 08-02-10 FAS   CAT1:10
TT-125-1 05-03-14 10-06-08 CAS   CAT2:1010 FAT1:10000
TT-125-3 07-03-14 11-02-06 FAS   FAT1:1101
SS-120-1 05-03-14 09-04-07 FAS   CAT3:100000
AA-121-0 06-03-14 08-03-06 CAS   FAT2:11 CAT1:101100

Output:

TT-124 06-03-14 08-02-10 FAS CAT1:1 
TT-125-1 05-03-14 10-06-08 CAS CAT2:3 CAT2:1 FAT1:4 
TT-125-3 07-03-14 11-02-06 FAS FAT1:3 FAT1:2 FAT1:0 
SS-120-1 05-03-14 09-04-07 FAS CAT3:5 
AA-121-0 06-03-14 08-03-06 CAS FAT2:1 FAT2:0 CAT1:5 CAT1:3 CAT1:2 

Your input data isn't consistent with your required output. You say you want CAT2:3 CAT2:1 for the second line, but that is not in right-to-left order, and doesn't match the other lines of output, so I assume that is a mistake.

I think this program is what you need. It expects a path to the input file as a parameter on the command line, and prints the output to STDOUT.

use strict;
use warnings;

while (<>) {

  my @fields = split;

  for (my $i = $#fields; $i >= 4; --$i) {
    my ($prefix, $suffix) = split /:/, $fields[$i];
    my @positions;
    push @positions, length($suffix) - $-[0] - 1 while $suffix =~ /1/g;
    my @expansions = map "$prefix:$_", reverse @positions;
    splice @fields, $i, 1, @expansions;
  }

  print "@fields\n";
}

output

TT-124 06-03-14 08-02-10 FAS CAT1:1
TT-125-1 05-03-14 10-06-08 CAS CAT2:1 CAT2:3 FAT1:4
TT-125-3 07-03-14 11-02-06 FAS FAT1:0 FAT1:2 FAT1:3
SS-120-1 05-03-14 09-04-07 FAS CAT3:5
AA-121-0 06-03-14 08-03-06 CAS FAT2:0 FAT2:1 CAT1:2 CAT1:3 CAT1:5
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top