Question

I need some help @ a script in perl

I have the following file "etichete":

0.000000 8.700000 speech_L1
8.710000 27.300000 speech_L2 
27.310000 27.600000 speech_L3 
31.210000 37.210000 speech_L4 
37.220000 61.210000 speech_L5

I need to create "etichete.rttm" file using "etichete" like this:

SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L1 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L2 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L3 <NA>
SPEAKER etichete 1 0.000 8.556 <NA> <NA> speech_L1 <NA>
SPEAKER etichete 1 8.556 21.063 <NA> <NA> speech_L2 <NA>
SPEAKER etichete 1 32.304 9.515 <NA> <NA> speech_L3 <NA>
SPEAKER etichete 1 42.049 0.767 <NA> <NA> speech_L1 <NA>

This is my code (I think my mistakes are near creating the matrix @rttm):

#!/usr/bin/perl -w

use List::MoreUtils qw(uniq);
use File::Path qw(make_path);
use File::Copy "cp";
use warnings;
use autodie;  


open my $fh, "etichete" or die $!;

$nume="etichete"; 
my @file_array;
while (my $line = <$fh>) {
    chomp $line;
    my @line_array = split(/\s+/, $line);
    push (@file_array, \@line_array);
}

my @arr=@file_array;
my $arrSize = @arr;


@speakers=$arr[0][2];
$j=0;
while ($j < $arrSize)
{   
    push(@speakers, $arr[$j][2]);
    $j++;   
}

my @uniq;
foreach my $x (@speakers){
        push @uniq, $x if !grep{/^$x$/}@uniq;
}

my $s1= @uniq;
my @rttm=();

$contorlinie1=0;
while ($contorlinie1 < $s1){
     $rttm[$contorlinie1][0]="SPKR-INFO";
     $rttm[$contorlinie1][1]="$nume";
     $rttm[$contorlinie1][2]="1";   
     $rttm[$contorlinie1][3]="<NA>";
     $rttm[$contorlinie1][4]="<NA>";
     $rttm[$contorlinie1][5]="<NA>";
     $rttm[$contorlinie1][6]="unknown";
     $rttm[$contorlinie1][7]="$uniq[$contorlinie1]";
     $rttm[$contorlinie1][8]="<NA>";
    $contorlinie1++;
    }
$contorlinie2=$s1;
while ($contorlinie2 < $arrSize)
     {
        $rttm[$contorlinie2][0]="SPEAKER";
     $rttm[$contorlinie2][1]="$nume";
     $rttm[$contorlinie2][2]="1";   
     $rttm[$contorlinie2][3]="$arr[$contorlinie2][0]";
     $rttm[$contorlinie2][4]="$arr[$contorlinie2][1]";
     $rttm[$contorlinie2][5]="<NA>";
     $rttm[$contorlinie2][6]="<NA>";
     $rttm[$contorlinie2][7]="$arr[$contorlinie2][2]";
     $rttm[$contorlinie2][8]="<NA>";
    $contorlinie2++;
}


open my $fh1,">etichete.rttm" or die $!;
foreach(@rttm)
    {
    print $fh1 "$-\n";
    }
close $fh1; 

When I run the script it creates an file filled with zeros on every line and when i put to print matrix is something like this:

ARRAY(0x10b13d8)
ARRAY(0x10b14e0)
ARRAY(0x10b15e8)
ARRAY(0x1038f78)
ARRAY(0x1039080)
Was it helpful?

Solution

Your example code doesn't produce the output you showed. You probably changed $_ to $-.

The reason is @rttm is an array of arrays. You can't print an array reference directly to get the inner array, you have to dereference it first:

print $fh1 "@$_\n";

BTW, when you use autodie, there's no need to add or die after open.

OTHER TIPS

This alternative program may help you.

As far as I ca tell, what you need is an output record for each unique speaker labelled SPKR-INFO, followed by a reformatted version of the original lines labelled SPEAKER.

The input data you show doesn't seem to correspond with your required output. My program below uses this input

 0.000  8.556 speech_L1
 8.556 21.063 speech_L2
32.304  9.515 speech_L3
42.049  0.767 speech_L1

The biggest change is that I have abandoned the @rttm array, as on the face of it you can just print each line to the output file as you come to it.

I have also removed the awkward while loops that iterate over the array indices. Because there is no need for the value of the index except to access the array element it is simpler and clearer just to interate over the array values directly.

Note also that, if you have autodie in place, there is no need to test the success of open calls with an or die....

Since you have included the List::MoreUtils module, I have used the uniq function instead of coding it using a @uniq array

use strict;
use warnings;
use autodie;

use List::MoreUtils qw(uniq);

open my $fh, '<', 'etichete';

my $nume = 'etichete';

my @file;
while (<$fh>) {
  push @file, [ split ];
}

my @unique_speakers = sort { $a cmp $b } uniq map $_->[2], @file;

open my $out, '>', 'etichete.rttm';

for my $speaker (@unique_speakers) {
  print $out join(' ', 'SPKR-INFO', $nume, '1', '<NA>', '<NA>', '<NA>', 'unknown', $speaker, '<NA>'), "\n";
}

for my $line (@file) {
  print $out join(' ', 'SPEAKER', $nume, '1', $line->[0], $line->[1], '<NA>', '<NA>', $line->[2], '<NA>'), "\n";
}

close $out;

output

SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L1 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L2 <NA>
SPKR-INFO etichete 1 <NA> <NA> <NA> unknown speech_L3 <NA>
SPEAKER etichete 1 0.000 8.556 <NA> <NA> speech_L1 <NA>
SPEAKER etichete 1 8.556 21.063 <NA> <NA> speech_L2 <NA>
SPEAKER etichete 1 32.304 9.515 <NA> <NA> speech_L3 <NA>
SPEAKER etichete 1 42.049 0.767 <NA> <NA> speech_L1 <NA>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top