Question

I'm parsing a CSV file in which each line look something as below.

10998,4499,SLC27A5,Q9Y2P5,GO:0000166,GO:0032403,GO:0005524,GO:0016874,GO:0047747,GO:0004467,GO:0015245,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

There seems to be trailing commas at the end of each line.

I want to get the first term, in this case "10998" and get the number of GO terms related to it. So my output in this case should be,

Output:

10998,7

But instead it shows 299. I realized overall there are 303 commas in each line. And I'm not able to figure out an easy way to remove trailing commas. Can anyone help me solve this issue?

Thanks!

My Code:

use strict;
use warnings;

open my $IN, '<', 'test.csv' or die "can't find file: $!";
open(CSV, ">GO_MF_counts_Genes.csv") or die "Error!! Cannot create the file: $!\n";
my @genes = ();

my $mf;
foreach my $line (<$IN>) {
    chomp $line;
    my @array = split(/,/, $line);
    my @GO = splice(@array, 4);
    my $GO = join(',', @GO);
    $mf = count($GO);
    print CSV "$array[0],$mf\n";
}

sub count {
    my $go = shift @_;
    my $count = my @go = split(/,/, $go);
    return $count;
}
Was it helpful?

Solution

I'd use juanrpozo's solution for counting but if you still want to go your way, then remove the commas with regex substitution.

$line =~ s/,+$//;

OTHER TIPS

I suggest this more concise way of coding your program.

Note that the line my @data = split /,/, $line discards trailing empty fields (@data has only 11 fields with your sample data) so will produce the same result whether or not trailing commas are removed beforehand.

use strict;
use warnings;

open my $in, '<', 'test.csv' or die "Cannot open file for input: $!";
open my $out, '>', 'GO_MF_counts_Genes.csv' or die "Cannot open file for output: $!";

foreach my $line (<$in>) {
  chomp $line;
  my @data = split /,/, $line;
  printf $out "%s,%d\n", $data[0], scalar grep /^GO:/, @data;
}

You can apply grep to @array

my $mf = grep { /^GO:/ } @array;

assuming $array[0] never matches /^GO:/

For each your line:

foreach my $line (<$IN>) {
    my ($first_term) = ($line =~ /(\d+),/);
    my @tmp = split('GO', " $line ");
    my $nr_of_GOs = @tmp - 1;
    print CSV "$first_term,$nr_of_GOs\n";
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top