Question

Please help improve the following code . I am not able to print the sequence in one single line. Would like to have output printed in four lines each with nucleotide frequency of one of the four characters. Thanks in advance.enter code here

#!/usr/bin/perl
use strict;
use warnings;
my $A;    
my $T;
my $G;
my $C;
my $fileIN;
my $fileOUT;

my $seq ;
open ($fileIN, "basecount.nfasta") or die "can't open file ";
open ($fileOUT, ">basecount.out") or die "can't open file ";

while (<$fileIN>)
{

             if ($_ =~/^>/)  #ignore header line
             {next;}

             else
                   {
                    $seq  = $_; #copy the all line with only nucleotide characters ATGC
                   }
            $seq  =~ s/\n//g; #create one single line containing all ATGC characters

             print "$seq\n"; # verify previous step

             my @dna = split ("",$seq); #create an array to include each nucleotide as array element

             foreach my $element (@dna)

            {
            if ($element =~/A/) # match nucleotide pattern and countstrong text
                            {
                             $A++;
                            }
             if ($element =~/T/)
                            {
                             $T++;
                            }
             if ($element =~/G/)
                            {
                             $G++;
                            }
             if ($element =~/C/)
                            {
                             $C++;
                            }

            }

            print $fileOUT "A=$A\n";
            print $fileOUT "T=$T\n";
            print $fileOUT "G=$G\n";
            print $fileOUT "C=$C\n";
}

close ($fileIN);
close ($fileOUT);
Was it helpful?

Solution

At first, i would use some shortcuts. Its easier to read:

use strict;
use warnings;
use feature 'say';
my $A;
my $T;
my $G;
my $C;
my $fileIN;
my $fileOUT;

open $fileIN,  '<',"basecount.nfasta" or die "can't open file basecount.nfasta for reading";
open $fileOUT, '>','basecount.out' or die "can't open file basecount.out for writing";

while ( my $seq = <$fileIN> ) {

  next if $seq =~ /^>/;
  $seq =~ s/\n//g;
  say $seq;

  my @dna = split //, $seq;

  foreach my $element ( @dna ) {
    $A++ if $element =~ m/A/;
    $T++ if $element =~ m/T/;
    $G++ if $element =~ m/G/;
    $C++ if $element =~ m/C/;
  }

  say $fileOUT "A=$A";
  say $fileOUT "T=$T";
  say $fileOUT "G=$G";
  say $fileOUT "C=$C";
}

close $fileIN;
close $fileOUT;

Using the 3 statement open is also recommended ( and a good die warning as well ).

EDIT: I used use feature 'say' here because all of your prints end with a newline. say does exactly the same like print, just with adding newlines at the end.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top