Question

I have written the following code in Perl. I want to iterate through a string 3 positions (characters) at a time. If TAA, TAG, or TGA (stop codons) appear, I want to print till the stop codons and remove the rest of the characters.


Example:

data.txt

ATGGGTAATCCCTAGAAATTT

ATGCCATTCAAGTAACCCTTT

Answer:

ATGGGTAATCCCTAG (last 6 characters removed)

ATGCCATTCAAGTAA (last 6 characters removed)

(Each sequence begins with ATG).


Code:

#!/usr/bin/perl -w

open FH, "data.txt";
@a=<FH>;

foreach $tmp(@a)
{
  for (my $i=0; $i<(length($tmp)-2); $i+=3)
  {
    if ($tmp=~/(ATG)(\w+)(TAA|TAG|TGA)\w+/)
    {
      print "$1$2$3\n";
    }
    else 
    { 
      print "$tmp\n"; 
    }
    $tmp++;
  }
}
exit;

However, my code is not giving the correct result. There should not be any overlaps in the characters (I want to move every 3 characters).

Can someone suggest how to fix the error?

Thanks!

Was it helpful?

Solution

Script:

#!/usr/bin/perl

use strict;
use warnings;

open FH, "data.txt";
my @a = <FH>;

foreach (@a) {
  print /^(ATG(...)*?(TAA|TAG|TGA))/? $1 : $_, "\n";
}

Output:

ATGGGTAATCCCTAG
ATGCCATTCAAGTAA

OTHER TIPS

I think this code will do. It uses \w{3} - three-symbol codons as you need.

#!/usr/bin/perl -w
open FH, "data.txt";
@a=<FH>;
foreach $tmp(@a) {
  if ($tmp=~ /^(ATG(?:\w{3})*(?:TAA|TAG|TGA)).*/) {
    print "$1\n";
  } else {
    print "$tmp\n";
  }
}

You say you want to remove everything after the first stop codon. If so, all you need is

while (<FH>) {
   s/(?<=TAA|TAG|TGA).*//;
   print;
}

But then there's the mystical "I want to iterate through a string 3 positions (characters) at a time" requirement. That doesn't make any sense. Perhaps you want the match to occur at a position that's divisible by three? If so, you'd use

s/^(?:.{3})*?(?:TAA|TAG|TGA)\K.*//;    # Requires 5.10+
s/^((?:.{3})*?(?:TAA|TAG|TGA)).*/$1/;  # Backwards compatible

May I suggest a reading of perlretut (about 4 paragraphs down from here)? It actually covers almost exactly this situation with avoiding overlaps and finding stop codons.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top