Question

My code:

use strict;  
use warnings;

my $seq = "ATGGT[TGA]G[TA]GC";  
print "The sequences is $seq\n";  
my %regex = (  
   AG => "R",  
   TC => "Y",  
   GT => "K",  
   AC => "M",  
   GC => "S",  
   AT => "M",  
   CGT => "B",  
   TGA => "D",  
   ACT => "H",  
   ACG => "V",  
   ACGT => "N"  
);  

$seq =~ s/\[(\w+)\]/$regex{$1}/g;  
print "$seq\n";  

My ideal output is: ATGGTDGMGC But in the above scenario, since my hash key is AT and not TA, it doesn't run. One way to solve this problem would be adding another key-value: TA => "M". But I cannot do this for all key-value pairs, as there are too many possibilities.

So is there a better way to address this issue??

Thanks..

Was it helpful?

Solution

I'm guessing you mean that the order of the stuff in brackets is unimportant, so AT is equivalent to TA, and TAG equivalent to TGA, etc.

[ Note that the other Eric made a different guess. You weren't very clear on what you wanted. ]

You could sort the letters.

sub key { join '', sort split //, $_[0] }

my @subs = (
   AG => "R",
   TC => "Y",
   GT => "K",
   AC => "M",
   GC => "S",
   AT => "M",
   CGT => "B",
   TGA => "D",
   ACT => "H",
   ACG => "V",
   ACGT => "N",
);  

my %subs;
while (@subs) {
    my $key = shift(@subs);
    my $val = shift(@subs);
    $subs{ key($key) } = $val;
}

# Die on unrecognized
$seq =~ s/\[(\w+)\]/ $subs{ key($1) } or die $1 /ge;

or

# Do nothing on unrecognized
$seq =~ s/\[(\w+)\]/ $subs{ key($1) } || $1 /ge;

OTHER TIPS

Perl has no way of knowing that the key AT means the same thing as TA unless you tell it in some way. If all of your sequences can be reversed, then you could do something like:

for (keys %regex) {
   $regex{reverse $_} = $regex{$_}
}

You probably should also check to make sure you are not overwriting any existing keys.

Alternatively, you could modify the regex:

$seq =~ s/\[(\w+)\]/$regex{$1} or $regex{reverse $1}
        or die "pattern $1 not found"/ge;  

Again both of these examples assume that all of your keys can be reversed. If not, then you will have to either enter the reversals manually, or develop some sort of selection criteria for reversal.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top