Question

I have a file (sequences.txt) with 3 fasta sequences like this:

>Line40_Chr2L
AAAA
>Line41_Chr2L
CCCC
>Line42_Chr2L
TTTT

I have write a code which allows me to store the sequences (withoud the header (>) in a variable called $sequences.

open INFILE, $infile or die "Can't open $infile: $!";

my $sequence = ();  # This sequence variable stores the sequences from the .fasta file
my $line;                             # This reads the input file one-line-at-a-time

while ($line = <INFILE>) {
    chomp $line;

     if {
        if($line =~ /^\s*$/) {         # This finds lines with whitespaces from the beginning to the ending of the sequence. Removes blank line.
        next;

    } elsif($line =~ /^\s*#/) {        # This finds lines with spaces before the hash character. Removes .fasta comment
        next; 
    } elsif($line =~ /^>/) {           # This finds lines with the '>' symbol at beginning of label. Removes .fasta label
        next;
    } else {
        $sequence = $line;
    }

    $sequence =~ s/\s//g;               # Whitespace characters are removed
}    
}

Now I want to compare the same position in each sequence (in columns so). For example, I want to compare the first position from the 3 sequences. The intention is to analyse if I have the same base in the same position of the sequences. But I am having problems doing it because I don't know how to index the columns if they I have a variable with 3 sequences without separator.

So I was thinking in a Bidimensional array (i,j) but I am starting with perl and I need some help. Or do you know an easier way?

Can someone help me?

Thank you very much!

Était-ce utile?

La solution

You can easily convert a string to array using 'split'.

In your case you could do:

my @sequence = split //,$sequence;

if $sequence was for example: "ADQLTEEQ" then @sequence will be an array with 8 elements :

0|1|2|3|4|5|6|7
A|D|Q|L|T|E|E|Q

hope it helps

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top