Question

My multi-fasta archive is in this format:

>miRNA65 dvex2345
CGATGCTAGATGCTATGACAACGATGCCTCG-G
>miRNA60 dvex1234
T-TAA-ACTCATCATCATCATACTCATCATCATCATCAGCATATTAACAAG
>miRNA65 dvex2345
T-TAA-ACTTATCATCATCATACTCATCATCATCATCAGCATATTAACAAG

I am new in Perl and I need to search the equals '> lines' and concatenate the next line to join the sequence.

I'm expecting the following output for the above file:

>miRNA60 dvex1234
T-TAA-ACTCATCATCATCATACTCATCATCATCATCAGCATATTAACAAG
>miRNA65 dvex2345
T-TAA-ACTTATCATCATCATACTCATCATCATCATCAGCATATTAACAAG.CGATGCTAGATGCTATGACAACGATGCCTCG-G

What is the best way to get this done?

Was it helpful?

Solution

%hash;
while (<DATA>) {
        if (/^>(miRNA\d+)/) {
                $hash{$1}[0] = $_;
                chomp($n = <DATA>);
                unshift @{$hash{$1}[1]}, $n;
        }
}

for $k (sort keys %hash) {
        print $hash{$k}[0], join(',', @{$hash{$k}[1]}), "\n";
}
__DATA__
>miRNA65 dvex2345
CGATGCTAGATGCTATGACAACGATGCCTCG-G
>miRNA60 dvex1234
T-TAA-ACTCATCATCATCATACTCATCATCATCATCAGCATATTAACAAG
>miRNA65 dvex2345
T-TAA-ACTTATCATCATCATACTCATCATCATCATCAGCATATTAACAAG
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top