Question

I am trying to reverse engineer a Perl script. One of the lines contains a matching operator that reads:

$line =~ /^\s*^>/ 

The input is just FASTA sequences with header information. The script is looking for a particular pattern in the header, I believe.

Here is an example of the files the script is applied to:

>mm9_refGene_NM_001252200_0 range=chr1:39958075-39958131 5'pad=0 3'pad=0 strand=+ 
repeatMasking=none
ATGGCGAACGACTCTCCCGCGAAGAGCCTGGTGGACATTGACCTGTCGTC
CCTGCGG

>mm9_refGene_NM_001252200_1 range=chr1:39958354-39958419 5'pad=0 3'pad=0 strand=+ 
repeatMasking=none
GACCCTGCTGGGATTTTTGAGCTGGTGGAAGTGGTTGGAAATGGCACCTA
TGGACAAGTCTATAAG

This is a matching operator asking whether the line, from its beginning, contains white spaces of at least more than zero, but then I lose its meaning.

This is how I have parsed the regex so far:

from beginning [ (/^... ], contains white spaces [ ...\s... ] of at least more than zero [ ...*... }.

Was it helpful?

Solution

Using RegexBuddy (or, as r3mus said, regex101.com, which is free):

Assert position at the beginning of the string «^»
Match a single character that is a “whitespace character” (spaces, tabs, and line breaks) «\s*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
Assert position at the beginning of the string «^»
Match the character “>” literally «>»

EDIT: Birei's answer is probably more correct if the regex in question is actually wrong.

OTHER TIPS

You have to get rid of the second ^ character. It is a metacharacter and means the beginning of a line (without special flags like /m), but that meaning it's already achieved with the first one.

The character > will match at the beginning of the line without the second ^ because the initial whitespace is optional (* quantifier). So, use:

$line =~ /^\s*>/ 

It is much easier to reverse engineer perl script with debugger. "perl -d script.pl" or if you have Linux ddd: "ddd cript.pl &".

For multiline regex this regex match for emptyline with spaces and begin of the next FASTA. http://www.rexfiddle.net/c6locQg

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top