Question

I am working on a preprocessor that is analyzing a DSL. My goal is to remove the comments. The block comment facility is demarcated by %% before and after. I do not have to worry about %% being in strings, by the definition of the language.

I am using this s/// regex. Unfortunately, it seems to match everything and wipe it out:

#Remove multiline comments.
$text_string =~ s/%%.*%%//msg;

What am I doing wrong?

Was it helpful?

Solution

the first thing you can do is make it non-greedy:

.*?

otherwise,

%% some text %%

real content

%% other text %%

will all be wiped out.

OTHER TIPS

From perlfaq6: What does it mean that regexes are greedy? How can I get around it?


Most people mean that greedy regexes match as much as they can. Technically speaking, it's actually the quantifiers (?, *, +, {}) that are greedy rather than the whole pattern; Perl prefers local greed and immediate gratification to overall greed. To get non-greedy versions of the same quantifiers, use (??, *?, +?, {}?).

An example:

$s1 = $s2 = "I am very very cold";
$s1 =~ s/ve.*y //;      # I am cold
$s2 =~ s/ve.*?y //;     # I am very cold

Notice how the second substitution stopped matching as soon as it encountered "y ". The *? quantifier effectively tells the regular expression engine to find a match as quickly as possible and pass control on to whatever is next in line, like you would if you were playing hot potato.

assuming that you have read entire code into the variable $str and between %% and %% there is no possibility of a single % occuring, you could use this.

$str =~ s/%%([^%]+)%%//g;

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top