Simple answer
Not what you asked for, but I believe this is what you want to do. All you need is catch the comments and remove them. To do that:
~(?<!\\)//[^\n\r]*|(?<!\\)\(\*.*?(?<!\\)\*\)~sg
will select all the text following //
on the line, and all text (multiline) enclosed in (* *)
. Afterwards you just need to replaced all selection with the empty string ""
.
For info, (?<! )
patterns are look behind: they're here to make sure the comment's limits aren't escaped. \//I wanna keep this code
shouldn't be matched, code (*foo\*)bar*)
should select (*foo\*)bar*)
.
Crazy overkill [shouldn't use]
For the record, and because it is too damn tempting to go for the monstrous regex when there's a simple obvious answer, and because I didn't saw that before way too long... You shouldn't use this.
~(?:^//.*$|\(\*.*?\*\)|([^(\n]+)|(\())~mg
might catch what you you want in variables \1
and \2
.
^//.*$
catches lines beginning with //
(though you might want to also catch the code before the //
in a line resembling cool code //this was cool code
)
\(\*.*?\*\)
catches anything between (* *)
(though not if there's a newline... You could use (?s:\(\*.*?\*\))
if your regex flavor supports it. And it probably isn't speed-optimized)
([^(\n]+])
looks for (and selects) anything ON THIS LINE that isn't an opening parenthesis. This means that multiline code, unsprinkled with comments, will be cut into lines. You may change this behavior with something like (?s:((?:(?!\n/|\().)+))
.
(\()
matches the open parenthesis that stopped the previous pattern, only if it isn't the beginning of a (*
comment.
You can see it in action here: http://regex101.com/r/aX6sF7, but I do believe it can be greatly simplified.