Question

I'm playing around with RegEx but I'm by no means a pro and I can't quite get this to work properly in http://www.regexr.com/ (Being in a rush doesn't help ...)

I have two dozen ".ST" files, basically PLC code which seems to be similar to C syntax. So all the comments are // or (* ... *)

I'm a translator and I'm supposed to translate ONLY the comments, so my thought was to use Find/Replace in Notepad++ and find everything which was NOT a comment, replace it with blank, in order to, in the end, have a document with only comments. So I'm not sure what to do with a RegEx that matches comments, because I don't want to delete those and can't "replace" it with anything... Make any sense???

Thanks so much for your help!

Was it helpful?

Solution

Simple answer

Not what you asked for, but I believe this is what you want to do. All you need is catch the comments and remove them. To do that:

~(?<!\\)//[^\n\r]*|(?<!\\)\(\*.*?(?<!\\)\*\)~sg

will select all the text following // on the line, and all text (multiline) enclosed in (* *). Afterwards you just need to replaced all selection with the empty string "".

For info, (?<! ) patterns are look behind: they're here to make sure the comment's limits aren't escaped. \//I wanna keep this code shouldn't be matched, code (*foo\*)bar*) should select (*foo\*)bar*).

Crazy overkill [shouldn't use]

For the record, and because it is too damn tempting to go for the monstrous regex when there's a simple obvious answer, and because I didn't saw that before way too long... You shouldn't use this.

~(?:^//.*$|\(\*.*?\*\)|([^(\n]+)|(\())~mg

might catch what you you want in variables \1 and \2.

^//.*$ catches lines beginning with // (though you might want to also catch the code before the // in a line resembling cool code //this was cool code)

\(\*.*?\*\) catches anything between (* *) (though not if there's a newline... You could use (?s:\(\*.*?\*\)) if your regex flavor supports it. And it probably isn't speed-optimized)

([^(\n]+]) looks for (and selects) anything ON THIS LINE that isn't an opening parenthesis. This means that multiline code, unsprinkled with comments, will be cut into lines. You may change this behavior with something like (?s:((?:(?!\n/|\().)+)).

(\() matches the open parenthesis that stopped the previous pattern, only if it isn't the beginning of a (* comment.

You can see it in action here: http://regex101.com/r/aX6sF7, but I do believe it can be greatly simplified.

OTHER TIPS

This will match any line starting with //:

^\/\/.*$


This will match anything between * and *:

\*[^\*]*\*

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top