Find multiple Objective-C comments per file, in certain format, with Ruby Regex

https://stackoverflow.com/questions/8946503

07-11-2019
|

Question

I'm writing a Ruby script that uses regex to find all comments of a specific format in Objective-C source code files.

The format is

/* <Headline_in_caps> <#>:
    <Comment body>
**/

I want to capture the headline in caps, the number and the body of the comment.

With the regex below I can find one comment in this format within a larger body of text.

My problem is that if there are more than one comments in the file then I end up with all the text, including code, between the first /* and last **/. I don't want it to capture all text inclusively, but only what is within each /* and **/.

The body of the comment can include all characters, except for **/ and */ which both signify the end of a comment. Am I correct assuming that regex will find multiple-whole-regex-matches only processing text once?

\/\*\s*([A-Z]+). (\d)\:([\w\d\D\W]+)\*{2}\//x

Broken apart the regex does this:

\/\* —finds the start of a comment

\s* —finds whitespace

([A-Z]+) —captures caps word

.<space> —find the space in between caps word and digit

(\d) —capture the digit

\: —find the colon

([\w\W\d\D]+) —captures the body of a message which can include all valid characters, except **/ or */

\*{2}\/ —finds the end of a comment

Here is a sample, everything from the first /* to the second **/ is captured.:

/*

 HEADLINE 1:

 Comment body.

 **/

- (BOOL)application:(UIApplication *)application didFinishLaunchingWithOptions:(NSDictionary *)launchOptions
{
// This text and method declaration are captured
// The regex captures from HEADLINE to the end of the comment "meddled in." inclusively.

/*
       HEADLINE 2:

       Should be captured separately and without Objective-C code meddled in. 
 **/

}

Here is the sample on Rubular: http://rubular.com/r/4EoXXotzX0

I'm using gsub to process the regex on a string of the whole file, running Ruby 1.9.3. Another issue I have is that gsub gives me what Rubular ignores, is this a regression or is Rubular using a different method that gives what I want?

In this question Regex matching multiple occurrences per file and per line about multiple occurrences the answer is to use g for the global option, that is not valid in Ruby regex.

No correct solution

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow