Question

I'm currently editing my javascript.lang file to highlight function names. Here is my expression for gtksourceview that I am currently using.

<define-regex id="function-regex" >
(?&lt;=([\.|\s]))
([a-z]\w*)
(?=([\(].*))(?=(.*[\)]))
</define-regex>

here's the regex by itself

(?<=([\.|\s]))([a-z]\w*)(?=([\(].*))(?=(.*[\)]))

It appears to work for situations such as, foo(A) which I am satisfied with. But where I am having trouble is if I want it to highlight a function name within the parentheses of another function call.

  foo(bar(A))

or to put it more rigorously

  foo{N}(foo{N-1}(...(foo{2}(foo{1}(A))...))

So with the example,

  foo(bar(baz(A)))

my goal is for it to highlight foo, bar, baz and nothing else.

I don't know how to handle the bar function. I have read about a way of doing regex recursively with (?R) or (?0) but I have not had any success using that to highlight functions recursively in gedit.

P.S. Here are the tests that I am currently using to determine success.

initialDrawGraph(toBeSorted);   
$(element).removeClass(currentclass);
myFrame.popStack();
context.outputCurrentSortOrder(V);
myFrame.nextFunction = sorter.Sort.;
context.outputToDivConsole(formatStr(V),1);
Was it helpful?

Solution 2

Ok, looks like I was making this more complicated than it needed to be.

I was able to achieve what I needed with this simpler regex. I just told it to stop looking for the close parenthesis.

([a-zA-Z0-9][a-zA-Z0-9]*)(?=\()

OTHER TIPS

Balancing parentheses is not a regular expression, since it needs memory (See: Can regular expressions be used to match nested patterns?). For some implementations, there is an implementation for recursion in regular expressions:

Matching Balanced Constructs

The main purpose of recursion is to match balanced constructs or nested constructs. The generic regex is b(?:m|(?R))*e where b is what begins the construct, m is what can occur in the middle of the construct, and e is what can occur at the end of the construct. For correct results, no two of b, m, and e should be able to match the same text. You can use an atomic group instead of the non-capturing group for improved performance: b(?>m|(?R))*e.

A common real-world use is to match a balanced set of parentheses. \((?>[^()]|(?R))*\) matches a single pair of parentheses with any text in between, including an unlimited number of parentheses, as long as they are all properly paired. If the subject string contains unbalanced parentheses, then the first regex match is the leftmost pair of balanced parentheses, which may occur after unbalanced opening parentheses. If you want a regex that does not find any matches in a string that contains unbalanced parentheses, then you need to use a subroutine call instead of recursion. If you want to find a sequence of multiple pairs of balanced parentheses as a single match, then you also need a subroutine call.

The following regex works for nested functions (Note: This is the python version of regex. You may or may not need to make some syntax tweaks. Hopefull, you'll get the idea):

[OBSOLETED] '(\w+\()+[^\)]*\)+'

[UPDATED] (Should Work. Hopefully)

(\w+\()+([^\)]*\)+)*

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top