Frage

Hi I'm writing a simple lexer based on regular expressions.

One lexer token is the CHARLITERAL which is any character enclosed in single quotes ie:

'A'
'.'

even

'''

is allowed.

The only time this is not allowed is in a situation like this

somerandomcontext'('"')

In this case only the CHARLITERAL within the parenthesis is valid and it should ignore the first single quote. I'm looking for a regular expression that returns '"' instead of '(' when i feed it the above string. Obviously '[^\n\r]' doesn't cut it. Unfortunately I'm not so familiar with assertions in regular expressions.

War es hilfreich?

Lösung

One way you can do this is to use a negative-lookahead assertion. The following regular expression should behave the way you describe.

'(?![()]).'

This expression first looks for an apostrophe and then begins the negative-lookahead. The negative-lookahead says that the next character cannot be an open or closed parenthesis. If it is, then the entire expression fails. If the next character is anything other than a parenthesis, then it matches on whatever the next character actually is (which is now guaranteed not to be a parentehsis), followed by an apostrophe.

Given the following input string, this expression will return the listed matches. It should remain robust even when there are no spaces between potential literals.

Input:    
'A' '.' '''somerandomcontext'('"')'B''C''''''' sadfasdf'(')'L')

Matches:
1: 'A'
2: '.'
3: '''
4: '"'
5: 'B'
6: 'C'
7: '''
8: '''
9: 'L'

http://www.regular-expressions.info/lookaround.html

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top