How to capture a string without quote characters
-
05-03-2021 - |
문제
I'm trying to capture quoted strings without the quotes. I have this terminal
%token <string> STRING
and this production
constant:
| QUOTE STRING QUOTE { String($2) }
along with these lexer rules
| '\'' { QUOTE }
| [^ '\'']* { STRING (lexeme lexbuf) } //final regex before eof
It seems to be interpreting everything leading up to a QUOTE
as a single lexeme, which doesn't parse. So maybe my problem is elsewhere in the grammar--not sure. Am I going about this the right way? It was parsing fine before I tried to exclude quotes from strings.
Update
I think there may be some ambiguity with the following lexer rules
let name = alpha (alpha | digit | '_')*
let identifier = name ('.' name)*
The following rule is prior to STRING
| identifier { ID (lexeme lexbuf) }
Is there any way to disambiguate these without including quotes in the STRING
regex?
해결책
It's pretty normal to do semantic analysis in the lexer for constants like strings and numeric literals, so you might consider a lex rule for your string constants like
| '\'' [^ '\'']* '\''
{ STRING (let s = lexeme lexbuf in s.Substring(1, s.Length - 2)) }
다른 팁
You can use lexeme with quotes, but trim quotes in parser
Lexer:
let constant = ("'" ([^ '\''])* "'")
...
| constant { STRING(lexeme lexbuf) }
Parser:
%token <string> STRING
...
constant:
| STRING { ($1).Trim([|'''|]) }
Or if you want to extract quotes from string:
Lexer:
let name = alpha (alpha | digit | '_')*
let identifier = name ('.' name)*
...
| '\'' { QUOTE }
| identifier { ID (lexeme lexbuf) }
| _ { STRING (lexeme lexbuf) }
identifier will take away symbols from STRING, so your lexeme stream can be like: QUOTE ID STRING ID .. QUOTE, and you have to handle this in parser:
Parser:
constant:
| QUOTE content QUOTE { String($2) }
content:
| ID content { $1+$2 }
| STRING content { $1+$2 }
| ID { $1 }
| STRING { $1 }
I had a similar problem. I capture them in the "lexic.l" file using states. Here my autoanswer