Domanda

I'm trying to modify a flex+bison generator to allow the inclusion of code snippets denoted by surrounding '{{' and '}}'. Unlike the multi-line comment case, I must capture all of the content.

My attempts either fail in the case where the '{{' and the '}}' are on the same line or they are painfully slow.

My first attempt was something like this:

%{
#include <stdio.h>
// sscce implementation of a growing string buffer
char codeBlock[4096];
int codeOffset;
const char* curFilename = "file.l";
extern int yylineno;

void add_code_line(const char* yytext)
{
    codeOffset += sprintf(codeBlock + codeOffset, "#line %u \"%s\"\n\t%s\n", yylineno, curFilename, yytext);
}

%}

%option stack
%option yylineno

%x CODE_FRAG

%%

"{{"[ \n]*          { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}"     { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>[^\n]*   { add_code_line(yytext); }
<CODE_FRAG>\n

\n
.

Note: the "codeBlock" implementation is a contrivance for the purpose of an SSCCE only. It's not what I'm actually using.

This works for a simple test case:

{{ from line 1
from line 2
}}

{{

from line 7
}}

Outputs

// code
#line 1 "file.l"
    from line 1
#line 2 "file.l"
    from line 2

// code
#line 7 "file.l"
    from line 7

But it can't handle

{{ hello }}

The two solutions I can think of are:

    /* capture character-by-character */
    <CODE_FRAG>.  { add_code_character(yytext[0]); }

And

    <INITIAL>"{{".*?"}}" { int n = strlen(yytext); yytext + (n - 2) = 0; add_code(yytext + 2); }

The former seems likely to be slow, and the latter just feels wrong.

Any ideas?

--- EDIT ---

The following appears to achieve the result desired, but I'm not sure if it's a "good" Flex way to do this:

"{{"[ \n]*          { codeOffset = 0; yy_push_state(CODE_FRAG); }
<CODE_FRAG>"}}"     { codeBlock[codeOffset] = 0; printf("// code\n%s\n", codeBlock); yy_pop_state(); }
<CODE_FRAG>.*?/"}}" { add_code_line(yytext); }
<CODE_FRAG>.*?      { add_code_line(yytext); }
<CODE_FRAG>\n
È stato utile?

Soluzione

  1. Flex doesn't implement non-greedy matches. So .*? won't work the way you expect it to in flex. (It will be an optional .*, which is indistinguishable from .*)

  2. Here's a regular expression which will match from {{ as far as possible without a }}:

    "{{"([}]?[^}])*

  3. That might not be what you want, since it won't allow nested {{...}} within your code blocks. However, you didn't mention that as a requirement and none of your examples functions that way.

  4. The above regular expression does not match the closing }}, which appears to be what you want since it lets you call add_code(yytext+2) without modifying the temporary buffer. However, you do need to deal with the }} in your action. See below.

  5. The regular expression above will match to the end of the file if there is no matching }}. You probably want to deal with that as an error; the simplest way is to check if EOF is encountered while you are trying to ignore the }}

    "{{"([}]?[^}])*   { add_code(yytext+2);
                        if (input() == EOF || input() == EOF) {
                          /* Produce an error, unclosed {{ */
                        }
                      }
    
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top