How to write regex for c type inetger in lex?

https://stackoverflow.com/questions/22556476

18-06-2023
|

Pergunta

I am trying to write a C parser code in lex

   %{
     /* this program does the job for identifying C type integer and floats*/
   %}
   %%

   [\t ]+      /* ignore whitespace */ ;

   [0-9][5]+       { printf ("\"%s\" out of range \n", yytext); }
   [-+][0-9][5]        { printf ("\"%s\" is a C integers\n", yytext); }
   [-+]?[0-9]*\.?[0-9]+        { printf ("\"%s\" is a float\n", yytext); }

   \n      ECHO; /* which is the default anyway */
   %%

I am facing a problem in identifying C type integer because it have a limit i.e. 32767. So I have used regex i.e. digit length greater than 5 in should yell "out of range" error but it's a hack and not perfect solution.

Solução 2

This is working :)

%{
 /* this program does the job for identifying C type integer and floats*/
 #include <stdio.h>
 int c;
%}
%%

[\t ]+      /* ignore whitespace */ ;

[-+]?[0-9]*\.?[0-9]+        { printf ("\"%s\" is a float\n", yytext); }
[-+]?[0-9]+ c = atoi(yytext); if(c < -32767 || c > 32767) {printf("OUT OF RANGE INTEGER");} else {printf("INTEGER");}

\n      ECHO; /* which is the default anyway */
%%

Outras dicas

This might be provably impossible to do right. Regular expressions make up a fairly simplistic kind of recognition (a state machine with no memory), and are really only good for tokenization/lexical analysis. You're trying to use it for type-checking, which requires a lot more horsepower.

I'd save this for the parser itself. It will be far easier, with the symbol table filled in (to know what kind of variable you want to assign to) and can check the actual value of the integer and compare it to the upper and lower bounds.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow