들소 및 플렉스와 함께 블록 구분기로 압입을 사용하는 방법

https://stackoverflow.com/questions/1413204

06-07-2019
|

문제

나는 Bison + Flex에서 블록 구분기로서 들여 쓰기를 구현하는 방법을 상처를 입혔습니다. 파이썬에서와 마찬가지로. 나는 내 자신의 프로그래밍 언어를 쓰고 있습니다 (주로 재미 있지만 게임 엔진과 함께 사용하려고합니다). 보일러 플레이트를 최소화하고 개발 속도를 극대화하는 특별한 것을 생각해냅니다.

이미 컴파일러를 작성했습니다 (실제로 a `Langtoy ' C에서는 NASM 번역기에게). 어떤 이유로 든 전체 소스 파일에서 하나의 문자열 만 처리 할 수있었습니다 (음, 48 시간 이상 깨어있었습니다.

나는 Curly Brackets 및/또는 시작 -> 끝이 구현하기가 더 쉬운 지 (문제가 없음) 또는 내 뇌가 잠그는 것이 더 쉬운 지 모르겠습니다.

미리 감사드립니다!

업데이트: 좋아, 나는 플렉스로 그것을하는 방법에 대한 단서가 없다. 여러 마리의 헌신을 파서에 반환하는 데 문제가 있습니다. Flex/Bison은 나에게 비교적 새롭습니다.

Update 2:이것은 내가 지금까지 내가 생각한 Flex 파일입니다. 그것은 그것을 얻지 못합니다 :

%x t
%option noyywrap

%{
  int lineno = 0, ntab = 0, ltab = 0, dedent = 0;
%}

%%

<*>\n  { ntab = 0; BEGIN(t); }
<t>\t  { ++ntab; }
<t>.   { int i; /* my compiler complains not c99 if i use for( int i=0... */
         if( ntab > ltab )
           printf("> indent >\n");
         else if( ntab < ltab )
           for( i = 0; i < ltab - ntab; i++ )
             printf("< dedent <\n");
         else
           printf("=        =\n");

         ltab = ntab; ntab = 0;
         BEGIN(INITIAL);
         /* move to next rule */
         REJECT;}
.    /* ignore everything else for now */

%%

main()
{
  yyin = fopen( "test", "r" );
  yylex();
}

당신은 그것과 함께 놀려고 노력할 수 있습니다. 아마도 내가 놓친 것을 볼 수 있습니다. 여러 마리의 전당을 반환하는 것은 Haxe에서 쉽게 이루어질 것입니다 (Return t_dedent (num);).

이 코드는 항상 들여 쓰기/전당과 일치하지는 않습니다.

3 : 업데이트 3. 나는 Flex에 대한 희망을 포기하고 내 자신의 방식으로 할 것이라고 생각합니다. 누군가가 Flex에서 그것을하는 방법을 알고 있다면 어쨌든 그것을 듣고 기뻐할 것입니다.

해결책

당신이해야 할 일은 Flex를 모든 선의 시작 부분에있는 흰색 스페이스의 양을 계산하고 파서가 물건을 그룹화하는 데 사용할 적절한 수의 들여 쓰기/무인 토큰을 삽입하는 것입니다. 한 가지 질문은 탭 대 공간에 대해 무엇을하고 싶은지입니다. 고정 된 탭 정지와 동등하게되거나 일관되도록 들여 쓰기를 요구 하시겠습니까 (따라서 한 줄이 탭과 다음 탭으로 시작하는 경우 공간을 사용하면 오류가 나타나는데, 이는 아마도 조금 더 어려울 것입니다).

고정 된 8 열 탭 스톱을 원한다고 가정하면 다음과 같은 것을 사용할 수 있습니다.

%{
/* globals to track current indentation */
int current_line_indent = 0;   /* indentation of the current line */
int indent_level = 0;          /* indentation level passed to the parser */
%}

%x indent /* start state for parsing the indentation */
%s normal /* normal start state for everything else */

%%
<indent>" "      { current_line_indent++; }
<indent>"\t"     { current_line_indent = (current_line_indent + 8) & ~7; }
<indent>"\n"     { current_line_indent = 0; /*ignoring blank line */ }
<indent>.        {
                   unput(*yytext);
                   if (current_line_indent > indent_level) {
                       indent_level++;
                       return INDENT;
                   } else if (current_line_indent < indent_level) {
                       indent_level--;
                       return UNINDENT;
                   } else {
                       BEGIN normal;
                   }
                 }

<normal>"\n"     { current_line_indent = 0; BEGIN indent; }
... other flex rules ...

당신은 당신이 들여 쓰기 모드에서 구문 분석을 시작해야합니다 (첫 번째 줄에서 들여 쓰기를 얻으려면).

다른 팁

Chris의 답변은 사용 가능한 해결책으로 먼 길을갑니다. 불행히도, 내가 필요로하는 몇 가지 더 중요한 측면이 없습니다.

한 번에 여러 개의 구식 (Unindents). 다음 코드가 방출해야합니다 둘 전화 후 구식 baz:
```
def foo():
  if bar:
    baz()
```
파일의 끝에 도달하면 여전히 일부 계약 수준에있을 때 구식이 방출됩니다.
다른 크기의 압입 수준. Chris의 현재 코드는 1 공간 인장에 대해서만 올바르게 작동합니다.

Chris의 코드를 바탕으로, 나는 지금까지 내가 만난 모든 경우에 작동하는 솔루션을 생각해 냈습니다. Github에서 Flex (및 Bison)를 사용하여 압입 기반 텍스트를 구문 분석하기위한 템플릿 프로젝트를 만들었습니다. https://github.com/lucasb-eyer/flex-bison-intentation. 현재 토큰의 라인 위치와 열 범위를 추적하는 완전히 작동하는 (CMAKE 기반) 프로젝트입니다.

어떤 이유로 든 링크가 끊어 야 할 경우를 대비하여 Lexer의 고기가 있습니다.

#include <stack>

int g_current_line_indent = 0;
std::stack<size_t> g_indent_levels;
int g_is_fake_outdent_symbol = 0;

static const unsigned int TAB_WIDTH = 2;

#define YY_USER_INIT { \
    g_indent_levels.push(0); \
    BEGIN(initial); \
}
#include "parser.hh"

%}

%x initial
%x indent
%s normal

%%
    int indent_caller = normal;

 /* Everything runs in the <normal> mode and enters the <indent> mode
    when a newline symbol is encountered.
    There is no newline symbol before the first line, so we need to go
    into the <indent> mode by hand there.
 */
<initial>.  { set_yycolumn(yycolumn-1); indent_caller = normal; yyless(0); BEGIN(indent); }
<initial>\n { indent_caller = normal; yyless(0); BEGIN(indent); }    

<indent>" "     { g_current_line_indent++; }
<indent>\t      { g_current_line_indent = (g_current_line_indent + TAB_WIDTH) & ~(TAB_WIDTH-1); }
<indent>\n      { g_current_line_indent = 0; /* ignoring blank line */ }
<indent><<EOF>> {
                    // When encountering the end of file, we want to emit an
                    // outdent for all indents currently left.
                    if(g_indent_levels.top() != 0) {
                        g_indent_levels.pop();

                        // See the same code below (<indent>.) for a rationale.
                        if(g_current_line_indent != g_indent_levels.top()) {
                            unput('\n');
                            for(size_t i = 0 ; i < g_indent_levels.top() ; ++i) {
                                unput(' ');
                            }
                        } else {
                            BEGIN(indent_caller);
                        }

                        return TOK_OUTDENT;
                    } else {
                        yyterminate();
                    }
                }

<indent>.       {
                    if(!g_is_fake_outdent_symbol) {
                        unput(*yytext);
                    }
                    g_is_fake_outdent_symbol = 0;
                    // -2: -1 for putting it back and -1 for ending at the last space.
                    set_yycolumn(yycolumn-1);

                    // Indentation level has increased. It can only ever
                    // increase by one level at a time. Remember how many
                    // spaces this level has and emit an indentation token.
                    if(g_current_line_indent > g_indent_levels.top()) {
                        g_indent_levels.push(g_current_line_indent);
                        BEGIN(indent_caller);
                        return TOK_INDENT;
                    } else if(g_current_line_indent < g_indent_levels.top()) {
                        // Outdenting is the most difficult, as we might need to
                        // outdent multiple times at once, but flex doesn't allow
                        // emitting multiple tokens at once! So we fake this by
                        // 'unput'ting fake lines which will give us the next
                        // outdent.
                        g_indent_levels.pop();

                        if(g_current_line_indent != g_indent_levels.top()) {
                            // Unput the rest of the current line, including the newline.
                            // We want to keep it untouched.
                            for(size_t i = 0 ; i < g_current_line_indent ; ++i) {
                                unput(' ');
                            }
                            unput('\n');
                            // Now, insert a fake character indented just so
                            // that we get a correct outdent the next time.
                            unput('.');
                            // Though we need to remember that it's a fake one
                            // so we can ignore the symbol.
                            g_is_fake_outdent_symbol = 1;
                            for(size_t i = 0 ; i < g_indent_levels.top() ; ++i) {
                                unput(' ');
                            }
                            unput('\n');
                        } else {
                            BEGIN(indent_caller);
                        }

                        return TOK_OUTDENT;
                    } else {
                        // No change in indentation, not much to do here...
                        BEGIN(indent_caller);
                    }
                }

<normal>\n    { g_current_line_indent = 0; indent_caller = YY_START; BEGIN(indent); }

곱슬 괄호 (및 그와 같은)는 모든 공백을 제거하는 토큰 화기를 사용하는 경우에만 더 간단합니다 (별도의 토큰에만 사용). 보다 이 페이지 (섹션은 "컴파일러가 어떻게 들여 쓰기를 구문 분석합니까?") Python 토큰 화에 대한 아이디어에 대해서는.

구문 분석 전에 토큰 화를하지 않는다면 추가 작업이있을 수 있습니다. 파서를 구축하는 방법에 따라 다릅니다.

이것과 유사하게 보이는 규칙이 필요합니다 (들여 쓰기에 탭을 사용한다고 가정) :

t : {return tabdent; }

솔직히, 난 언제나 인간과 Lexer/Parser 작가 모두로서 쓰고 읽기가 더 쉽고 읽기가 더 쉬운 버팀대 (또는 시작/끝)를 발견했습니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow