ANTLRWorks, whitespaces, memory leaks, and crashing

https://stackoverflow.com/questions/16342384

14-04-2022
|

質問

I wanted to try this tool, antlr, so that I could eventually arrive to parse some code and refactor it. I tried some small grammars, everything was ok, so I took the next step and started parsing a sort of simple C#.
The good news: it takes like 10 minutes to understand the basics.
The extrememly bad news: it takes hours to understand how to parse two spaces instead of just one. Really. This things hates whitespaces, and has no shame in telling you that. Honestly I started to think it was unable to parse them, but then something went the right way... Or at least I thought so.

Now the problem of spaces comes after the fact that ANTLRWorks tries to allocate half a GB of ram and cannot really parse anything.

The grammar is not very hard, being I a beginner:

grammar newEmptyCombinedGrammar;

TokenEndCmd : ';' ;
TokenGlobImport : 'import' ;
TokenGlobNamespace : 'namespace' ;
TokenClass : 'class' ;

TokenSepFloat : ',' ;
TokenSepNamespace : '.' ;

fragment TokenEmptyString : '' ;
TokenUnderscore : '_' ;

TokenArgsSep : ',' ;
TokenArgsOpen : '(' ;
TokenArgsClose : ')' ;
TokenBlockOpen : '{' ;
TokenBlockClose : '}' ;

// --------------------

Digit : [0-9] ;
numberInt : Digit+ ;
numberFloat : numberInt TokenSepFloat numberInt ;

WordCI : [a-zA-Z]+ ;
WordUP : [A-Z]+ ;
WordLW : [a-z]+ ;

// -----------------

keyword : (WordCI | TokenUnderscore+) (numberInt | WordCI | TokenUnderscore)* ;

// ---------------------

spaces : (' ' | '\t')+ ;
spaceLNs : (' ' | '\t' | '\r' | '\n')+ ;

spacesOpt : spaces* ;
spaceLNsOpt : spaceLNs* ;

// ---------------------

// tipo "System" o "System.Net.Socket"
namepaceNameComposited : keyword (TokenSepNamespace keyword)* ;

// import System; import System.IO;
globImport : TokenGlobImport spaces namepaceNameComposited spacesOpt TokenEndCmd ;

// class class1 {}
namespaceClass : TokenClass spaces keyword spaceLNsOpt TokenBlockOpen spaceLNsOpt TokenBlockClose ;

// "namespace ns1 {}", "namespace ns1.sns2{}"
globNamespace : TokenGlobNamespace spaces namepaceNameComposited spaceLNsOpt TokenBlockOpen spaceLNsOpt namespaceClass spaceLNsOpt TokenBlockClose ;

globFile : (globImport | spaceLNsOpt)* (globNamespace | spaceLNsOpt)* ;

but still when globFile or globNamespace are added the ide starts to allocate memory like there's no tomorrow, and that's obviously a problem.

So

-is this way of capturing the whitespaces right? (I don't want to skip them, that's the point)
-is the memory leaking for a recursion I don't see?

The code that this thing is able to parse is something like:

import System;

namespace aNamespace{
    class aClass{
    }
}

globFile is the main rule, by the way.

解決 2

The problem is effectively the last rule

globFile : (globImport | spaceLNsOpt)* (globNamespace | spaceLNsOpt)* ;

I changed it like this:

globFile : (globImport spaceLNsOpt)* (globNamespace spaceLNsOpt)* ;

and it seems that adding a EOF apparently helps:

globFile : (globImport spaceLNsOpt)* (globNamespace spaceLNsOpt)* EOF ;

but this is not sufficient, the rule cannot function in any case.

他のヒント

You should define a lexer token to treat whitespaces the way you need it to. If you want a group of consecutive space or tab characters to form a single token, use a definition like the following. In this case, you would reference whitespace in the parser rules as Whitespace (required) or Whitespace? (optional).

// ANTLR 3:
Whitespace : (' ' | '\t')+;

// ANTLR 4:
Whitespace : [ \t]+;

If you want every individual whitespace character to be its own token, use something like the following. In this case, you would reference whitespace in the parser rules as Whitespace+ (required) or Whitespace* (optional).

// ANTLR 3:
Whitespace : ' ' | '\t';

// ANTLR 4:
Whitespace : [ \t];

The question about memory leaks probably belongs on the ANTLRWorks issue tracker.

ANTLRWorks 1 issue tracker: https://github.com/antlr/antlrworks/issues
ANTLRWorks 2 issue tracker: https://bitbucket.org/sharwell/antlrworks2/issues

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow