I wanted to try this tool, antlr, so that I could eventually arrive to parse some code and refactor it. I tried some small grammars, everything was ok, so I took the next step and started parsing a sort of simple C#.
The good news: it takes like 10 minutes to understand the basics.
The extrememly bad news: it takes hours to understand how to parse two spaces instead of just one. Really. This things hates whitespaces, and has no shame in telling you that. Honestly I started to think it was unable to parse them, but then something went the right way... Or at least I thought so.
Now the problem of spaces comes after the fact that ANTLRWorks tries to allocate half a GB of ram and cannot really parse anything.
The grammar is not very hard, being I a beginner:
grammar newEmptyCombinedGrammar;
TokenEndCmd : ';' ;
TokenGlobImport : 'import' ;
TokenGlobNamespace : 'namespace' ;
TokenClass : 'class' ;
TokenSepFloat : ',' ;
TokenSepNamespace : '.' ;
fragment TokenEmptyString : '' ;
TokenUnderscore : '_' ;
TokenArgsSep : ',' ;
TokenArgsOpen : '(' ;
TokenArgsClose : ')' ;
TokenBlockOpen : '{' ;
TokenBlockClose : '}' ;
// --------------------
Digit : [0-9] ;
numberInt : Digit+ ;
numberFloat : numberInt TokenSepFloat numberInt ;
WordCI : [a-zA-Z]+ ;
WordUP : [A-Z]+ ;
WordLW : [a-z]+ ;
// -----------------
keyword : (WordCI | TokenUnderscore+) (numberInt | WordCI | TokenUnderscore)* ;
// ---------------------
spaces : (' ' | '\t')+ ;
spaceLNs : (' ' | '\t' | '\r' | '\n')+ ;
spacesOpt : spaces* ;
spaceLNsOpt : spaceLNs* ;
// ---------------------
// tipo "System" o "System.Net.Socket"
namepaceNameComposited : keyword (TokenSepNamespace keyword)* ;
// import System; import System.IO;
globImport : TokenGlobImport spaces namepaceNameComposited spacesOpt TokenEndCmd ;
// class class1 {}
namespaceClass : TokenClass spaces keyword spaceLNsOpt TokenBlockOpen spaceLNsOpt TokenBlockClose ;
// "namespace ns1 {}", "namespace ns1.sns2{}"
globNamespace : TokenGlobNamespace spaces namepaceNameComposited spaceLNsOpt TokenBlockOpen spaceLNsOpt namespaceClass spaceLNsOpt TokenBlockClose ;
globFile : (globImport | spaceLNsOpt)* (globNamespace | spaceLNsOpt)* ;
but still when globFile
or globNamespace
are added the ide starts to allocate memory like there's no tomorrow, and that's obviously a problem.
So
-is this way of capturing the whitespaces right? (I don't want to skip them, that's the point)
-is the memory leaking for a recursion I don't see?
The code that this thing is able to parse is something like:
import System;
namespace aNamespace{
class aClass{
}
}
globFile
is the main rule, by the way.