I am writing a parser for the output of Clasp
with ANTLR 4
. The typical output is like the following:
clasp version 3.0.3
Reading from stdin
Solving...
Answer: 1
bird(a) bird(b) bird(c) penguin(d) bird(d)
Optimization: 7 0
Answer: 2
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(b) flies(b)
Optimization: 6 5
Answer: 3
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(c) flies(c)
Optimization: 2 5
Answer: 4
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(a) flies_abd(c) flies(a) flies(c)
Optimization: 1 10
Answer: 5
bird(a) bird(b) bird(c) penguin(d) bird(d) flies_abd(a) flies_abd(b) flies_abd(c) flies(a) flies(b) flies(c)
Optimization: 0 15
OPTIMUM FOUND
Models : 5
Optimum : yes
Optimization : 0 15
Calls : 1
Time : 0.002s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time : 0.000s
I have to check that clasp
is version 3
so I am writing a grammar like the following:
/**
* Define a grammar for Clasp 3's output.
*/
grammar Output;
@header {package ac.bristol.clasp.parser;}
output:
version source solving answer* result separation statistics NEWLINE* EOF;
version: 'clasp version 3.' INT '.' INT NEWLINE;
source: 'Reading from stdin' NEWLINE # sourceSTDIN
| 'Reading from ' path NEWLINE # sourceFile;
path:
DRIVE? folder ( BSLASH folder )* filename # pathWindows
| FSLASH? folder ( FSLASH folder )* filename # pathNIX;
folder:
LETTER+ # genericFolder
| DOTDOT # parentFolder
| DOT # currentFolder;
solving: 'Solving...' NEWLINE;
filename:
LETTER+ extension?;
extension:
DOT LETTER*;
answer: 'Answer: ' INT NEWLINE //
model? NEWLINE //
'Optimization: ' INT ( SPACE INT )* NEWLINE;
model:
fact ( SPACE fact )*;
fact:
groundPredicate;
groundTermList:
groundTerm ( COMMA groundTerm )*;
groundTerm:
groundCompound | STRING | number | atom; // literal?
groundCompound:
groundPredicate
| groundExpression;
groundPredicate:
IDENTIFIER ( LROUND groundTermList RROUND )?;
groundExpression:
groundBits AND groundBits
| groundBits OR groundBits
| groundBits XOR groundBits;
groundBits:
groundCompare GT groundCompare
| groundCompare GE groundCompare
| groundCompare LT groundCompare
| groundCompare LE groundCompare;
groundCompare:
groundItem EQ groundItem
| groundItem NE groundItem;
groundItem:
groundFactor PLUS groundFactor
| groundFactor MINUS groundFactor;
groundFactor:
groundUnary TIMES groundUnary
| groundUnary DIVIDE groundUnary
| groundUnary MOD groundUnary;
groundUnary:
TILDE groundTerm
| MINUS groundTerm;
atom:
IDENTIFIER
| QUOTED;
number:
INT
| FLOAT;
//------------------------------------------------------------------------------
result: 'OPTIMUM FOUND' NEWLINE
| 'SATISFIABLE' NEWLINE
| 'UNKNOWN' NEWLINE;
separation:
NEWLINE;
statistics:
models optimum? optimization calls time cputime;
models: 'Models : ' INT SPACE* NEWLINE;
optimum: ' Optimum : yes' NEWLINE
| ' Optimum : no' NEWLINE;
optimization: 'Optimization : ' INT ( SPACE INT )* NEWLINE;
calls: 'Calls : ' INT NEWLINE;
time: 'Time : ' FLOAT 's (Solving: ' FLOAT 's 1st Model: ' FLOAT 's Unsat: ' FLOAT 's)' NEWLINE;
cputime: 'CPU Time : ' FLOAT 's';
//------------------------------------------------------------------------------
AND: '&';
BSLASH: '\\';
COLON: ':';
COMMA: ',';
DIVIDE: '/';
DOT: '.';
DOTDOT: '..';
EQ: '==';
FSLASH: '/';
GE: '>=';
GT: '>';
LE: '<=';
LROUND: '(';
LT: '<';
MINUS: '-';
MOD: '%';
NE: '!=';
OR: '?';
PLUS: '+';
RROUND: ')';
SEMICOLON: ';';
SPACE: ' ';
TILDE: '~';
TIMES: '*';
XOR: '^';
DRIVE: ( LOWER | UPPER ) COLON BSLASH?;
IDENTIFIER: LOWER FOLLOW*;
INT: DIGIT+;
FLOAT: DIGIT+ DOT DIGIT+;
NEWLINE: '\r'? '\n';
QUOTED: '\'' ( ~[\'\\] | ESCAPE )+? '\'';
STRING: '"' ( ~["\\] | ESCAPE )+? '"';
fragment DIGIT: [0] | NONZERO;
fragment ESCAPE: '\\' [btnr"\\] | '\\' [0-3]? [0-7]? [0-7] | '\\' 'u' [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F];
fragment FOLLOW: LOWER | UPPER | DIGIT | UNDERSCORE;
fragment LETTER: LOWER | UPPER | DIGIT | SPACE;
fragment LOWER: [a-z];
fragment NONZERO: [1-9];
fragment UNDERSCORE: [_];
fragment UPPER: [A-Z];
Notice that there is no rule to skip
some parts of the input stream because I want to check every single character.
Also notice that I have a terminal rule for INT
ever and one for FLOAT
, INT
is defined before FLOAT
, FLOAT
s are defined like in Prolog.
The rule that parses the first line of the above example is the following:
version: 'clasp version 3.' INT '.' INT NEWLINE;
because it I have to check that the clasp
major version number being used is 3, than I have to consume the rest of the line reading the minor version number, a dot, the build number and the newline (without spaces or whatsoever).
Unfortunately, I get the following warning message, that makes me think that ANTLR is recognizing the minor version number, the dot and the build number as a FLOAT
:
line 1:16 mismatched input '0.3' expecting INT
Could you please explain me what is going on?
Am I assuming something that I shouldn't?
Or is it ANTLR
that is applying an unneeded optimization?