Question

I have a parser and lexer written in ocamlyacc and ocamllex. If the file to parse ends prematurely, as in I forget a semicolon at the end of a line, the application doesn't raise a syntax error. I realize it's because I'm raising and catching EOF and that is making the lexer ignore the unfinished rule, but how should I be doing this to raise a syntax error?

Here is my current parser (simplified),

%{
    let parse_error s = Printf.ksprinf failwith "ERROR: %s" s
%}

%token COLON
%token SEPARATOR
%token SEMICOLON
%token <string> FLOAT
%token <string> INT
%token <string> LABEL

%type <Conf.config> command
%start command
%%
  command:
      | label SEPARATOR data SEMICOLON    { Conf.Pair ($1,$3)     }
      | label SEPARATOR data_list         { Conf.List ($1,$3)     }
      | label SEMICOLON                   { Conf.Single ($1)      }
  label :
      | LABEL                             { Conf.Label $1         }
  data :
      | label                             { $1                    }
      | INT                               { Conf.Integer $1       }
      | FLOAT                             { Conf.Float $1         }
  data_list :
      | star_data COMMA star_data data_list_ending
                                          { $1 :: $3 :: $4        }
  data_list_ending:
      | COMMA star_data data_list_ending  { $2 :: $3              }
      | SEMICOLON                         { []                    }

and lexxer (simplified),

{
    open ConfParser
    exception Eof
}

rule token = parse
    | ['\t' ' ' '\n' '\010' '\013' '\012']
                        { token lexbuf   }
    | ['0'-'9']+ ['.'] ['0'-'9']* ('e' ['-' '+']? ['0'-'9']+)? as n
                        { FLOAT n        }
    | ['0'-'9']+ as n   { INT n          }
    | '#'               { comment lexbuf }
    | ';'               { SEMICOLON      }
    | ['=' ':']         { SEPARATOR      }
    | ','               { COMMA          }
    | ['_' 'a'-'z' 'A'-'Z']([' ']?['a'-'z' 'A'-'Z' '0'-'9' '_' '-' '.'])* as w
                        { LABEL w        }
    | eof               { raise Eof      }

and comment = parse
    | ['#' '\n']        { token lexbuf   }
    | _                 { comment lexbuf }

example input file,

one = two, three, one-hundred;
single label;
list : command, missing, a, semicolon

One solution, is to add a recursive call in the command rule to itself at the end, and adding an empty rule, all of which build a list to return to the main program. I think I maybe interpreting Eof as a expectation, and ending condition, rather then an error in the lexer, is this correct?

Was it helpful?

Solution

ocamlyacc does not necessarily consume the whole input. If you want to force it to fail if the whole input is not parse-able, you need to match EOF in your grammar. Instead of raising Eof in you lexer, add a token EOF and change your start symbol to

%type <Conf.config list> main

main:
    EOF { [] }
  | command main { $1::$2 }
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top