Retrieve a part of parsing by making separate .mly and .mll

Question

You write that you want to retrieve the parsing of procedure_declaration, but in your code, you only want to retrieve a procedure_body, so I'm assuming that's what you want.

To put into my own words, you want to have to compose grammars without telling the embedding grammar which grammar is embedded. The problem (no problem in your case, because you luckily have a very friendly grammar) with this is that in LALR(1), you need one token of lookahead to decide which rule to take. Your grammar looks like this:

procedure_declaration:
  SUB procedure_name EOS
  procedure_body
  END SUB EOS

You can combine procedure_name and procedure_body, so your rule and semantic action will look like:

procedure_declaration:
  SUB combined = procedure_name EOS /* nothing here */ EOS
  { { procedure_name = fst combined; procedure_body = snd combined; } }

procedure_name:
  id = IDENT {
    let lexbuf = _menhir_env._menhir_lexbuf in
    (id, Parser_pd.main Lexer_pd.token lexbuf)
  }

Parser_pd will contain this rule:

main: procedure_body END SUB { $1 }

You will very likely want END SUB in Parser_pd, because procedure_body is likely not self-delimiting.

Note that you call the sub-parser before parsing the first EOS after the procedure name identifier, because that is your lookahead. If you call it in EOS, it is too late, and the parser will have pulled a token from the body, already. The second EOS is the one after END SUB.

The _menhir_env thing is obviously a hack that only works with menhir. You may need another hack to make menhir --infer work (if you use that), because that doesn't expect a user to refer to it, so the symbol won't be in scope. That hack would be:

%{
  type menhir_env_hack = { _menhir_lexbuf : Lexing.lexbuf }
  let _menhir_env = { _menhir_lexbuf = Lexing.from_function
    (* Make sure this lexbuf is never actually used. *)
    (fun _ _ -> assert false) }
%}