Domanda

When I try sexplib, it tells me

Sexp.of_string " a";; is correct.

Sexp.of_string "a ";; is wrong.


Is trailing white space is forbidden in sexp?

Why?

È stato utile?

Soluzione

According to an informal grammar specification, whitespaces should be ignored on both ends of an atom:

{2 Syntax Specification of S-expressions}

{9 Lexical conventions of S-expression}

Whitespace, which consists of space, newline, carriage return, horizontal tab and form feed, is ignored unless within an OCaml-string, where it is treated according to OCaml-conventions. The semicolon introduces comments. Comments are ignored, and range up to the next newline character. The left parenthesis opens a new list, the right parenthesis closes it again. Lists can be empty. The double quote denotes the beginning and end of a string following the lexical conventions of OCaml (see OCaml-manual for details). All characters other than double quotes, left- and right parentheses, and whitespace are considered part of a contiguous string.

Indeed, you can read an atom with a trailing whitespace from a file without any errors.

The error is thrown from a function Pre_sexp.of_string_bigstring in a case when a parser successfully returns, but something was left in a buffer. So the main question is why did something has left in the buffer. It seems that there exists several parsers, and files and string are parsed with different parsers.

I've examined parse_atom rule defined at pre_sexp.ml:699 (all locations are for this commit ) and discovered that when the trailing whitespace is hit, the bump_found_atom is called. Then, if something is on stack, the position indicator is incremented and parsing continues. Otherwise, parsing is finished, but the position is not incremented. With a simple patch this can be fixed:

diff --git a/lib/pre_sexp.ml b/lib/pre_sexp.ml
index 86603f3..9690c0f 100644
--- a/lib/pre_sexp.ml
+++ b/lib/pre_sexp.ml
@@ -502,7 +502,7 @@ let mk_cont_parser cont_parse = (); fun _state str ~max_pos ~pos ->
     let pbuf_str = Buffer.contents pbuf in \
         let atom = MK_ATOM in \
     match GET_PSTACK with \
-    | [] -> Done (atom, mk_parse_pos state pos) \
+    | [] -> Done (atom, mk_parse_pos state (pos + 1)) \
     | rev_sexp_lst :: sexp_stack -> \
         Buffer.clear pbuf; \
         let pstack = (atom :: rev_sexp_lst) :: sexp_stack in \

After this patch, the following code produces an expected 'a', 'a', 'a' output:

let s1 = Sexp.of_string " a" in
let s2 = Sexp.of_string "a " in
let s3 = Sexp.of_string " a " in
printf "'%s', '%s', '%s'\n"
  (Sexp.to_string s1)
  (Sexp.to_string s2)
  (Sexp.to_string s3);
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top