Question

PEG-based parser generators usually provide limited error reporting on invalid inputs. From what I read, the parse dialect of rebol is inspired by PEG grammars extended with regular expressions.

For example, typing the following in JavaScript:

d8> function () {}

gives the following error, because no identifier was provided in declaring a global function:

(d8):1: SyntaxError: Unexpected token (
function () {}
         ^

The parser is able to pinpoint exactly the position during parsing where an expected token is missing. The character position of the expected token is used to position the arrow in the error message.

Does the parse dialect in rebol provides built-in facilities to report the line and column errors on invalid inputs?

Otherwise, are there examples out there of custom rolled out parse rules that provide such error reporting?

Was it helpful?

Solution

I've done very advanced Rebol parsers which manage live and mission-critical TCP servers, and doing proper error reporting was a requirement. So this is important!

Probably one of the most unique aspects of Rebol's PARSE is that you can include direct evaluation within the rules. So you can set variables to track the parse position, or the error messages, etc. (It's very easy because the nature of Rebol is that mixing code and data as the same thing is a core idea.)

So here's the way I did it. Before each match rule is attempted, I save the parse position into "here" (by writing here:) and then also save an error into a variable using code execution (by putting (error: {some error string}) in parentheses so that the parse dialect runs it). If the match rule succeeds, we don't need to use the error or position...and we just go on to the next rule. But if it fails we will have the last state we set to report after the failure.

Thus the pattern in the parse dialect is simply:

; use PARSE dialect handling of "set-word!" instances to save parse
; position into variable named "here"

here:

; escape out of the parse dialect using parentheses, and into the DO 
; dialect to run arbitrary code.  Here we run code that saves an error
; message string into a variable named "error"

(error: "<some error message relating to rule that follows>")

; back into the PARSE dialect again, express whatever your rule is,
; and if it fails then we will have the above to use in error reporting

what: (ever your) [rule | {is}]

That's basically what you need to do. Here is an example for phone numbers:

digit: charset "012345689"

phone-number-rule: [
    here:
    (error: "invalid area code")
    ["514" | "800" | "888" | "916" "877"]

    here:
    (error: "expecting dash")
    "-"

    here:
    (error: "expecting 3 digits")
    3 digit

    here:
    (error: "expecting dash")
    "-"

    here:
    (error: "expecting 4 digits")
    4 digit

    (error: none)
]

Then you can see it in action. Notice that we set error to none if we reach the end of the parse rules. PARSE will return false if there is still more input to process, so if we notice there is no error set but PARSE returns false anyway... we failed because there was too much extra input:

input: "800-22r2-3333"

if not parse input phone-number-rule [
   if none? error [
        error: "too much data for phone number"
    ]
]

either error [
    column: length? copy/part input here newline
    print rejoin ["error at position:" space column]
    print error
    print input
    print rejoin [head insert/dup "" space column "^^"}
    print newline
][
    print {all good}
]

The above will print the following:

error at position: 4

expecting 3 digits
800-22r2-3333
    ^

Obviously, you could do much more potent stuff, since whatever you put in parens will be evaluated just like normal Rebol source code. It's really flexible. I even have parsers which update progress bars while loading huge datasets... :-)

OTHER TIPS

Here is a simple example of finding the position during parsing a string which could be used to do what you ask.

Let us say that our code is only valid if it contains a and b characters, anything else would be illegal input.

code-rule: [
    some [
        "a" |
        "b"
    ] 
    [ end | mark: (print [ "Failed at position" index? mark ]) ]
]

Let's check that with some valid code

>> parse "aaaabbabb" code-rule
== true

Now we can try again with some invalid input

>> parse "aaaabbXabb" code-rule
Failed at position 7
== false

This is a rather simplified example language, but it should be easy to extend to more a complex example.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top