PEG grammar not working as expected

https://stackoverflow.com/questions/22750463

24-06-2023
|

Question

I'm working on a PEG grammar that takes code in a Music Programming Language and creates a parse tree of musical events (notes, chords, volume/tempo changes, etc.). A feature of my MPL is that it supports voices, i.e. different sequences of events happening at the same time. I'm having a hard time getting my Instaparse grammar to parse this correctly... what I want is a voices tag that consists of one or more voices, each of which consists of a voice definition (e.g. V1:) and then any number of events. The voices tag should end either with V0: (which means the end of the split voices, and we're back to just one voice, or "voice zero"), or the end of the file.

Here is an excerpt of my grammar in progress (I'm leaving out the definitions of note, chord, etc. for the sake of clarity):

part                    = <ows> event+
<event>                 = chord | note | rest | octave-change |
                          attribute-change | voices |
                          marker | at-marker

voices                  = voice+ 
voice                   = !voices voice-number voice-events? 
                          (<voice-zero> | #"\z")
voice-number            = <"V"> #"[1-9]\d*" <":"> <ows>
<voice-zero>            = <"V0:"> <ows>
voice-events            = !voices event+ 

...

ows                     = #"\s*"

Given the following code:

V1: o2 b1/>b o2 g+/>g+ o2 g/>g 
V0: e8 f+ g+ a b2

Running the parser gives the following output:

[:part 
  [:voices 
    [:voice [:voice-number "1"] 
            [:voice-events 
              [:octave-change "2"] [:chord [:note [:pitch "b"] 
              [:duration "1"]] [:octave-change ">"] [:note [:pitch "b"]]] 
              [:octave-change "2"] [:chord [:note [:pitch "g+"]] 
              [:octave-change ">"] [:note [:pitch "g+"]]] 
              [:octave-change "2"] [:chord [:note [:pitch "g"]]
              [:octave-change ">"] [:note [:pitch "g"]]]]]] 
  [:note [:pitch "e"] [:duration "8"]] 
  [:note [:pitch "f+"]] 
  [:note [:pitch "g+"]] 
  [:note [:pitch "a"]] 
  [:note [:pitch "b"] [:duration "2"]]]

Which is exactly what I want. The V0: signals the end of the voices tag, and the last 5 notes are on their own inside the part tag.

However, when I change the V0 to V2, I get this:

[:part 
  [:voices 
    [:voice [:voice-number "1"] 
            [:voice-events 
              [:octave-change "2"] [:chord [:note [:pitch "b"] [:duration "1"]] 
              [:octave-change ">"] [:note [:pitch "b"]]] [:octave-change "2"] 
              [:chord [:note [:pitch "g+"]] [:octave-change ">"] 
              [:note [:pitch "g+"]]] [:octave-change "2"] 
              [:chord [:note [:pitch "g"]] [:octave-change ">"] 
              [:note [:pitch "g"]]] 
              [:voices 
                [:voice [:voice-number "2"] 
                [:voice-events 
                  [:note [:pitch "e"] [:duration "8"]] [:note [:pitch "f+"]] 
                  [:note [:pitch "g+"]] [:note [:pitch "a"]] 
                  [:note [:pitch "b"] [:duration "2"]]]]]]]]]

For some reason, either the voice 1 tag or its voice-events tag is not terminating like it's supposed to, and the second voice is swallowed up as part of the first voice's voice-events. I also don't want there to be a second voices tag; voice 2 should be within the primary voices tag.

What I want is this:

[:part 
  [:voices 
    [:voice [:voice-number "1"] 
            [:voice-events 
              [:octave-change "2"] [:chord [:note [:pitch "b"] [:duration "1"]] 
              [:octave-change ">"] [:note [:pitch "b"]]] [:octave-change "2"] 
              [:chord [:note [:pitch "g+"]] [:octave-change ">"] 
              [:note [:pitch "g+"]]] [:octave-change "2"] 
              [:chord [:note [:pitch "g"]] [:octave-change ">"] 
              [:note [:pitch "g"]]]]]
    [:voice [:voice-number "2"] 
            [:voice-events 
              [:note [:pitch "e"] [:duration "8"]] [:note [:pitch "f+"]] 
              [:note [:pitch "g+"]] [:note [:pitch "a"]] 
              [:note [:pitch "b"] [:duration "2"]]]]]]

I can't figure out what I'm doing wrong, but I think it has something to do with how I'm defining the voice tag and/or the voice-events tag. It may have something to do with how I'm using negative lookahead, which I don't think I fully understand yet. Can anyone figure out how I can fix my grammar?

Thanks! :)

Solved!

Thanks, @DanielNeal! I've re-worked my grammar to this, which works exactly the way I want it to:

part                    = <ows> (voices | event)+
<event>                 = chord | note | rest | octave-change |
                          attribute-change | marker | at-marker

voices                  = voice+ (<voice-zero> | <#"\z">)
voice                   = voice-number event*
voice-number            = <"V"> #"[1-9]\d*" <":"> <ows>
<voice-zero>            = <"V0:"> <ows>

...

ows                     = #"\s*"

The big change was in how I defined part and event; before, I had these terms defined such that voices is an event, so any subsequent voices were being consumed and lumped into the previous voice's events. By pulling voices out of the definition of an event and redefining part to be a variable number of voices groupings or events, I eliminated the ambiguity and got the grammar to behave the way I want it to.

After that, the events within a voice were grouping properly, but I was still having a problem with each voice being within its own separate voices tag, when I need them to all be within the same voices grouping. I fixed this by specifying that a voices tag ends either with a "V0:" or the end of the file (\z), in other words being more specific about how much code I want the voices tag to consume.

The moral of the story is, if you're writing a PEG grammar and you're having problems, you probably need to make your definitions less ambiguous! I also ended up not using negative lookahead at all, which I think helped a lot to simplify / de-ambiguify my grammar.

Solution

I think you're right - it's the negative lookahead that's causing the problem. Without your full grammar, I can't test properly, but this line:

voice-events = !voices event+

Represents something that does not match voices followed by one or more events.

I'm assuming that voice-events should not contain voices within it in a recursive way, but at the moment it does - indirectly. Eachevent can contain voices within it, and in turn, voice-events can contain events.

In the sample above, the first event in V1 is an octave shift (which matches the not-a-voice condition). This allows the subsequent voice that occurs to get consumed within the event definition. If that makes sense.

To fix this you could (perhaps) define it the other way round:

voice-event = chord | note | rest | octave-change | attribute-change | marker | at-marker
event       = voice-event | voices

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow