Question

I'm trying to write a parser with treetop to parse some latex commands into HTML markup. With the following I get a deadspin in generated code. I've build the source code with tt and stepped through but it doesn't really elucidate what the underlying issue is (it just spins in _nt_paragraph)

Test input: "\emph{hey} and some more text."

grammar Latex
  rule document
    (paragraph)* {
      def content
        [:document, elements.map { |e| e.content }]
      end
    }
  end

  # Example: There aren't the \emph{droids you're looking for} \n\n. 
  rule paragraph
    ( text / tag )* eop {
      def content
        [:paragraph, elements.map { |e| e.content } ]
      end
    }
  end

  rule text
    ( !( tag_start / eop) . )* {
      def content
        [:text, text_value ]
      end
    }
  end

  # Example: \tag{inner_text}
  rule tag
    "\\emph{" inner_text '}' {
      def content
        [:tag, inner_text.content]
      end
    }
  end 

  # Example: \emph{inner_text}
  rule inner_text
    ( !'}' . )* {
      def content
        [:inner_text, text_value]
      end
    }
  end

  # End of paragraph.
  rule eop
    newline 2.. {
      def content
        [:newline, text_value]
      end
    }
  end

  rule newline
    "\n"
  end

  # You know, what starts a tag
  rule tag_start
    "\\"
  end

end
Was it helpful?

Solution

For anyone curious, Clifford over at the treetop dev google group figured this out.

The problem was with paragraph and text.

Text is 0 or more characters, and there can be 0 or more texts in a paragraph, so what was happening was there was an infinite amount of 0 length characters before the first \n, causing the parser to dead spin. The fix was to adjust text to be:

( !( tag_start / eop) . )+

So that it must have at least one character to match.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top