Parsing a simple extension of latex: grammar, recursive descent, pyParsing?

https://stackoverflow.com/questions/15729778

30-03-2022
|

Question

I would like to do a small extension of latex syntax.
There are pure latex ways to avoid this parsing exercise, and I know them.
The goal of this question is to solve the following parsing problem.

If \ep is small                    --> If \epsilon is small  

\theorem                           --> \begin{theorem}  
(tab) lorem ipsum                  --> (tab) lorem ipsum  
(tab) lorem ipsum                  --> (tab) lorem ipsum  
(no tab) Some text                 --> \end{theorem}  
                                       Some text 

A function \oldFunction{x}{y}      --> A function \newFunction{x}{y}

Some other text with latex construct like \frac{1}{2} (not part of the grammar)

So I have a few keywords, like ep, oldFunction, I want to convert to a new keyword.
They can be nested.

\oldFunction{\ep}{\ep}

I have a 'tab' consistent keywords, like theorem, which enclosed content.
This tab consist keyworks can be nested.

\theorem  
(tab) \lemma  
(tab) (tab) \oldFunction{\ep}{\ep}

Also, the \ep and \theorem keywords can be mixed, like in the previous line.

Then, there are all the other latex constructs, that I don't touch and just leave there.

I look into pyParsing and codeTalker.
codeTalker is context-free grammar, I don't know if my describe grammer is context-free.
pyParsing can do it, I look into the documentation, but I don't understand how to apply it.
It's my first encounter with a parsing problem.

Solution

It seems like you could get away with not using a parsing library at all. I'm thinking about:

newstuff = {r'\b\ep\b':r'\epsilon',r'\b\other\b':r'\notherthings'}
fixed = []
intheorem = False
for line in source:
    for k,v in newstuff:
        line = re.sub(k, v, line)
    if not line.startswith('\t') and intheorem:
        fixed.append('\end{theorem}')
        intheorem = False
    if line.startswith('\theorem')
        line = '\begin{theorem}'
        intheorem = True
    fixed.append(line)
if intheorem:
    fixed.append('\end{theorem}')

Does that make sense? In each line, do a regex replace for all your special names, and keep track of indents for the special "\theorem" block.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow