REGEX parsing commands from latex lines - Python

https://stackoverflow.com/questions/23482689

16-07-2023
|

Pregunta

I'm trying to parse and remove any \command (\textit, etc...) from each line loaded (from .tex file or other commands from lilypond files as [\clef, \key, \time]).

How could I do that?

What I've tried

import re
f = open('example.tex')
lines = f.readlines()
f.close()

pattern = '^\\*([a-z]|[0-9])' # this is the wrong regex!!
clean = []
for line in lines:
    remove = re.match(pattern, line)
    if remove:
        clean.append(remove.group())

print(clean)

Example

Input

#!/usr/bin/latex

\item More things
\subitem Anything

Expected output

More things
Anything

Solución

You could use a simple regex substitution using this pattern ^\\[^\s]*:

Sample code in python:

import re
p = re.compile(r"^\\[^\s]*", re.MULTILINE)

str = '''
\item More things
\subitem Anything
'''

subst = ""

print re.sub(p, subst, str)

The result would be:

More things
Anything

Otros consejos

This will work:

'\\\w+\s'

It searches for the backslash, then for one or more characters, and a space.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow