Pregunta

I am writing a Python script that takes plain text as input and produces LaTeX code as output. At some point the script has to quote all the characters that have a special meaning in TeX, such as %, &, \, and so on.

This is more difficult than I expected. Currently I have this:

def ltx_quote(s):
    s = re.sub(r'[\\]', r'\\textbackslash{}', s)
    # s = re.sub(r'[{]', r'\\{{}', s)
    # s = re.sub(r'[}]', r'\\}{}', s)
    s = re.sub(r'[&]', r'\\&{}', s)
    s = re.sub(r'[$]', r'\\${}', s)
    s = re.sub(r'[%]', r'\\%{}', s)
    s = re.sub(r'[_]', r'\\_{}', s)
    s = re.sub(r'[\^]', r'\\^{}', s)
    s = re.sub(r'[~]', r'\\~{}', s)
    s = re.sub(r'[|]', r'\\textbar{}', s)
    s = re.sub(r'[#]', r'\\#{}', s)
    s = re.sub(r'[<]', r'\\textless{}', s)
    s = re.sub(r'[>]', r'\\textgreater{}', s)
    return s

The problem is the { and } characters, because they are potentially produced by an earlier substitution (\ -> \textbackslash{}) in which case shouldn't be substituted. I think the solution would be making all the substitutions in one step, but I don't know how to do it.

¿Fue útil?

Solución

Perhaps try using the undocumented re.Scanner:

import re
scanner = re.Scanner([
    (r"[\\]", r'\\textbackslash{}'),
    (r"[{]", r'\\{{}'),
    (r"[}]", r'\\}{}'), 
    (r".", lambda s, t: t)
])

tokens, remainder = scanner.scan("\\foo\\{bar}")
print(''.join(tokens))

yields

\\textbackslash{}foo\\textbackslash{}\\{{}bar\\}{}

Unlike the code you posted, if you look at the source code, the re.Scanner.scan makes only one pass through the string. Once a match is made, the next match is begun from where the last match ended.

The first argument to re.Scanner is a lexicon -- a list of 2-tuples. Each 2-tuple is a regex pattern and an action. The action may be a string, a callable (function), or None (no action).

The patterns are all compiled into one compound pattern. So the order in which the patterns are listed in the lexicon is important. The first pattern to match wins.

If a match is made, the action is called if it is callable, or simply returned if a string. The return values are collected in the list tokens.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top