How to prevent table regeneration in PLY

https://stackoverflow.com/questions/12644902

04-07-2021
|

Question

I am using PLY in a command line application that I package as a Python egg to be installed via pip. Everytime I run my script from the command line, I see the following message:

"Generating LALR tables"

Additionally, parser.out and parsetab.py files are written to the directory from which the script is invoked. Is there some way to ship these files with the application so that it does not regenerate the tables each and every time?

Solution 3

What I ultimately wound up doing was turning off optimization. I was going through the PLY 3.4 source and I found this little nugget in the lexer code:

# If in optimize mode, we write the lextab
if lextab and optimize:
    lexobj.writetab(lextab,outputdir)

return lexobj

By changing the code that builds the lexer and parser to:

self.lexer = lex.lex(module=self, optimize=False, debug=False, **kwargs)

and

self.lexer = lex.lex(module=self, optimize=False, debug=False, **kwargs)

I avoided all file write-outs. The debugger writes .out files into the directory and the Python files are the result of the optimize flag.

While this works for the time being, I cannot say I am entirely happy with this approach. Presumably, having some way to keep optimization on and, at the same time, keep the working directory clean would be a superior solution would result in better performance. If someone else has a better methodology, I am more than open to it.

OTHER TIPS

use

yacc.yacc(debug=0, write_tables=0)

You want to use optimized mode, by calling lex as:

lexer = lex.lex(optimize=1)

It's worth emphasising (from the same link):

On subsequent executions, lextab.py will simply be imported to build the lexer. This approach substantially improves the startup time of the lexer and it works in Python's optimized mode.

When running in optimized mode, it is important to note that lex disables most error checking. Thus, this is really only recommended if you're sure everything is working correctly and you're ready to start releasing production code.

Since this is production code, this sounds like exactly what you want.

In looking into this issue, I came across the miscellaneous Yacc notes:

Since the generation of the LALR tables is relatively expensive, previously generated tables are cached and reused if possible. The decision to regenerate the tables is determined by taking an MD5 checksum of all grammar rules and precedence rules. Only in the event of a mismatch are the tables regenerated.

And looking deeper into the yacc function inside yacc.py, we see that optimise ignores this mismatch in the following snippet:

if optimize or (read_signature == signature):
    try:
        lr.bind_callables(pinfo.pdict)
        parser = LRParser(lr,pinfo.error_func)
        parse = parser.parse
        return parser

where signature is compared to checksum stored in parsetab.py (as _lr_signature).

This is an old question, but I ran into a similar problem with ply when I tried to use the outputdir yacc keyword argument to place the generated parser tables in specific directories within my project -- it would place them there, but re-generate them every time regardless. I found this patch on github which solved the regeneration issue with no noticeable ill effects. Basically, all it does is modify the read_table method on the yacc class to take an extra parameter -- the outputdir -- and searches the directory there before re-generation. In order to make that work, the sole call site to read_table (in method yacc) also needs to be modified to pass the outputdir keyword argument.

Apparently, there are arguments for this in ply.yacc:

def yacc(method='LALR', debug=yaccdebug, module=None, tabmodule=tab_module, start=None, 
     check_recursion=1, optimize=0, write_tables=1, debugfile=debug_file,outputdir='',
     debuglog=None, errorlog = None, picklefile=None):

So, you just pass a different errorlog and debuglog (with a debug() etc. methods that do not print to stdout/stderr). And you specify a fixed outputdir. And that's all you need to do.

UPDATE: I just checked and this is the correct setting:

yacc.yacc(
    debug=False,                         # do not create parser.out
    outputdir=r"c:\temp\aaa" # instruct to place parsetab here
)

Actually you need to use an outputdir that already contains parsetab.py. This will eliminate not just the message but your program will not write out parsetab.py. It will just use it.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow