Replacing macro-style class method with a decorator?
-
11-12-2019 - |
Question
I'm having a lot of trouble getting a good grasp on decorators despite having read many an article on the subject (including [this][1] very popular one on SO). I'm suspecting I must be stupid, but with all the stubbornness that comes with being stupid, I've decided to try to figure this out.
That, and I suspect I have a good use case...
Below is some code from a project of mine that extracts text from PDF files. Processing involves three steps:
- Set up PDFMiner objects needed for processing of PDF file (boilerplate initializations).
- Apply a processing function to the PDF file.
- No matter what happens, close the file.
I recently learned about context managers and the with
statement, and this seemed like a good use case for them. As such, I started by defining the PDFMinerWrapper
class:
class PDFMinerWrapper(object):
'''
Usage:
with PDFWrapper('/path/to/file.pdf') as doc:
doc.dosomething()
'''
def __init__(self, pdf_doc, pdf_pwd=''):
self.pdf_doc = pdf_doc
self.pdf_pwd = pdf_pwd
def __enter__(self):
self.pdf = open(self.pdf_doc, 'rb')
parser = PDFParser(self.pdf) # create a parser object associated with the file object
doc = PDFDocument() # create a PDFDocument object that stores the document structure
parser.set_document(doc) # connect the parser and document objects
doc.set_parser(parser)
doc.initialize(self.pdf_pwd) # pass '' if no password required
return doc
def __exit__(self, type, value, traceback):
self.pdf.close()
# if we have an error, catch it, log it, and return the info
if isinstance(value, Exception):
self.logError()
print traceback
return value
Now I can easily work with a PDF file and be sure that it will handle errors gracefully. In theory, all I need to do is something like this:
with PDFMinerWrapper('/path/to/pdf') as doc:
foo(doc)
This is great, except that I need to check that the PDF document is extractable before applying a function to the object returned by PDFMinerWrapper
. My current solution involves an intermediate step.
I'm working with a class I call Pamplemousse
which serves as an interface to work with the PDFs. It, in turn, uses PDFMinerWrapper
each time an operation must be performed on the file to which the object has been linked.
Here is some (abridged) code that demonstrates its use:
class Pamplemousse(object):
def __init__(self, inputfile, passwd='', enc='utf-8'):
self.pdf_doc = inputfile
self.passwd = passwd
self.enc = enc
def with_pdf(self, fn, *args):
result = None
with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
if doc.is_extractable: # This is the test I need to perform
# apply function and return result
result = fn(doc, *args)
return result
def _parse_toc(self, doc):
toc = []
try:
toc = [(level, title) for level, title, dest, a, se in doc.get_outlines()]
except PDFNoOutlines:
pass
return toc
def get_toc(self):
return self.with_pdf(self._parse_toc)
Any time I wish to perform an operation on the PDF file, I pass the relevant function to the with_pdf
method along with its arguments. The with_pdf
method, in turn, uses the with
statement to exploit the context manager of PDFMinerWrapper
(thus ensuring graceful handling of exceptions) and executes the check before actually applying the function it has been passed.
My question is as follows:
I would like to simplify this code such that I do not have to explicitly call Pamplemousse.with_pdf
. My understanding is that decorators could be of help here, so:
- How would I implement a decorator whose job would be to call the
with
statement and execute the extractability check? - Is it possible for a decorator to be a class method, or must my decorator be a free-form function or class?
Solution
The way I interpreted you goal, was to be able to define multiple methods on your Pamplemousse
class, and not constantly have to wrap them in that call. Here is a really simplified version of what it might be:
def if_extractable(fn):
# this expects to be wrapping a Pamplemousse object
def wrapped(self, *args):
print "wrapper(): Calling %s with" % fn, args
result = None
with PDFMinerWrapper(self.pdf_doc) as doc:
if doc.is_extractable:
result = fn(self, doc, *args)
return result
return wrapped
class Pamplemousse(object):
def __init__(self, inputfile):
self.pdf_doc = inputfile
# get_toc will only get called if the wrapper check
# passes the extractable test
@if_extractable
def get_toc(self, doc, *args):
print "get_toc():", self, doc, args
The decorator if_extractable
is defined is just a function, but it expects to be used on instance methods of your class.
The decorated get_toc
, which used to delegate to a private method, simply will expect to receive a doc
object and the args, if it passed the check. Otherwise it doesn't get called and the wrapper returns None.
With this, you can keep defining your operation functions to expect a doc
You could even add some type checking to make sure its wrapping the expected class:
def if_extractable(fn):
def wrapped(self, *args):
if not hasattr(self, 'pdf_doc'):
raise TypeError('if_extractable() is wrapping '\
'a non-Pamplemousse object')
...
OTHER TIPS
A decorator is just a function that takes a function and returns another. You can do anything you like:
def my_func():
return 'banana'
def my_decorator(f): # see it takes a function as an argument
def wrapped():
res = None
with PDFMineWrapper(pdf_doc, passwd) as doc:
res = f()
return res
return wrapper # see, I return a function that also calls f
Now if you apply the decorator:
@my_decorator
def my_func():
return 'banana'
The wrapped
function will replace my_func
, so the extra code will be called.
You might want to try along the lines of this:
def with_pdf(self, fn, *args):
def wrappedfunc(*args):
result = None
with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
if doc.is_extractable: # This is the test I need to perform
# apply function and return result
result = fn(doc, *args)
return result
return wrappedfunc
and when you need to wrap the function, just do this:
@pamplemousseinstance.with_pdf
def foo(doc, *args):
print 'I am doing stuff with', doc
print 'I also got some good args. Take a look!', args
Here is some demonstration code:
#! /usr/bin/python
class Doc(object):
"""Dummy PDFParser Object"""
is_extractable = True
text = ''
class PDFMinerWrapper(object):
'''
Usage:
with PDFWrapper('/path/to/file.pdf') as doc:
doc.dosomething()
'''
def __init__(self, pdf_doc, pdf_pwd=''):
self.pdf_doc = pdf_doc
self.pdf_pwd = pdf_pwd
def __enter__(self):
return self.pdf_doc
def __exit__(self, type, value, traceback):
pass
def safe_with_pdf(fn):
"""
This is the decorator, it gets passed the fn we want
to decorate.
However as it is also a class method it also get passed
the class. This appears as the first argument and the
function as the second argument.
"""
print "---- Decorator ----"
print "safe_with_pdf: First arg (fn):", fn
def wrapper(self, *args, **kargs):
"""
This will get passed the functions arguments and kargs,
which means that we can intercept them here.
"""
print "--- We are now in the wrapper ---"
print "wrapper: First arg (self):", self
print "wrapper: Other args (*args):", args
print "wrapper: Other kargs (**kargs):", kargs
# This function is accessible because this function is
# a closure, thus still has access to the decorators
# ivars.
print "wrapper: The function we run (fn):", fn
# This wrapper is now pretending to be the original function
# Perform all the checks and stuff
with PDFMinerWrapper(self.pdf, self.passwd) as doc:
if doc.is_extractable:
# Now call the orininal function with its
# argument and pass it the doc
result = fn(doc, *args, **kargs)
else:
result = None
print "--- End of the Wrapper ---"
return result
# Decorators are expected to return a function, this
# function is then run instead of the decorated function.
# So instead of returning the original function we return the
# wrapper. The wrapper will be run with the original functions
# argument.
# Now by using closures we can still access the original
# functions by looking up fn (the argument that was passed
# to this function) inside of the wrapper.
print "--- Decorator ---"
return wrapper
class SomeKlass(object):
@safe_with_pdf
def pdf_thing(doc, some_argument):
print ''
print "-- The Function --"
# This function is now passed the doc from the wrapper.
print 'The contents of the pdf:', doc.text
print 'some_argument', some_argument
print "-- End of the Function --"
print ''
doc = Doc()
doc.text = 'PDF contents'
klass = SomeKlass()
klass.pdf = doc
klass.passwd = ''
klass.pdf_thing('arg')
I recommend running that code to see how it works. Some of the interesting points to look out for tho:
First you will notice that we only pass a single argument to pdf_thing()
but if you look at the method it takes two arguments:
@safe_with_pdf
def pdf_thing(doc, some_argument):
print ''
print "-- The Function --"
This is because if you look at the wrapper where we all the function:
with PDFMinerWrapper(self.pdf, self.passwd) as doc:
if doc.is_extractable:
# Now call the orininal function with its
# argument and pass it the doc
result = fn(doc, *args, **kargs)
We generate the doc argument and pass it in, along with the original arguments (*args, **kargs
). This means that every method or function that is wrapped with this decorator receives an addition doc
argument in addition to the arguments listed in its declaration (def pdf_thing(doc, some_argument):
).
Another thing to note is that the wrapper:
def wrapper(self, *args, **kargs):
"""
This will get passed the functions arguments and kargs,
which means that we can intercept them here.
"""
Also captures the self
argument and does not pass it to the method being called. You could change this behaviour my modifying the function call from:
result = fn(doc, *args, **kargs)
else:
result = None
To:
result = fn(self, doc, *args, **kargs)
else:
result = None
and then changing the method itself to:
def pdf_thing(self, doc, some_argument):
Hope that helps, feel free to ask for more clarification.
EDIT:
To answer the second part of your question.
Yes is can be a class method. Just place safe_with_pdf
inside of SomeKlass
above and calls to it e.g. The first method in the class.
Also here is a reduced version of the above code, with the decorator in the class.
class SomeKlass(object):
def safe_with_pdf(fn):
"""The decorator which will wrap the method"""
def wrapper(self, *args, **kargs):
"""The wrapper which will call the method is a doc"""
with PDFMinerWrapper(self.pdf, self.passwd) as doc:
if doc.is_extractable:
result = fn(doc, *args, **kargs)
else:
result = None
return result
return wrapper
@safe_with_pdf
def pdf_thing(doc, some_argument):
"""The method to decorate"""
print 'The contents of the pdf:', doc.text
print 'some_argument', some_argument
return '%s - Result' % doc.text
print klass.pdf_thing('arg')