Question

I'm having a lot of trouble getting a good grasp on decorators despite having read many an article on the subject (including [this][1] very popular one on SO). I'm suspecting I must be stupid, but with all the stubbornness that comes with being stupid, I've decided to try to figure this out.

That, and I suspect I have a good use case...

Below is some code from a project of mine that extracts text from PDF files. Processing involves three steps:

  1. Set up PDFMiner objects needed for processing of PDF file (boilerplate initializations).
  2. Apply a processing function to the PDF file.
  3. No matter what happens, close the file.

I recently learned about context managers and the with statement, and this seemed like a good use case for them. As such, I started by defining the PDFMinerWrapper class:

class PDFMinerWrapper(object):
    '''
    Usage:
    with PDFWrapper('/path/to/file.pdf') as doc:
        doc.dosomething()
    '''
    def __init__(self, pdf_doc, pdf_pwd=''):
        self.pdf_doc = pdf_doc
        self.pdf_pwd = pdf_pwd

    def __enter__(self):
        self.pdf = open(self.pdf_doc, 'rb')
        parser = PDFParser(self.pdf)  # create a parser object associated with the file object
        doc = PDFDocument()  # create a PDFDocument object that stores the document structure
        parser.set_document(doc)  # connect the parser and document objects
        doc.set_parser(parser)
        doc.initialize(self.pdf_pwd)  # pass '' if no password required
        return doc

    def __exit__(self, type, value, traceback):
        self.pdf.close()
        # if we have an error, catch it, log it, and return the info
        if isinstance(value, Exception):
            self.logError()
            print traceback
            return value

Now I can easily work with a PDF file and be sure that it will handle errors gracefully. In theory, all I need to do is something like this:

with PDFMinerWrapper('/path/to/pdf') as doc:
    foo(doc)

This is great, except that I need to check that the PDF document is extractable before applying a function to the object returned by PDFMinerWrapper. My current solution involves an intermediate step.

I'm working with a class I call Pamplemousse which serves as an interface to work with the PDFs. It, in turn, uses PDFMinerWrapper each time an operation must be performed on the file to which the object has been linked.

Here is some (abridged) code that demonstrates its use:

class Pamplemousse(object):
    def __init__(self, inputfile, passwd='', enc='utf-8'):
        self.pdf_doc = inputfile
        self.passwd = passwd
        self.enc = enc

    def with_pdf(self, fn, *args):
        result = None
        with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
            if doc.is_extractable:  # This is the test I need to perform
                # apply function and return result
                result = fn(doc, *args)

        return result

    def _parse_toc(self, doc):
        toc = []
        try:
            toc = [(level, title) for level, title, dest, a, se in doc.get_outlines()]
        except PDFNoOutlines:
            pass
        return toc

    def get_toc(self):
        return self.with_pdf(self._parse_toc)

Any time I wish to perform an operation on the PDF file, I pass the relevant function to the with_pdf method along with its arguments. The with_pdf method, in turn, uses the with statement to exploit the context manager of PDFMinerWrapper (thus ensuring graceful handling of exceptions) and executes the check before actually applying the function it has been passed.

My question is as follows:

I would like to simplify this code such that I do not have to explicitly call Pamplemousse.with_pdf. My understanding is that decorators could be of help here, so:

  1. How would I implement a decorator whose job would be to call the with statement and execute the extractability check?
  2. Is it possible for a decorator to be a class method, or must my decorator be a free-form function or class?
Was it helpful?

Solution

The way I interpreted you goal, was to be able to define multiple methods on your Pamplemousse class, and not constantly have to wrap them in that call. Here is a really simplified version of what it might be:

def if_extractable(fn):
    # this expects to be wrapping a Pamplemousse object
    def wrapped(self, *args):
        print "wrapper(): Calling %s with" % fn, args
        result = None
        with PDFMinerWrapper(self.pdf_doc) as doc:
            if doc.is_extractable:
                result = fn(self, doc, *args)
        return result
    return wrapped


class Pamplemousse(object):

    def __init__(self, inputfile):
        self.pdf_doc = inputfile

    # get_toc will only get called if the wrapper check
    # passes the extractable test
    @if_extractable
    def get_toc(self, doc, *args):
        print "get_toc():", self, doc, args

The decorator if_extractable is defined is just a function, but it expects to be used on instance methods of your class.

The decorated get_toc, which used to delegate to a private method, simply will expect to receive a doc object and the args, if it passed the check. Otherwise it doesn't get called and the wrapper returns None.

With this, you can keep defining your operation functions to expect a doc

You could even add some type checking to make sure its wrapping the expected class:

def if_extractable(fn):
    def wrapped(self, *args):
    if not hasattr(self, 'pdf_doc'):
        raise TypeError('if_extractable() is wrapping '\
                        'a non-Pamplemousse object')
    ...

OTHER TIPS

A decorator is just a function that takes a function and returns another. You can do anything you like:

def my_func():
    return 'banana'

def my_decorator(f): # see it takes a function as an argument
    def wrapped():
        res = None
        with PDFMineWrapper(pdf_doc, passwd) as doc:
            res = f()
        return res
     return wrapper # see, I return a function that also calls f

Now if you apply the decorator:

@my_decorator
def my_func():
    return 'banana'

The wrapped function will replace my_func, so the extra code will be called.

You might want to try along the lines of this:

def with_pdf(self, fn, *args):
    def wrappedfunc(*args):
        result = None
        with PDFMinerWrapper(self.pdf_doc, self.passwd) as doc:
            if doc.is_extractable:  # This is the test I need to perform
                # apply function and return result
                result = fn(doc, *args)
        return result
    return wrappedfunc

and when you need to wrap the function, just do this:

@pamplemousseinstance.with_pdf
def foo(doc, *args):
    print 'I am doing stuff with', doc
    print 'I also got some good args. Take a look!', args

Here is some demonstration code:

#! /usr/bin/python

class Doc(object):
    """Dummy PDFParser Object"""

    is_extractable = True
    text = ''

class PDFMinerWrapper(object):
    '''
    Usage:
    with PDFWrapper('/path/to/file.pdf') as doc:
        doc.dosomething()
    '''
    def __init__(self, pdf_doc, pdf_pwd=''):
        self.pdf_doc = pdf_doc
        self.pdf_pwd = pdf_pwd

    def __enter__(self):
        return self.pdf_doc

    def __exit__(self, type, value, traceback):
        pass

def safe_with_pdf(fn):
    """
    This is the decorator, it gets passed the fn we want
    to decorate.

    However as it is also a class method it also get passed
    the class. This appears as the first argument and the
    function as the second argument.
    """
    print "---- Decorator ----"
    print "safe_with_pdf: First arg (fn):", fn
    def wrapper(self, *args, **kargs):
        """
        This will get passed the functions arguments and kargs,
        which means that we can intercept them here.
        """
        print "--- We are now in the wrapper ---"
        print "wrapper: First arg (self):", self
        print "wrapper: Other args (*args):", args
        print "wrapper: Other kargs (**kargs):", kargs

        # This function is accessible because this function is
        # a closure, thus still has access to the decorators
        # ivars.
        print "wrapper: The function we run (fn):", fn

        # This wrapper is now pretending to be the original function

        # Perform all the checks and stuff
        with PDFMinerWrapper(self.pdf, self.passwd) as doc:
            if doc.is_extractable:
                # Now call the orininal function with its
                # argument and pass it the doc
                result = fn(doc, *args, **kargs)
            else:
                result = None
        print "--- End of the Wrapper ---"
        return result

    # Decorators are expected to return a function, this
    # function is then run instead of the decorated function.
    # So instead of returning the original function we return the
    # wrapper. The wrapper will be run with the original functions
    # argument.

    # Now by using closures we can still access the original
    # functions by looking up fn (the argument that was passed
    # to this function) inside of the wrapper.
    print "--- Decorator ---"
    return wrapper


class SomeKlass(object):

    @safe_with_pdf
    def pdf_thing(doc, some_argument):
        print ''
        print "-- The Function --"

        # This function is now passed the doc from the wrapper.

        print 'The contents of the pdf:', doc.text
        print 'some_argument', some_argument
        print "-- End of the Function --"
        print ''

doc = Doc()
doc.text = 'PDF contents'
klass = SomeKlass()  
klass.pdf = doc
klass.passwd = ''
klass.pdf_thing('arg')

I recommend running that code to see how it works. Some of the interesting points to look out for tho:

First you will notice that we only pass a single argument to pdf_thing() but if you look at the method it takes two arguments:

@safe_with_pdf
def pdf_thing(doc, some_argument):
    print ''
    print "-- The Function --"

This is because if you look at the wrapper where we all the function:

with PDFMinerWrapper(self.pdf, self.passwd) as doc:
    if doc.is_extractable:
        # Now call the orininal function with its
        # argument and pass it the doc
        result = fn(doc, *args, **kargs)

We generate the doc argument and pass it in, along with the original arguments (*args, **kargs). This means that every method or function that is wrapped with this decorator receives an addition doc argument in addition to the arguments listed in its declaration (def pdf_thing(doc, some_argument):).

Another thing to note is that the wrapper:

def wrapper(self, *args, **kargs):
    """
    This will get passed the functions arguments and kargs,
    which means that we can intercept them here.
    """

Also captures the self argument and does not pass it to the method being called. You could change this behaviour my modifying the function call from:

result = fn(doc, *args, **kargs)
    else:
        result = None

To:

result = fn(self, doc, *args, **kargs)
    else:
        result = None

and then changing the method itself to:

def pdf_thing(self, doc, some_argument):

Hope that helps, feel free to ask for more clarification.

EDIT:

To answer the second part of your question.

Yes is can be a class method. Just place safe_with_pdf inside of SomeKlass above and calls to it e.g. The first method in the class.

Also here is a reduced version of the above code, with the decorator in the class.

class SomeKlass(object):
    def safe_with_pdf(fn):
        """The decorator which will wrap the method"""
        def wrapper(self, *args, **kargs):
            """The wrapper which will call the method is a doc"""
            with PDFMinerWrapper(self.pdf, self.passwd) as doc:
                if doc.is_extractable:
                    result = fn(doc, *args, **kargs)
                else:
                    result = None
            return result
        return wrapper

    @safe_with_pdf
    def pdf_thing(doc, some_argument):
        """The method to decorate"""
        print 'The contents of the pdf:', doc.text
        print 'some_argument', some_argument
        return '%s - Result' % doc.text

print klass.pdf_thing('arg')
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top