Question

In my current work environment, we produce a large number of Python packages for internal use (10s if not 100s). Each package has some dependencies, usually on a mixture of internal and external packages, and some of these dependencies are shared.

As we approach dependency hell, updating dependencies becomes a time consuming process. While we care about the functional changes a new version might introduce, of equal (if not more) importance are the API changes that break the code.

Although running unit/integration tests against newer versions of a dependency helps us to catch some issues, our coverage is not close enough to 100% to make this a robust strategy. Release notes and a change log help identify major changes at a high-level, but these rarely exist for internally developed tools or go into enough detail to understand the implications the new version has on the (public) API.

I am looking at otherways to automate this process.

I would like to be able to automatically compare two versions of a Python package and report the API differences between them. In particular this would include backwards incompatible changes such as removing functions/methods/classes/modules, adding positional arguments to a function/method/class and changing the number of items a function/method returns. As a developer, based on the report this generates I should have a greater understanding about the code level implications this version change will introduce, and so the time require to integrate it.

Elsewhere, we use the C++ abi-compliance-checker and are looking at the Java api-compliance-checker to help with this process. Is there a similar tool available for Python? I have found plenty of lint/analysis/refactor tools but nothing that provides this level of functionality. I understand that Python's dynamic typing will make a comprehensive report impossible.

If such a tool does not exist, are they any libraries that could help with implementing a solution? For example, my current approach would be to use an ast.NodeVisitor to traverse the package and build a tree where each node represents a module/class/method/function and then compare this tree to that of another version for the same package.

Edit: since posting the question I have found pysdiff which covers some of my requirements, but interested to see alternatives still.

Edit: also found Upstream-Tracker would is a good example of the sort of information I'd like to end up with.

No correct solution

OTHER TIPS

What about using the AST module to parse the files?

import ast

with file("test.py") as f:
    python_src = f.read()

    node = ast.parse(python_src) # Note: doesn't compile the src
    print ast.dump(node)

There's the walk method on the ast node (described http://docs.python.org/2/library/ast.html)

The astdump might work (available on pypi)

This out of date pretty printer http://code.activestate.com/recipes/533146-ast-pretty-printer/

The documentation tool Sphinx also extracts the information you are looking for. Perhaps give that a look.

So walk the AST and build a tree with the information you want in it. Once you have a tree you can pickle it and diff later or convert the tree to a text representation in a text file you can diff with difftools, or some external diff program.

The ast has parse() and compile() methods. Only thing is I'm not entirely sure how much information is available to you after parsing (as you don't want to compile()).

Perhaps you can start by using the inspect module

import inspect
import types
def genFunctions(module):
    moduleDict = module.__dict__
    for name in dir(module):
        if name.startswith('_'):
            continue
        element = moduleDict[name]
        if isinstance(element, types.FunctionType):
            argSpec = inspect.getargspec(element)
            argList = argSpec.args
            print "{}.{}({})".format(module.__name__, name, ", ".join(argList))

That will give you a list of "public" (not starting with underscore) functions with their argument lists. You can add more stuff to print the kwargs, classes, etc.

Once you run that on all the packages/modules you care about, in both old and new versions, you'll have two lists like this:

myPackage.myModule.myFunction1(foo, bar)
myPackage.myModule.myFunction2(baz)

Then you can either just sort and diff them, or write some smarter tooling in Python to actually compare all the names, e.g. to permit additional optional arguments but reject new mandatory arguments.

Check out zope.interfaces (you can get it from PyPI). Then you can incorporate unit testing that modules support interfaces into your unit tests. May take a while to retro fit however - also it's not a silver bullet.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top