coverage on a frozen executable

Question 1

This isn't a fully formulated answer but what I have found so far.

From my understanding of how pyinstaller works is that a binary is constructed from a small C program that embeds a python interpreter and bootstraps loading a script. The PyInstaller constructed EXE includes an archive after the end of the actual binary that contains the resources for the python code. This is explained here http://www.pyinstaller.org/export/develop/project/doc/Manual.html#pyinstaller-archives.

There is iu.py from Pyinstaller/loader/iu.py Docs. You should be able to create an import hook to import from the binary. Googling for pyinstaller disassembler found https://bitbucket.org/Trundle/exetractor/src/00df9ce00e1a/exetractor/pyinstaller.py that looks like it might extract necessary parts.

The other part of this is that all of the resources in the binary archive will be compiled python code. Most likely, coverage.py will give you unhelpful output the same way as when hitting any other compiled module when running under normal conditions.

Question 2

Highlight use cover_pylib=True

I know this is long after you asked the question, but I'm just getting around to needing the answer. :)

Using the current bitbucket source for coverage.py I'm able to successfully collect coverage data from a PyInstaller generated EXE file.

In the main source for my application I conditionally tell coverage to start collecting coverage like this:

if os.environ.has_key('COVERAGE') and len(os.environ['COVERAGE']) > 0:
   usingCoverage = True
   import coverage
   import time
   cov = coverage.coverage(data_file='.coverage.' + version.GetFullString(), data_suffix=time.strftime(".%Y_%m_%d_%H_%M.%S", time.localtime()), cover_pylib=True)
   cov.start()

This starts coverage collection ONLY when I desire. The use of the data_suffix allows me to more easily utilize cov.combine() for coverage file merging later. version.GetFullString() is just my applications version number.

cover_pylib is set to True here in because all standard Python Library modules __file__ attribute look like this ...\_MEIXXXXX\random.pyc and are thus indistinguishable (on a path basis) from other code that doesn't exist inside of a package.

When the application is ready to exit I have this little snippet:

if usingCoverage:
   cov.stop()
   cov.save()

Once my application has been run coverage.py still won't automatically generate its HTML report for me. The coverage data needs to be cleaned up so that the ...\_MEIXXXX\... file references are transformed into absolute file paths to the real source code.

I do this by running this snippet of code:

import sys
import os.path

from coverage.data import CoverageData
from coverage import coverage

from glob import glob

def cleanupLines(data):
    """
    The coverage data collected via PyInstaller coverage needs the data fixed up
    so that coverage.py's report generation code can analyze the source code.
    PyInstaller __file__ attributes on code objecters are all in subdirectories of the     _MEIXXXX 
    temporary subdirectory. We need to replace the _MEIXXXX temp directory prefix with     the correct 
    prefix for each source file. 
    """
    prefix = None
    for file, lines in data.lines.iteritems():
        origFile = file
        if prefix is None:
            index = file.find('_MEI')
            if index >= 0:
                pathSepIndex = file.find('\\', index)
                if pathSepIndex >= 0:
                    prefix = file[:pathSepIndex + 1]
        if prefix is not None and file.find(prefix) >= 0:
            file = file.replace(prefix, "", 1)
            for path in sys.path:
                if os.path.exists(path) and os.path.isdir(path):
                    fileName = os.path.join(path, file)
                    if os.path.exists(fileName) and os.path.isfile(fileName):
                        file = fileName
            if origFile != file:
                del data.lines[origFile]
                data.lines[file] = lines

for file in glob('.coverage.' + version.GetFullString() + '*'):
    print "Cleaning up: ", file
    data = CoverageData(file)
    data.read()
    cleanupLines(data)
    data.write()

The for loop here is solely to ensure all of the coverage files that will be combined are cleaned up.

NOTE: The only coverage data this code does not cleanup by default is PyInstaller related files which don't include the _MEIXXX data in their __file__ attributes.

You can now successfully generate a HTML or XML (or whatever) coverage.py report the normal way.

In my case it looks like this:

cov = coverage(data_file='.coverage.' + version.GetFullString(), data_suffix='.combined')
cov.load()
cov.combine()
cov.save()
cov.load()
cov.html_report(ignore_errors=True,omit=[r'c:\python27\*', r'..\3rdParty\PythonPackages\*'])

The use of data_file in the constructor is to ensure that the load/combine will recognize all of my cleaned up coverage files correctly.

The html_report call tells coverage.py to ignore standard python libraries (and Python libraries checked into my version control tree) and focus on just my application code.

I hope this helps.