Question

I have a script that I'm trying to run to check the encoding of the files in the newest commit. When I run it manually, it behaves as expected, but when I perform a commit, it doesn't. I can print variables just fine if they're outside of my functions, so I suspect that it has something to do with the way I'm retrieving the modified/added files. Is there a way to do it that Git can handle better?

#!/usr/bin/env python

import chardetect, subprocess, os
from sys import stdin, exit
from chardet.universaldetector import UniversalDetector

confidenceLevel = 0.8
allowedEncoding = ('ascii', 'utf-8')

# Get the current path and modify it to be the path to the repo
filePath = os.path.dirname(os.path.realpath(__file__))
filePath = filePath.replace('.git/hooks', '')

# Get all files that have been added or modified (filter is missing 'D' so that deleted files don't come through)
pr = subprocess.Popen(['/usr/bin/git', 'diff', '--diff-filter=ACMRTUXB', '--cached', '--name-only'],                  
       cwd=os.path.dirname('../../'), 
       stdout=subprocess.PIPE, 
       stderr=subprocess.PIPE, 
       shell=False) # Take note: Using shell=True has significant security implications.
(out, error) = pr.communicate()

# Create a list of files to check
out = out.split('\n')
out = [item for item in out if item != '']
out = [filePath + item for item in out]

messageList = [] # Keep this global

# If no paths are provided, it takes its input from stdin.
def description_of(file, name='stdin'):
    #Return a string describing the probable encoding of a file.
    u = UniversalDetector()
    for line in file:
        u.feed(line)
    u.close()
    result = u.result
    if result['encoding']:
        itPasses = ''
        if result['encoding'] in allowedEncoding and result['confidence'] >= confidenceLevel:
            pass
        else:
            messageList.append('%s: FAILS encode test %s with confidence %s\nYou must convert it before committing.' % (name, result['encoding'], result['confidence']))
    else:
        return '%s: no result' % name


def main():
    if len(out) <= 0:
        exit()
    else:
        for path in out:
            description_of(open(path, 'rb'), path)
        for item in messageList:
            print item
    if len(messageList) == 0:
        exit()
    else:
        exit(1)

if __name__ == '__main__':
    main()
Was it helpful?

Solution

The problem in your script is this line:

cwd=os.path.dirname('../../'), 

The hooks are run inside your .git directory, not inside the hooks directory. So you are leaving the repo with the given line. More details on this can be found here in this answer. So you do not need to change the cwd for git diff --cached.

You might want to change the cwd so the paths you are feeding into UniversalDetector make sense. But what you are doing there is wrong anyways. You should not check the working dir files, but the files in the index, because they are what’s actually going to be commited.

You should use git ls-files --staged and git show to get the contents of the index. A shorthand for that is git show :filename, but that might cause trouble with wierd filenames.

Also add -z to the paramaters for git diff, so you can handle more filenames.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top