Question

I am writing a python program to backup files from a series of watched directories. I am using Watchdog to detect changes to the file system. All that gives is a list of changed files and folders.

The application has the option to have include and exclude folder lists, but i cant work out how to match the paths to see if they should be excluded or included.

The issue is when you have a tree and the user selects to include a folder that is inside of an excluded one.

example file tree

/folder1/folder2/folder3/folder4/folder5

Includes

/folder1
/folder1/folder2/folder3/folder4

Excluded

/folder1/folder2

I thought about using startswith() to compare the starts of the path section of the string returned by watchdog, but then /folder1/folder2/folder3/folder4/folder5 would match on the includes and the exclude folder lists.

If someone could suggest the best way to approach this I would be very grateful. I could make it work easily if i was using os.walk to recurse through the directories but just given a list I cant work out how to do it. Its driving me nuts.

Was it helpful?

Solution

If I understand what you're saying, you want to give priority to the most-nested depth. So '/folder1/folder2/folder3/folder4/folder5 is included.

I would get your data into a lookup table like this:

lookup = {'/folder1/folder2/folder3/folder4':'include','/folder1/folder2':'exclude','/folder1':'include'}

Then just loop over your query in reverse order, stripping off one directory at a time until you get a match:

folder = '/folder1/folder2/folder3/folder4/folder5'.split('/')
for i in reversed(range(len(folder) + 1)):
    check = '/'.join(folder[:i])
    if lookup.get(check):
        print('{}: {}'.format(check,lookup.get(check)))
        break

#/folder1/folder2/folder3/folder4: include

OTHER TIPS

Assuming a path f and if I understand your question correctly, this may work

f.startswith(tuple(includes)) and not f.startswith(tuple(excludes))

As another possibility, the action (i.e. include or exclude) that should apply to any given path is the most specific. So, you might approach the problem by placing your configuration in a structure like:

rules = [("/folder1", "include"), ("/folder1/folder2/...", "exclude"), ...]

You can then determine which action to apply for a given path using a function such as:

def get_action(path, rules):
    action = None
    depth = None
    for filter, filter_action in rules:
        if path.startswith(filter):
            filter_depth = filter.count(os.sep)
            if depth is None or filter_depth > depth:
                depth = filter_depth
                action = filter_action
    return action

This will then return the action, that is "include" or "exclude", or if no rule is defined for the path None. The definition I've given is fairly inefficient, and there are many ways it could be improved, but the basic idea is to look for the most specific rule for a given path and follow that action.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top