Question

My python function is given a (long) list of path arguments, each of which can possibly be a glob. I make a pass over this list using glob.glob to extract all the matching filenames, like this:

files  = [filename for pattern in patterns for filename in glob.glob(pattern)]

That works, but the filesystem I'm on has very poor performance for directory listing operations, and currently this operation adds about a minute(!) to the start-up time of my program. So I would like to only perform glob expansion for non-trivial glob patterns (i.e. those that aren't just normal pathnames) to speed this up. I.e.

def cheapglob(pattern):
    return [pattern] if istrivial(pattern) else glob.glob(pattern)
files  = [filename for pattern in patterns for filename in cheapglob(pattern)]

Since glob.glob basically does a set of directory listings coupled with fnmatch.fnmatch, I thought it should be possible to somehow ask fnmatch whether a given string is a non-trivial pattern or not, but I can't see how to do that.

As a fallback, I guess I could attempt to identify these patterns in the string myself, though that feels a lot like reinventing the wheel, and would be error prone. But this feels like the sort of thing there should be an elegant solution for.

Was it helpful?

Solution

According to the fnmatch source code, the only special characters it recognizes are *, ?, [ and ]. Hence any pattern that does not contain any of these will only match itself. We can therefore implement the cheapglob mentioned in the question as

def cheapglob(s): return glob.glob(s) if re.search("[][*?]", s) else [s]

This will only hit the file system for patterns which include special characters. This differs subtly from a plain glob.glob: For a pattern with no special characters like "foo.txt", this function will return ["foo.txt"] regardless of whether that file exists, while glob.glob will return [] if the file isn't there. So the calling function will need to handle the possibility that some of the returned files might not exist.

OTHER TIPS

I don't think you'll find much, as your idea of a trivial pattern might not be mine. Also, from a comp-sci point of view, it might be impossible to tell from inspection whether a pushdown automata is going to run in a set amount of time given the inputs you're running it against, without actually running it against those inputs.

I strongly suspect you'd be better off here loading the directory listing once and then applying fnmatch against that list manually.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top