Domanda

My python function is given a (long) list of path arguments, each of which can possibly be a glob. I make a pass over this list using glob.glob to extract all the matching filenames, like this:

files  = [filename for pattern in patterns for filename in glob.glob(pattern)]

That works, but the filesystem I'm on has very poor performance for directory listing operations, and currently this operation adds about a minute(!) to the start-up time of my program. So I would like to only perform glob expansion for non-trivial glob patterns (i.e. those that aren't just normal pathnames) to speed this up. I.e.

def cheapglob(pattern):
    return [pattern] if istrivial(pattern) else glob.glob(pattern)
files  = [filename for pattern in patterns for filename in cheapglob(pattern)]

Since glob.glob basically does a set of directory listings coupled with fnmatch.fnmatch, I thought it should be possible to somehow ask fnmatch whether a given string is a non-trivial pattern or not, but I can't see how to do that.

As a fallback, I guess I could attempt to identify these patterns in the string myself, though that feels a lot like reinventing the wheel, and would be error prone. But this feels like the sort of thing there should be an elegant solution for.

È stato utile?

Soluzione

According to the fnmatch source code, the only special characters it recognizes are *, ?, [ and ]. Hence any pattern that does not contain any of these will only match itself. We can therefore implement the cheapglob mentioned in the question as

def cheapglob(s): return glob.glob(s) if re.search("[][*?]", s) else [s]

This will only hit the file system for patterns which include special characters. This differs subtly from a plain glob.glob: For a pattern with no special characters like "foo.txt", this function will return ["foo.txt"] regardless of whether that file exists, while glob.glob will return [] if the file isn't there. So the calling function will need to handle the possibility that some of the returned files might not exist.

Altri suggerimenti

I don't think you'll find much, as your idea of a trivial pattern might not be mine. Also, from a comp-sci point of view, it might be impossible to tell from inspection whether a pushdown automata is going to run in a set amount of time given the inputs you're running it against, without actually running it against those inputs.

I strongly suspect you'd be better off here loading the directory listing once and then applying fnmatch against that list manually.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top