Question

I'm using Python 3.3.

If I'm manipulating potentially infinite files in a directory (bear with me; just pretend I have a filesystem that supports that), how do I do that without encountering a MemoryError? I only want the string name of one file to be in memory at a time. I don't want them all in an iterable as that would cause a memory error when there are too many.

Will os.walk() work just fine, since it returns a generator? Or, do generators not work like that?

Is this possible?

Était-ce utile?

La solution

If you have a system for naming the files that can be figured out computationally, you can do such as this (this iterates over any number of numbered txt files, with only one in memory at a time; you could convert to another calculable system to get shorter filenames for large numbers):

import os

def infinite_files(path):
    num=0;
    while 1:
        if not os.path.exists(os.path.join(path, str(num)+".txt")):
            break
        else:
            num+=1 #perform operations on the file: str(num)+".txt"



[My old inapplicable answer is below]

glob.iglob seems to do exactly what the question asks for. [EDIT: It doesn't. It actually seems less efficient than listdir(), but see my alternative solution above.] From the official documentation:

glob.glob(pathname, *, recursive=False)
Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).


glob.iglob(pathname, *, recursive=False)
Return an iterator which yields the same values as glob() without actually storing them all simultaneously.

iglob returns an "iterator which yields" or-- more concisely-- a generator.

Since glob.iglob has the same behavior as glob.glob, you can search with wildcard characters:

import glob
for x glob.iglob("/home/me/Desktop/*.txt"):
    print(x) #prints all txt files in that directory

I don't see a way for it to differentiate between files and directories without doing it manually. That is certainly possible, however.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top