Question

I've written a script to crawl directories on my system and record file meta data. I've used os.walk to do this.

It has worked for the most part, but when running on different machines it returns a different list of files.

Right now I'm testing on my Dropbox folder; on my MBPro(lion) it crawls the folder and returns the correct number of files. On my iMac(mountain lion) it does not, normally skipping between 1-3 files per run. Additional crawls will pickup a straggler but usually it will continue to ignore a few files in the directory.

here's a short snippet of the code:

directory = '/Users/user/Dropbox/'
for dirname, dirnames, filenames in os.walk(directory):
  for subdirname in dirnames:
    for filename in filenames:
      if os.path.isfile(filename):
        # collect file info using os.path and os.stat

I obviously want to ignore directories. Is there a better way to do this? Preferably something that will be os agnostic.

Was it helpful?

Solution

The trick is like @MartijnPieters suggested. It is unnecessary to loop over the sub-directories as well because they are picked up in the next iteration of the loop. This was cause for the discrepancies between my two machines.

Also it is important to note that OSX has a very odd way of calculating files in a given directory. You can see this by running df on a given directory and then doing 'Get Info' and comparing the results.

directory = '/Users/user/Dropbox/'
for dirname, dirnames, filenames in os.walk(directory):
    for filename in filenames:
        if os.path.isfile(filename):
            # collect file info using os.path and os.stat'   
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top