Question

I have this line of code in my python script. It searches all the files in in a particular directory for * cycle *.log.

for searchedfile in glob.glob("*cycle*.log"):

This works perfectly, however when I run my script to a network location it does not search them in order and instead searches randomly.

Is there a way to force the code to search by date order?

This question has been asked for php but I am not sure of the differences.

Thanks

Was it helpful?

Solution

To sort files by date:

import glob
import os

files = glob.glob("*cycle*.log")
files.sort(key=os.path.getmtime)
print("\n".join(files))

See also Sorting HOW TO.

OTHER TIPS

Essentially the same as @jfs but in one line using sorted

import os,glob
searchedfiles = sorted(glob.glob("*cycle*.log"), key=os.path.getmtime)

Well. The answer is nope. glob uses os.listdir which is described by:

"Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order. It does not include the special entries '.' and '..' even if they are present in the directory."

So you are actually lucky that you got it sorted. You need to sort it yourself.

This works for me:

import glob
import os
import time

searchedfile = glob.glob("*.cpp")
files = sorted( searchedfile, key = lambda file: os.path.getctime(file))

for file in files:
 print("{} - {}".format(file, time.ctime(os.path.getctime(file))) )

Also note that this uses creation time, if you want to use modification time, the function used must be getmtime.

If your paths are in sortable order then you can always sort them as strings (as others have already mentioned in their answers).

However, if your paths use a datetime format like %d.%m.%Y, it becomes a bit more involving. Since strptime does not support wildcards, we developed a module datetime-glob to parse the date/times from paths including wildcards.

Using datetime-glob, you could walk through the tree, list a directory, parse the date/times and sort them as tuples (date/time, path).

From the module's test cases:

import pathlib
import tempfile

import datetime_glob

def test_sort_listdir(self):
    with tempfile.TemporaryDirectory() as tempdir:
        pth = pathlib.Path(tempdir)
        (pth / 'some-description-20.3.2016.txt').write_text('tested')
        (pth / 'other-description-7.4.2016.txt').write_text('tested')
        (pth / 'yet-another-description-1.1.2016.txt').write_text('tested')

        matcher = datetime_glob.Matcher(pattern='*%-d.%-m.%Y.txt')
        subpths_matches = [(subpth, matcher.match(subpth.name)) for subpth in pth.iterdir()]
        dtimes_subpths = [(mtch.as_datetime(), subpth) for subpth, mtch in subpths_matches]

        subpths = [subpth for _, subpth in sorted(dtimes_subpths)]

        # yapf: disable
        expected = [
            pth / 'yet-another-description-1.1.2016.txt',
            pth / 'some-description-20.3.2016.txt',
            pth / 'other-description-7.4.2016.txt'
        ]
        # yapf: enable

        self.assertListEqual(subpths, expected)

One can do that now with just the pathlib module:

import pathlib
found = pathlib.Path.cwd().glob('*.py')
found = sorted(found,key=lambda file: pathlib.Path(file).lstat().st_mtime) 

Using glob no. Right now as you're using it, glob is storing all the files simultaneously in code and has no methods for organizing those files. If only the final result is important, you could use a second loop that checks the file's date and resorts based on that. If the parse order matters, glob is probably not the best way to do this.

You can sort the list of files that come back using os.path.getmtime or os.path.getctime. See this other SO answer and note the comments as well.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top