Question

What's the most efficient way to grab a list of new files after a given date in php, or perhaps using a system call?

I have full control over how the files are stored as I receive them, so I thought maybe storing them in a folder structure like year/month/day/filename would be best, then all I have to do is scan for the directories greater than or equal to the date I want to retrieve using scandir and casting the directory name to int values. But I am not sure if I'm missing something that would make this easier/faster. I'm interested in the most efficient way of doing this as there will be a lot of files building up over time and I don't want to have to rescan old directories. Basically the directory structure should lend itself well to efficient manual filtering but I wanted to check to see if I'm missing something.

Simple example usage:

'2012/12/1' contains files test1.txt, test2.txt
'2012/12/2' => test3.txt, test4.txt
'2011/11/1' => test5.txt
'2011/11/2' => test6.txt

If I search for files on or after 2011/11/2, then I want everything except test5.txt to be returned.

Thanks in advance for any insight!

edit: the storing and actual processing of files are two separate processes, so I can't just process them as they come in which would obviously be the best solution.

Was it helpful?

Solution

Generally speaking I create directories like YYYY/MM/DD to store my files, often with another level for different sources. Sometimes I'll use YYYY-MM/DD or something similar. Note that there are only 3652 days in a decade, so you could even have a single level like YYYY-MM-DD and not get directories that are so large that they're hard to work with. If you have a filesystem that indexes directories, you can easily have 10s of thousands of files in a directory, otherwise one thousand should probably be your upper limit.

To process the files, I don't bother doing any actual searching of directory names. Since I know what date I'm interested in, I can simply generate the paths and scan only the directories containing files in the proper date range.

For example, let's say I want to process all files for the past week:

for $date = today() - 7 to today():
    $path = strftime("%Y/%m/%d", $date)
    for $filename in getFiles($path):
        processFile($path, $filename)

OTHER TIPS

It looks like you are on either linux or mac based on how you wrote your path.

The find command can return a list of files modified (or accessed) within a certain date.

// find files that were modified less than 30m ago
$filelist = system("find /path/to/files -type f -mmin -30"); 

I think system calls should be used sparingly since they reduce portability.

Storing in directories as you mentioned makes sense as it will reduce the search space.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top