Question

I have a utility which goes through a processes a set of files in a directory - the process is relatively slow (and there are a lot of files) and so I've tried to optimise the process by only processes files that have a "last modified" later than the last processing date.

Usually this works well however I've found that as copying a file doesn't change the last modified date, and so there are various scenarios involving copying files in which certain files that have changed are skipped by the process, for example:

  1. The user processes the directory at 9:00.
  2. A file is then copied from this directory and modified so that it has a last modified date of 9:30
  3. The directory is then processed again at 10:00
  4. The modified file is then copied back into the directory at 10:30
  5. Finally the directory is processed again at 11:00

As the modified date of the given file is 9:30, and the directory was last processed at 10:00 the file is skipped when it shouldn't be.

Unfortunately the above tends to happen far too often in certain situations (such as in a collaborative environment with source control etc...). Clearly my logic is flawed - what I really need is a "last modified or copied" date. does such a thing exist?

Failing that, is there another way to quickly determine with reasonable reliability if a given file has changed?

Was it helpful?

Solution

Have you thought of running MD5 checksums on the files and storing them later for comparison? If your always processing a certain directory, this might be feasible.

OTHER TIPS

You might want to look at using the FileSystemWatcher class. This class lets you monitor a directory for changes and will fire an event when something is modified. Your code can then handle the event and process the file.

From MSDN:

// Create a new FileSystemWatcher and set its properties.
FileSystemWatcher watcher = new FileSystemWatcher();
watcher.Path = args[1];
/* Watch for changes in LastAccess and LastWrite times, and
   the renaming of files or directories. */
watcher.NotifyFilter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
   | NotifyFilters.FileName | NotifyFilters.DirectoryName;
// Only watch text files.
watcher.Filter = "*.txt";

// Add event handlers.
watcher.Changed += new FileSystemEventHandler(OnChanged);
watcher.Created += new FileSystemEventHandler(OnChanged);
watcher.Deleted += new FileSystemEventHandler(OnChanged);
watcher.Renamed += new RenamedEventHandler(OnRenamed);

You can use the FileInfo class to get the required change information (which you might be already using). You need to check two properties of a file, which are LastWriteTime and CreationTime. If either of them is higher than your last processing date, you need to copy the file. It is a common misconception that CreationTime is always less than LastWriteTime. It's not. If a file is copied to another file, the new file retains the LastWriteTime of the source but the CreationTime will be the time of the copy.

Have you considered adding a process to watch your directory instead? Using a FileSystemWatcher? Then you move from using a batch process and a real time system for monitoring your files.

As you've observed, copying a file to an existing destination file keeps the existing file's CreationTime, and sets LastWriteTime to the source file's LastWriteTime, rather than current system time when doing the copy. Two possible solutions:

  1. Do a delete-and-copy, ensuring a destination CreationTime will be system's current time.
  2. Check for file's Archived attribute as well, and clear it while processing. When copying source->dest, dest +A attribute will be set.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top