Monitoring files - how to know when a file is complete

https://stackoverflow.com/questions/30074

09-06-2019
|

Question

We have several .NET applications that monitor a directory for new files, using FileSystemWatcher. The files are copied from another location, uploaded via FTP, etc. When they come in, the files are processed in one way or another. However, one problem that I have never seen a satisfactory answer for is: for large files, how does one know when the files being monitored are still being written to? Obviously, we need to wait until the files are complete and closed before we begin processing them. The event args in the FileSystemWatcher events do not seem to address this.

Solution

Have you tried getting a write lock on the file? If it's being written to, that should fail, and you know to leave it alone for a bit...

OTHER TIPS

If you are in control on the program that is writing the files into the directory, you can have the program write the files to a temporary directory and then move them into the watched directory. The move should be an atomic operation, so the watcher shouldn't see the file until it is fully in the directory.

If you are not in control of what is writing to the watched directory, you can set a time in the watcher where the file is considered complete when it has remained the same size for the given time. If immediate processing isn't a concern, setting this timer to something relatively large is a fairly safe way to know that either the file is complete or it never will be.

The "Changed" event on the FileSystemWatcher should shouldn't fire until the file is closed. See my answer to a similar question. There is a possibility that the FTP download mechanism closes the file multiple times during download as new data comes in, but I would think that is a little unlikely.

Unless the contents of a file can be verified for completion (it has a verifiable format or includes a checksum of the contents) only the sender can verify that a whole file has arrived.

I have used a locking method for sending large files via FTP in the past.

File is sent with an alternative extension and is renamed once the sender is happy it is all there.

The above is obviously combined with a process which periodically tidies up old files with the temporary extension.

An alternative is to create a zero length file with the same name but with an additonal .lck extension. Once the real file is fully uploaded the lck file is deleted. The receiving process obviously ignores files which have the name of a lock file.

Without a system like this the receiver can never be sure that the whole file has arrived.

Checking for files that haven't been changed in x minutes is prone to all sorts of problems.

The following method tries to open a file with write permissions. It will block execution until a file is completely written to disk:

/// <summary>
/// Waits until a file can be opened with write permission
/// </summary>
public static void WaitReady(string fileName)
{
    while (true)
    {
        try
        {
            using (System.IO.Stream stream = System.IO.File.Open(fileName, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite))
            {
                if (stream != null)
                {
                    System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} ready.", fileName));
                    break;
                }
            }
        }
        catch (FileNotFoundException ex)
        {
            System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} not yet ready ({1})", fileName, ex.Message));
        }
        catch (IOException ex)
        {
            System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} not yet ready ({1})", fileName, ex.Message));
        }
        catch (UnauthorizedAccessException ex)
        {
            System.Diagnostics.Trace.WriteLine(string.Format("Output file {0} not yet ready ({1})", fileName, ex.Message));
        }
        Thread.Sleep(500);
    }
}

(from my answer to a related question)

You probably have to go with some out of band signaling: have the producer of "file.ext" write a dummy "file.ext.end".

+1 for using a file.ext.end signaler if possible, where the contents of file.ext.end is a checksum for the larger file. This isn't for security so much as it is to make sure nothing got garbled along the way. If someone can insert their own file into the large stream they can replace the checksum as well.

A write lock doesn't help if the file upload failed part way through and the sender hasn't tried resending (and relocking) the file yet.

The way I check in Windows if a file has been completely uploaded by ftp is to try to rename it. If renaming fails, the file isn't complete. Not very elegant, I admit, but it works.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow