Need elegant way to move orphan files from folder after paired files are moved with FileInfo class

https://stackoverflow.com/questions/5399171

c#
fileinfo

28-10-2019
|

Question

I have a folder from which I'm moving pairs of related files (xml paired with pdf). Additional files could be deposited into this folder at any time, but the utility runs every 10 minutes or so. We could use the FileSystemWatcher class but for internal reasons we don't for this utility.

I'm using the System.IO.FileInfo class to read all the files in the folder (will only be xml and pdf) during each run. Once I have the files in the FileInfo object, I iterate through the files, moving matches to a working folder. Once that is done, I want to move any files that were not paired, but are in the FileInfo object, to a failure folder.

Since I can't seem to remove items from the FileInfo object (or I am missing something), would it be easier to (1) use a string array from Directory class .GetFiles, (2) create a Dictionary from the FileInfo object and remove values from that during iteration, or (3) is there a more elegant approach using LINQ or something else?

Here is the code so far:

internal static bool CompareXMLandPDFFileNames(FileInfo[] xmlFiles, FileInfo[] pdfFiles, string xmlFilePath)
    {
        string workingFilePath = xmlFilePath + @"\WORKING";            

        if (xmlFiles.Length > 0)
        {
            foreach (var xmlFile in xmlFiles)
            {
                string xfn = xmlFile.Name; //xml file name
                string pdfName = xfn.Substring(0,xfn.IndexOf('_')) + ".pdf"; //parsed pdf file name contained in xml file name

                foreach (var pdfFile in pdfFiles)
                {
                    string pfn = pdfFile.Name; //pdf file name
                    if (pfn == pdfName)
                    {
                        //move xml and pdf files to working folder...
                        FileInfo xmlInfo = new FileInfo(xmlFilePath + xfn);
                        FileInfo pdfInfo = new FileInfo(xmlFilePath + pfn);
                        if (!File.Exists(workingFilePath + xfn))
                        {
                            xmlInfo.MoveTo(workingFilePath + xfn);                                
                        }

                        if (!File.Exists(workingFilePath + pfn))
                        {
                            pdfInfo.MoveTo(workingFilePath + pfn);
                        }                            
                    }
                }
            }

            //all files in the file objects should now be moved to working folder, if not, fix orphans...
        }

        return true;
    }

Solution

To be honest I think the question is a bit poor. The problem is stated in a very complicated fashion. I think the workflow be designed to be more robust and deterministic. (e.g. why not upload file pairs in zipped sets in the first place?)

(And no "Someone" most likely "must not have been here before")

Here are some random improvements:

using System;
using System.Linq;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections.Generic;

namespace O
{
    static class X
    {
        private static readonly Regex _xml2pdf = new Regex("(_.*).xml$", RegexOptions.Compiled | RegexOptions.IgnoreCase);

        internal static void MoveFileGroups(string uploadFolder)
        {
            string workingFilePath = Path.Combine(uploadFolder, "PROGRESS");

            var groups = new DirectoryInfo(uploadFolder)
                .GetFiles()
                .GroupBy(fi => _xml2pdf.Replace(fi.Name, ".pdf"), StringComparer.InvariantCultureIgnoreCase)
                .Where(group => group.Count() >1);

            foreach (var group in groups)
            {
                if (!group.Any(fi => File.Exists(Path.Combine(workingFilePath, fi.Name))))
                    foreach (var file in group)
                        file.MoveTo(Path.Combine(workingFilePath, file.Name));
            }
        }

        public static void Main(string[]args)
        {
        }
    }
}

use readable names (say what you mean)
IndexOf returns -1 if filename contains no "_"; random upload filenames could make procedure fail
Handle filenames case insensitive on Windows
Don't manually do the path concats (you could accidentally manufacture UNC paths, and your code is less portable)
don't assume one xml will map to one pdf: the naming scheme implies that many xmls map to the same pdf name. This implementation allows that (or you could detect the situation by rejecting groups.Where(g => g.Count()>2)
Move groups atomically only (!): if any one of the files in a group exist in the target dir, don't move any (or you will have a race condition, where part of a group get's moved before the last file was (completely) uploaded and it will never get moved because the group is no longer detected

Other items (todo)

Don't pass redundant parameters. You might pass a FI[] instead of the raw GetFiles() call if you want filtering.
Do error handling, notably:
- handle IO exceptions
- locking errors are expectable while uploads in progress (test it or end up with corrupted files); you need to atomically handle these (i.e. not move any files in a group unless all could be moved; this will be somewhat tricky)
  - test your code (none of my sample was tested; it just compiled on linux with mono)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow