문제

we are transferring lots of docs/images and before actually saving this docs to sql server I want to compare 2 list of files.

  1. My List of filePaths (will be a txtFile with a list of filepaths In it.Converted to hashset)

  2. Their List of filePaths (will read on the fly and produce a hashset)

    public static HashSet<string> ToHashSet(this string rootDirectory)
    {
        const string searchPattern = "*.*";
        string[] files = Directory.GetFiles(rootDirectory, searchPattern, SearchOption.AllDirectories);
        return new HashSet<string>(files);
    }
    

So I am comparing MyHashSet against TheirHashSet.

Just getting a bit paranoid here and just want to double check if except does what I think it does.

Except="Given 2 hashsets compare all the filePaths and if the ones in the TheirList are not found in MyList produce a result"

I have written a small test that proves that except does find the diff.

Is this correct and best way to compare large files?

Dummy ProofOfConcept

 class Program
{
    static void Main(string[] args)
    {
        const string rootDirectory = @"C:\Tests";
        HashSet<string> myHashSet= CreateDummyHashSet(rootDirectory,10);
        HashSet<string> theirHashSet= CreateDummyHashSet(rootDirectory, 12);

        IEnumerable<string> result = theirHashSet.Except(myHashSet);

        foreach (var file in result)
        {
            Console.WriteLine(file);
        }
        Console.Read();
    }

    public static HashSet<string> CreateDummyHashSet(string rootDirectory, int numberOfFiles)
    {
        var dummyHashSet = new HashSet<string>();
        const string extension = ".txt";
        const string fileName = "File";
        for (int i = 0; i < numberOfFiles; i++)
        {
            string fullfileName = string.Format("{0}{1}{2}", fileName, i, extension);
            string path = Path.Combine(rootDirectory, fullfileName);
            dummyHashSet.Add(path);
        }
        return dummyHashSet;
    }
}
도움이 되었습니까?

해결책

Is this correct and best way to compare large files?

You are not comparing large files, you are just comparing their names. Hashset is perfectly suited to do this operations on sets.

I would not advise using what sbrauen propose

var result = theirHashSet.Where(x => !myHashSet.Contains(x));

because it has to do n operations on m entries, n and m being number of entries in theirHashSet and myHashSet respectively. Hashset should be more performing with these actions. And what is actually better then Except is ExceptWith because Except is an extension method of IEnumerable whereas ExceptWith is a method in HashSet<>.

EDIT:

The difference is that Except returns a new IEnumerable collection whereas ExceptWith will remove equal entries from theirHashSet. Also ExceptWith is faster because it knows internals of HashTable, Except is just an extension method.

Here is what it looks like under the hood

Except

Set<TSource> set = new Set<TSource>(comparer);
foreach (TSource tSource in second)
{
    set.Add(tSource);
}
foreach (TSource tSource1 in first)
{
    if (!set.Add(tSource1))
    {
        continue;
    }

    yield return tSource1;
}

ExceptWith

foreach (T t in other)
{
    this.Remove(t);
}

You can see a difference immediately.

다른 팁

You could try something like this:

var result = theirHashSet.Where(x => !myHashSet.Contains(x));

The documentation is quite clear on this - it returns values from the first HashSet that are not in the second one.

Is this correct and best way to compare large files?

However, in your case it does compare only the lists of file paths, not the content of the files. I.e. after the check you know that you have (or not) the files with the same name but not if the files are the same.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top