Question

I'm hashing a file with one or more hash algorithms. When I tried to parametrize which hash types I want, it got a lot messier than I was hoping.

I think I'm missing a chance to make better use of generics or LINQ. I also don't like that I have to use a Type[] as the parameter instead of limiting it to a more specific set of type (HashAlgorithm descendants), I'd like to specify types as the parameter and let this method do the constructing, but maybe this would look better if I had the caller new-up instances of HashAlgorithm to pass in?

public List<string> ComputeMultipleHashesOnFile(string filename, Type[] hashClassTypes)
        {
            var hashClassInstances = new List<HashAlgorithm>();
            var cryptoStreams = new List<CryptoStream>();

            FileStream fs = File.OpenRead(filename);
            Stream cryptoStream = fs;

            foreach (var hashClassType in hashClassTypes)
            {
                object obj = Activator.CreateInstance(hashClassType);
                var cs = new CryptoStream(cryptoStream, (HashAlgorithm)obj, CryptoStreamMode.Read);

                hashClassInstances.Add((HashAlgorithm)obj);
                cryptoStreams.Add(cs);

                cryptoStream = cs;
            }

            CryptoStream cs1 = cryptoStreams.Last();

            byte[] scratch = new byte[1 << 16];
            int bytesRead;
            do { bytesRead = cs1.Read(scratch, 0, scratch.Length); }
            while (bytesRead > 0);

            foreach (var stream in cryptoStreams)
            {
                stream.Close();
            }

            foreach (var hashClassInstance in hashClassInstances)
            {
                Console.WriteLine("{0} hash = {1}", hashClassInstance.ToString(), HexStr(hashClassInstance.Hash).ToLower());
            }
        }
Was it helpful?

Solution

Let's start by breaking the problem down. Your requirement is that you need to compute several different kinds of hashes on the same file. Assume for the moment that you don't need to actually instantiate the types. Start with a function that has them already instantiated:

public IEnumerable<string> GetHashStrings(string fileName,
    IEnumerable<HashAlgorithm> algorithms)
{
    byte[] fileBytes = File.ReadAllBytes(fileName);
    return algorithms
        .Select(a => a.ComputeHash(fileBytes))
        .Select(b => HexStr(b));
}

That was easy. If the files might be large and you need to stream it (keeping in mind that this will be much more expensive in terms of I/O, just cheaper for memory), you can do that too, it's just a little more verbose:

public IEnumerable<string> GetStreamedHashStrings(string fileName,
    IEnumerable<HashAlgorithm> algorithms)
{
    using (Stream fileStream = File.OpenRead(fileName))
    {
        return algorithms
            .Select(a => {
                fileStream.Position = 0;
                return a.ComputeHash(fileStream);
            })
            .Select(b => HexStr(b));
    }
}

It's a little gnarly and in the second case it's highly questionable whether or not the Linq-ified version is any better than an ordinary foreach loop, but hey, we're having fun, right?

Now that we've disentangled the hash-generation code, instantiating them first isn't really that much more difficult. Again we'll start with code that's clean - code that uses delegates instead of types:

public IEnumerable<string> GetHashStrings(string fileName,
    params Func<HashAlgorithm>[] algorithmSelectors)
{
    if (algorithmSelectors == null)
        return Enumerable.Empty<string>();
    var algorithms = algorithmSelectors.Select(s => s());
    return GetHashStrings(fileName, algorithms);
}

Now this is much nicer, and the benefit is that it allows instantiation of the algorithms within the method, but doesn't require it. We can invoke it like so:

var hashes = GetHashStrings(fileName,
    () => new MD5CryptoServiceProvider(),
    () => new SHA1CryptoServiceProvider());

If we really, really, desperately need to start from the actual Type instances, which I'd try not to do because it breaks compile-time type checking, then we can do that as the last step:

public IEnumerable<string> GetHashStrings(string fileName,
    params Type[] algorithmTypes)
{
    if (algorithmTypes == null)
        return Enumerable.Empty<string>();
    var algorithmSelectors = algorithmTypes
        .Where(t => t.IsSubclassOf(typeof(HashAlgorithm)))
        .Select(t => (Func<HashAlgorithm>)(() =>
            (HashAlgorithm)Activator.CreateInstance(t)))
        .ToArray();
    return GetHashStrings(fileName, algorithmSelectors);
}

And that's it. Now we can run this (bad) code:

var hashes = GetHashStrings(fileName, typeof(MD5CryptoServiceProvider),
    typeof(SHA1CryptoServiceProvider));

At the end of the day, this seems like more code but it's only because we've composed the solution effectively in a way that's easy to test and maintain. If we wanted to do this all in a single Linq expression, we could:

public IEnumerable<string> GetHashStrings(string fileName,
    params Type[] algorithmTypes)
{
    if (algorithmTypes == null)
        return Enumerable.Empty<string>();
    byte[] fileBytes = File.ReadAllBytes(fileName);
    return algorithmTypes
        .Where(t => t.IsSubclassOf(typeof(HashAlgorithm)))
        .Select(t => (HashAlgorithm)Activator.CreateInstance(t))
        .Select(a => a.ComputeHash(fileBytes))
        .Select(b => HexStr(b));
}

That's all there really is to it. I've skipped the delegated "selector" step in this final version because if you're writing this all as one function you don't need the intermediate step; the reason for having it as a separate function earlier is to give as much flexibility as possible while still maintaining compile-time type safety. Here we've sort of thrown it away to get the benefit of terser code.


Edit: I will add one thing, which is that although this code looks prettier, it actually leaks the unmanaged resources used by the HashAlgorithm descendants. You really need to do something like this instead:

public IEnumerable<string> GetHashStrings(string fileName,
    params Type[] algorithmTypes)
{
    if (algorithmTypes == null)
        return Enumerable.Empty<string>();
    byte[] fileBytes = File.ReadAllBytes(fileName);
    return algorithmTypes
        .Where(t => t.IsSubclassOf(typeof(HashAlgorithm)))
        .Select(t => (HashAlgorithm)Activator.CreateInstance(t))
        .Select(a => {
            byte[] result = a.ComputeHash(fileBytes);
            a.Dispose();
            return result;
        })
        .Select(b => HexStr(b));
}

And again we're kind of losing clarity here. It might be better to just construct the instances first, then iterate through them with foreach and yield return the hash strings. But you asked for a Linq solution, so there you are. ;)

OTHER TIPS

Why are you supplying the types as Types and creating them rather than just allowing the user to pass in instances of HashAlgorithm? It seems like that would alleviate the problem altogether.

If this is a requirement, then what you have is really the only solution, since you can't specify a variable number of type parameters on a generic type or function (which it seems like you'd need, since it's an array now), and you can't enforce the types passed in to be of a particular inheritance line (any more than you can enforce that an integer parameter be between 1 and 10). That sort of validation can only be done at runtime.

Just a minor point here, nothing ground breaking. Whenever you foreach over a list you can use linq. It's especially nice for one liners:

cryptoStreams.ForEach(s => s.Close());
hashClassInstances.ForEach(h => CW("{0} ...", h.ToString()...);

What about something like this?

    public string ComputeMultipleHashesOnFile<T>(string filename, T hashClassType)
        where T : HashAlgorithm
    {

    }

The where clause restricts the T parameter to be of HashAlgorithm type. So you can create a class inheriting from HashAlgorithm and implement the abstract class members:

public class HA : HashAlgorithm
{
    protected override void HashCore(byte[] array, int ibStart, int cbSize)
    {
        throw new NotImplementedException();
    }

    protected override byte[] HashFinal()
    {
        throw new NotImplementedException();
    }

    public override void Initialize()
    {
        throw new NotImplementedException();
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top