Domanda

I have a structure which describes the address, it looks like:

class Address
{
    public string AddressLine1 { get; set; }
    public string AddressLine2 { get; set; }
    public string City { get; set; }
    public string Zip { get; set; }
    public string Country { get; set; }
} 

I'm looking for a way to create an unique identifier for this structure (I assume it should be also of a type of string) which is depend on all the structure properties (e.g. change of AddressLine1 will also cause a change of the structure identifier).

I know, I could just concatenate all the properties together, but this gives too long identifier. I'm looking for something significantly shorter than this.

I also assume that the number of different addresses should not be more than 100M.

Any ideas on how this identifier can be generated?

Thanks in advance.

A prehistory of this:

There are several different tables in the database which hold some information + address data. The data is stored in the format similar to the one described above.

Unfortunately, moving the address data into a separate table is very costly right now, but I hope it will be done in the future.

I need to associate some additional properties with the address data, and going to create a separate table for this. That's why I need to unique identify the address data.

È stato utile?

Soluzione

Serialize all fields to a large binary value. For example using concatenation with proper domain separation.

Then hash that value with a cryptographic hash of sufficient length. I prefer 256 bits, but 128 are probably fine. Collisions are extremely rare with good hashes, with a 256 bit hash like SHA-256 they're practically impossible.

Altri suggerimenti

Here is a complete example using serialization, sha256 hashing and base64 encoding (based on CodesInChaos answer):

using System;
using System.IO;
using System.Security.Cryptography;
using System.Runtime.Serialization.Formatters.Binary;

namespace Uniq
{
    [Serializable]
    class Address
    {
        public string AddressLine1 { get; set; }
        public string AddressLine2 { get; set; }
        public string City { get; set; }
        public string Zip { get; set; }
        public string Country { get; set; }
    } 
    class MainClass
    {
        public static void Main (string[] args)
        {
            Address address1 = new Address(){AddressLine1 = "a1"};
            Address address2 = new Address(){AddressLine1 = "a1"};
            Address address3 = new Address(){AddressLine1 = "a2"};
            string unique1 = GetUniqueIdentifier(address1);
            string unique2 = GetUniqueIdentifier(address2);
            string unique3 = GetUniqueIdentifier(address3);
            Console.WriteLine(unique1);
            Console.WriteLine(unique2);
            Console.WriteLine(unique3);
        }
        public static string GetUniqueIdentifier(object obj){
            if (obj == null) return "0";
            SHA256 mySHA256 = SHA256Managed.Create ();
            BinaryFormatter formatter = new BinaryFormatter ();
            MemoryStream stream = new MemoryStream();
            formatter.Serialize(stream, obj);
            byte[] hash = mySHA256.ComputeHash(stream.GetArray());
            string uniqId = Convert.ToBase64String(hash);
            return uniqId;
        }
    }
}

Edit: this is a version without using BinaryFormatter. You may replace the null representation and the field separator to anything that suits your needs.

public static string GetUniqueIdentifier(object obj){
    if (obj == null) return "0";
    SHA256 mySHA256 = SHA256Managed.Create ();
    StringBuilder stringRep = new StringBuilder();
    obj.GetType().GetProperties()
                .ToList().ForEach(p=>stringRep.Append(
            p.GetValue(obj, null) ?? '¨'
            ).Append('^'));
    Console.WriteLine(stringRep);
    Console.WriteLine(stringRep.Length);
    byte[] hash = mySHA256.ComputeHash(Encoding.Unicode.GetBytes(stringRep.ToString()));
    string uniqId = Convert.ToBase64String(hash);
    return uniqId;
}
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top