Question

I need to convert this c# function to a T-SQL UDF

I need to get all the rows from a database that have a humming distance smaller than x This function is just part of the solution.

The csharp function return 40 for these 2 hashes while the t-sql function returns 52

14714557628763197901

15383788748848265778

public static ulong csharp_hamming_distance(ulong hash1, ulong hash2)
{
ulong x = hash1 ^ hash2;
const ulong m1 = 0x5555555555555555UL;
const ulong m2 = 0x3333333333333333UL;
const ulong h01 = 0x0101010101010101UL;
ulong m4 = 0x0f0f0f0f0f0f0f0fUL;
x -= (x >> 1) & m1;
x = (x & m2) + ((x >> 2) & m2);
x = (x + (x >> 4)) & m4;
return (x * h01) >> 56;
}

I have sample but it is does not give me the same results.

create function HammingDistance1(@value1 char(8000), @value2 char(8000))
returns int
as
begin
    declare @distance int
    declare @i int
    declare @len int

    select @distance = 0,
           @i =1,
           @len = case when len(@value1) > len(@value2)
                       then len(@value1)
                       else len(@value2) end

    if (@value1 is null) or (@value2 is null)
        return null

    while (@i <= @len)
        select @distance = @distance +
                           case 
                           when substring(@value1,@i,1) = substring(@value2,@i,1)
                                then 0
                           when substring(@value1,@i,1) < substring(@value2,@i,1)
                                then  CAST(substring(@value2,@i,1) as smallint) -  CAST(substring(@value1,@i,1) as smallint)
                           when substring(@value1,@i,1) > substring(@value2,@i,1)
                                then  CAST(substring(@value1,@i,1) as smallint) - CAST(substring(@value2,@i,1) as smallint)
                          else 1 end,
               @i = @i +1
    return @distance
end 

Any help would be apreciated

No correct solution

OTHER TIPS

In hamming calculations, integers are treated as bits. The hamming distance is the number of bit differences, which can be calculated as the number of non-zero bits in the xor of the two values. For the two integers you provide, the bitwise hamming distance is indeed 40.

14714557628763197901=
   1100110000110100100111000011001111001001011100011101000111001101

15383788748848265778=
   1101010101111110001100100101110000111010110000000111101000110010

^= 0001100101001010101011100110111111110011101100011010101111111111

which is 40 non-zero bits. The C# shown is just a fancy way of counting them.

This is not the case with strings. In the TSQL you are performing string hamming, which is classically just the number of positions at which the characters are different. Performing a classic hamming distance on those two values as strings gives:

"14714557628763197901"
"15383788748848265778"
 01111111110111111111 = 18

Youre example TSQL code is performing a modified hamming calculation; to get the classic hamming distance, just remove the last two when clauses.

To perform a binary hamming distance on bigint in TSQL will be very hard, because TSQL does not support bitwise operations on bigint. You could, however, perform the calculation on the left and right halves separately using integer arithmetic, and then add them. The only tricky part is that damned MSB and the impact on shifting.

Performing hamming distance on a decimal is not well-defined. You would need to be more specific about what you think that means.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top