Question

Is there a case-insensitive variant of the Bob Jenkins hash function?

Generics.Defaults.BobJenkinsHash

provides a fast hash function. Unfortunately it cannot be used in combination with a case-insensitive compare function like so

TCustomStringComparer = class (TEqualityComparer <String>)
  function Equals(const Left, Right: String): Boolean; override;
  function GetHashCode(const Value: String): Integer; override;
end;
function TCustomStringComparer.Equals (const Left, Right : String) : Boolean;
begin
  Result := CompareText (Left, Right) = 0;
end;
function TCustomStringComparer.GetHashCode (const Value : String) : Integer;
begin
  Result := Generics.Defaults.BobJenkinsHash (Value [1], Length (Value) * SizeOf (Char), 0);
end;

This is because TDictionary first compares the hash codes and then uses the provided comparer when checking for equality.

Of course I could use UpperCase in my GetHashCode function, but I wondered if it would be faster if I could somehow modify the hash function itself.

Was it helpful?

Solution

No, there is no case invariant version of the hash function. Lower or upper case all strings before passing them to the hash function.

OTHER TIPS

It would be slightly faster, but it hurts your maintainability a lot. There is rarely a good reason for this type of micro-optimization. Just convert your strings to lower or upper case before hashing like you suggested.

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified" - Donald Knuth

IMO the whole question is wrong. To quote the Wikipedia article on hash functions:

A hash function is any well-defined procedure or mathematical function which converts a large, possibly variable-sized amount of data into a small datum, usually a single integer that may serve as an index to an array.

Note the "amount of data" - there is no type specified, and indeed the Bob Jenkins hash function has an untyped parameter const Data pointing to the data to be hashed. Since the input data is not necessarily a sequence of characters there is no way to compute a "case-insensitive" hash value. And even if it were a sequence of characters - the upper- or lowercasing would depend on character set and encoding. So you would need to have different hash functions for ASCII strings, UTF-8 encoded strings, UTF-16 LE encoded strings, ... (you get the idea).

I also needed such a function in a project. Bob Jenkin's one-at-a-time hash:

function hash(const s: string): cardinal;
var
  p, last: PByte;
begin
  if s = '' then exit(1);
  p := pbyte(pointer(s));
  last := p + length(s);
  result := 0;
  while p < last do begin
    if {$ifdef asciionly}p^ < 128{$else}true{$endif}  then begin
      result := result + p^;
      if (p^ >= ord('a')) and (p^ <= ord('z')) then result := result - ord('a') + ord('A');
      result := result + (result shl 10);
      result := result xor (result shr 6);
    end;
    inc(p);
  end;

  result := result + (result shl 3);
  result := result xor (result shr 11);
  result := result + (result shl 15);
end;        

If asciionly is set, it should also give the same hash for utf-8 and latin1 strings.

Do not forget to disable overflow checking.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top