Question

OLE Variants, as used by older versions of Visual Basic and pervasively in COM Automation, can store lots of different types: basic types like integers and floats, more complicated types like strings and arrays, and all the way up to IDispatch implementations and pointers in the form of ByRef variants.

Variants are also weakly typed: they convert the value to another type without warning depending on which operator you apply and what the current types are of the values passed to the operator. For example, comparing two variants, one containing the integer 1 and another containing the string "1", for equality will return True.

So assuming that I'm working with variants at the underlying data level (e.g. VARIANT in C++ or TVarData in Delphi - i.e. the big union of different possible values), how should I hash variants consistently so that they obey the right rules?

Rules:

  • Variants that hash unequally should compare as unequal, both in sorting and direct equality
  • Variants that compare as equal for both sorting and direct equality should hash as equal

It's OK if I have to use different sorting and direct comparison rules in order to make the hashing fit.

The way I'm currently working is I'm normalizing the variants to strings (if they fit), and treating them as strings, otherwise I'm working with the variant data as if it was an opaque blob, and hashing and comparing its raw bytes. That has some limitations, of course: numbers 1..10 sort as [1, 10, 2, ... 9] etc. This is mildly annoying, but it is consistent and it is very little work. However, I do wonder if there is an accepted practice for this problem.

Was it helpful?

Solution

Hash codes of VARIANTS that are equal should be equal.

Without knowing the equality and coercion rules that are used for testing equality, it is hard to come up with a proper implementation.

OTHER TIPS

There's a built in tension in your question between the use of a hash function and the stated requirements, which are to validated against the input of the hash. I'd suggest we keep in mind a few properties of hashes in general: information is lost during the hashing process, and hash collisions are to be expected. It is possible to construct a perfect hash without collisions, but it would be problematic (or impossible?) to construct a perfect hash function if the domain of the function is any possible OLE Variant. On the other hand, if we're not talking about a perfect hash, then your first rule is violated.

I don't know the larger context of what you're trying to accomplish, but I must push back on one of your assumptions: is a hash function really what you want? Your requirements could be met in a fairly straightforward way if you develop a system that encodes, not hashes, all of the possible OLE Variant attributes so that they can be recalled later and compared against other Variant images.

Your baseline implementation of converting the Variant to a string representation is moving in this direction. As you are no doubt aware, a Variant can contain pointers, double pointers, and arrays, so you'll have to develop consistent string representation of these data types. I question whether this approach could really be classified as a hash. Aren't you just persisting data attributes?

So in summary, to make stuff comparable you first stream to a common format, string or blob.

How do you handle e.g. localisation, e.g. formating of reals? A real compared to a string containing the same real created in another locale will fail. Or a real written to string with a different precision setting.

It sounds to me the definition of equal() is the problem, not the hashing. If "equal" values can be serialized to string (or blob) differently, hashing will fail.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top