Вопрос

I'm writting interpreter of language. There is problem: I want to create type-dictionary, where you can put value of any type by index, that value of any type (simple[int,float,string] or complex[list,array,dictionary] of simple types or of complex of simple types ...). That is the same like in python-lang. What algorithm of hash-function should I use?

For strings there are many examples of hashes - the simplest: sum of all characters multiplied by 31, divided by HASH_SIZE, that simple number.

But for DIFFERENT TYPES, I think, It must be more complicated algorithm. I find SHA256, but don't know, how use "unsigned char[32]" result type for adressing in hash-table - it is much more than RAM in computer. thank you.

Это было полезно?

Решение

There are hash tables in C++11, newest C++ standard - std::unordered_map, std::unordered_set.

EDIT:

Since every type has different distribution, usually every type has its own hash function. This is how it's done in Java (.hashCode() method inherited from Object), C#, C++11 and many other implementations.

EDIT2:

Typical hash function does two things:

1.) Create object representation in a natural number. (this is what .hashCode() in Java does) For example - string "CAT" can be transformed to:

67 * 256^2 + 65 * 256^1 + 84 = 4407636

2.) Map this number to position in array. One of the way to do this is:

integer_part(fractional_part(k*4407636)*m)

Where k is a constant (Donald Knuth in his book Art of Programming recommends (sqrt(5)+1)/2), m is size of your hash table and fractional_part and integer_part (obviously) calculate fractional part and integer part of real number.

In your hash table implementation, you need to handle collisions, especially when there are much more possible keys than size of your hash table.

EDIT3:

I read more on the subject, and it looks like 67 * 256^2 + 65 * 256^1 + 84 = 4407636 is really bad way to do hash_code. This is because, "somethingAAAAAABC" and "AAAAAABC" give exactly the same hash code.

Другие советы

Well, a common approach is to define the hash function as a method belonging to the type. That way you can call different algorithms for different types through a common API.

That ,of course, entails that you define wrapper classes for every baisc "c type" that you want to use in your interpreter.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top