Explanation about hashing and its use for data compression

https://stackoverflow.com/questions/445202

22-07-2019
|

Question

I am facing an application that uses hashing, but I cannot still figure out how it works. Here is my problem, hashing is used to generate some index, and with those indexes I access to different tables, and after I add the value of every table that I get using the indexes and with that I get my final value. This is done to reduce the memory requirements. The input to the hashing function is doing the XOR between a random constant number and some parameters from the application.

Is this a typical hashing application?. The thing that I do not understand is how using hashing can we reduce the memory requirements?. Can anyone clarify this?.

Thank you

Solution

Hashing alone doesn't have anything to do with memory.

What it is often used for is a hashtable. Hashtables work by computing the hash of what you are keying off of, which is then used as an index into a data structure.

Hashing allows you to reduce the key (string, etc.) into a more compact value like an integer or set of bits.

That might be the memory savings you're referring to--reducing a large key to a simple integer.

Note, though, that hashes are not unique! A good hashing algorithm minimizes collisions but they are not intended to reduce to a unique value--doing so isn't possible (e.g., if your hash outputs a 32bit integer, your hash would have only 2^32 unique values).

OTHER TIPS

Is it a bloom filter you are talking about? This uses hash functions to get a space efficient way to test membership of a set. If so then see the link for an explanation.

Most good hash implementations are memory inefficient, otherwise there would be more computing involved - and that would exactly be missing the point of hashing.

Hash implementations are used for processing efficiency, as they'll provide you with constant running time for operations like insertion, removal and retrieval.

You can think about the quality of hashing in a way that all your data, no matter what type or size, is always represented in a single fixed-length form.

This could be explained if the hashing being done isn't to build a true hash table, but is to just create an index in a string/memory block table. If you had the same string (or memory sequence) 20 times in your data, and you then replaced all 20 instances of that string with just its hash/table index, you could achieve data compression in that way. If there's an actual collision chain contained in that table for each hash value, however, then what I just described is not what's going on; in that case, the reason for the hashing would most likely be to speed up execution (by providing quick access to stored values), rather than compression.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow