If you assign less number or bits or shorter code words for most frequently used symbols you will be saving a lot of storage space.
Suppose you want to assign 26 unique codes to English alphabet and want to store an english novel ( only letters ) in term of these code you will require less memory if you assign short length codes to most frequently occurring characters.
You might have observed that postal code and STD codes for important cities are usually shorter ( as they are used very often ). This is very fundamental concept in Information theory.
Huffman encoding gives prefix codes.
Construction of Huffman tree:
A greedy approach to construct Huffman tree for n
characters is as follows:
places n
characters in n sub-trees.
Starts by combining the two least weight nodes into a tree which is assigned the sum of the two leaf node weights as the weight for its root node.
Do this until you get a single tree.
For example consider below binary tree where E and T have high weights ( as very high occurrence )
It is a prefix tree. To get the Huffman code for any character, start from the node corresponding to the the character and backtrack till you get the root node.