Question

Everywhere you can find answer what are differences:

Map is storing keys-values, it is not synchronized(not a thread safe), allows null values and only one null key, faster to get value because all values have unique key, etc. Set - not sorted, slower to get value, storing only value, does not allow duplicates or null values I guess.

BUT what means Hash word (that is what they have the same). Is it something about hashing values or whatever I hope you can answer me clearly.

Was it helpful?

Solution

Both use hash value of the Object to store which internally uses hashCode(); method of Object class.

So if you are storing instances of your custom class then you need to override hashCode(); method.

OTHER TIPS

HashSet and HashMap have a number of things in common:

  1. The start of their name - which is a clue to the real similarity.
  2. They use Hash Codes (from the hashCode method built into all Java objects) to quickly process and organize Objects.
  3. They are both unordered collections - but both provide ordered varients (LinkedHashX to store objects in the order of addition)
  4. There is also TreeSet/TreeMap to sort all objects present in the collection and keep them sorted. A comparison of TreeSet to TreeMap will find very similar differences and similarities to one between HashSet and HashMap.

They are also both impacted by the strengths and limitations of Hash algorithms in general.

  1. Hashing is only effective if the objects have well behaved hash functions.
  2. Hashing breaks entirely if equals and hashCode do not follow the correct contract.
  3. Key objects in maps and objects in set should be immutable (or at least their hashCode and equals return values should never change) as otherwise behavior becomes undefined.

If you look at the Map API you can also see a number of other interesting connections - such as the fact that keySet and entrySet both return a Set.

None of the Java Collections are thread safe. Some of the older classes from other packages were but they have mostly been retired. For thread-safety look at the concurrent package for non-thread-safety look at the collections package.

Just look into HashSet source code and you will see that it uses HashMap. So they have the same properties of null-safety, synchronization etc:

public class HashSet<E>

...

    private transient HashMap<E,Object> map;

    // Dummy value to associate with an Object in the backing Map
    private static final Object PRESENT = new Object();

    /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * default initial capacity (16) and load factor (0.75).
     */
    public HashSet() {
        map = new HashMap<>();
    }

...

    public boolean contains(Object o) {
        return map.containsKey(o);
    }

...

    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

...

}

HashSet is like a HashMap where you don't care about the values but about the keys only. So you care only if a given key K is in the set but not about the value V to which it is mapped (you can think of it as if V is a constant e.g. V=Boolean.TRUE for all keys in the HashSet). So HashSet has no values (V set). This is the whole difference from structural point of view. The hash part means that when putting elements into the structure Java first calls the hashCode method. See also http://en.wikipedia.org/wiki/Open_addressing to understand in general what happens under the hood.

The hash value is used to check faster if two objects are the same. If two objects have same hash, they can be equal or not equal (so they are then compared for equality with the equals method). But if they have different hashes they are different for sure and the check for equality is not needed. This doesn't mean that if two objects have same hash values they overwrite each other when they are stored in the HashSet or in the HashMap.

Both are not Thread safe and store values using hashCode(). Those are common facts. And another one is both are member of Java collection framework. But there are lots of variations between those two.

Hash regards the technique used to convert the key to an index. Back in the data strucutures class we used to learn how to construct a hash table, to do that you would need to get the strings that were inserted as values and convert them to a number to index an array used internally as the storing data structure.

One problem that was also very discussed was to find a hashing function that would incurr in minimum colision so that we won't have two different objects, with different keys sharing the same position.

So, the hash is about how the keys are processed to be stored. If we think about it for a while, there isn't a (real) way to index memory with strings, only with numbers, so to have a 2d structure like a table that is indexed by a string (or an object as you wish) you need to generate a number (or a hash) for that string and store the value in an array in this index. However, if you need the key "name" you would need a different array to, in the same index, store the key "name".

Cheers

The "HASH" word is common because both uses hashing mechanism. HashSet is actually implemented using HashMap, using dummy object instance on every entry of the Set. And thereby a wastage of 4 bytes for each entry.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top