Question

Eclipse implements the hashCode() function for a singly linked list's Node class the following way:

class Node{
    int val;
    Node next;

    public Node(int val){
        this.val = val;
        next = null;
    }
    @Override
    public int hashCode() {
        final int prime = 31;
        int result = 1;
        result = prime * result + ((next == null) ? 0 : next.hashCode());
        result = prime * result + val;
        return result;
    }
}

Now hashCode() for a node is dependent on the hash code of the nodes that follow it.

So, every call of hashCode() will take amortized linear time in the length of the linked list. Thus using a HashSet<Node> will become unfeasible.

One way to get around this is to cache the value of the hashCode in a variable(call it hash) so that it is computed only once. But even in this case, the hash will become invalid once any node's val is changed. And again it will take linear time to modify the hashCode of nodes that follow the current node.

So what are some good ways of implementing hashing for such a linked list Node?

Was it helpful?

Solution

My first thought upon reading your question was: what does LinkedList do? Digging into the source, we see that there is no hashCode() or equals() defined on the inner LinkedList.Node class (link to source).

Why does this make sense? Well, nodes are normally internal data structures, only visible to the list itself. They are not going to be placed into collections or any other data structure where comparing equality and hash-codes are necessary. No external code has access to them.

You say in your question:

Thus using a HashSet<Node> will become unfeasible.

But I would argue that you have no need to place your nodes in such a data structure. By definition, your nodes will link to each other and require no additional classes to facilitate that relationship. And unless you plan to expose this class outside your list (which isn't necessary), they will never end up in a HashSet.

I would propose you follow the LinkedList.Node model and avoid creating these methods on your nodes. The outer list can base its hashcode and equality on the values stored in the nodes (but not the nodes themselves), which is how LinkedList does it - see AbstractList (link to source).

Source links are to the OpenJDK source, but in this case they are identical to source supplied with Oracle JDKs

OTHER TIPS

You have to ask yourself what quality of hashing is valueable for you. The only restriction is to make sure another list with same number in same order has the same hash. That's achieved by using a contant number as well as using the first as well as by limiting on 5 numbers. How much numbers make sense for you depends on the structure of your data. If for example you always store consecutive, ascending numbers starting from 1 and the difference is only the length, that will be hard to optimize. If it's completly random over the entire range of int the first number will do the job well. How many numbers deliver the best ratio for you is found out by measuring I'd say.

In the end what you need is a good ration between collisions (objects put to the same bucket) and calculation time. Generated implementation typically try to maximize the calculation time, providing the human developer with the pleasure of much room for improvement. ;-)

And concerning the changing of contained value: java.util.HashSet (respectivly the HashMap it holds) will calulate its own hash upon yours, and cache that. So if an object containted in a HashSet can't be found again once it changed that far that its hash changed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top