Question

Background information:

In my project I'm applying Reinforcement Learning (RL) to the Mario domain. For my state representation I chose to use a hashtable with custom objects as keys. My custom objects are immutable and have overwritten the .equals() and the .hashcode() (which were generated by the IntelliJ IDE).

This is the resulting .hashcode(), I've added the possible values in comments as extra information:

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0);                // 2 possible values: 0, 1
    result = 31 * result + (facing ? 1 : 0);     // 2 possible values: 0, 1 
    result = 31 * result + marioMode;            // 3 possible values: 0, 1, 2
    result = 31 * result + (onGround ? 1 : 0);   // 2 possible values: 0, 1 
    result = 31 * result + (canJump ? 1 : 0);    // 2 possible values: 0, 1 
    result = 31 * result + (wallNear ? 1 : 0);   // 2 possible values: 0, 1 
    result = 31 * result + nearestEnemyX;        // 33 possible values: - 16 to 16
    result = 31 * result + nearestEnemyY;        // 33 possible values: - 16 to 16

    return result;
}

The Problem:

The problem here is that the result in the above code can exceed Integer.MAX_VALUE. I've read online this doesn't have to be a problem, but in my case it is. This is partly due to algorithm used which is Q-Learning (an RL method) and depends on the correct Q-values stored inside the hashtable. Basically I cannot have conflicts when retrieving values. When running my experiments I see that the results are not good at all and I'm 95% certain the problem lies with the retrieval of the Q-values from the hashtable. (If needed I can expand on why I'm certain about this, but this requires some extra information on the project which isn't relevant for the question.)

The Question:

Is there a way to avoid the integer overflow, maybe I'm overlooking something here? Or is there another way (perhaps another datastructure) to get reasonably fast the values given my custom-key?

Remark:

After reading some comments I do realise that my choice for using a HashTable wasn't maybe the best one as I want unique keys that do not cause collisions. If I still want to use the HashTable I will probably need a proper encoding.

Was it helpful?

Solution

You need a dedicated Key Field to guarantee uniqueness

.hashCode() isn't designed for what you are using it for

.hashCode() is designed to give good general results in bucketing algorithms, which can tolerate minor collisions. It is not designed to provide a unique key. The default algorithm is a trade off of time and space and minor collisions, it isn't supposed to guarantee uniqueness.

Perfect Hash

What you need to implement is a perfect hash or some other unique key based on the contents of the object. This is possible within the boundries of an int but I wouldn't use .hashCode() for this representation. I would use an explicit key field on the object.

Unique Hashing

One way to use use SHA1 hashing that is built into the standard library which has an extremely low chance of collisions for small data sets. You don't have a huge combinational explosion in the values you posts to SHA1 will work.

You should be able to calculate a way to generate a minimal perfect hash with the limited values that you are showing in your question.

A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers—usually [0..n−1] or [1..n]. A more formal way of expressing this is: Let j and k be elements of some finite set K. F is a minimal perfect hash function iff F(j) =F(k) implies j=k (injectivity) and there exists an integer a such that the range of F is a..a+|K|−1. It has been proved that a general purpose minimal perfect hash scheme requires at least 1.44 bits/key.2 The best currently known minimal perfect hashing schemes use around 2.6 bits/key.[3]

A minimal perfect hash function F is order preserving if keys are given in some order a1, a2, ..., an and for any keys aj and ak, j

A minimal perfect hash function F is monotone if it preserves the lexicographical order of the keys. In this case, the function value is just the position of each key in the sorted ordering of all of the keys. If the keys to be hashed are themselves stored in a sorted array, it is possible to store a small number of additional bits per key in a data structure that can be used to compute hash values quickly.[6]

Solution

Note where it talks about a URL it can be any byte[] representation of any String that you calculate from your object.

I usually override the toString() method to make it generate something unique, and then feed that into the UUID.nameUUIDFromBytes() method.

Type 3 UUID can be just as useful as well UUID.nameUUIDFromBytes()

Version 3 UUIDs use a scheme deriving a UUID via MD5 from a URL, a fully qualified domain name, an object identifier, a distinguished name (DN as used in Lightweight Directory Access Protocol), or on names in unspecified namespaces. Version 3 UUIDs have the form xxxxxxxx-xxxx-3xxx-yxxx-xxxxxxxxxxxx where x is any hexadecimal digit and y is one of 8, 9, A, or B.

To determine the version 3 UUID of a given name, the UUID of the namespace (e.g., 6ba7b810-9dad-11d1-80b4-00c04fd430c8 for a domain) is transformed to a string of bytes corresponding to its hexadecimal digits, concatenated with the input name, hashed with MD5 yielding 128 bits. Six bits are replaced by fixed values, four of these bits indicate the version, 0011 for version 3. Finally, the fixed hash is transformed back into the hexadecimal form with hyphens separating the parts relevant in other UUID versions.

My preferred solution is Type 5 UUID ( SHA version of Type 3)

Version 5 UUIDs use a scheme with SHA-1 hashing; otherwise it is the same idea as in version 3. RFC 4122 states that version 5 is preferred over version 3 name based UUIDs, as MD5's security has been compromised. Note that the 160 bit SHA-1 hash is truncated to 128 bits to make the length work out. An erratum addresses the example in appendix B of RFC 4122.

Key objects should be immutable

That way you can calculate toString(), .hashCode() and generate a unique primary key inside the Constructor and set them once and not calculate them over and over.

Here is a straw man example of an idiomatic immutable object and calculating a unique key based on the contents of the object.

package com.stackoverflow;

import javax.annotation.Nonnull;
import java.util.Date;
import java.util.UUID;

public class Q23633894
{

    public static class Person
    {
        private final String firstName;
        private final String lastName;
        private final Date birthday;
        private final UUID key;
        private final String strRep;

        public Person(@Nonnull final String firstName, @Nonnull final String lastName, @Nonnull final Date birthday)
        {
            this.firstName = firstName;
            this.lastName = lastName;
            this.birthday = birthday;
            this.strRep = String.format("%s%s%d", firstName, lastName, birthday.getTime());
            this.key = UUID.nameUUIDFromBytes(this.strRep.getBytes());
        }

        @Nonnull
        public UUID getKey()
        {
            return this.key;
        }

        // Other getter/setters omitted for brevity

        @Override
        @Nonnull
        public String toString()
        {
            return this.strRep;
        }

        @Override
        public boolean equals(final Object o)
        {
            if (this == o) { return true; }
            if (o == null || getClass() != o.getClass()) { return false; }
            final Person person = (Person) o;
            return key.equals(person.key);
        }

        @Override
        public int hashCode()
        {
            return key.hashCode();
        }
    }
}

OTHER TIPS

For a unique representation of your object's state, you would need 19 bits in total. Thus, it is possible to represent it by a "perfect hash" integer value (which can have up to 32 bits):

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0); // needs 1 bit (2 possible values)
    result += (facing ? 1 : 0) << 1; // needs 1 bit (2 possible values)
    result += marioMode << 2; // needs 2 bits (3 possible values)
    result += (onGround ? 1 : 0) << 4; // needs 1 bit (2 possible values)
    result += (canJump ? 1 : 0) << 5; // needs 1 bit (2 possible values)
    result += (wallNear ? 1 : 0) << 6; // needs 1 bit (2 possible values)
    result += (nearestEnemyX + 16) << 7; // needs 6 bits (33 possible values)
    result += (nearestEnemyY + 16) << 13; // needs 6 bits (33 possible values)
}

Instead of using 31 as a your magic number, you need to use the number of possibilities (normalised to 0)

@Override
public int hashCode() {
    int result = (stuck ? 1 : 0);                // 2 possible values: 0, 1
    result = 2 * result + (facing ? 1 : 0);      // 2 possible values: 0, 1 
    result = 3 * result + marioMode;             // 3 possible values: 0, 1, 2
    result = 2 * result + (onGround ? 1 : 0);    // 2 possible values: 0, 1 
    result = 2 * result + (canJump ? 1 : 0);     // 2 possible values: 0, 1 
    result = 2 * result + (wallNear ? 1 : 0);    // 2 possible values: 0, 1 
    result = 33 * result + (16 + nearestEnemyX); // 33 possible values: - 16 to 16
    result = 33 * result + (16 + nearestEnemyY); // 33 possible values: - 16 to 16

    return result;
}

This will give you 104544 possible hashCodes() BTW you can reverse this process to get the original values from the code by using a series of / and %

Try Guava's hashCode() method or JDK7's Objects.hash(). It's way better than writing your own. Don't repeat code yourself (and anyone else when you can use out of box solution):

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top