A common pattern with a map is to check if a key exists and then act on the value only if it does, consider:
if(!map.containsKey(key)) {
map.put(key, new DefaultValue());
}
return map.get(key);
However this is generally considered poor, as it requires two map lookups, where this alternative only requires one:
Value result = map.get(key);
if(result == null)
{
result = new DefaultValue();
map.put(key,result);
}
return result;
Yet this second implementation has it's own problems. In addition to being less concise and readable, it's potentially incorrect, as it cannot differentiate between the case where the key does not exist and where the key exists but maps explicitly to null
. In individual cases of course we can create external invariants that the map will not contain null
values, however in general we cannot rely on this second pattern, needing to fall back to the less efficient implementation.
But why does the first implementation need to be less efficient? HashMap
's .containsKey()
looks like this:
public boolean containsKey(Object key) {
return getEntry(key) != null;
}
And Guava's ImmutableMap.containsKey()
similarly is:
public boolean containsKey(@Nullable Object key) {
return get(key) != null;
}
Since these calls go through all the work of doing a .get()
, what is the disadvantage of caching the result of this call, and then short-circuting a successive call to .get()
for the same key? It seems like the cost is a single pointer, yet the benefit would mean the "right" way to implement such a pattern is also the "efficient" way to do so.
private transient Entry<K,V> lastContainsKeyResult = null;
public boolean containsKey(Object key) {
lastContainsKeyResult = getEntry(key);
return lastContainsKeyResult != null;
}
public V get(Object key) {
if(key != null && lastContainsKeyResult != null &&
key.equals(lastContainsKeyResult.getKey()) {
return lastContainsKeyResult.getValue();
}
// normal hash lookup
}