Why don't common Map implementations cache the result of Map.containsKey() for Map.get()

Question 1

Because caching assumes a particular use-case but will actually slow things down in others. It also adds a lot of complications.

How do you cache the value? What happens when multiple threads are reading at once?

Sit down and start thinking through all the various edge cases and problems that can happen here. For example if the value gets changed between the contains call and the get call. Such a seemingly simple change actually introduced a lot of complexity and slows down a lot of operations which are actually more likely to be more frequently used than this specific sequence.

You should also consider that it's possible to build a "caching map" on top of a non-caching one but the opposite would not be possible.

Question 2

Caching is helpful in some situation, detrimental in others. To implement caching within basic map implementations would cause problems in situations where caching is unhelpful.

Remember that one can easily construct a wrapper around a non-caching map that caches as appropriate for a particular scenario.

Question 3

I guess it's not worth it:

Usually, you simply don't care.
Your simple caching is not trivial at all as it needs to deal with modification and concurrency.
In performance critical code you may write the ugly and fast workaround and avoid the overhead.
In another performance critical code you may need to call contains without the following get and your caching would slow it down.

You can use this snippet which is always correct.

Value result = map.get(key);
if (result == null && !map.containsKey(key)) {
    // handle absent key
}

It uses only a single operation unless the key is absent or mapped to null. I guess, in your use case this doesn't occur often.

Question 4

The main points are covered by the other answers, but I want to address this point in particular:

this second implementation has it's own problems. In addition to being less concise and readable, it's potentially incorrect, as it cannot differentiate between the case where the key does not exist and where the key exists but maps explicitly to null.

Something I took away with me from this answer (in this comment) is the following: Would you actually want to differentiate between null and an absent value?

Although I can't speak in general terms, I would say that from my personal experience I have never needed to map keys explicitly to null.

A design where null is inserted into the map would, I speculate, mostly be used to indicate that a special/negative scenario has occurred. In such a case I would probably consider using the null object pattern instead by storing an actual object that through its method return values indicates to the caller that a special scenario has occurred.