Simply put, locking and unlocking many locks is more expensive than locking and unlocking a single lock. This shouldn't be surprising, doing anything N times instead of once obvious takes more time (all other things being equal). And for this kind of thing, economics of scale don't really apply, there's no big one-time cost to amortize over all locking operations.
Edit: In principle, Java has the same problem, but due to different focus of everyone involved, history, and perhaps other factors, Java gets by rather well with fine-grained locks. In short, single-threaded performance is not regarded that important, and multi-threaded performance is probably better than a hypothetical free-threaded CPython.
Historically, I don't think there ever was a JVM with a GIL (though it started out with green threads running on a single OS thread - but this was long ago), so there's no historical reasons for keeping a GIL and no base-line single-threaded performance that makes people loathe locks. Instead, a lot of effort was put into making Java good at multi-threading, and this ability is widely used. In contrast, even if you solved the GIL issue with no performance cost for single-threaded Python or Ruby programs, most code out there wouldn't benefit from it and the libraries are... not awful, but not exactly on par with java.util.concurrent
either.
Because Java has (now) a memory model which explicitly doesn't give a lot of guarantees, many common operations in Java programs don't need any kind of lock in general. The downside is, of course, that Java programmers have to add locks or other synchronization manually when it is needed. In addition, Java's locks have seen a lot of optimizations (some of which was original research and first introduced in JVM) to locks - thin locks, lock elision, etc. - which make locks with contention very cheap.
Another factor may be that a Java program runs almost entirely Java code (which, as I've described above, only needs very little synchronization if it's not explicitly requested), with only few calls into a runtime library. As a consequence, a free-threaded JVM could even have a global lock (or only a few coarse locks) for the JIT, the classloader, etc. without affecting most Java programs too much. In contrast, a Python program will spend a large part of its time in C code, either of the built-in modules or in third-party extension modules.