The problem is likely not with your Murmur3 hashing, but rather with the native library and how it allocates memory.
I'm not experienced with JNI-calls, but they are problematic when it comes to memory use (every such call allocates stack and heap-space). One can not be sure that the GC can trigger correctly (read the horror stories about GZipInputStream).
You say you have 22*100 threads created, each one likely allocating some stack for JNI-calls, and just 4Gb memory in the box. The machine seems to be quite crowded, and I guess it's CPU/memory access that is the constraint here, not long external waits (where only few threads are really active in parallell)?
What happends when you lower the amount of threads radically? How is the SimStrings library meant to be used? Does it have an internal threading model which should be respected (ie just letting one thread make it's queries at once?).
I'm afraid the JNI is quite singlethreaded.
Read more about how native calls allocate memory.