You need to think a bit more about what you really benchmark using this program. I can tell you it is not Redis, but rather the capability of your system to run a ping pong game between two processes (because all your hsetnx calls are synchronous).
Please read this page before trying to benchmark Redis, it will definitely help you.
Your assumption that the speed of Redis should approach the writing speed of RAM is somewhat naive. Redis is a remote store, and for O(1) operations, most of the overhead is due to the communication costs. For synchronous traffic (like your example), it is also due to the cost of the OS scheduler.
If you want to apply of lot of commands in sequence, you need to use pipelining. Or if you do not care about the sequence, you can work concurrently with several connections (this is the default mode for redis-benchmark). Or you can try to send asynchronous commands instead. In all cases, the idea is to amortize the cost of the roundtrips to the Redis server
With pipelining on several connections with asynchronous traffic, you will get the maximum throughput Redis can achieve on this machine.