Note there are three things here:
Getting the server to support two million connections. This is usually just tweaking the kernel such that the number of simultaneous connections are allowed and such that the context associated with each connection fit in (wired) main memory. The latter means you can't have a megabyte buffer space allocated for each connection.
Doing something with each connection. You map them into userspace in a process (Erlang in this case). Now, if each connection allocates too much data at the user space level we are back to square one. We can't do it.
Getting multiple cores to do something with the connections. This is necessary due to the sheer amount of work to be done. It is also the point where you want to avoid locking too much and so on.
You seem to be focused on 1. alone.