Domanda

suppose we have an erlang application which involves thousands of processes. Suppose there is a single resource X which may be a tuple, a list, or any erlang term, which all these processes may need to read / pick out something from it, at any moment in time.

An example of such an occurrence, is say, an API system, in which client processes may need to read and write on a remote machine. Ant it happens that you do not want, for each read/write request, a new connection to be created. So, what you do, you create a pool of connections, consider them as a pool of open pipes/sockets/channels.

Now, this pool of resources is to be shared by thousands of processes such that for each read or write demand, you want that process to retrieve any available open channel/resource.

Question is, what if i have a process (a single process) hold this information, whether in its process dictionary or in its receive loop. It would mean that all the processes would have to send a message to this process whenever they need a free resource. This single process would have a huge mailbox at any time because of the high demand for this single resource.

OR I could use an ETS Table, and have only one row, say, #resources{key=pool,value= List_of_openSockets_or_channels}. But this would mean that, all our processes would attempt to make a read from the ETS Table for the same row at (high probability) same instantaneous times.

How would the ETS Table handle, if 10,000 process atttempt a read, for the same row/record from it, at the same time/at almost same time ? and yet, if i use a process, its mailbox, if 10,000 processes send a message to it, at same time, for the same resource (and it would need to reply each requestor). And remember this action may occur so frequently. What option (dis-regarding availability issues of process going down blah blah), would provide higher throughput, in a way that, processes would get what they need faster ?

Is there any other better way, of handling high demand data structures in the Erlang VM in a way that will provide very fast access to millions of processes, even if they all needed that resource at the same time ?

È stato utile?

Soluzione

Short answer: profile. Try different approaches and verify how your system behaves.

Firstly, I would look at ETS' {read_concurrency, true} option. From the documentation:

{read_concurrency,boolean()} Performance tuning. Default is false. When set to true, the table is optimized for concurrent read operations. When this option is enabled on a runtime system with SMP support, read operations become much cheaper; especially on systems with multiple physical processors. However, switching between read and write operations becomes more expensive. You typically want to enable this option when concurrent read operations are much more frequent than write operations, or when concurrent reads and writes comes in large read and write bursts (i.e., lots of reads not interrupted by writes, and lots of writes not interrupted by reads). You typically do not want to enable this option when the common access pattern is a few read operations interleaved with a few write operations repeatedly. In this case you will get a performance degradation by enabling this option. The read_concurrency option can be combined with the write_concurrency option. You typically want to combine these when large concurrent read bursts and large concurrent write bursts are common.

Secondly, I would look at caching possibilities. Are the processes reading that information only once or multiple times? If they're accessing it multiple times, you could read it once and store it in your process state.

Thirdly, you could try to replicate and distribute that piece of information across your system. Divide et impera.

Altri suggerimenti

If you use the process approach, in order to avoid having all the read requests serialized on the message queue of the 'server' process you must replicate.

Using an ETS table with read_concurrency feels more natural and it is something that I used when developing the parallel version of Dialyzer. However, ETS access was never a bottleneck in that case.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top