Should one use Disruptor (LMAX) with a big model in memory and CQRS?

https://stackoverflow.com/questions/12930233

07-07-2021
|

Domanda

Our system has structured model (about 30 different entities with several kind of relations) entirely kept in memory (about 10 Gb) for performance reasons. On this model we have to do 3 kind of operation:

update one or a few entities
query for a particular data (this usually require to read thousands of entities)
get statistical data (how much memory is used, how many queries for kind etc.)

Currently the architecture is a fairly standard one, with a pool of threads for servlets that use a shared model. Inside the model there are a lot of concurrent collections, but still there are many waits because some entities are "hotter" and mostly of threads want to read/write them. Note also that usually queries are much more cpu and time consuming than writes.

I'm studying the possibility to switch to a Disruptor architecture keeping the model in a single thread, moving everything possible (validity checks, auditing, etc.) out of the model in a separate consumers.

First question of course is: does it make sense?

Secondo question is: ideally write requests should take precedence over read ones. Which is the best way to have a priority in disruptor? I was thinking about 2 rings buffers and then try to read from the highpriority one more often than from the low priority one.

To clarify the question is more architectural than about the actual code of LMAX Disruptor.

Update with more details

Data is a complex domain, with many entities (>100k) of many different types (~20) linked between them in a tree structure with many different collections.

Queries usually involve traversing thousands of entities to find the correct data. Updates are frequent but quite limited like 10 entities at time, so in the whole data are not changing very much (like 20% for hour).

I did some preliminar tests and it appears the speed advantages of querying the model in parallel outweigh the occasional write locks delays.

Soluzione

"ideally write requests should take precedence over read ones."

Why ? Most fast locks like C# ReaderWriterLockSlim do the opposite.. A write needs to block all reads to avoid partial reads. So such locks allow many concurrent reads hoping things get "quite" and then do the write .. ( The write does run at its number in the queue but its very likely many reads which came after it were processed before it locks)..

Prioritizing writes is a good way to kill concurrency ..

Is eventual concurrency / CQRS an option ?

Altri suggerimenti

LMAX may be appropriate ..

The LMAX people first implemented traditional , then they implemented actors ( with queues ) and found actors spent most of the time in the queues. Then they went to the single threaded architecture..Now the disruptor is not the key to the architecture the key is a single threaded BL. With 1 writer ( single thread) and small objects your going to get a high cache hit and no contention. To do this they have to move all long running code out of the Business layer ( which includes IO) . Now to do this they use they used the disruptor its basically just a ring buffer with a single writer as has been used in device driver code for a while but at a huge message scale.

First I have one disagreement with this , LMAX is an actor system .. Where you have 1 actor for all the BL ( and the disruptors connect other actors) .. They could have improved there actor system significantly instead of jumping to 1 actor for BL , namely

Dont have lots of services / actors , try to have commonly used components in one service. ( this comes up time and time again in SOA / distributed systems also)
When communicating between actors use point to point queues not many to 1. ( like all the services buses!)
When you have point to point queues ensure the tail is a pointer to a separate memory area . With 2 and 3 you can now use lockless queues ,and the queues / threads only have 1 writer (and you can even use not temporal 256 but YMM bit writes into the queue) . However the system now has more threads (and if you have done 1 correctly a relatively small amount of messages between actors) . The queues are similar to disruptors and can batch process many entries and can use a ring buffer style.

With these actors you have a more modular ( and hence main-table) system ( and the system could launch more actors to process the queues - note 1 writer ! )

re your case I think 20% of changes in an hour is huge... Are the queries always on in memory objects ? Do you have in memory hash tables / indexes ? Can you use read only collections ? Does it matter if your data is old eg Ebay uses a 1 hour refresh on its items collection so the item collection itself is static. With a static collection and static item briefs , they have a static index and you can search and find items fast and all in memory . Every hour it gets rebuilt and when complete ( it could take minutes to rebuild ) the system switches to the new data. Note the items themselves are not static.

IN your case with a huge domain the single thread may get a lowish cache hit ..which is different to LMAX who have a smaller domain for each message to pass over..

An agent based system may be the best bet namely because a bunch of entities can be grouped and hence have a high cache hit. But i need to know more. eg move validity checks, auditing, logging etc out is probably a good plan . Less code = smaller objects = higher cache hit and LMAX objects were small.

Hope this quick dump helps but its hard from only a glance.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow