Implemeting a distributed n-Producers/1-Consumer service for critical-mission system

https://softwareengineering.stackexchange.com/questions/414930

15-03-2021
|

문제

I am trying to implement a distributed version of multi-producer/1-consumer for a critical-mission system. I'm looking for good alternatives to the current approach based on RDBMS.

The problem

The system consists of serveral (50+) producers which continuously produce thousands of instances per second. Each instance is a well-defined, timestamped, flat structured. Each instance is stored into a single queue by the producers.

In the other side, I have a consumer which consumes the instances in a FIFO way.

Producers and the Consumer run on different machines connected by a TCP/IP private network.

For the sake of completeness, there are two strong requirements

The consumer cannot consume the same resource twice. It is an error.
Every resource must be consumed by the consumer. If a resource is missed it is a loss

In addition, the solution must run on Linux and Windows Servers.

Current approach

In the current version, the system implements this solution by using a relational database as data bus.

There is one database server which supports all producers and the consumer. Producers insert resources into a determined table and the consumer consumes the resources from that table as represented in the image above.

The database server/JDBC transaction model allows to control the inserts/deletions in order to avoid queue corruption.

That current approach works well but:

Introduces the overhead of maintaining a whole relational database server for a task where no data relationship is required;
The relational database server must to fit to critical-mission requirements what is hard to achieve on some real settings when the database server instance is not dedicate

Alternatives

Here I'm listing some alternatives to current relational database server data bus approach:

Dedicate lightweight relational database server

This seems to be the easiest approach: Using a dedicate lightweight relational database server as HSQLDB, Apache Derby or H2.

Pros

They have considerably less overhead to maintain if compared with an RDBMS such as MS SQL Server, Oracle DB Server or even MySQL. In addition, less code change and tests are required since they are basically SQL engines like the ones used in current solution.

Cons

They are relational database servers so it turns out that there exist still some level of overhead to perform a relationship-free task. Another point is the critical-mission aspect. We use Derby DB internally for ages for realtime system surpervision in both embedded and network modes. It works great, neither crash nor data corruption. However, the volume of transactions/sec for that new usage is higher.

Redis server

At first glance, Redis looks perfect for this use case. In memory, fast, no overhad for data relationship, straight-forward. Widely used as databus and reported as reliable. But not for Windows. As said in the docs, Redis on Windows isn’t recommended. The Microsoft Windows port is no longer maintained, the last release dates 2016 so attaching Redis to the system looks not promissing.

Implementing a solution from scratch

In last words, it is a producer-consumer problem. Implementing a network service using TCP or something more elegant such as Camel and using concurrent queue internally plus some local persistence engine will be time-costly, reinvent wheel but it is still a option.

These are the alternatives we are considering so far. I appreciate if someone can provide some insight or recommendation.

해결책

Looks like you're looking for a message queue. Depending on your tech stack, there are various implementations of distributed queue that might interest you, for example ZeroMQ or RabbitMQ.

Some approaches like ZeroMQ can run without having a message broker, that means the producers and consumers talks directly without needing another service or a database to orchestrate/broker the queue. Being brokerless has the advantage of being a lot simpler to manage operationally than a brokered message queues, and being simpler to understand and to scale and customise, but the primary disadvantage is that it lacks the services usually provided by brokers, so if a participant is offline, then messages may be lost. If you need the message to be reliably processed, you'll need to design your producers to be able to handle retry sending if the consumer isn't available, you'll need to add mechanism for acknowledgement off successful delivery, and the consumer needs to be designed to be idempotent (be able to detect duplicate messages and discard them). The main advantage of being brokerless is that you're free to implement as much or as little broker behaviour as your application needs, so you're not tied to a specific broker behaviour.

A brokered message queue like RabbitMQ is somewhat simpler during use, as the broker adds a persistence and messaging reliability layer to the fabric of the queue system rather than requiring producer and consumers to implement those, but it adds the complexity and overhead of managing the broker, and the broker adds latency so it may be unsuitable for scenarios where milliseconds matters or where your target scalability level exceeds what can be achieved in a brokered system.

there exist still some level of overhead to perform a relationship-free task

I'd suggest you to profile your application to actually find out whether or not that actually matters. Chances are if an in-process SQL database aren't sufficient for a non-concurrent application, it's most likely because you're using it inefficiently, rather than because of performance issues in the relationship management itself.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 softwareengineering.stackexchange