Question

I have an architecture/performance problem.

The problem: enter image description here

  • One voting based application with: 1 loadbalancer, 2 webservers and a database
  • 10 candidates and 200k voting users.
  • There may be more than 100 users voting at the same time.
  • The database can only hold 1 write concurrently and does not support transactions. If this value is exceeded, the database responds immediately with an error code XXXXX.
  • An user should receive an immediate response that vote was accepted.

The question, what is the best approach to avoid the user getting the error and sustain the demanded load? (without changing the underlying database)

PS: I don't have more details, it's a generic problem.

Possible Solution:

  • If it's not possible to add more resources, use both WS's but only one will write on the BD.
  • Add an external queue to handle the quests
  • One of the WS will work like an "Worker" , it will read the queue and write on the BD with concurrency=1.
Was it helpful?

Solution

Reading the specs, the main point of your requirement is:

An user should receive an immediate response that vote was accepted.

To achieve that, you could separate out the vote-taking-part. Each of your two instances need to talk to a component responsible for that.

Then the problem arises, this component becoming the bottleneck in your architecture. It would make no sense parting your traffic 50/50 on two different servers which both wait on one common used / shared component.

This makes it necessary, that the component is able to deliver fast responses without having to wait (too long) on other components.

I) A proper way to achieve fast responses would be using (as you already said) a message queue.

The vote-taker hands the incoming vote request off to a queue and could immediately answer to the client, that its vote has been received.

Mission accomplished.

There is no need to switch databases.

II) Depending on your demands, you could use some in-memory key-value-stores like redis or hazelcast to take the votes. How you transfer the data to your DB is seondary.

III) Having introduced Hazelcast, there is also the possiblity to use its locking mechanism to lock the component able to write to the db.

IV) Depending on your real time demands: LMAX maybe something you are looking for (on SE-Radio)

There are many possibilities.

OTHER TIPS

The database can only hold 1 write concurrently and does not support transactions. If this value is exceeded, the database responds immediately with an error code XXXXX.

This is extremely bizarre for any modern database, but let's accept the premise.

At a high-level, there are 2 approaches to concurrent data modifications:

  1. Pessimistic - This is lock-based, which will acquire a lock on a resource before modifying it, and release the lock afterwards. If another process has that resource locked, requests to acquire a lock on that same resource will block. This basically serializes updates to this resource. I am intentionally being vague when I say resource, because it is somewhat ambiguous. Simplistically, a resource is usually a row in a database table, but it can also be pages/blocks, ranges, or entire tables (the specifics of this vary from one DBMS to another).

  2. Optimistic - Locks are never acquired, but when updating data, there is an implied if-it-has-not-changed-since-i-read-it stipulation and if that stipulation is violated, an error will be thrown. (Sometimes this is implemented by literally comparing old values to new values, or by comparing a hash code, version number, etc. Details are irrelevant at this level).

Problem analysis

Since you are getting an error on concurrent inserts, I suspect you either have the latter, an optimistic concurrency strategy in place, or you have some wonky DBMS that simply does not support concurrent updates whatsoever. If using optimistic concurrency, then, however that was setup, it is a poor use case for it. Optimistic is, well, optimistic. It assumes that updates and overlapping changes are rare, so it optimizes for the common case (very efficient reading) at the expense of rare cases (conflicting changes). If you are hitting a bottleneck where inserts are regularly failing due to such a case, then your read-write ratio is better suited to pessimistic locking.

Long-story short, whatever the reason is, the solution is the same: these inserts need to be serialized. Some of these answers on this question suggest message queues or dropping to a single server (which, by the way, I don't think will really solve the problem...), using Hazelcast to do cross-server locks, etc. They are all variations of the exact same thing: serialize the inserts!

Recommended solution

The fact is, any ACID-compliant DBMS will handle this for you automatically. If you are using an ACID-compliant DBMS, but using optimistic concurrency (some sort of a version field, probably?), take that out. Switch to plain insert/update/delete without any versioning and see if that solves your problem.

If you are not using an ACID-compliant DBMS, then my recommendation is to switch to a better database. Yes, you can solve this through other means (like Hazelcast, message queue, etc.), but they either add single points of failure or re-invent database locking. Frankly, that scares me. Proper database locking is an extremely complicated subject with armies of engineers with PhD's designing it. I would never want to re-invent it myself. That is a waste of my time. It would be cheaper to change databases.

Beyond that

For questions like this, it would be good to specify the database software you are using. I am skeptical that your database really has this limitation. It's like a highway which only allows one car at a time.

There is no point introducing a cluster of app servers fronted by a load balancer, if all are pointed at a common "database" which is not designed to handle concurrency. I wouldn't call it a database at all because that implies it handles concurrency as one expects from a modern DBMS.

Since the requirement is you cannot change that piece of the architecture, then you have to add functionality to the app servers to handle the concurrency. My assumption is when we say "voting" that means whatever is being voted on, each vote is not dependent on how others are voting. For example, when a person casts a vote for X, they do that regardless of how many other people are or are not voting for X. If this assumption is not true, then there is a lot more work needed.

If the votes are cast without the above mentioned dependency, then it is fairly simple. Within the web app you make sure it can handle the number of sessions you anticipate (100) and the job of each session is to log in the user, accept their vote, and then pass the vote to the back end. The webapp session needs to handle the possibility that the error will come back, and simply retry until successful. It doesn't matter what order or the specific timing of when the votes come in, it just matters that each is eventually received and counted. Of course, there needs to be logging of each session and the votes so that the votes can be reconciled and make sure that the votes that are captured in the end are consistent with the votes submitted by the web app sessions.

If the assumption I mention is not true, and how people vote does have a dependency on the other votes, then the web app needs to handle inter session communication and locking. It should not take a tremendous amount of time to take each vote and write it to the back end, but the web app needs to ensure only one session is writing and all other sessions are blocked while that happens, and to make certain that all the sessions have "seen" the vote just cast before allowing any other sessions to proceed. There are many ways to do this, but I'd be tempted to use a lightweight embedded DB which supports transactions, logging, and concurrent writing -- but then transfer all the votes to the back-end upon completion of the voting.

I would not look to message queuing because if each vote cast might change based on the other votes, message queuing won't give you that. It is just a spooler that is guaranteeing taking in the vote and making sure it is processed. I don't think you need that level of complexity for taking in simple votes, because it shouldn't take a significant amount of time to write each vote to the back end and handling the error and retrying will suffice.

The database can only hold 1 write concurrently and does not support transactions

Lack of transaction is the main issue. I would suggest:

  1. If possible, update to a different RDBMS capable of transaction, e.g. Postgres. You really need ACID properties (or at least an RDBMS capable of locks), so that is the only honest and professional solution.

  2. otherwise, "simulate" a transaction or "serialization" machinery. If your code is running on Linux, you might wrap the code doing the request around some shared semaphore (see sem_overview(7)...) or some file lock (see lockf(3), or flock(2)) on the same file (if both web servers run on the same computer), or have some other process providing such a synchronization.

  3. Have some way to detect web requests which would write into the database, and redirect all of them to the same webserver (e.g. the left one on your figure) inside the load balancer.

  4. Remove the load balancer and one of the web servers, so simply have one single websever talking to the DBMS.

Be aware that lack of transactions (or at least, of locks) in the RDBMS would slow down significantly the website (assuming that the cost of ACID-ity in the RDBMS is negligible); basically, solutions 2 & 3 are equivalent to (4) disabling the web balancing and one of the web servers.

just like the other responders, i would really like to know what database this is.

to answer your question, i would use a Distributed Lock Manager to ensure only one process can write to this database...ZooKeeper is one of the most popular implementations at the time a write this...Redis RedLock looks very promising.

Licensed under: CC-BY-SA with attribution
scroll top