Pergunta

Suppose we have a DB with a table holding

Person:
id
first name
last name 
age

The DB is a black box. And there are a lot of get and post requests for these records. We cannot change the DB but only calling its API. The API is very slow. How can we come up with a design to reduce the impact of slow DB access?

My initial thought is Use LRU cache for get requests. Message Queue for post/put requests to be processed asynchronously.

Second thought, maybe we can use Bloom filter for the get requests based on last and first name combined with LRU cache. But the post/put requests will still be processed asynchronously.

Is this the right approach? Please share your thought.

Foi útil?

Solução

If possible, start simple (or spend more time analysing the actual usage patterns - eg read vs write). It might be that you can put a simple cache on the front to help with only read performance:

  1. a read pulls from the cache. If not cached, loaded from db into cache first
  2. a write writes through to db directly but also clears the cache entry

This approach is simplistic and the devil may be in the detail but if it is appropriate, it might be the best result for the least effort. A bloom filter is the opposite of the simplistic approach, but could be used in conjunction.

Outras dicas

Fixing the database should be a priority.

Thoughts: Most data access follows the 80:20 rule. 80% of the requests use 20% of the data. And in that 20% the same applies. 80% of its accesses use only 20% of it's data.

So caching can help. But now you have cache coherency issues. Does user A have a different cache from user B? Sometimes you don't care. If a database is read only, then separate caches are ok.

You also need to know what the correlation is between users. If both User A and User B want fiscal report 2019, quarter 3 data then even for read access you want a single cache. So caching may be best at the department or division level, not the worker level.

Write operations are a different kettle of fish. There is a lot of arcane magic in database programming and in the design of the database itself to handle multiple people wanting to write the same data. Doing this in your front end is the same order of difficulty as writing a database from scratch, especially as you don't know the mechanisms in the black box.

If not having perfectly up to date reports is tolerable, then you can write transactions to a log file, then have a separate process commit them to the database. But now you have the issue of what happens when both UserA and UserB have pulled a copy of record 1234 and have modified one field in it. Same field? Different field? The actual database is usually designed with some form of record locking:

UserA grabs record 1234. UserB also grabs it. UserB wants to make a modification. Database locks record. UserA's client is notified that record 1234 is locked. B's modifcation is written to the database. A is given the updated record. If A tried to modify the record while B had it locked, A would receive a 'locked' error.

Not too bad when users are squabbling over an individual record. What happens when User B is doing a bulk update. Imagine a photo database, and User B wants to change the spelling of a keyword, or apply a copyright notice to 5 million image records.

You then have to consider relative speed. At one point in the Bad Old Days we had a box with some 256 MByte of ram in it. Our desk top machines had 4 MByte. It worked out that the network was faster than disk, so for small data access (Under 4K) it was faster to set set up the big box as a memory server.

I had a similar problem with a Java application. Using a DB with superslow speeds which I could do nothing about, and those speeds basically made the application unusable if the queries were synchronous.

I solved it by creating a cache, where all reads went to the cache, when the cache did not have the element in question, it loaded and returned it.

When a write was made, the action was added to a queue. A thread was started, that had a while loop, e.g while queue not empty do operation if the queue became empty the thread was suspended and woken again upon things being added to the queue. If the write was a deletion, all other writes on that element were removed from the queue, the delete taking the position of the first item (this could save a bit of IO)

Changes made from the GUI were shown instantly although if the operation then failed the changes were reverted and a small warning about connection problems shown.

This worked very well, however:

  • It was a single-user program so only one client would ever be running
    • the program assumed direct db connection
    • the implementation used Singleton, which was probably ill advised.

Web api might not be the best suited candidate, perhaps rather a TCP service: thing is: you want an ongoing connection. Every client needs to know when one of it's models has a value updated

Licenciado em: CC-BY-SA com atribuição
scroll top