Best practice for inserting and querying data from memory

https://stackoverflow.com/questions/3178628

02-10-2019
|

Question

We have an application that takes real time data and inserts it into database. it is online for 4.5 hours a day. We insert data second by second in 17 tables. The user at any time may query any table for the latest second data and some record in the history...

Handling the feed and insertion is done using a C# console application...

Handling user requests is done through a WCF service...

We figured out that insertion is our bottleneck; most of the time is taken there. We invested a lot of time trying to finetune the tables and indecies yet the results were not satisfactory

Assuming that we have suffecient memory, what is the best practice to insert data into memory instead of having database. Currently we are using datatables that are updated and inserted every second A colleague of ours suggested another WCF service instead of database between the feed-handler and the WCF user-requests-handler. The WCF mid-layer is supposed to be TCP-based and it keeps the data in its own memory. One may say that the feed handler might deal with user-requests instead of having a middle layer between 2 processes, but we want to seperate things so if the feed-handler crashes we want to still be able to provide the user with the current records

We are limited in time, and we want to move everything to memory in short period. Is having a WCF in the middle of 2 processes a bad thing to do? I know that the requests add some overhead, but all of these 3 process(feed-handler, In memory database (WCF), user-request-handler(WCF) are going to be on the same machine and bandwidth will not be that much of an issue.

Please assist!

Solution

I would look into creating a cache of the data (such that you can also reduce database selects), and invalidate data in the cache once it has been written to the database. This way, you can batch up calls to do a larger insert instead of many smaller ones, but keep the data in-memory such that the readers can read it. Actually, if you know when the data goes stale, you can avoid reading the database entirely and use it just as a backing store - this way, database performance will only affect how large your cache gets.

Invalidating data in the cache will either be based on whether its written to the database or its gone stale, which ever comes last, not first.

The cache layer doesn't need to be complicated, however it should be multi-threaded to host the data and also save it in the background. This layer would sit just behind the WCF service, the connection medium, and the WCF service should be improved to contain the logic of the console app + the batching idea. Then the console app can just connect to WCF and throw results at it.

Update: the only other thing to say is invest in a profiler to see if you are introducing any performance issues in code that are being masked. Also, profile your database. You mention you need fast inserts and selects - unfortunately, they usually trade-off against each other...

OTHER TIPS

What kind of database are you using? MySQL has a storage engine MEMORY which would seem to be suited to this sort of thing.

Are you using DataTable with DataAdapter? If so, I would recommend that you drop them completely. Insert your records directly using DBCommand. When users request reports, read data using DataReader, or populate DataTable objects using DataTable.Load (IDataReader).

Storying data in memory has the risk of losing data in case of crashes or power failures.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow