Very large database, very small portion most being retrieved in real time

https://stackoverflow.com/questions/2876295

03-10-2019
|

Question

I have an interesting database problem. I have a DB that is 150GB in size. My memory buffer is 8GB.

Most of my data is rarely being retrieved, or mainly being retrieved by backend processes. I would very much prefer to keep them around because some features require them.

Some of it (namely some tables, and some identifiable parts of certain tables) are used very often in a user facing manner

How can I make sure that the latter is always being kept in memory? (there is more than enough space for these)

More info: We are on Ruby on rails. The database is MYSQL, our tables are stored using INNODB. We are sharding the data across 2 partitions. Because we are sharding it, we store most of our data using JSON blobs, while indexing only the primary keys

Update 2 The tricky thing is that the data is actually being used for both backend processes as well as user facing features. But they are accessed far less often for the latter

Update 3 Some people are commenting than 8Gb is toy these days. I agree, but just increasing the size of the db is pure LAZINESS if there is a smarter, efficient solution

Solution

With MySQL, proper use of the Query Cache will keep frequently queried data in memory. You can provide a hint to MySQL not to cache certain queries (e.g. from the backend processes) with the SQL_NO_CACHE keyword.

If the backend processes are accessing historical data, or accessing data for reporting purposes, certainly follow S. Lott's suggestion to create a separate data warehouse and query that instead. If a data warehouse is too much to accomplish in the short term, you can replicate your transactional database to a different server and perform queries there (a Data Warehouse gives you MUCH more flexibility and capability, so go down that path if possible)

UPDATE:

See documentation of SELECT and scroll down to SQL_NO_CACHE.
Read about the Query Cache
Ensure query_cache_type set appropriate for your needs.

UPDATE 2:

I confirmed with MySQL support that there is no mechanism to selectively cache certain tables etc. in the innodb buffer pool.

OTHER TIPS

This is why we have Data Warehouses. Separate the two things into either (a) separate databases or (b) separate schema within one database.

Data that is current, for immediate access, being updated.
Data that is historical fact, for analysis, not being updated.

150Gb is not very big and a single database can handle your little bit of live data and your big bit of history.

Use a "periodic" ETL process to get things out of active database, denormalize into a star schema and load into the historical data warehouse.

If the number of columns used in the customer facing tables are small you can make indexes with all the columns being used in the queries. This doesn't mean that all the data stays in memory but it can make the queries much faster. Its trading space for response time.

This calls for memcached! I'd recommend using cache-money, a great ActiveRecord write-through caching library. The ngmoco branch has support for enabling caching per-model, so you could only cache those things you knew you wanted to keep in memory.

You could also do the caching by hand using $cache.set/get/expire calls in controller actions or model hooks.

So, what is the problem?

First, 150gb is not very large today. It was 10 years ago.

Second any non-total-crap database system will utilize your memory as cache. If the cache is big enough (compared to the amount of data that is in use) it will be efficient. If not, the only thing you CAN do is get more memory (because, sorry, 8gb of memory is VERY low for a modern server - it was low 2 years ago).

You should not have to do anything for the memory to be efficiently used. At least not on a commercial level database - maybe mysql sucks, but I would not assume this.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow