Question

I am reading a book on databases and a chapter in it talks about buffer manager and it's replacement strategies.

I've noticed that the DBMS interacts directly with memory, as opposed to asking OS to interact with virtual memory. Is that correct? How exactly does this direct communication takes place? How much of memory is DBMS allowed to allocate? I've never been exposed to such low level function calls and I'd love to see some explanations and examples of it.

Getting this kind of info is kinda troublesome to find on google if you don't know the correct keywords to search for.

Was it helpful?

Solution

Let's talk about MySQL/MariaDB with ENGINE=InnoDB only. (That is virtually the only Engine in use today.)

  • All InnoDB activity happens in the buffer_pool.
  • All data and each secondary index is composed of 16KB blocks.
  • Those blocks are brought into the buffer_pool as needed, and flushed out as needed.
  • It uses a LRU (Least Recently Used) algorithm for controlling which blocks to bump out. (There are minor tweaks to LRU, so it is not exactly LRU.)

Hence, it is very important go give InnoDB most of RAM, then let it do its thing. A simple rule of thumb: innodb_buffer_pool_size should be set to about 70% of RAM. For most applications, that is the most important setting and the only one that needs to be changed from the default.

Setting the buffer_pool too large could lead to swapping. But, since InnoDB treats it as a big array, that could cause a terrible amount of swapping, hence slowing down MySQL/MariaDB terribly.

InnoDB has threads for reading writing blocks. In particular, flushing to disk is somewhat done "in the background". This allows an INSERT to appear to be faster than it really is.

I discuss memory allocation further in here

Using "virtual memory" for accessing a file is a lazy way to do things. It works OK for simple cases. However, a database engine needs to be efficient and fast. When the code knows the access pattern, it is best to use an access method that is tuned for such.

For example, if you are going to read sequentially through a file, it is best (for memory management) to ping-ping between two 'small' buffers; one being read into and one being parsed. (A single "circular" buffer is a good alternative.)

How exactly does this direct communication takes place?

Because everything is in 16KB blocks, there is a number that uniquely identifies such. It is rather large (20? bytes). It includes at least the "tablespace" number, and block number within the table space. This is taken modulo the number of buffer_pool_instances to decide which instance to look in.

Within a block, there are rows of the table or rows of a secondary index, or rows of a node in the B+Tree (see Wikipedia) that it represents.

If the block is not in the buffer_pool, it must wait while a read occurs. But, before the read can occur, there might need to be a write to flush out some block. Normally, a background thread(s) flushes out "dirty" blocks so that there is always some free blocks in the buffer_pool. ("Free" or "easily removed" because they are not dirty.)

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top