Question

I am working on MySQL 5.7 with the default configuration of InnoDB storage engine. I have two questions each for a different scenario and need to know the internal working on how MySQL sends data to the client.

Scenario 1:

If there is a select query on a very large data (say 1 GB), does MySQL pull the entire data from the disk (from .idb files) to the InnoDB Buffer Pool or does it send data to the client in batches without exhausting the server's memory?

Scenario 2:

A simple inner join of two tables without ORDER BY or GROUP BY clause (i.e the ordering of data doesn't matter). In such case, does MySQL send the data as and when the join data is accumulated (i.e in batches) or does it construct the entire JOIN result in the buffer pool and sends the data after? Is the entire data loaded in-memory?

The my.cnf configuration of my local machine is as follows:

[mysqld]
performance_schema=OFF
innodb_buffer_pool_load_at_startup=OFF
innodb_buffer_pool_dump_at_shutdown=OFF
innodb_buffer_pool_size=4294967296 
secure-file-priv = ""

Note: I have disabled the buffer pool load on shutdown and startup so as to simulate the condition where none of data is cached when the server is started.

Was it helpful?

Solution

The buffer pool is not used to materialize query results. It stores table and index pages to be shared between all connected sessions.

Each session allocates its own result buffer, where the selected data is copied from the buffer pool. This applies whether a join is involved or not.

If no additional operation is required, partial results will be sent to the client as soon as the result buffer is full. If the result set must be sorted, that will be done in the sort buffer (and temporary tables if necessary), then sorted data will be sent to the client, via the result buffer, as before.

You can read more about how MySQL uses memory in the documentation.

OTHER TIPS

mustaccio covers the Question well. Perhaps this gives another perspective...

Any database engine is designed to handle arbitrarily large resultsets, potentially much larger than the cache (buffer_pool) or RAM. To do so, it is best to 'stream' the resultset to the client as soon as final results are generated.

Let me digress into the client for a moment. Clients usually have two options: (1) Don't return from the call until all the result is available, or (2) give a few rows at a time. In pre-GB RAMs, this choice was important. Now, most clients can handle (1), so that is typically the default.

But does the Server do likewise? No. The goal is to ship the results ASAP. Finding space to hold it wastes time and resources. (And this pretty much answers the Question as asked.)

But, as already mentioned, there may need to be a sort (eg, ORDER BY without a suitable index). This is a severe break in the flow of the data. The engine must allocate space (RAM and/or on disk) to handle whatever needs sorting -- before continuing with the processing.

Meanwhile, InnoDB's buffer_pool should be thought of as a simple cache. And note that all processing of data and index rows is done only in the buffer_pool. So, when the query needs a block (of data or index), it loads that block into cache if it is not already there. (There are some low-level optimizations, but you can still view the buffer_pool as a simple cache.)

The resultset, once finally being generated, is buffered as it goes out. This is a standard Computer Science technique for handing thing off -- One process (the query processor) sticks data into the buffer; some other process sucks the data out to send it in different sized chunks to the Client. (Cf "FIFO", "pipe")

A "filesort" may be handled entirely in RAM (using qsort or a priority queue) or may turn into a MyISAM table (or another engine) for disk-based sorting.

If you say GROUP BY b ORDER BY x there may even be two temp tables and two filesorts.

You asked about "sending data". Did you get that from Profile? If so, notice that almost always nearly all the time is spent in "sending data". Hence Profiling is virtually useless. And, although it may be "sending" data, it is probably also "fetching" data to send.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top