Question

I am trying to understand read-ahead reads, but it seems a bit complicated to me. I searched on the web and got the following:

From Reading Pages (Microsoft documentation):

Read-ahead anticipates the data and index pages needed to fulfill a query execution plan and brings the pages into the buffer cache before they are actually used by the query.

From an answer to Why is 'physical reads' less than 'read-ahead reads' & 'logical reads' in SQL Server for first time execution of query? by huntharo on Stack Overflow:

Physical Read - The query is blocked waiting for the page to be read from disk into the cache for immediate use.

Read-Ahead Read - The page is being read before it blocks the query and is read into the cache as are all reads. Read-Aheads are possible when you are scanning an index, in which case the next leaf pages in the index can assume to be needed and the read can be initiated for them before the query actually says it needs them. This allows the disk to be busy while the db engine is examining the contents of previously fetched pages.

Maybe someone could clarify the above using their own explanation because I can't find a detailed explanation for read-ahead reads.

To set an example, look at statistics io info:

Table 'TestLarge'. Scan count 1, logical reads 159185, physical reads 348, read-ahead reads 159209
Was it helpful?

Solution

A query always reads data from memory (a logical read). Your example query scanning the TestLarge table touched 159,185 8KB memory pages during its execution.

During execution, SQL Server does two things.

1. It reads data from the pages that belong to the table.

If the required page is already in memory, a logical read is recorded.

If the required page is not in memory, a physical read is recorded.

  • The page is brought into memory from persistent storage.
  • The query is blocked until this read completes.
  • This happened 348 times to your test query.
  • A logical read is also counted when SQL Server processes the page (that is now in memory) to satisfy your query.

2. It issues read-ahead reads.

Every so often during the scanning operation, SQL Server spends a moment managing read-ahead:

  • SQL Server gathers a list of 8KB pages that may well be encountered by the current operation in the near future. You can think of this as the engine "looking ahead" of the current scan position for the pages that come next. It achieves using IAM (allocation map) pages or b-tree levels above the leaf, depending on the type of scan.
  • Any pages on this "look-ahead" list that are not already in memory are passed to the operating system in one or more asynchronous read requests. These are counted as read-ahead reads.
  • The operating system handles reading the pages into SQL Server memory, and notifies SQL Server when the reads are complete.
  • The SQL Server thread that issued the asynchronous reads is not blocked. It can continue scanning pages that are in memory while the operating system fetches read-ahead pages in the background, on a separate thread.
  • Your test query read 159,209 pages into memory via the read-ahead mechanism.

An analogy

Imagine there is a book. You are given the index only. The rest of the book is in the local library. The library has a rule that the whole book cannot be checked out, and only 50 pages at most can be taken from the library at each visit.

Your task is to assemble the book at home in the order pages are referenced in the index (a to z). You are not allowed to leave home, but you have a friend that can go to the library on your behalf.

The first entry in the index is for "aardvark", which appears on page 392 of the book.

You realize it will be very inefficient to do this task a page at a time, so instead of sending your friend to the library for page 392, you read 50 entries in index order, and give your friend that list of pages to take to the library. You count 50 read-ahead reads at this point.

Now you turn back to processing "aardvark". You don't have page 392 in front of you, so you have to wait, doing nothing, until your friend gets back. This is a physical read.

When your friend arrives you count a logical read when you process page 392.

You could start on the other 49 pages your friend brought back (counting a logical read for each one), but you realize it will be more efficient if you give your friend another list of pages to fetch from the library while you are busy with the work in front of you.

Each time you send your friend to the library with a list of pages to fetch, you count read-ahead reads. Each time you process a page in front of you, you count a logical read. If you find yourself without the next page you need (because your friend is too slow), you count a physical read.

The overall task completes quicker when you and your friend can overlap your activities effectively. They can be busy fetching pages you will need soon, while you are busy processing the pages you have in front of you. When this works well, you never have to wait for the next page you need, though you do spend a little time telling your friend what to do.

OTHER TIPS

Read-Ahead reads - this means those pages were not needed by the query you've just executed, but - SQL Server has cached them "just in case", as it thinks those pages can be requested by next queries

From your statistics, your query required total 159533 pages to run
159533 = 159185 that were already in cache (logical reads) + 348 were not in cache so it fetched them from disk (physical reads)

And it also fetched from disk more 159209 pages into cache (read-ahead reads) "because they can be requested by subsequent queries"

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top