Domanda

While reading SQL Server Query Performance Tuning written by Grant Fritchey, I found it difficult to understand the following part: avoid RAID 5 for t-logs because, for every write request, RAID 5 disk arrays incur twice the number of disk I/Os compared to RAID 1 or RAID 10.

I know that RAID 5 differentiate from other RAIDs with its parity feature. It means that if some of the drives fails, then it is possible to recover lost data from the other drives. I want to understand why it is not recommended to use RAID 5 for a transaction log file. Explanation in the book was not enough for me to get it. Maybe someone could explain it to me or provide a good article.

È stato utile?

Soluzione

Transaction log writes, when they occur, are synchronous operations, that is, the activity that has caused a log write must wait until log I/O completes before continuing to do whatever it is doing. As a result log writes are very sensitive to the write throughput of the underlying storage.

As you have mentioned, every write to a RAID-5 device has an overhead1 of calculating and writing a parity block in addition to the data block(s). However small, this extra work RAID-5 performs on each write operation is the reason behind the recommendation to not use RAID-5 for log storage.


1 - More details in this Q&A

Altri suggerimenti

RAID-5 maintains redundancy by using N-1 disks for data and 1 disk for the XOR of that data. (It's not actually the same disk used for all the parity; that's RAID-4. RAID-5 distributes the parity across all the disks, changing at each "stripe" boundary.) https://en.wikipedia.org/wiki/Standard_RAID_levels#RAID_5

The biggest RAID-5 write overhead (read + rewrite of the parity disk for that block) only applies for short writes. A full-stripe write (e.g. as part of a large sequential I/O without sync/flush after small steps) can just calculate the parity stripe from the data stripes and write them all in parallel, without having to read anything from disk.

As mustaccio points out, transaction log writes have to hit disk before we can allow later writes to hit disk. (Or at least the battery-backed memory of a RAID controller. i.e. become persistent.) This typically means that they can't be buffered into a big contiguous full-stripe write.

In the optimal case, N-disk RAID-5 sequential write bandwidth in theory equals per-disk bandwidth times N-1. (Plus some CPU time, or not even that if the XOR parity computation is offloaded to a hardware RAID controller.)

In the pessimal case, yes, RAID-5 has to do extra disk I/O to read the old data and parity and update it by XORing the old data into the parity (to remove it), and then XORing in the new data.


Notice that it's not just calculating the parity that adds the big overhead. It's that the data you need to calculate new parity might be sitting on disk, not in memory, for small writes.

RAID-5 is (very) bad at small writes, very good with large writes (almost as good a RAID-0), and good for reads in general.


Historically some RAID controllers would read the full length of a stripe to update parity, but at least Linux software RAID only reads the sectors that correspond to the actual small write. This helps some, but small-ish stripe size like 32k or 64k (I think) is usually a good thing (so full-stripe writes are more common without having to buffer megabytes of data).

Still, that just goes from "very very bad" to "very bad" compared to RAID10 or RAID1 where small writes can just happen on both disks that hold the blocks being written.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a dba.stackexchange
scroll top