SQL SERVER Database IO performance impact of splitting database tables into FileGroups and multiple Files per filegroup

https://dba.stackexchange.com/questions/207392

01-01-2021
|

Question

SAN LUN ambiguity wrt performance

MSDN says to split read/write operations into a drive, sequential writes to another drive. (PRIMARY filegroup use typically data with read, update, write activity while LOG filegroup is sequential write mainly) Documentation predates Storage Area Networks.

I get that grouping storage to read/write activity would optimise the heads on disk - minimise disk thrashing, sort of thing... but how is this dealt with when a "Disk" on the OS is an allocation to a storage pool derived from many disks?

The "bottleneck" of IO is then the IO controller on the SAN, and it controls the data flow from the data on striped groups of disks? Meaning, LUN1 exposed to Server1 OS as 2 500GB drives, each split into 2 logical drives D:, E:; F:,G: of 250GB each: A database having a 750GB table created on a a filegroup of 4 files (D:\f1.mdf, E:\f2.ndf, F:\f3.ndf, G:\f4.ndf) will have parrallelism across all the logical drives (OS file driver optimised?) and SQL can use more file handler threads? but all the IO still comes through the LUN controller maxing at 1GB/sec.

Or is the DISK seperation then implied to be LUN level splits? Meaning LUN1 250GB, LUN2 250GB, LUN3 250, LUN4 250... each exposed to the OS as a seperate physical drive, mapped to a logical drive each, and the SAME FILEGROUP split would ONLY then realise the benefit of IO optimised throughput? I guess my question is then dependent on whether the SAN storage controller can sustain a 1GB/sec throughput on each LUN.

Or does it mean a storage controller for each LUN?

Can someone more experienced advise whether the gotcha of aligning the striping to the NTFS blocksize is participant in this discussion? OS NTFS allocation size blocks... oh! and the network packet size - is this also an influencing factor in terms of data throughput to the client? IPv6 vs IPv4 - is this a factor also?

Solution

Talk with your SAN administrator. That's the source that should be able to explain you with detailed information what kind of SAN you've got and what it is capable of. Have you got MPIO? What's the structure of your fabric?

Sometimes it makes sense to use multiple LUNs for different files. Behind the scenes, there might be different types of storage available.

A SAN with some SSD capacity might offer tiering. That is, the hottest data is elevated to SSD -backed part of the system whilst colder data is persisted on HDDs. Some models provide autotiering, in which the system decides what's hot and what's not. Some models require administrative action.

HP EVAs provide VRAID levels 0, 1 and 5. Using those will spread a LUN on all the disks in an array. This increases reliability, but the cost with VRAID 5 is increased controller CPU usage. The SAN admin needs to balance increased storage costs against increased CPU usage.

Aligning NTFS blocks with storage is dependent upon the SAN. If you are in addition using virtualization (Hyper-V, VMWare, whatever), that's additional complexity introducec. Your storage vendor is likely to offer white papers, best practices and consulting services about how to configure the storage for optimal performance.

Network packet size is relevant - if you use iSCSI. For FibreChannel, IP settings are not relevant.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange