문제

A Microsoft Technet article recommends creating a secondary file group as the default file group (see reference below). The secondary file group should have a number of files, say four, that are each placed on separate disks. As an added rule-of-thumb from a colleague, the number of files should be equal to the number of CPU cores.

My understanding is that this setup is ideal for mechanical spinning disk drives where there is a performance boost by streaming data from multiple heads because a spinning disk drive is much slower than a solid-state drive. Is this understanding correct?

If yes, then my question is whether the cost-based optimizer takes into account the newer solid state drives? It seems like the performance bottlenecks from the spinning hard drives goes away when switching to the new solid-state drive. Our IT Operations groups has informed me that while I am currently allocated one virtual drive, the data is really stored on an ISCSI SAN, which has multiple solid-state drives.

This question is geared for trying to answer what is the most optimal setup for a large database of this magnitude:

  1. Should I have a default secondary file group with just one file?
  2. Should I have a default secondary file group with a number of files that equals the number of cores on the CPU?
  3. Is there a performance boost by partitioning a database table when using solid state drives?

The current project I am working on requires a scaled database that will be a few terabytes in size that stores massive amounts of log data. A one week sample has approximately 150 million records and we need to store a rolling 3 years of logs. So, I am now looking at long running queries to find data. I have tuned the indexes to the point where nearly all of the work is attributed to non-clustered index seeks; the optimizer does not recommend adding a missing index.

Note

The licensing on Microsoft SQL Server is currently by CPU core. So, there is sensitivity to throwing more cores at the problem, especially if that will not boost performance.

Additionally, I am currently developing on SQL Server 2014, but will be migrating to SQL Server 2017 for both development and production.

Update 1

The project will be loading logs nightly where I expect few (perhaps none) updates or deletes because the logs are not expected to change at all - so they will not be re-loaded. Everything else will be reads for analytic purposes.

The PRIMARY file group for the system tables where the SECONDARY default file group is for everything else. The reason for doing this is explained by the link that is referenced at the bottom of this question.

Separate file groups will be created for the table partition. There are other tables within the database that are small enough where they will reside within the SECONDARY file group - I am only partitioning two tables where one exceeds 100 million records (partitioned by IDENTITY row number) and the other will go into the billions of records (partitioned by time [monthly]).

I plan on partitioning by month over 3 years. So, there will be 36 partitions. I will create file groups for each year and then place 12 files into the corresponding yearly file groups. The partition strategy is to reduce read times as there will be a lot of data scanning for analytic purposes. The annual file group strategy is strictly for ease of maintenance for the DBAs where they can remove a year's worth of data by removing a single file group.

Reference:

SQL Server Best Practices Setting a Default File Group

도움이 되었습니까?

해결책

The cost-based optimizer does not (currently) change its estimated IO costs based on your hardware.

You say this question is geared towards answering what filegroup and partitioning strategy is best for your environment. In that case, who cares what the optimizer does? The question is what works well in your environment, and the real answer is to test it.

다른 팁

As a rule of thumb, add filegroups to improve maintenance or manageability and add files to improve performance. Of course, not all workloads or scenarios will benefit from additional files or filegroups. For a few examples:

  1. Consider a windows server with many CPUs and just one data file for the user database. Having just one data file means that I only have one LUN. It is possible to create a workload that needs more I/O than a single LUN can provide. If the workload is bottlenecked by I/O then it is possible to improve performance by adding more files to the database which each file on its own LUN. You can tell if you're in this situation by looking at wait stats.
  2. Consider a windows server with many CPUs and just one data file for the user database. It is possible to create a workload that is bottlenecked by logical contention caused by having just one file. Certain types of queries that change data will all need to modify the same PFS page at once, for example. If the workload is bottlenecked by logical contention in memory then it is possible to improve performance by adding more files, even if those files are on the same LUN as before. Having additional files means there are more "active" PFS pages which reduces contention. See this article for more details.

  3. You can assign tables or partitions to specific filegroups. You can't do that for files. So if you have old data that you want to purge and you want to reclaim all of the storage then filegroups might be a good option for you. Putting all user data in secondary filegroup has some maintenance benefits that some folks swear by. The important point is that filegroups should be thought of helping with management as opposed to performance.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 dba.stackexchange
scroll top