Pregunta

Have 2 tables, Doc and DocDetail:

CREATE TABLE [Doc](

[ID] [int] IDENTITY(1,1) NOT NULL,
[DocTypeId] [int] NOT NULL,
[BusinessEntityId] [int] NOT NULL,
[Created] [datetime] NOT NULL CONSTRAINT [DF_Doc_Created] DEFAULT (getdate()),
[Updated] [datetime] NULL CONSTRAINT [DF_Doc_Updated] DEFAULT (getdate()),
[Active] [bit] NOT NULL CONSTRAINT [DF_Doc_Active]  DEFAULT ((1)),
[ReadOnly] [bit] NOT NULL CONSTRAINT [DF_Doc_ReadOnly]  DEFAULT ((0)),
CONSTRAINT [PK_Doc] PRIMARY KEY CLUSTERED ([ID] ASC) ) ON [PRIMARY]

 CREATE TABLE [dbo].[DocDetail](
[ID] [int] IDENTITY(1,1) NOT NULL,
[DocId] [int] NOT NULL,
[FieldId] [int] NOT NULL,
[RowNumber] [int] NULL,
[ParentRowNumber] [int] NULL,
[vString] [varchar](4096) NULL,
[vDate] [datetime] NULL,
[vTime] [time] NULL,
[vInteger] [int] NULL,
[vNumber] [decimal](26, 7) NULL,
[vReal] [float] NULL,
 CONSTRAINT [PK_DocDetail] PRIMARY KEY CLUSTERED ([ID] ASC)) ON [PRIMARY]

DocDetail will only have one of the vXXX columns populated. The others will be NULL.

All access to the tables is through a few stored procs and a views. I can add columns if needed (such as a column in DocDetail that repeats the Doc Created date, populated via a trigger, for partitioning), but I can't rewrite the app.

DocDetail has over a billion rows, with about 9 million added per day now due to new business! In the past, data was 'archived' by moving rows to another database based upon the Doc table's Created column. Over time they were removed from that DB as the rows 'aged'. However, additional external requirements prevent us from continuing to do so and the data is piling up quickly. The server has 256GB RAM and a nice SAN with plenty of storage (so far). I have a decent test environment I've been playing with to verify my solution.

Looking into partitioning. About 45% of the OLTP is for data that was created within the last 30 days. About 25% is on data within 60 days from then (i.e., older than 30 days, younger than 90 days). The remainder falls off over 90 days periods up to about 18 months. I need to keep 7 years worth of data, but it doesn't have to be all in this db.

Any recommendations for a partitioning scheme?

¿Fue útil?

Solución

Check out my presentation for SQL PASS on Effective Data Warehouse Storage Patterns.

http://craftydba.com/?page_id=880

The presentation reviews the following techniques to fix your woes. It has working code for a 1.2 M row database.

Coverage:

1 – What is horizontal partitioning?
2 – Database sharding for daily information.
3 – Working with files and file groups.
3 – Partitioned views for performance.
4 – Table and Index partitions.
5 – Row Data Compression.
6 – Page Data Compression.
7 – Programming a sliding window.
8 – What are Federations in Azure SQL?

As for which way to go, it is up to you.

Both sharding and partitioned views can be done with the Standard version of SQL Server.

Data compression and table partition are available in the Enterprise version of SQL Server.

Since you are building the warehouse from scratch, you can change the data types to eke out space.

I was able to re-organize a 4 TB database into 500 GB using compression and partitioning.

In summary, both DATES and INTEGERS are good candidates for partition keys.

In my own warehouse, a date dimension mapped a date to an integer.

Play around in a test environment to get a feel on how a real rebuild will work.

Good luck.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top