SqlServer Bulk Insert with logic

https://stackoverflow.com/questions/15638687

29-03-2022
|

Question

I'm using SqlServer 2008R2 and I need to load 900 million records with the follow structure very fast.

varchar(20)
varchar(10)
varchar(50)
varchar(15)
varchar(20)
varchar(10)
varchar(4)
varchar(3)
varchar(10)
datetime
datetime
datetime
datetime
decimal(19, 2)
decimal(19, 2)
decimal(19, 2)
decimal(19, 2)
decimal(19, 2)
decimal(19, 2)
decimal(19, 2)
decimal(19, 2)
decimal(19, 2)

(I have various formats, some up to 100 columns of data - In total 221GB of data to load)

The problem is that the data comes from an old OS390 and if the date is null they send it in the text file like 99999999.

What is the best way to transform this data to be null? In Oracle you can put logic inside the formats, can you do that with BCP? or is the fastest way to achieve this using SSIS to load and transform at the same time? or with a trigger or something??

Loading as text and then transforming in the database I dont think is an option due to the volumes of data.

La solution

You can always transform the data any way you want using a .NET client application. I just loaded data at 430GB/hour with a trivial setup:

I just spun up 8 .NET threads and made them stream data using SqlBulkCopy in 10m row batches. Each thread inserted into its own heap table. This is kind of the simplest possible setup. This ran on my 4*2 Core i7 desktop with SQL Server on the same machine. SQL had about 50% CPU usage and my app had the other 50%. So the throughput could easily be doubled by using two machines and a fast network.

This allows you to insert the final data directly into a target table (which ideally is partitioned so you can load into separate partitions).

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow