Which method to use when bulk-inserting data that needs to be transformed (Performance)

StackOverflow https://stackoverflow.com/questions/19544189

  •  01-07-2022
  •  | 
  •  

Question

I have been dealt the task of importing data from a file into an SQL Server database table daily. I researched different methods on doing bulk inserts and my idea was to use the bcp utility from the command line, by running a scheduled task daily.

My biggest problem is I don't know how to or if it is even possible to transform dates when importing data using bcp. For example, I have a date field in the format dd.mm.yyyy which I haven't been able to store as a SQL datetime type.

The file is updated daily (it currently has about 2 million rows or 255 MB of data), the way this works is a new file is created every day which contains all previous data and some new data at the end of the file. The first row of the file contains the headers for the data. The data that follows is semicolon separated and each row ends with a \n. The real file has 16 columns, so I have simplified it in an example:

data.txt:

NUMBER;START_DATE;END_DATE;GROUP_ID;IS_OPEN;TOTAL;
2262101;02.10.2010;01.11.2010;123456789012345678;0;268,75;
2291245;01.11.2010;01.12.2010;123456789012345678;0;67,25;
etc...


The format file I created is shown below (also an example with less columns). I am currently not importing the dates as SQLDATETIME because it doesn't seem possible due to the format (DD.MM.YYYY).

format.fmt:

10.0
4   
1       SQLINT     0 0  ""   1 Id        ""
2       SQLCHAR    0 4  ";"  2 Number    ""
3       SQLCHAR    0 50 ";"  3 StartDate Finnish_Swedish_CI_AS
4       SQLCHAR    0 50 ";"  4 EndDate   Finnish_Swedish_CI_AS
5       SQLCHAR    0 20 ";"  5 GroupId   Finnish_Swedish_CI_AS
6       SQLBIT     0 1  ";"  6 IsOpen    ""
7       SQLDECIMAL 0 18 "\n" 7 Total     ""


For the command i specify the table/db, the data file, the format file, -T = Trusted connection and -F = First row 2 to skip the header row.

In cmd:

bcp [database].[dbo].[table] in C:\...\data.txt -f C:\...\format.fmt -T -F 2


The database table:

CREATE TABLE [dbo].[table](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Number] [int] NOT NULL,
    [StartDate] [varchar](50) NULL,
    [EndDate] [varchar](50) NULL,
    [GroupId] [varchar](50) NULL,
    [IsOpen] [bit] NULL,
    [Total] [decimal](18, 2) NULL

My idea was to save the dates as strings but reading them using CAST(StartDate as datetime), but that didn't seem to work with the dd.mm.yyyy format. I also tried executing set dateformat dmy.

So, I will need to import thousands of rows automatically into the database table daily and I need to transform the dates into datetime during this procedure. What is (performance wise) the best way to do this? Is it even possible to achieve it with bcp in an effective way?

Was it helpful?

Solution

Are you able to use SQL Server Integration Services?

SSIS would allow you to easily transform the data during import

SSIS

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top