SQL Bulk Insert with FIRSTROW parameter skips the following line

https://stackoverflow.com/questions/1029384

06-07-2019
|

Question

I can't seem to figure out how this is happening.

Here's an example of the file that I'm attempting to bulk insert into SQL server 2005:

***A NICE HEADER HERE***
0000001234|SSNV|00013893-03JUN09
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09

Here's my bulk insert statement:

BULK INSERT sometable
FROM 'E:\filefromabove.txt
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)

But, for some reason the only output I can get is:

0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09

The first record always gets skipped, unless I remove the header altogether and don't use the FIRSTROW parameter. How is this possible?

Thanks in advance!

Solution

I don't think you can skip rows in a different format with BULK INSERT/BCP.

When I run this:

TRUNCATE TABLE so1029384

BULK INSERT so1029384
FROM 'C:\Data\test\so1029384.txt'
WITH
(
--FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)

SELECT * FROM so1029384

I get:

col1                                               col2                                               col3
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
***A NICE HEADER HERE***
0000001234               SSNV                                               00013893-03JUN09
0000005678                                         ABCD                                               00013893-03JUN09
0000009112                                         0000                                               00013893-03JUN09
0000009112                                         0000                                               00013893-03JUN09

It looks like it requires the '|' even in the header data, because it reads up to that into the first column - swallowing up a newline into the first column. Obviously if you include a field terminator parameter, it expects that every row MUST have one.

You could strip the row with a pre-processing step. Another possibility is to select only complete rows, then process them (exluding the header). Or use a tool which can handle this, like SSIS.

OTHER TIPS

Maybe check that the header has the same line-ending as the actual data rows (as specified in ROWTERMINATOR)?

Update: from MSDN:

The FIRSTROW attribute is not intended to skip column headers. Skipping headers is not supported by the BULK INSERT statement. When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.

I found it easiest to just read the entire line into one column then parse out the data using XML.

IF (OBJECT_ID('tempdb..#data') IS NOT NULL) DROP TABLE #data
CREATE TABLE #data (data VARCHAR(MAX))

BULK INSERT #data FROM 'E:\filefromabove.txt' WITH (FIRSTROW = 2, ROWTERMINATOR = '\n')

IF (OBJECT_ID('tempdb..#dataXml') IS NOT NULL) DROP TABLE #dataXml
CREATE TABLE #dataXml (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, data XML)

INSERT #dataXml (data)
SELECT CAST('<r><d>' + REPLACE(data, '|', '</d><d>') + '</d></r>' AS XML)
FROM #data

SELECT  d.data.value('(/r//d)[1]', 'varchar(max)') AS col1,
        d.data.value('(/r//d)[2]', 'varchar(max)') AS col2,
        d.data.value('(/r//d)[3]', 'varchar(max)') AS col3
FROM #dataXml d

You can use the below snippet

BULK INSERT TextData
FROM 'E:\filefromabove.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|',  --CSV field delimiter
ROWTERMINATOR = '\n',   --Use to shift the control to next row
ERRORFILE = 'E:\ErrorRows.csv',
TABLOCK
)

Given how mangled some data can look after BCP importing into SQL Server from non-SQL data sources, I'd suggest doing all the BCP import into some scratch tables first.

For example

truncate table Address_Import_tbl

BULK INSERT dbo.Address_Import_tbl FROM 'E:\external\SomeDataSource\Address.csv' WITH ( FIELDTERMINATOR = '|', ROWTERMINATOR = '\n', MAXERRORS = 10 )

Make sure all the columns in Address_Import_tbl are nvarchar(), to make it as agnostic as possible, and avoid type conversion errors.

Then apply whatever fixes you need to Address_Import_tbl. Like deleting the unwanted header.

Then run a INSERT SELECT query, to copy from Address_Import_tbl to Address_tbl, along with any datatype conversions you need. For example, to cast imported dates to SQL DATETIME.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow