Bulk Insert a File with a Text Field with Carriage Return (Enters)
-
22-07-2019 - |
Question
I´m having trouble with bulk insert a file that contains a field in data type text, and it holds enters and the delimiter for the fields are pipes "|" and the row terminator is "|\n"
I get an error of truncation for the fields next to the text field. I think that the bulk insert thinks that the enters in the second field are the next rows.
DECLARE @sql varchar(2000)
PRINT 'xyz table'
SET @sql =
'BULK INSERT xyz
FROM ''\\' + @@servername + '\Import\xyz.txt''
WITH
(
DATAFILETYPE = ''char'',
FIELDTERMINATOR = ''|'',
ROWTERMINATOR = ''|\n'',
TABLOCK,
ROWS_PER_BATCH = 2000,
MAXERRORS = 1000
)
'
PRINT @sql
EXECUTE (@sql)
PRINT ''
GO
This is the data example
467507**|**08-20-09\
[8:55:03 AM] *** miked@wkaiser-6793 is Not Available [Ext. away]\(CR)
[9:00:57 AM] *** miked@wkaiser-6793 is Online [Online]\(CR)
[9:01:00 AM] <jeffp> Howdy. Time slip update game! Update TS#467493 & 467507 with a (CR)comment and see if you win! [9:01:30 AM] *** miked@wkaiser-6793 is Away [Away]\
\(CR)
08-18-09\(CR)
[10:13:15 AM] *** miked@wkaiser-6793 is Online [Online]\(CR)
[10:13:59 AM] <jeffp> Howdy; welcome back from lunch.. Just pinging you for an update in TS#467493 & 467507. Since 467493 is pri9, want a stock e-mail to go out to the customer to get them moving?\(CR)
[10:14:47 AM] <miked@wkaiser-6793> thats ok i got it\(CR)
[10:14:53 AM] <jeffp> Aiight.**|**2009-08-18 00:00:00**|**2009-08-20 00:00:00**|**JDP**|**JDP**|**
The table schema is:
create table xyz
(
xyz_id VARCHAR(200), --INT TO VARCHAR
notes text,
create_date varchar(32), --DATETIME
create_user varchar(12),
modify_date varchar(32), --DATETIME
modify_user varchar(12)
)
As you can see (well, not can see) there are enters (marked with CR) and the bulk insert is confusing them with a new row.
Any help will be appreciated.
Thanks
Solution
EDIT: you don't need a format file:
IF OBJECT_ID('tempdb..#rawdata') IS NOT NULL DROP TABLE #rawdata
create table #rawdata (
xyz_id VARCHAR(200)
, notes text
, create_date varchar(32)
, modify_date varchar(32)
, create_user varchar(12)
, modify_user varchar(12)
)
BULK INSERT #rawdata
FROM 'C:\temp\file.txt'
WITH (
FIELDTERMINATOR = '**|**'
, ROWTERMINATOR = '**|**\n'
, TABLOCK
)
SELECT * FROM #rawdata
You may need to play with **|**\n
(LF) vs **|**\r\n
(CRLF) vs **|**\r
(CR), depending on whether the file is Unix, DOS or MAC-like.
/EDIT
It can't be done without a format file. So, create a format file:
8.0
7
1 SQLCHAR 0 0 "**|**" 1 xyz_id ""
2 SQLCHAR 0 0 "**|**" 2 notes ""
3 SQLCHAR 0 0 "**|**" 3 create_date ""
4 SQLCHAR 0 0 "**|**" 4 modify_date ""
5 SQLCHAR 0 0 "**|**" 5 create_user ""
6 SQLCHAR 0 0 "**|**" 6 modify_user ""
7 SQLCHAR 0 0 "\n" 0 omitted ""
Then:
IF OBJECT_ID('tempdb..#rawdata') IS NOT NULL DROP TABLE #rawdata
create table #rawdata (
xyz_id VARCHAR(200)
, notes text
, create_date varchar(32)
, modify_date varchar(32)
, create_user varchar(12)
, modify_user varchar(12)
)
BULK INSERT #rawdata
FROM '\\folder\file.txt'
WITH (
FORMATFILE = '\\folder\file.fmt'
, TABLOCK
)
Or, in SQL 2005+:
SELECT * FROM OPENROWSET(
BULK '\\folder\file.txt'
, FORMATFILE = '\\folder\file.fmt'
) a
OTHER TIPS
The data is inconsistent. You have some rows ending \(cr)
with one field, other (cr)
ending with | column separators
Either that or you have inconsistent column separators, some \(cr)
, some |
.
If so, then you'll need a format file to deal with each separate "column".