specify delimiter in openrowset bulk insert
-
26-01-2021 - |
Question
I'm trying to insert data from CSV files into tables using a cursor and open rowset bulk insert. this works for me 99% of the times but if one of the fields in the csv files contains a comma( , ) the data import is ruined and all the columns get imported one column after their place. For example it should look like this:
timestamp username name IP title
20190331 ABCD12G david hertz 1.1.1.1 null
But if i have a comma in the name column(for example) it will look like that:
timestamp username name IP title
20190331 ABCD12G david hertz 1.1.1.1
the csv example:
username,name,ip,title
ABCD12G,david,hertz,1.1.1.1,''
This is the syntax i'm using:
insert into [player table]
select 20190331,* FROM OPENROWSET(BULK 'D:\folder\2019\03-
31\Player_statistics.csv', FIRSTROW = 2,
FORMATFILE='D:\folder\test\xml\Player_statistics.xml')
as t1
how can i change this to ignore the comma in the middle of a word?(i can't limit the CSV since its taken from another client that will not limit characters.
the xml i'm using for the import with openrowset:
when you perform regular bulk insert you have an option to specify the delimiter fielterminator = '","' , I could not find something like that in openrowset.
What can help fix this issue?
Solution
but if one of the fields in the csv files contains a comma( , )
If all lines contains a comma, the solution can be
<ROW>
<COLUMN SOURCE="1" NAME="username" xsi:type="SQLSMALLINT"/>
<COLUMN SOURCE="2" NAME="name1" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="name2" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="IP" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="title" xsi:type="SQLNVARCHAR"/>
</ROW>
INSERT INTO [player table]
SELECT 20190331,
username,
name1+','+name2,
IP,
title
FROM OPENROWSET( BULK 'D:\folder\2019\03-31\Player_statistics.csv',
FIRSTROW = 2,
FORMATFILE='D:\folder\test\xml\Player_statistics.xml') t1
If only some lines contains commas, and none title
is null/empty, then try
INSERT INTO [player table]
SELECT 20190331,
`username`,
CASE WHEN title IS NULL
THEN name1
ELSE name1+','+name2 END,
CASE WHEN title IS NULL
THEN name2
ELSE IP END,
CASE WHEN title IS NULL
THEN IP
ELSE title END
IP,
title
FROM OPENROWSET( BULK 'D:\folder\2019\03-31\Player_statistics.csv',
FIRSTROW = 2,
FORMATFILE='D:\folder\test\xml\Player_statistics.xml') t1
If some title
is null/empty, I cannot find simple way to distinguish.
UPDATE
The solution can be: divide your CSV on two fields only - username
and the whole slack, the slack then can be divided to separate fields using common string functions in the SELECT part of importing query (maybe with using intermediate variables in chain calculations).
OTHER TIPS
You can find your answer from below link:
https://stackoverflow.com/questions/4123875/commas-within-csv-data
We also used to deal with BCP and Transfer however were replacing comma with space(if possible) after agreeing with Client based on the seriousness of column or with .
Hope this helps.