Question

I'm using SQL Server Express 2008 R2.

I'm importing from a csv file and some of the columns have a "£" sign as part of some free text. When this file is loaded into a database, the "£" sign is displayed as "ú". I think this is definitely to do with the database collation. The current database collation is Latin1_General_CI_AS.

What collation will store "£" as "£" in SQL Server.

Many thanks.

Further info: I created a small file to demonstrate my issue here: https://www.dropbox.com/s/yvcx4t9nk9p0bf7/poundTest.txt

use myDB;
go

create table test
(id int,
amt_range varchar(50));

bulk insert test
from 'F:\poundtest.txt'
with (
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n',
firstrow=1
);
select * from test;

This returns:

id  amt_range
1   <-ú200K
2   -ú200K to -ú20k
3   -ú20k to ú0k
4   ú0k to ú20k
5   ú20k to ú200k
6   >ú200k
Was it helpful?

Solution

SQL Server will definitely store "£" correctly in a varchar or nvarchar column using collation Latin1_General_CI_AS. I see it happening every day in the software I maintain.

I think the problem lies in the way that the text file is encoded and read in. "£" has a code point value of 163 in both Windows-1252 and Unicode. However, in Extended ASCII (e.g. DOS code page 850), "£" has the value 156, and "ú" has the value 163. Is your code attempting to convert the csv text encoding before passing the data to SQL Server? If the csv is encoded as UTF-8, then no conversion from ASCII is necessary.

UPDATE

Looking on MSDN, it appears that the bulk insert command performs character set conversion. OEM is the default option if not specified.

CODEPAGE = { 'ACP' | 'OEM' | 'RAW' | 'code_page' }

The default is definitely not what you want here. Ideally, you would specify UTF-8 (CODEPAGE = '65001'). However, MSDN says that UTF-8 is not supported.

I suggest that you change the encoding of your CSV file to Windows-1252, then use the CODEPAGE = 'ACP' option to import the data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top