Question

I was writing files using

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file, 'Rosëttenville');

but I changed this to

l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file, 'Rosëttenville');

when I found out that the extended ASCII (characters above code 127) were not written out correctly. However the second unicode version also does not write the extended characters correctly. Instead of Rosëttenville I'm getting Rosëttenville. Anyone know how to fix this?

Was it helpful?

Solution

You haven't said what your database character set is, and thus whether it's legitimate to have 'extended ascii' (probably 8859-1, with chr(235) in this case) in a string, or if this is just a demo. Either way, I think, your problem is trying to implicitly convert a non-unicode string.

ë is code point EB, which is also UTF-8 C3 AB. You're getting separate characters à (code point C3) and « (code point AB). So it can't do a direct translation from chr(235), which is 0x00EB, to U+00EB. It seems to be going via the UTF-8 C3 AB as two separate characters. I'm not going to try to understand exactly why...

You can either use the convert function:

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
  convert('Rosëttenville', 'WE8ISO8859P1', 'UTF8'));

... or, as use of that is discourage by Oracle, the utl_raw.convert function:

l_file := utl_file.fopen('OUT', 'a.txt', 'w');
utl_file.put_line(l_file,
  utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
    'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));

Both give me the value you want, and your original gave me the same value you saw (where my DB character set is AL32UTF8 in 11gR2 on Linux). If your DB character set is not Unicode, your national character set certainly appears to be (it isn't clear in the question if you got the same output with both attempts), so the nchar version should work instead:

l_file := utl_file.fopen_nchar('OUT', 'a.txt', 'w', 32767);
utl_file.put_line_nchar(l_file,
  utl_raw.cast_to_varchar2(utl_raw.convert(utl_raw.cast_to_raw('Rosëttenville'),
    'ENGLISH_UNITED KINGDOM.WE8ISO8859P1', 'ENGLISH_UNITED KINGDOM.UTF8')));

It would probably be better to be working with Unicode values in the first place, particularly if you currently have a mix of 'extended ascii' and other string types in a table; applying the conversion to everything in that case might give some odd results...

OTHER TIPS

UTL_FILE.PUT_LINE does not make conversion of data and export data in database default character set.

So You need to make proper conversion on write:

UTL_FILE.PUT_LINE(file,CONVERT(text,'WE8ISO8859P1'),FALSE);

You must set:

LANG=GERMAN_AUSTRIA.WE8ISO8859P1;export LANG
LC_CTYPE=ISO-8859-1;export LC_CTYPE
NLS_LANG=GERMAN_AUSTRIA.WE8ISO8859P1;export NLS_LANG
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top