Oracle DB export, possible charset conversion if NLS_LANG not set
-
29-09-2020 - |
Question
I need to export an Oracle database from one server and import it on another server. But I am getting the following information message at the beginning:
Connected to: Oracle Database 10g Release 10.2.0.4.0 - 64bit Production Export done in US7ASCII character set and AL16UTF16 NCHAR character set server uses UTF8 character set (possible charset conversion)
Although the export still terminates successfully without warning, I was wondering whether the data in the database is still correct. Is there a need to set the NLS_LANG
variable to match the character set of the server (in this case UTF-8
)?
If I set the NLS_LANG variable to UTF-8
I don't get this information message. But the target server where I import the database again shows the following:
import done in UTF8 character set and AL16UTF16 NCHAR character set import server uses AL32UTF8 character set (possible charset conversion)
If I understand that correctly these two servers use (slightly) different character sets - UTF-8
and AL32UTF8
.
I am confused on what I should set the NLS_LANG
variable or even whether it is necessary to set. Should I use AL32UTF8
as NLS_LANG
for the export and import?
EDIT: In case this information is important: I am using a SUSE Linux and for the export I use the command exp
that comes with Oracle.
Solution
First of all, character set UTF-8
does not exist on Oracle, use AL32UTF8
or UTF8
(without the hyphen).
Usually you should get an error when your client character set is UTF-8
:
$ setenv NLS_LANG AMERICAN_AMERICA.UTF-8
$ sqlplus ...
ERROR:
ORA-12705: Cannot access NLS data files or invalid environment specified
Character set UTF8
is identical to AL32UTF8
for characters of BMP (Basic Multilingual Plane), i.e. the first 65536 characters of Unicode. Most likely all characters in your application fall in this range, so it does not make any difference. AL32UTF8
is what is commonly called UTF-8 encoding. Oracle character set UTF8
is commonly known as CESU-8.
If you do not set any NLS_LANG variable then Oracle defaults it to AMERICAN_AMERICA.US7ASCII
. Setting your client character set to US7ASCII
is like you tell the Oracle Database server: "My client is able to display (only) 7-bit ASCII characters and files are saved in 7-bit ASCII format." Thus the database will replace all russian/cyrillic characters to ?
or ¿
at export.
Set your NLS_LANG
to AL32UTF8
(no matter which database character sets you have in place), then you will get any characters properly exported and imported.
For more information check also this document: NLS_LANG FAQ
OTHER TIPS
The NLS_LANG
parameter is used by the Oracle network layer to do character translation between the client and the database. This way the client can display all the characters in the database in a 'correct' way.
Your NLS_LANG
should match the character-set of your database when you do an export. If you export a UTF-8 database in US7ASCII then you risk to loose information when you import this data back into a database. Why? The UTF-8 character-set can hold much more different characters then US7ASCII. The 'unknown' characters are replaced by a single character. In the past I have seen this happening that we lost all characters with accents. They were replaced by a '?' if I remember well.
If you import then you must be sure that the NLS_LANG
is set to the value of your import file. Oracle will then do the mapping to the character-set of the database in which you import.