Question

I need an encoding scheme for a table which is flexible enough to support the english language but also languages such as russian, mandarin and arabic.

Was it helpful?

Solution

Tables in Oracle do not have encodings.

A database has a character set that drives the encoding of all the CHAR, VARCHAR, VARCHAR2, and CLOB columns in the database. If you want to support multiple languages, you would generally choose a Unicode based character set. Presumably, that would be AL32UTF8.

A database also has a national character set. That drives the encoding of all the NCHAR, NVARCHAR, NVARCHAR2, and NCLOB columns in the database. If you have an existing database whose character set does not support all the characters you need and you cannot migrate the database to a Unicode character set because existing applications don't support it, you can add NVARCHAR2 columns and use the national character set. I don't believe it's possible to create a new 11g database whose national character set is not Unicode based (AL16UTF16 most likely) though it is possible to have such a beast if you have a legacy system that has been upgraded to 11g. Using the national character set, however, generally requires more work than using the database character set-- various front-ends, for example, don't support national character set data cleanly and the API calls need to change when you're working with national character set data. The UTF-16 character set also often takes more space than the UTF-8 character set-- UTF-16 uses either 2 or 4 bytes per character (the vast majority of characters require 2 bytes), UTF-8 uses between 1 and 4 bytes with English characters requiring 1 byte, most European characters requiring 2 bytes, and most Asian characters requiring 3 bytes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top