Question

I want to change the character set of an existing database from none to utf8, it seems that the safest way is to create a new one then pump data to it.

I have tried two methods, both gave me the same error:

  1. Create a new database with utf8 charscter set, then create a simple stored procedure to pump data via "execute block on external datasource"

  2. use fbclone with this command line

    fbclone -l fbembed.dll -v -s source.gdb -t destination.gdb -u SYSDBA -p masterkey -tc UTF8 -wc UTF8
    

Both gave me a malformed string error, and most of the rows were not copied to the destination database.

Example verbose error

Incompatible column/host variable data type
GDS Code: 335544569 - SQL Code: -303 - Error Code: 249
fields values ---
VANUM = 244458
RUBNUM = 5054
VALEUR = Absence de germes pathogènes.
ATBANA =
DATEMODIF = 16/05/2018
Was it helpful?

Solution

In your current setup, you seem to try to convert directly from NONE to UTF8, when this is done, the bytes stored as NONE are checked if they are valid UTF8, and then stored as-is. A malformed string error means that the data in a column NONE was not actually a valid combination of bytes in UTF8.

You will need to find out what the actual character set is of the data before you can convert it correctly (and even if you don't get errors, that does not necessarily mean the conversion was logically correct). You may want to try if the data currently stored isn't a different character set (for example WIN1252 is common in Western Europe and the US).

When you pump manually, this means that you either have to explicitly cast data. That is from NONE to WIN1252 (or whatever it is) and then to UTF8. Eg if the data is stored in column1, then you'd use

cast(cast(column1 as varchar(<length>) character set WIN1252) as varchar(<length>) character set UTF8)

I have no experience with fbclone, but the fact you specify UTF8 twice suggests that one is the source and the other the target character set. You may want to try if changing one or the other to WIN1252 (or a different one) to see which works.

Make sure to check the data afterwards.

In the worst case, you may have NONE data that is actually a mix of different character sets, and in that case you may have to inspect and convert each row manually.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top