Question

I have a field with encoding utf8-general-ci in which many values contain non-ascii characters. I want to

  1. Search for all fields with any non-ascii characters

  2. Replace all non-ascii characters with their corresponding ascii version.

For example: côte-d'ivoire should be replaced with cote-d-i'voire, são-tomé should be replaced with sao-tome, etc.

How do I achieve this? If I just change the field type to ascii, non-ascii characters get replaced by '?'. I am not even able to search for all such fields using

RLIKE '%[^a-z]%'

For example

SELECT columname 
FROM tablename 
WHERE NOT columname REGEXP '[a-z]';

returns an empty set.

Thanks

Était-ce utile?

La solution

An sql fiddle example is at

http://www.sqlfiddle.com/#!2/c1d90/1/0

the query to select is

select * from test where maintext rlike  '[^\x00-\x7F]'

Hope this helps

Autres conseils

I'm presuming from your previous questions that you're using PHP.

https://github.com/silverstripe-labs/silverstripe-unidecode

You could then use skv's answer to return the object's you wish to use and then use unidecode to attempt to convert the object to it's ascii equivalents.

In Perl, you can use Text::Unidecode.

In MySQL, there isn't an easy function to convert from utf8 (or utf8mb4) into ascii without it spitting out some ugly '?' characters as replacements. It's best to replace these prior to inserting them in the DB, or run something in Perl (or whatever) to extract the data and re-update it one row at a time.

There are many different ports of Text::Unidecode in different languages: Python, PHP, Java, Ruby, JavaScript, Haskell, C#, Clojure, Go.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top