سؤال

I have data that contains a note field that is full of invalid characters due to a faulty exporting tool and imported from an excel spreadsheet (*.xls).

The character is erroring out in XML, saying Character reference "&#xb" is an invalid XML character.

How would I go about replacing wingdings characters such as: Gender signs, blocks, symbols in PostgreSQL?

I tried in vain to copy and paste these characters into a replace statement, or anything for that matter, and it was futile. Is there a way through RegExp to replace any non-alphanumeric or "-=+" type of characters? Any help would be appreciated.

هل كانت مفيدة؟

المحلول

SELECT regexp_replace('123xabcABCxöäüxÖÄÜx¡‘’xæćčx=+-x"§$%&/()x'
                     ,'[^a-zA-Z0-9=+-]','_','g')

Result:

123xabcABCx___x___x___x___x=+-x________x

The leading ^ in the character class [^a-zA-Z0-9=+-] negates it. Read "all characters not in the following list".

Take care to place the - character at the end (or beginning) of the character class, or it will have a special meaning like in a-z.

Note the 4th parameter 'g' for "globally". Without it, only the first match would be replaced.

Note also, how characters like öäü get replaced as well. You may or may not want that ... Therefore, you may be interested in the unaccent extension that provides the unaccent() function:

The unaccent() function removes accents (diacritic signs) from a given string.

Details about PostgreSQL regular expression in the manual here.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top