Question

In Oracle 10g, I'd like to create a regular expression to list the characters that are different between two strings.

Here is the reason: I have a table with a field that contains sometimes Unicode characters that are not in the French language.

I am able to list the rows containing these nonstandards characters to make a future cleanup with this query:

SELECT DataID, Name, CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC')  
  FROM table
  WHERE NAME <> CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC' )

where WE8ISO8859P1 - West European (that I accept)

and WE8DEC - 8-bit character sets from Digital Equipment Corporation (that I know that the application support)

I imagine that with an Oracle regular expression I would be able to extract the list of all these nonstandards characters. But I'm not familiar with regexp in Oracle so any help would be appreciated.

Here is my (not working) idea :

select regexp_replace("éaé", '[a-z][A-Z]', '' ) from dual;

would give "é" as a character to cleanup.

Was it helpful?

Solution

Perhaps something like this might get you going in the right direction:

SQL> select regexp_replace('éaéABcdEF', '([a-zA-Z])', '' ) problems from dual;

PROBLEMS
--------
éé

It gives you each occurrence of the characters you want to identify, but perhaps that's not a problem or you can refine it...

OTHER TIPS

I think your only mistake was using double quotes for the first parameter in your question. Typically double quotes are only used for column/table names.

The parantheses that DCookie added make the regex match multiple times. Depending on what is in the name field, you may need to add more characters to the replace clause, including puncuation (use two single quotes to match a quote). Here's a start for you:

SELECT regexp_replace(name, '([a-zA-Z ,.;''"])\-?','') problem_characters , count(*) 
FROM table
  WHERE NAME <> CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC' )
group by regexp_replace(name, '([a-zA-Z ,.;''"])\-?','');
SELECT LISTAGG( letter, '' ) WITHIN GROUP(ORDER BY letter) 
       FROM ( SELECT DISTINCT substr( 'aaaaabcde', level, 1 ) letter 
                     FROM dual CONNECT BY level <= length('aaaaabcde') )
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top