Question

Due to a bug in one of our applications, a certain character was duplicated 2^n times in many CLOB fields, where n is anywhere between 1 and 24. For the sake of simplicity, lets say the character would be X. It is safe to assume that any adjacent occurrence of two or more of these characters identifies broken data.

We've thought of running over every CLOB field in the database, and replacing the value where necessary. We've quickly found out that you can easily replace the value by using REGEXP_REPLACE, e.g. like this (might contain syntax errors, typed this by heart):

SELECT REGEXP_REPLACE( clob_value, 'XX*', 'X' ) 
FROM   someTable
WHERE  clob_value LIKE 'XX%';

However, even when changing the WHERE part to WHERE primary_key = 1234, for a data set which contains around four million characters in two locations within its CLOB field, this query takes more than fifteen minutes to execute (we aborted the attempt after that time, not sure how long it would actually take).

As a comparison, reading the same value into a C# application, fixing it there using a similar regular expression approach, and writing it back into the database only takes 3 seconds.

We could write such a C# application and execute that, but due to security restrictions it would just be a lot easier to send a database script to our customer which they could execute theirselves.

Is there any way to do a replacement like this much faster on an Oracle 10g (10.2.0.3) database?

Note: There are two configurations, one running the database on a Windows 2003 Server with the Clients being Windows XP, and another one running both the database and the client on a standalone Windows XP notebook. Both configurations are affected

Was it helpful?

Solution 3

As we didn't find any way to make it faster on the database, we delivered the C# tool within an executable patch.

OTHER TIPS

How does your client access the Oracle server? If it is via a Unix environment(which most likely is the case) then maybe you can write a shell script to extract the value from database, fix it using sed, and write back to database. Replacing in unix should be real quick.

Maybe you facing problem with LOB segment space fragmentation. In fact each of your lobs will be shorted that before. Try to create a new table and copy modified clobs into this new table.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top