Question

I'm completely new to Oracle's XDB, in particular using it to generate XML output from a database table, and am working on an application which is moving from 9i (Oracle9i Enterprise Edition Release 9.2.0.5.0 - Production) to 11g (Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production). Here's a small test case which illustrates the problem I'm having:

select xmlelement("test", test) from (select 'a' test from dual);

This works and gives me:

<test>a</test>

However in 11g, if I swap 'a' for an invalid character, such as U+0013 I get the following error:

ORA-31061: XDB error: special char to escaped char conversion failed.

Under 9i the same thing works successfully, with no error.

Obviously the ideal answer is to have some validation in place to prevent control characters getting into the simple character data that I'm trying to convert into XML, but unfortunately that's outside the scope of what I'm doing.

Is this something anyone else has experienced, and if so, is there a simple change I can make to my XML generating script, or do I need to do some other kind of cleansing? Or just manually fix the problem on the rare occasions that it happens (which would be a perfectly reasonable option for my needs).

Was it helpful?

Solution

U+0013 is not a valid unicode codepoint for XML. See e.g. Valid characters in XML. So 11g correctly raises an exception.

SQL> select xmlelement("test", unistr('a\0013b')) from dual;
ERROR:
ORA-31061: XDB error: special char to escaped char conversion failed.

no rows selected

SQL> select xmlelement("test", unistr('a\00aeb')) from dual;

XMLELEMENT("TEST",UNISTR('A\00AEB'))
--------------------------------------------------------------------------------
<test>a®b</test>

SQL> 

No idea why this will pass in 9i (I don't have that available), but that's probably simply because Oracle's implementation has evolved to be more standard conforming and/or the standard has evolved.

Your fix is correct.

OTHER TIPS

While always fixing the data at the source is the best solution, I also found this to be useful in the case where I cannot control the data at the source:

select xmlelement("test", test) 
  from (select regexp_replace(unistr('a\0013b'), '[[:cntrl:]]', '') test from dual);

Important piece is the regexp_replace(your_field, '[[:cntrl::]]', '') to remove control characters from the data.

Just to follow-up on this for anyone interested. As far as I can tell, 9i just passed through the invalid character, producing invalid XML. 11g throws an error, which is probably the more correct behaviour, even if it is annoying in my case.

The only reasonable solution I found was to fix the content at source.

If you wish to keep line breaks, you may try like follows:

select xmlelement("test", regexp_replace(test, '[^[:print:]|[:space:]]', '#')) from  
    (select '-   <- to keep line break after weird char
-' test from dual ) 
  • replace all that ^ => is not in the sets (of printing [:print:] or space |[:space:] chars)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top