Extra whitespace in XML file read from Oracle database - why?

https://stackoverflow.com/questions/15474875

24-03-2022
|

Pregunta

I am experimenting with Python and Oracle XML DB. I have a table with an XMLType column and an ID column in an Oracle 11g database. The storage model for the XML column is object relational. Sometimes I need to get a whole XML file, and often it is longer than 4000 characters, so I use this query to get a CLOB:

select t.representation.getclobval()
from myxmldocs t 
where id=:documentId

When I run this query the output includes extra whitespace, with newlines and tabs between XML elements that were definitely not there in the XML docs I inserted. The effect is of some kind of formatting, so that the output looks like this:

<A>\n
\t<B></B>\n
\t\t<C>Some text</C>\n
\t\t<C>Some more text</C>\n
\t<B></B>\n
...

and so on. Quite pretty and readable, but why am I getting it? It also messes other libraries that I am using that choke on the extra whitespaces.

If I remove getclobval() my Python client does not get a CLOB but an Object and I don't know what to do with it.

This appears consistent; I get this problem using the sqlplus command line client, and also creating other tables using different XML Schemas, and then querying them. In a previous version of my prototype I had the XMLType column use a CLOB storage model and didn't have this problem.

How should I rewrite the query to just get a CLOB with the XML file without the extra formatting?

Update: as requested in the comments, this is the output I get running the query select dump(t.representation) from myxmldocs t where id=:documentId from the command line client (replacing of course :documentId with an actual, existing ID from the database):

DUMP(T.REPRESENTATION)
--------------------------------------------------------------------------------
Typ=58 Len=218: 32,156,148,1,0,0,0,0,80,193,223,20,0,0,0,0,216,15,47,21,0,0,0,0,
80,44,55,21,0,0,0,0,0,202,154,59,160,15,0,0,160,15,0,0,1,0,4,0,220,190,195,71,1,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,174,33,65,0,15,0,72,0,1,0,0,0,0,0,0,0,49,0
,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
,0,0,0,0,0,0,0,0

Solución

in 11g you can use xmlserialize (in fact you should not use getclobval anymore. it's not recommended for performance reasons)

 SQL> select t.test.getclobval() from testxml t where id = 1;

T.TEST.GETCLOBVAL()
--------------------------------------------------------------------------------
<A>
  <B>
    <C>foo</C>
    <C>foo2</C>
  </B>
</A>


SQL> select xmlserialize(document t.test as clob no indent) from testxml t where id = 1;

XMLSERIALIZE(DOCUMENTT.TESTASCLOBNOINDENT)
--------------------------------------------------------------------------------
<A><B><C>foo</C><C>foo2</C></B></A>

Otros consejos

The getClobVal() method shouldn't modify the indentation of data. Your XML may have been formatted during or before the insertion.

You can transform the XMLType to remove whitespaces:

SQL> SELECT XMLTYPE.createxml(
  2  '<a>
  3     <b><c></c></b>
  4  </a>'
  5  ).transform(XMLTYPE(
  6  '<?xml version="1.0"?>
  7  <xsl:stylesheet version="1.0"
  8     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" >
  9     <xsl:output method="xml" indent="no"/>
 10     <xsl:strip-space elements="*"/>
 11     <xsl:template match="@*|node()">
 12     <xsl:copy>
 13        <xsl:apply-templates select="@*|node()"/>
 14     </xsl:copy>
 15  </xsl:template>
 16  </xsl:stylesheet>
 17  ')).getClobVal() FROM dual;

<?xml version="1.0" encoding="utf-8"?> 
<a><b><c></c></b></a>

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow