At least part of the problem is that you're specifying the file should be US7ASCII, which only allows the first 128 ASCII characters, not the extended values from 128-255. You're doing that in this line:
dbms_xslprocessor.clob2file( result, 'MYXML', ''||V_TABLE_NAME||'.xml',1);
You're passing 1
as the fourth parameter, csid
. That value represents US7ASCII:
SQL> select nls_charset_name(1) from dual;
NLS_CHAR
--------
US7ASCII
Your XML is UTF-8, but specifying that with encoding="UTF-8"
has no bearing on how the file is written. Any unrecognised characters are replaced with ?
. So you might want to use the same setting for the file:
SQL> select nls_charset_id('UTF8') from dual;
NLS_CHARSET_ID('UTF8')
----------------------
871
So:
dbms_xslprocessor.clob2file( result, 'MYXML', ''||V_TABLE_NAME||'.xml',871);
or to be clearer:
dbms_xslprocessor.clob2file( result, 'MYXML', ''||V_TABLE_NAME||'.xml',
nls_charset_id('UTF8'));
But leaving it as the default might be OK - by not specifying csid
at all, or by explicitly setting it to zero - depending on our database environment.
You mentioned you avoid ORA-31061 error if you "replace all ASCIICHAR (0-30) like ♂ : 11 ♀ : 12 ♫ : 14 ☼ : 15 ► : 16 ◄ : 17 ↕ : 18 ‼ : 19 ¶ : 20". Those symbols aren't what you'd expect from ASCII, so your character set or client or something seems to be interpreting them differently.
I get the error with all the ASCII control characters, 0 through to 31, except the printable ones: 9, 10 or 13. But that's what's expected, the other characters in that range are not valid in XML 1.0:
- U+0009, U+000A, U+000D: these are the only C0 controls accepted in XML 1.0;
The same page shows that more, but still not all, control characters are allowed in XML 1.1, but as far as I'm aware Oracle only supports 1.0. If you do really have control characters in your data you'll need to strip them (retaining tabs, new lines and carriage returns); the rest would be meaningless in the final XML anyway, and perhaps of limited use in your existing data. I'm not sure if this is real data, or if you'd generated those values as a test.