Use CDATA to store raw binary streams?
Question
Instead of the overhead with saving binary as Base64, I was wondering if you could directly store double-byte binary streams into XML files, using CDATA, or commenting it out, or something?
Solution
You can store it as CDATA, but there's the risk that some byte sequences will evaluate to valid XML that closes the CDATA section. After a quick look at http://www.w3.org/TR/2006/REC-xml-20060816/#sec-cdata-sect, it seems you can have any sequence of chars except "]]>". Have a look at what is a valid XML char too.
OTHER TIPS
The Nul character ( '\0' in C ) is not valid anywhere in XML, even as an escape ( & #0; ).
No you can't use CDATA alone to inject binary data in an XML file.
In XML1.0 (because XML 1.1 is more permissive, but not about control chars), the following restrictions apply to CDATA characters:
CData ::= (Char* - (Char* ']]>' Char*))
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
That means there are several characters illegal, among them are:
- illegal XML control characters 0x00 to 0x20 except new lines, carriage returns and tabs
- illegal UTF-8 sequences like 0xFF or the non canonical 0b1100000x 0b10xxxxxx
In addition to that, in a standard entity content without CDATA :
- "<" and ">" use are illegal
- "&" use is restricted (
é
is OK,&zajdalkdza;
is not)
So CDATA is just a way to allow "<", ">" and "&", by restricting "]]>" instead. It doesn't solve the illegal XML, Unicode and UTF-8 characters issue which is the main problem.
Solutions:
XML is a plain-text format - don't use it to store binary data. Put the binary blobs in separate files and add an element to your XML which references these files. If you want to store all binary blobs in a single file, add an offset attribute or something like that...