Question

Test.xml:

<?xml version="1.0" encoding="UTF-8"?>
<node>line1
line2
line3</node>

CF code:

<cfset xmlfile = ExpandPath("test.xml")>
<cffile action="read" file="#xmlfile#" variable="xmlstring">
<cffile action="write" file="test1.xml" output="#xmlstring#">

<cfset xmldoc = XmlParse(xmlstring)> 
<cfset xmltext = ToString(xmldoc)>
<cffile action="write" file="test2.xml" output="#xmltext#">

The input file test.xml file is in CRLF format, UTF-8 encoded, 77 bytes. The first output file (test1.xml) is in CRLF format, ANSI encoded, 76 bytes. The second output file (test2.xml) is in UNIX format, ANSI encoded, 71 bytes.

The contents of the XML node in the input file is line1 Chr(13)Chr(10) line2 Chr(13)Chr(10) line3 (whitespaces for readability). The contents of the XML node in the first output file is the same as above. The contents of the XML node in the second output file is line1 Chr(10) line2 Chr(10) line3.

Any ideas why the carriage return character Chr(13) was not preserved after the XmlParse/ToString sequence?

UPDATE: The problem lies only with XmlParse. It's not about ToString or cffile. Here is a more relevant example - you can test for yourselves:

<cfsavecontent variable="xmlvar">
<nodes>
    <node>
line1
line2
line3
    </node>
</nodes>
</cfsavecontent>

<cfset vtext = "#xmlvar#">
<cfset vtext = Replace(vtext,Chr(10),'LF','All')>
<cfset vtext = Replace(vtext,Chr(13),'CR','All')>
<cfdump var = "#vtext#">
<!--- outputs CRLF<nodes>CRLF <node>CRLFline1CRLFline2CRLFline3CRLF </node>CRLF</nodes>CRLF --->
<cfset xmldoc = XmlParse(xmlvar)>
<cfset vtext = "#xmldoc.nodes.node.XmlText#">
<cfset vtext = Replace(vtext,Chr(10),'LF','All')>
<cfset vtext = Replace(vtext,Chr(13),'CR','All')>
<cfdump var = "#vtext#">
<!--- outputs LFline1LFline2LFline3LF --->
Was it helpful?

Solution

XML Parsers normalize the CR/LF to LF per the spec. To keep the CR/LF EOL use entity references. See below:

<cfsavecontent variable="xmlvar">
<nodes>
    <node>
      line1&#x000D;&#x000A;
      line2&#x000D;&#x000A;
      line3&#x000D;&#x000A;
    </node>
</nodes>
</cfsavecontent>

OTHER TIPS

Have you tried to use the parameter charset='utf-8' in the cffile tag ?

I cannot reproduce anything that you are talking about with ColdFusion 9.0.1 on Mac OSX. White space is being preserved just as it goes in. I tried both of your examples above and they worked (mostly) as expected. I actually did not see any CRs in the replace()s, I only saw LFs. But it maintained them all.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top