Question

Doing something like this:

using (XmlWriter myMamlHelpWriter = XmlWriter.Create(myFileStream, XmlHelpExToMamlXslTransform.OutputSettings))
{
    XmlHelpExToMamlXslTransform.Transform(myMsHelpExTopicFilePath, null, myMamlHelpWriter);
}

where

private static XslCompiledTransform XmlHelpExToMamlXslTransform
{
    get
    {
        if (fMsHelpExToMamlXslTransform == null)
        {
            // Create the XslCompiledTransform and load the stylesheet.
            fMsHelpExToMamlXslTransform = new XslCompiledTransform();
            using (Stream myStream = typeof(XmlHelpBuilder).Assembly.GetManifestResourceStream(
                typeof(XmlHelpBuilder),
                MamlXmlTopicConsts.cMsHelpExToMamlTransformationResourceName))
            {
                XmlTextReader myReader = new XmlTextReader(myStream);
                fMsHelpExToMamlXslTransform.Load(myReader, null, null);
            }
        }

        return fMsHelpExToMamlXslTransform;
    }
}

And every time the string """ is replaced with real quotes in the result file.
Cannot understand why this happens...

Was it helpful?

Solution

The reason is that in the XSLT's internal representation, " is exactly the same characer as ". They both represent the ascii code point 0x34. It would seem that when the XslCompiledTransform produces its output, it uses " where it's legal to do so. I would imagine that it would still output " inside an attribute value.

Is it a problem for you that " is produced as " in the output?

I just ran the following XSLT in Visual Studio using an arbitrary input file:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/*">
    <xml>
      <xsl:variable name="chars">&quot;&apos;&lt;&gt;&amp;</xsl:variable>
      <node a='{$chars}' b="{$chars}">
        <xsl:value-of select="$chars"/>
      </node>
    </xml>
  </xsl:template>
</xsl:stylesheet>

The output was:

<xml>
  <node a="&quot;'&lt;&gt;&amp;" b="&quot;'&lt;&gt;&amp;">"'&lt;&gt;&amp;</node>
</xml>

As you can see, even though all five characters were represented as entities originally, the apostrophies are produced as ' everywhere, and quotation marks are produced as " in text nodes. Furthermore, the a attribute which had ' delimiters uses " delimiters in the output. As I said, as far as the XSLT cares, a quotation mark is just a quotation mark, and an attribute is just an attribute. How those are produced in the output is up to the XSLT processor.

Edit: The root cause of this behavior appears to be the behavior of the XmlWriter class. It looks like the general suggestion for those who want more customized escaping is to extend the XmlTextWriter class. This page has an implementation that looks fairly promising:

public class KeepEntityXmlTextWriter : XmlTextWriter
{
    private static readonly string[] ENTITY_SUBS = new string[] { "&apos;", "&quot;" };
    private static readonly char[] REPLACE_CHARS = new char[] { '\'', '"' };

    public KeepEntityXmlTextWriter(string filename) : base(filename, null) { ; }

    private void WriteStringWithReplace(string text)
    {
        string[] textSegments = text.Split(KeepEntityXmlTextWriter.REPLACE_CHARS);

        if (textSegments.Length > 1)
        {
            for (int pos = -1, i = 0; i < textSegments.Length; ++i)
            {
                base.WriteString(textSegments[i]);
                pos += textSegments[i].Length + 1;

                // Assertion: Replace the following if-else when the number of
                // replacement characters and substitute entities has grown
                // greater than 2.
                Debug.Assert(2 == KeepEntityXmlTextWriter.REPLACE_CHARS.Length);

                if (pos != text.Length)
                {
                    if (text[pos] == KeepEntityXmlTextWriter.REPLACE_CHARS[0])
                        base.WriteRaw(KeepEntityXmlTextWriter.ENTITY_SUBS[0]);
                    else
                        base.WriteRaw(KeepEntityXmlTextWriter.ENTITY_SUBS[1]);
                }
            }
        }
        else base.WriteString(text);
    }

    public override void WriteString( string text)
    {
        this.WriteStringWithReplace(text);
    }
}

On the other hand, the MSDN documentation recommends using XmlWriter.Create() rather than instantiating XmlTextWriters directly.

In the .NET Framework 2.0 release, the recommended practice is to create XmlWriter instances using the XmlWriter.Create method and the XmlWriterSettings class. This allows you to take full advantage of all the new features introduced in this release. For more information, see Creating XML Writers.

One way around that would be to use the same logic as above, but put it in a class that wraps an XmlWriter. This page has a ready-made implementation of an XmlWrappingWriter, that you can modify as needed.

To use the above code with the XmlWrappingWriter, you would subclass the wrapping writer, like this:

public class KeepEntityWrapper : XmlWrappingWriter
{
    public KeepEntityWrapper(XmlWriter baseWriter)
        : base(baseWriter)
    {
    }

    private static readonly string[] ENTITY_SUBS = new string[] { "&apos;", "&quot;" };
    private static readonly char[] REPLACE_CHARS = new char[] { '\'', '"' };

    private void WriteStringWithReplace(string text)
    {
        string[] textSegments = text.Split(REPLACE_CHARS);

        if (textSegments.Length > 1)
        {
            for (int pos = -1, i = 0; i < textSegments.Length; ++i)
            {
                base.WriteString(textSegments[i]);
                pos += textSegments[i].Length + 1;

                // Assertion: Replace the following if-else when the number of
                // replacement characters and substitute entities has grown
                // greater than 2.
                Debug.Assert(2 == REPLACE_CHARS.Length);

                if (pos != text.Length)
                {
                    if (text[pos] == REPLACE_CHARS[0])
                        base.WriteRaw(ENTITY_SUBS[0]);
                    else
                        base.WriteRaw(ENTITY_SUBS[1]);
                }
            }
        }
        else base.WriteString(text);
    }

    public override void WriteString(string text)
    {
        this.WriteStringWithReplace(text);
    }
}

Note this essentially the same code as the KeepEntityXmlTextWriter, but using XmlWrappingWriter as the base class and with a different constructor.

I don't recognize the Guard that the XmlWrappingWriter code is using in two places, but given that you'll be consuming the code yourself, it should be pretty safe to delete the lines like this. They just ensure that a null value isn't passed to the constructor or the (in the above case inaccessible) BaseWriter property:

Guard.ArgumentNotNull(baseWriter, "baseWriter");

To create an instance of the XmlWrappingWriter, you would create an XmlWriter however you need to, and then use:

KeepEntityWrapper wrap = new KeepEntityWrapper(writer);

And then you'd use this wrap variable as the XmlWriter you pass to your XSL transform.

OTHER TIPS

The XSLT processor doesn't know whether a character was represented by a character entity or not. This is because the XML parser substitutes any character entity with its code-value.

Therefore, the XSLT processor would see exactly the same character, regardless whether it was represented as " or as &quot; or as &#x22; or as &#34;.

What you want can be achieved in XSLT 2.0 by using the so called "character maps".

Here is the trick you wanted:

  1. replace all & with &amp;
  2. perform XSLT
  3. replace all &amp; with &
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top