Why is xslt adding attributes itself in <copy>?
-
06-07-2021 - |
Вопрос
I use xslt 1.0 to do some manipulations on xhtml file. But I wanted to start from an identical copy. To my surprise xsl adds attributes that were absent in the original file. Please explain this phenomenon. I would rather avoid it to make it easier to compare source and result files.
I tried both xsltproc and msxsl. No difference. I get rowspan
and colspan
added to all td
elements.
Input:
<?xml version="1.0" encoding="windows-1250" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1250" />
<title>Anything</title>
</head>
<body>
<table>
<tr><td class="skl" >test</td><td class="kwota" >1 800,00</td></tr>
</table>
</body>
</html>
xslt:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
>
<xsl:output method="xml"
omit-xml-declaration="no"
encoding="windows-1250"
doctype-public="-//W3C//DTD XHTML 1.0 Strict//EN"
doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"
/>
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates
select="node()|@*|processing-instruction()|comment()" />
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
And the only difference is this line:
<tr><td class="skl" rowspan="1" colspan="1">test</td><td class="kwota" rowspan="1" colspan="1">1 800,00</td></tr>
Validation of source file against the dtd shows no errors. I can insert these attributes into the source file to workaround the problem, but I'm curious about the cause of this mess.
Edit:
I use original dtd downloaded (with a 20 seconds delay) from
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
<!ATTLIST td
%attrs;
abbr %Text; #IMPLIED
axis CDATA #IMPLIED
headers IDREFS #IMPLIED
scope %Scope; #IMPLIED
rowspan %Number; "1"
colspan %Number; "1"
%cellhalign;
%cellvalign;
>
Решение
Your XSLT processors are behaving perfectly correctly. No new attributes are being added. The rowspan attributes were always in your input file via the DTD reference. Whether the value of "1" for a rowspan is serialized as an explicit attribute or implied by your doctype declaration makes no difference to the model data.
The ATTLIST above shows that the rowspan and the colspan have a default value of 1. There is no way not to have these attributes and still conform to XHTML 1.1 strict. The other attributes annotated as #IMPLIED means they are optional.
I hope that explains it.
Другие советы
Several ways of disabling the "feature" in the processors I was able to test.
libxml
xsltproc: --nodtdattr
libxslt / libxml: don't specify XML_PARSE_DTDATTR
when loading the source, for example in xmlReadFile
msxml
msxsl: -xe
- don't resolve externals
Msxml.DomDocument: doc.resolveExternals = False
and doc.validateOnParse = False
before load
, also disables whole dtd
In MSXML 3.0 and MSXML 6.0 the default resolveExternals value is True. In MSXML 6.0, the default setting is False.
Yeah, that's stupid. But I only copied it from MS. Should be 3.0 and 4.0 True, 6.0 False I guess.
PopulateElementDefaultValues Property introduced in 6.0 SP1 has an attractive description, but it doesn't work for me with dtds.