質問

I am trying to set up a schematron test for validating special characters in XML...

More specifically, I would like to throw a warning where there is an occurrence of the copyright symbol (Unicode U+00A9).

It seems that schematron xml files cannot be parsed when using any of the following notation for the rules...

<iso:rule context="myelement>
   <iso:report test="matches(., '\u00A9')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule> 

<iso:rule context="myelement>
   <iso:report test="matches(., '\u{00A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule> 

<iso:rule context="myelement>
   <iso:report test="matches(., '\u{A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule> 

<iso:rule context="myelement>
   <iso:report test="matches(., '\x{00A9}')">{ES1037} Copyright Symbol Detected</iso:report>
</iso:rule> 

Any schematron experts out there that know how to accomplish embedding a unicode character into a regex?

Thanks in advance...

役に立ちましたか?

解決

You need to write the code as character entity like it is used for the XML Schema standard:

<?xml version="1.0" encoding="UTF-8"?>
<iso:schema xmlns:iso="http://purl.oclc.org/dsdl/schematron">
    <iso:pattern id="unicode in regex">
        <iso:rule context="a">
            <iso:report test="matches(., '&#xa9;')">
                Copyright found
            </iso:report>
        </iso:rule>
    </iso:pattern>
</iso:schema>

Output in XML ValidatorBuddy

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top