Question

I've started using RelaxNG to specify XML message schemas, and using PHP DOMDocument to validate and parse incoming messages, but can't figure out how to define a text node so that it cannot be empty. Example schema:

<?xml version="1.0"?>
<element name="amhAPI" xmlns="http://relaxng.org/ns/structure/1.0">
    <element name="auth">
        <element name="validateUser">
            <element name="username">
                <text/>
            </element>

            <element name="password">
                <text/>
            </element>
        </element>
    </element>
</element>

However, the message below is being validated as correct by the DOMDocument::relaxNGValidate method (since relaxng matches any arbitrary string [including an empty one] with the text pattern) and is equivalent to ):

<?xml version="1.0"?>
<amhAPI>
    <auth>
        <validateUser>
            <username/>
            <password/>
        </validateUser>
    </auth>
</amhAPI>

Because of this, I have to add in a bunch of checks and validation for fields that are not supposed to be empty, which could be removed if the validator identified them as non-empty elements.

Is there a way to force non-empty text?

Was it helpful?

Solution

If your RELAX NG validator supports XSD data types (most do), then you can use regular expressions to refine the constraints for text content:

<?xml version="1.0"?>
<element name="amhAPI" xmlns="http://relaxng.org/ns/structure/1.0"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <element name="auth">
    <element name="validateUser">
      <element name="username">
        <data type="string">
          <param name="pattern">.+</param>
        </data>
      </element>
      <element name="password">
        <data type="string">
          <param name="pattern">.+</param>
        </data>
      </element>
    </element>
  </element>
</element>

OTHER TIPS

The preceding solutions don't always work very well. If you set the minLength facet to "1", one single whitespace character (or one newline character) is accepted. If you use the pattern .*[\S]+.* you can't insert any newline character, but this is a good thing only for "username" and "password" (see the example above).

Regular expressions are the right way, but to define an element as non-empty the better solution (for me) is the generic pattern: (.|\n|\r)*\S(.|\n|\r)*, so you can also use newline characters wherever you want.

Alternatively, using minLength seems more direct and cleaner than regexes. (This also requires XSD data types.)

<element name="amhAPI" xmlns="http://relaxng.org/ns/structure/1.0"
  datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
  <element name="auth">
    <element name="validateUser">
      <element name="username">
        <data type="string">
          <param name="minLength">1</param>
        </data>
      </element>
      <element name="password">
        <data type="string">
          <param name="minLength">1</param>
        </data>
      </element>
    </element>
  </element>
</element>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top