Question

I'm reasonably new to working with XML schemas, so excuse my incompetence if this is more trivial than I myself believe it must be.

I'm trying to create a required attribute that must contain 1 or more white-space-separated string values from a list. The list is the 4 typical HTTP request methods; get, post, put, and delete.

So valid elements would include:

<rule methods="get" />
<rule methods="get post" />
<rule methods="post put delete" />

Whereas invalid elements would include:

<rule methods="get get" />
<rule methods="foobar post" />
<rule methods="get;post;put" />

I've tried fooling with enumerations and length, but I don't believe I'm understanding what I need to do (or for that matter if it is in fact possible, though it seems as though it should be)


This is where I'm at now, thanks to @tdrury:

<xs:attribute name="methods" use="required">
    <xs:simpleType>
        <xs:restriction base="xs:string">
            <xs:whiteSpace value="collapse" />
            <xs:pattern value="(?:(?:get|post|put|delete)\s?){1,4}" />
        </xs:restriction>
    </xs:simpleType>
</xs:attribute>

Which works, except for repetition (such as get get or post post post) and absent whitespace (such as getpost or postputdelete)


Edit:

After playing around with this a bit, I came up with an idea: an enumeration of all possible sequences. Thankfully, this list is (for the time being) fixed to the four usual transport methods, get, post, put, and delete, so I figured:

<xs:restriction base="xs:string">
    <xs:whiteSpace value="collapse" />
    <xs:enumeration value="delete" />
    <xs:enumeration value="put" />
    <xs:enumeration value="put delete" />
    <xs:enumeration value="post" />
    <xs:enumeration value="post delete" />
    <xs:enumeration value="post put" />
    <xs:enumeration value="post put delete" />
    <xs:enumeration value="get" />
    <xs:enumeration value="get delete" />
    <xs:enumeration value="get put" />
    <xs:enumeration value="get put delete" />
    <xs:enumeration value="get post" />
    <xs:enumeration value="get post delete" />
    <xs:enumeration value="get post put" />
    <xs:enumeration value="get post put delete" />
</xs:restriction>

Can anyone see a reason that this would not be a good idea?

Was it helpful?

Solution 3

After periodically screwing around with this, I came up with this hulk of a pattern; first in PCRE pretty-print:

^
(
  (get     (\s post)?    (\s put)?     (\s delete)?  (\s head)?    (\s options)?)
| (post    (\s put)?     (\s delete)?  (\s head)?    (\s options)?)
| (put     (\s delete)?  (\s head)?    (\s options)?)
| (delete  (\s head)?    (\s options)?)
| (head    (\s options)?)
| (options)
)
$

And XML compatible:

((get(\spost)?(\sput)?(\sdelete)?(\shead)?(\soptions)?)|(post(\sput)?(\sdelete)?(\shead)?(\soptions)?)|(put(\sdelete)?(\shead)?(\soptions)?)|(delete(\shead)?(\soptions)?)|(head(\soptions)?)|(options))

This will successfully match any permutation of get post put delete head and options, further requiring that they be correctly ordered (which is kinda nice too)

Anyways, in summary:

"get post put delete head options" // match

"get put delete options"           // match

"get get post put"                 // fail; double get

"get foo post put"                 // fail; invalid token, foo

"post delete"                      // match

"options get"                      // fail; ordering

This pattern doesn't scale the greatest, as each new "token" needs to be included in every group, but given the problem domain is HTTP methods, change is unforeseeable and I figure it should work just fine.


Also, here's a quick script (PHP) to generate the pattern:

$tokens = ['get', 'post', 'put', 'delete', 'head', 'options'];

echo implode('|', array_map(function ($token) use (&$tokens) {
    return sprintf('(%s%s)', array_shift($tokens),
        implode(null, array_map(function ($token) {
            return sprintf('(\s%s)?', $token);
        }, $tokens)));
}, $tokens));

It omits the outermost () because I don't think it's necessary.

OTHER TIPS

The basic problem can be addressed with enumerations as well:

<xs:attribute name="methods" use="required">
    <xs:simpleType>
        <xs:restriction>
            <xs:simpleType>
                <xs:list>
                    <xs:simpleType>
                        <xs:restriction base="xs:token">
                            <xs:enumeration value="get"/>
                            <xs:enumeration value="post"/>
                            <xs:enumeration value="put"/>
                            <xs:enumeration value="delete"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:list>
            </xs:simpleType>
            <xs:minLength value="1"/>
        </xs:restriction>
    </xs:simpleType>
</xs:attribute>

This unfortunately has the same limitation as the <xs:pattern> solution and cannot validate that each token in the list is unique. It does however address the whitespace issue (getpost would be rejected).

You can use regular expressions as a restriction on a simpleType: http://www.w3.org/TR/xmlschema-2/#dt-pattern

I'm not a regex expert but it would be something like this:

<xs:attribute name="methods" use="required">
   <xs:simpleType>
      <xs:restriction base="xs:string">
         <xs:pattern value='((get|post|put|delete)[/s]*){4}'/>
      </xs:restriction>
   </xs:simpleType>
</xs:attribute>

You could deal with whitespaces like this:

(get|post|put|delete)(\sget|\spost|\sput|\sdelete){0,3}

It will not match getpost.

I needed something similar to what you wanted, but I didn't want any order to be enforced, and I didn't want the pattern to grow exponentially as more possible values were added.

Using your enumeration as the example, the pattern I've come up with goes like this:

(?:get|post|put|delete|head|options)(?:\s(?:(?<!.*\bget\b.*)get|
(?<!.*\bpost\b.*)post|(?<!.*\bput\b.*)put|(?<!.*\bdelete\b.*)delete|
(?<!.*\bhead\b.*)head|(?<!.*\boptions\b.*)options))*

This part

(?:[values])

simply requires that at least one of the options are chosen. If no value is also allowed, surround the entire expression with this: (?:[...])?

The remainder

(?:\s(?:[values-with-restraints]))*

allows for zero-or-more whitespace-plus-value combinations. The values are given in this format

(?<!.*\b[value]\b.*)[value]

which uses negative look-behind (?<![...]) to make sure that it doesn't already exist previously in the text. I'm using word boundary markers \b to make sure that options that are part in others don't cause problems. An example is if you have options foo, bar and foobar, you don't want the option foobar to prevent the foo and bar options from being legal.

Just keep in mind that since this is going into XML, you'll have to replace the < character with &lt; when you put it into your schema.

Also, final warning, not all regular expression processors support the lookbehind feature.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top