Question

I'm using PHP's Filter Functions (FILTER_VALIDATE_REGEXP specifically) to validate the input data. I have a list of options and the $input variable can specify a number of options from the list.

The options are (case-insensitive):

  1. all
  2. rewards
  3. join
  4. promotions
  5. stream
  6. checkin
  7. verified_checkin

The $input variable can have almost any combination of the values. The possible success cases are:

  • all (value can either be all or a comma separated list of other values but not both)
  • rewards,stream,join (a comma separated list of values excluding all)
  • join (a single value)

The Regular Expression I've been able to come up with is:

/^(?:all|(?:checkin|verified_checkin|rewards|join|promotions|stream)?(?:,(?:checkin|verified_checkin|rewards|join|promotion|stream))*)$/

So far, it works for the following example scenarios:

  • all (passes)
  • rewards,join,promotion,checkin,verified_checkin (passes)
  • join (passes)

However, it lets a value with a leading comma and duplicates through:

  • ,promotion,checkin,verified_checkin (starts with a comma but also passes when it shouldn't)

Also, checking for duplicates would be a bonus, but not necessarily required.

  • rewards,join,promotion,checkin,join,verified_checkin (duplicate value but still passes but not as critical as a leading comma)

I've been at it for a couple of days now and having tried various implementations, this is the closest I've been able to get.

Any ideas on how to handle the leading comma false positive?

UPDATE: Edited the question to better explain that duplicate filtering isn't really a requirement, just a bonus.

Was it helpful?

Solution

Sometimes regular expressions just make things more complicated than they should be. Regular expressions are really good at matching patterns, but when you introduce external rules that have dependencies on the number of matched patterns things get complicated fast.

In this case I would just split the list on comma and check the resulting strings against the rules you just described.

$valid_choices = array('checkin','join','promotions','rewards','stream','verified_checkin');

$input_string;                       // string to match

$tokens = explode(',' $input_string);

$tokens = asort($tokens);            // sort to tokens to make it easy to find duplicates

if($tokens[0] == 'all' && count($tokens) > 1)
    return FALSE;                    // fail (all + other options)

if(!in_array($tokens[0], $valid_choices))
    return FALSE;                    // fail (invalid first choice)

for($i = 1; $i < count($tokens); $i++)
{
    if($tokens[$i] == $tokens[$i-1])
       return FALSE;                 // fail (duplicates)

    if(!in_array($tokens[$i], $valid_choices))
       return FALSE;                 // fail (choice not valid)
}

EDIT

Since you edited your and specified that duplicates would be acceptable but you definitely want a regex-based solution then this one should do:

^(all|((checkin|verified_checkin|rewards|join|promotions|stream)(,(checkin|verified_checkin|rewards|join|promotion|stream))*))$

It will not fail on duplicates but it will take care or leading or trailing commas, or all + other choices combination.

Filtering out duplicates with a regex would be pretty difficult but maybe not impossible (if you use a look-ahead with a capture group placeholder)

SECOND EDIT

Although you mentioned that detecting duplicate entries is not critical I figured I'd try my hand at crafting a pattern that would also check for duplicate entries.

As you can see below, it's not very elegant, nor is it easily scalable but it does get the job done with the finite list of options you have using negative look-ahead.

^(all|(checkin|verified_checkin|rewards|join|promotions|stream)(,(?!\2)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(?!\8)(checkin|verified_checkin|rewards|join|promotions|stream))?(,(?!\2)(?!\4)(?!\6)(?!\8)(?!\10)(checkin|verified_checkin|rewards|join|promotions|stream))?)$

Since the final regex is so long, I'm going to break it up into parts for the sake of making it easier to follow the general idea:

^(all|
  (checkin|verified_checkin|rewards|join|promotions|stream)
  (,(?!\2)(checkin|verified_checkin|rewards|join|promotions|stream))?
  (,(?!\2)(?!\4)(checkin|verified_checkin|rewards|join|promotions|stream))?
  (,(?!\2)(?!\4)(?!\6)(checkin|verified_checkin|rewards|join|promotions|stream))?
  (,(?!\2)(?!\4)(?!\6)(?!\8)(checkin|verified_checkin|rewards|join|promotions|stream))?
  (,(?!\2)(?!\4)(?!\6)(?!\8)(?!\10)(checkin|verified_checkin|rewards|join|promotions|stream))?
 )$/

You can see that the mechanism to form the pattern is somewhat iterative and such a pattern could be generated automatically by an algorithm if you wanted to provide a different list but the resulting pattern would get rather large, rather quickly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top