Question

Regex validation can receive next samples of strings:

t/E/s/t
t/E/s/t/
t/E/s/t/////...
t/E/s/t/////?page=10
t/E/s/t/////?page=10/
t/E/s/t/////?page=10////...

I need to split the string to the parts:

1. t/E/s/t
2. ?page=10////... 
  • [the dots mean that "/" symbol can be repeated many times]. If "?..." part does not exist the second result string should be empty.

I have wrote the regex: ^(.*[^\/])\/+(\?.*)$ The problem that it does not work if the text string does not contain part of "?page=10///...". To make valid verification for string without "?page..." part i need second validation string: ^(.*[^\/])\/+$

I want to have only one validation rule.

Any ideas how to combine them?

Was it helpful?

Solution

It would be nice if something like /(.*[^\/])\/*(\?.*)?/ worked. But the problem is that the regex engine will find the best possible match for (.*[^\/])\/*, even if this means matching (\?.*)? against the empty string.*

You could do the following:

/(.*[^\/])\/*(\?.*)|(.*[^\/])/

This is slightly unsatisfactory in that you get 3 capture groups even though you only wanted 2. So you could do this instead, if (the version of) the language you're using allows the (?|...) construct:

/(?|(.*[^\/])\/*(\?.*)|(.*[^\/]))/

*More generally, suppose the regex engine is faced with a regex /AB/. The match it returns will contain the best possible match for /A/ (by which I mean the best match that can actually be extended to a match for /AB/). To put it another way, it doesn't backtrack into A until it's finished searching for matches for B.

OTHER TIPS

As a quick side note, I used ~ instead of / for delimiters so your / don't need to be escaped. Also, I used a character class for the question mark ([?]) instead of having to escape it (\?)...this is just personal preference for readability.

First we capture the literal string t/E/s/t. Then we match 0+ /s (if there needs to be a / in between t/E/s/t and ?, then change the * to + for 1+). Finally we capture the question mark followed by the rest of the line ([?].*). This is made optional with the trailing ?, so that if your string does not have the ?page=10 it will still be matched with an empty second capture.

~(t/E/s/t)/*([?].*)?~

Regex101

Is this what you are looking for?

<?php
$strings = array(
"t/E/s/t", 
"t/E/s/t/",
"t/E/s/t/////...",
"t/E/s/t/////?page=10",
"t/E/s/t/////?page=10/",
"t/E/s/t/////?page=10////...");
$regex ='~(?<=t/E/s/t)/+~';
foreach($strings as $str) {
    print_r(preg_split($regex,$str));
    echo "<br />";
}

Output:

Array ( [0] => t/E/s/t )
Array ( [0] => t/E/s/t [1] => )
Array ( [0] => t/E/s/t [1] => ... )
Array ( [0] => t/E/s/t [1] => ?page=10 )
Array ( [0] => t/E/s/t [1] => ?page=10/ )
Array ( [0] => t/E/s/t [1] => ?page=10////... ) 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top